Category Archives: AWS

Amazon Redshift adds new AI capabilities, including Amazon Q, to boost efficiency and productivity

This post was originally published on this site

Amazon Redshift puts artificial intelligence (AI) at your service to optimize efficiencies and make you more productive with two new capabilities that we are launching in preview today.

First, Amazon Redshift Serverless becomes smarter. It scales capacity proactively and automatically along dimensions such as the complexity of your queries, their frequency, the size of the dataset, and so on to deliver tailored performance optimizations. This allows you to spend less time tuning your data warehouse instances and more time getting value from your data.

Second, Amazon Q generative SQL in Amazon Redshift Query Editor generates SQL recommendations from natural language prompts. This helps you to be more productive in extracting insights from your data.

Let’s start with Amazon Redshift Serverless
When you use Amazon Redshift Serverless, you can now opt in for a preview of AI-driven scaling and optimizations. When enabled, the system observes and learns from your usage patterns, such as the concurrent number of queries, their complexity, and the time it takes to run them. Then, it automatically optimizes your serverless endpoint to meet your price performance target. Based on AWS internal testing, this new capability may give you up to ten times better price performance for variable workloads without any manual intervention.

AI-driven scaling and optimizations eliminate the time and effort to manually resize your workgroup and plan background optimizations based on workload needs. It continually runs automatic optimizations when they are most valuable for better performance, avoiding performance cliffs and time-outs.

This new capability goes beyond the existing self-tuning capabilities of Amazon Redshift Serverless, such as machine learning (ML)-enhanced techniques to adjust your compute, modify the physical schema of the database, create or drop materialized views as needed (the one we manage automatically, not yours), and vacuum tables. This new capability brings more intelligence to decide how to adjust the compute, what background optimizations are required, and when to apply them, and it makes its decisions based on more dimensions. We also orchestrate ML-based optimizations for materialized views, table optimizations, and workload management when your queries need it.

During the preview, you must opt in to enable these AI-driven scaling and optimizations on your workgroups. You configure the system to balance the optimization for price or performance. There is only one slider to adjust in the console.

Redshift serverless - AI driven workgoups

As usual, you can track resource usage and associated changes through the console, Amazon CloudWatch metrics, and the system table SYS_SERVERLESS_USAGE.

Now, let’s look at Amazon Q generative SQL in Amazon Redshift Query Editor
What if you could use generative AI to help analysts write effective SQL queries more rapidly? This is the new experience we introduce today in Amazon Redshift Query Editor, our web-based SQL editor.

You can now describe the information you want to extract from your data in natural language, and we generate the SQL query recommendations for you. Behind the scenes, Amazon Q generative SQL uses a large language model (LLM) and Amazon Bedrock to generate the SQL query. We use different techniques, such as prompt engineering and Retrieval Augmented Generation (RAG), to query the model based on your context: the database you’re connected to, the schema you’re working on, your query history, and optionally the query history of other users connected to the same endpoint. The system also remembers previous questions. You can ask it to refine a previously generated query.

The SQL generation model uses metadata specific to your data schema to generate relevant queries. For example, it uses the table and column names and the relationship between the tables in your database. In addition, your database administrator can authorize the model to use the query history of all users in your AWS account to generate even more relevant SQL statements. We don’t share your query history with other AWS accounts and we don’t train our generation models with any data coming from your AWS account. We maintain the high level of privacy and security that you expect from us.

Using generated SQL queries helps you to get started when discovering new schemas. It does the heavy lifting of discovering the column names and relationships between tables for you. Senior analysts also benefit from asking what they want in natural language and having the SQL statement automatically generated. They can review the queries and run them directly from their notebook.

Let’s explore a schema and extract information
For this demo, let’s pretend I am a data analyst at a company that sells concert tickets. The database schema and data are available for you to download. My manager asks me to analyze the ticket sales data to send a thank you note with discount coupons to the highest-spending customers in Seattle.

I connect to Amazon Redshift Query Editor and connect the analytic endpoint. I create a new tab for a Notebook (SQL generation is available from notebooks only).

Instead of writing a SQL statement, I open the chat panel and type, “Find the top five users from Seattle who bought the most number of tickets in 2022.” I take the time to verify the generated SQL statement. It seems correct, so I decide to run it. I select Add to notebook and then Run. The query returns the list of the top five buyers in Seattle.

sql generation - top 5 users

I had no previous knowledge of the data schema, and I did not type a single line of SQL to find the information I needed.

But generative SQL is not limited to a single interaction. I can chat with it to dynamically refine the queries. Here is another example.

I ask “Which state has the most venues?” Generative SQL proposes the following query. The answer is New York, with 49 venues, if you’re curious.

generative sql chat 01

I changed my mind, and I want to know the top three cities with the most venues. I simply rephrase my question: “What about the top three venues?

generative sql chat 02

I add the query to the notebook and run it. It returns the expected result.

generative sql chat 03

Best practices for prompting
Here are a couple of tips and tricks to get the best results out of your prompts.

Be specific – When asking questions in natural language, be as specific as possible to help the system understand exactly what you need. For example, instead of writing “find the top venues that sold the most tickets,” provide more details like “find the names of the top three venues that sold the most tickets in 2022.” Use consistent entity names like venue, ticket, and location instead of referring to the same entity in different ways, which can confuse the system.

Iterate – Break your complex requests into multiple simple statements that are easier for the system to interpret. Iteratively ask follow-up questions to get more detailed analysis from the system. For example, start by asking, “Which state has the most venues?” Then, based on the response, ask a follow-up question like “Which is the most popular venue from this state?”

Verify – Review the generated SQL before running it to ensure accuracy. If the generated SQL query has errors or does not match your intent, provide instructions to the system on how to correct it instead of rephrasing the entire request. For example, if the query is missing a filter clause on year, write “provide venues from year 2022.”

Availability and pricing
AI-driven scaling and optimizations are in preview in six AWS Regions: US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland, Stockholm). They come at no additional cost. You pay only for the compute capacity your data warehouse consumes when it is active. Pricing is per Redshift Processing Unit (RPU) per hour. The billing is per second of used capacity. The pricing page for Amazon Redshift has the details.

Amazon Q generative SQL for Amazon Redshift Query Editor is in preview in two AWS Regions today: US East (N. Virginia) and US West (Oregon). There is no charge during the preview period.

These are two examples of how AI helps to optimize performance and increase your productivity, either by automatically adjusting the price-performance ratio of your Amazon Redshift Serverless endpoints or by generating correct SQL statements from natural language prompts.

Previews are essential for us to capture your feedback before we make these capabilities available for all. Experiment with these today and let us know what you think on the re:Post forums or using the feedback button on the bottom left side of the console.

— seb

AWS Clean Rooms ML helps customers and partners apply ML models without sharing raw data (preview)

This post was originally published on this site

Today, we’re introducing AWS Clean Rooms ML (preview), a new capability of AWS Clean Rooms that helps you and your partners apply machine learning (ML) models on your collective data without copying or sharing raw data with each other. With this new capability, you can generate predictive insights using ML models while continuing to protect your sensitive data.

During this preview, AWS Clean Rooms ML introduces its first model specialized to help companies create lookalike segments for marketing use cases. With AWS Clean Rooms ML lookalike, you can train your own custom model, and you can invite partners to bring a small sample of their records to collaborate and generate an expanded set of similar records while protecting everyone’s underlying data.

In the coming months, AWS Clean Rooms ML will release a healthcare model. This will be the first of many models that AWS Clean Rooms ML will support next year.

AWS Clean Rooms ML helps you to unlock various opportunities for you to generate insights. For example:

  • Airlines can take signals about loyal customers, collaborate with online booking services, and offer promotions to users with similar characteristics.
  • Auto lenders and car insurers can identify prospective auto insurance customers who share characteristics with a set of existing lease owners.
  • Brands and publishers can model lookalike segments of in-market customers and deliver highly relevant advertising experiences.
  • Research institutions and hospital networks can find candidates similar to existing clinical trial participants to accelerate clinical studies (coming soon).

AWS Clean Rooms ML lookalike modeling helps you apply an AWS managed, ready-to-use model that is trained in each collaboration to generate lookalike datasets in a few clicks, saving months of development work to build, train, tune, and deploy your own model.

How to use AWS Clean Rooms ML to generate predictive insights
Today I will show you how to use lookalike modeling in AWS Clean Rooms ML and assume you have already set up a data collaboration with your partner. If you want to learn how to do that, check out the AWS Clean Rooms Now Generally Available — Collaborate with Your Partners without Sharing Raw Data post.

With your collective data in the AWS Clean Rooms collaboration, you can work with your partners to apply ML lookalike modeling to generate a lookalike segment. It works by taking a small sample of representative records from your data, creating a machine learning (ML) model, then applying the particular model to identify an expanded set of similar records from your business partner’s data.

The following screenshot shows the overall workflow for using AWS Clean Rooms ML.

By using AWS Clean Rooms ML, you don’t need to build complex and time-consuming ML models on your own. AWS Clean Rooms ML trains a custom, private ML model, which saves months of your time while still protecting your data.

Eliminating the need to share data
As ML models are natively built within the service, AWS Clean Rooms ML helps you protect your dataset and customer’s information because you don’t need to share your data to build your ML model.

You can specify the training dataset using the AWS Glue Data Catalog table, which contains user-item interactions.

Under Additional columns to train, you can define numerical and categorical data. This is useful if you need to add more features to your dataset, such as the number of seconds spent watching a video, the topic of an article, or the product category of an e-commerce item.

Applying custom-trained AWS-built models
Once you have defined your training dataset, you can now create a lookalike model. A lookalike model is a machine learning model used to find similar profiles in your partner’s dataset without either party having to share their underlying data with each other.

When creating a lookalike model, you need to specify the training dataset. From a single training dataset, you can create many lookalike models. You also have the flexibility to define the date window in your training dataset using Relative range or Absolute range. This is useful when you have data that is constantly updated within AWS Glue, such as articles read by users.

Easy-to-tune ML models
After you create a lookalike model, you need to configure it to use in AWS Clean Rooms collaboration. AWS Clean Rooms ML provides flexible controls that enable you and your partners to tune the results of the applied ML model to garner predictive insights.

On the Configure lookalike model page, you can choose which Lookalike model you want to use and define the Minimum matching seed size you need. This seed size defines the minimum number of profiles in your seed data that overlap with profiles in the training data.

You also have the flexibility to choose whether the partner in your collaboration receives metrics in Metrics to share with other members.

With your lookalike models properly configured, you can now make the ML models available for your partners by associating the configured lookalike model with a collaboration.

Creating lookalike segments
Once the lookalike models have been associated, your partners can now start generating insights by selecting Create lookalike segment and choosing the associated lookalike model for your collaboration.

Here on the Create lookalike segment page, your partners need to provide the Seed profiles. Examples of seed profiles include your top customers or all customers who purchased a specific product. The resulting lookalike segment will contain profiles from the training data that are most similar to the profiles from the seed.

Lastly, your partner will get the Relevance metrics as the result of the lookalike segment using the ML models. At this stage, you can use the Score to make a decision.

Export data and use programmatic API
You also have the option to export the lookalike segment data. Once it’s exported, the data is available in JSON format and you can process this output by integrating with AWS Clean Rooms API and your applications.

Join the preview
AWS Clean Rooms ML is now in preview and available via AWS Clean Rooms in US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Seoul, Singapore, Sydney, Tokyo), and Europe (Frankfurt, Ireland, London). Support for additional models is in the works.

Learn how to apply machine learning with your partners without sharing underlying data on the AWS Clean Rooms ML page.

Happy collaborating!
— Donnie

Announcing Amazon OpenSearch Service zero-ETL integration with Amazon S3 (preview)

This post was originally published on this site

Today we are announcing a preview of Amazon OpenSearch Service zero-ETL integration with Amazon S3, a new way to query operational logs in Amazon S3 and S3-based data lakes without needing to switch between services. You can now analyze infrequently queried data in cloud object stores and simultaneously use the operational analytics and visualization capabilities of OpenSearch Service.

Amazon OpenSearch Service direct queries with Amazon S3 provides a zero-ETL integration to reduce the operational complexity of duplicating data or managing multiple analytics tools by enabling customers to directly query their operational data, reducing costs and time to action. This zero-ETL integration will be configurable within OpenSearch Service, where you can take advantage of various log type templates, including predefined dashboards, and configure data accelerations tailored to that log type. Templates include VPC Flow Logs, Elastic Load Balancing logs, and NGINX logs, and accelerations include skipping indexes, materialized views, and covered indexes.

With direct queries with Amazon S3, you can perform complex queries critical to security forensic and threat analysis that correlate data across multiple data sources, which aids teams in investigating service downtime and security events. After creating an integration, you can start querying their data directly from the OpenSearch Dashboards or OpenSearch API. You can easily audit connections to ensure that they are set up in a scalable, cost-efficient, and secure way.

Getting started with direct queries with Amazon S3
You can easily get started by creating a new Amazon S3 direct query data source for OpenSearch Service through the AWS Management Console or the API. Each new data source uses AWS Glue Data Catalog to manage tables that represent S3 buckets. Once you create a data source, you can configure Amazon S3 tables and data indexing and query data in OpenSearch Dashboards.

1. Create a data source in OpenSearch Service
Before you create a data source, you should have an OpenSearch Service domain with version 2.11 or later and a target Amazon S3 table in AWS Glue Data Catalog with the appropriate IAM permissions. IAM will need access to the desired S3 bucket(s) and read and write access to AWS Glue Data Catalog. To learn more about IAM prerequisites, see Creating a data source in the AWS documentation.

Go to the OpenSearch Service console and choose the domain you want to set up a new data source for. In the domain details page, choose the Connections tab below the general information and see the Direct Query section.

To create a new data source, choose Create, input the name of your new data source, select the data source type as Amazon S3 with AWS Glue Data Catalog, and choose the IAM role for your data source.

Once you create a data source, you can go to the OpenSearch Dashboards of the domain, which you use to configure access control, define tables, set up log type–based dashboards for popular log types, and query your data.

2. Configuring your data source in OpenSearch Dashboards
To configure data source in OpenSearch Dashboards, choose Configure in the console and go to OpenSearch Dashboards. In the left-hand navigation of OpenSearch Dashboards, under Management, choose Data sources. Under Manage data sources, choose the name of the data source you created in the console.

Direct queries from OpenSearch Service to Amazon S3 use Spark tables within AWS Glue Data Catalog. To create a new table you want to direct query, go to the Query Workbench in the Open Search Plugins menu.

Now run as in the following SQL statement to create http_logs table and run MSCK REPAIR TABLE mys3.default.http_logs command to update the metadata in the catalog

CREATE EXTERNAL TABLE IF NOT EXISTS mys3.default.http_logs (
   `@timestamp` TIMESTAMP,
    clientip STRING,
    request STRING, 
    status INT, 
    size INT, 
    year INT, 
    month INT, 
    day INT) 
USING json PARTITIONED BY(year, month, day) OPTIONS (path 's3://mys3/data/http_log/http_logs_partitioned_json_bz2/', compression 'bzip2')

To ensure a fast experience with your data in Amazon S3, you can set up any of three different types of accelerations to index data into OpenSearch Service, such as skipping indexes, materialized views, and covering indexes. To create OpenSearch indexes from external data connections for better performance, choose the Accelerate Table.

  • Skipping indexes allow you to index only the metadata of the data stored in Amazon S3. Skipping indexes help quickly identify data stored by narrowing down a specific location of where the data is stored.
  • Materialized views enable you to use complex queries such as aggregations, which can be used for querying or powering dashboard visualizations. Materialized views ingest data into OpenSearch Service for anomaly detection or geospatial capabilities.
  • Covering indexes will ingest all the data from the specified table column. Covering indexes are the most performant of the three indexing types.

3. Query your data source in OpenSearch Dashboards
After you set up your tables, you can query your data using Discover. You can run a sample SQL query for the http_logs table you created in AWS Glue Data Catalog tables.

To learn more, see Working with Amazon OpenSearch Service direct queries with Amazon S3 in the AWS documentation.

Join the preview
Amazon OpenSearch Service zero-ETL integration with Amazon S3 is now previewed in the AWS US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland) Regions.

OpenSearch Service separately charges for only the compute needed as OpenSearch Compute Units to query your external data as well as maintain indexes in OpenSearch Service. For more information, see Amazon OpenSearch Service Pricing.

Give it a try and send feedback to the AWS re:Post for Amazon OpenSearch Service or through your usual AWS Support contacts.

Channy

Analyze large amounts of graph data to get insights and find trends with Amazon Neptune Analytics

This post was originally published on this site

I am happy to announce the general availability of Amazon Neptune Analytics, a new analytics database engine that makes it faster for data scientists and application developers to quickly analyze large amounts of graph data. With Neptune Analytics, you can now quickly load your dataset from Amazon Neptune or your data lake on Amazon Simple Storage Service (Amazon S3), run your analysis tasks in near real time, and optionally terminate your graph afterward.

Graph data enables the representation and analysis of intricate relationships and connections within diverse data domains. Common applications include social networks, where it aids in identifying communities, recommending connections, and analyzing information diffusion. In supply chain management, graphs facilitate efficient route optimization and bottleneck identification. In cybersecurity, they reveal network vulnerabilities and identify patterns of malicious activity. Graph data finds application in knowledge management, financial services, digital advertising, and network security, performing tasks such as identifying money laundering networks in banking transactions and predicting network vulnerabilities.

Since the launch of Neptune in May 2018, thousands of customers have embraced the service for storing their graph data and performing updates and deletion on specific subsets of the graph. However, analyzing data for insights often involves loading the entire graph into memory. For instance, a financial services company aiming to detect fraud may need to load and correlate all historical account transactions.

Performing analyses on extensive graph datasets, such as running common graph algorithms, requires specialized tools. Utilizing separate analytics solutions demands the creation of intricate pipelines to transfer data for processing, which is challenging to operate, time-consuming, and prone to errors. Furthermore, loading large datasets from existing databases or data lakes to a graph analytic solution can take hours or even days.

Neptune Analytics offers a fully managed graph analytics experience. It takes care of the infrastructure heavy lifting, enabling you to concentrate on problem-solving through queries and workflows. Neptune Analytics automatically allocates compute resources according to the graph’s size and quickly loads all the data in memory to run your queries in seconds. Our initial benchmarking shows that Neptune Analytics loads data from Amazon S3 up to 80x faster than existing AWS solutions.

Neptune Analytics supports 5 families of algorithms covering 15 different algorithms, each with multiple variants. For example, we provide algorithms for path-finding, detecting communities (clustering), identifying important data (centrality), and quantifying similarity. Path-finding algorithms are used for use cases such as route planning for supply chain optimization. Centrality algorithms like page rank identify the most influential sellers in a graph. Algorithms like connected components, clustering, and similarity algorithms can be used for fraud-detection use cases to determine whether the connected network is a group of friends or a fraud ring formed by a set of coordinated fraudsters.

Neptune Analytics facilitates the creation of graph applications using openCypher, presently one of the widely adopted graph query languages. Developers, business analysts, and data scientists appreciate openCypher’s SQL-inspired syntax, finding it familiar and structured for composing graph queries.

Let’s see it at work
As we usually do on the AWS News blog, let’s show how it works. For this demo, I first navigate to Neptune in the AWS Management Console. There is a new Analytics section on the left navigation pane. I select Graphs and then Create graph.

Neptune Analytics - create graph 1

On the Create graph page, I enter the details of my graph analytics database engine. I won’t detail each parameter here; their names are self-explanatory.

Neptune Analytics - Create graph 1

Pay attention to Allow from public because, the vast majority of the time, you want to keep your graph only available from the boundaries of your VPC. I also create a Private endpoint to allow private access from machines and services inside my account VPC network.

Neptune Analytics - Create graph 2

In addition to network access control, users will need proper IAM permissions to access the graph.

Finally, I enable Vector search to perform similarity search using embeddings in the dataset. The dimension of the vector depends on the large language model (LLM) that you use to generate the embedding.

Neptune Analytics - Create graph 3

When I am ready, I select Create graph (not shown here).

After a few minutes, my graph is available. Under Connectivity & security, I take note of the Endpoint. This is the DNS name I will use later to access my graph from my applications.

I can also create Replicas. A replica is a warm standby copy of the graph in another Availability Zone. You might decide to create one or more replicas for high availability. By default, we create one replica, and depending on your availability requirements, you can choose not to create replicas.

Neptune Analytics - create graph 3

Business queries on graph data
Now that the Neptune Analytics graph is available, let’s load and analyze data. For the rest of this demo, imagine I’m working in the finance industry.

I have a dataset obtained from the US Securities and Exchange Commission (SEC). This dataset contains the list of positions held by investors that have more than $100 million in assets. Here is a diagram to illustrate the structure of the dataset I use in this demo.

Nuptune graph analytics - dataset structure

I want to get a better understanding of the positions held by one investment firm (let’s name it “Seb’s Investments LLC”). I wonder what its top five holdings are and who else holds more than $1 billion in the same companies. I am also curious to know what are other investment companies that have a similar portfolio as Seb’s Investments LLC.

To start my analysis, I create a Jupyter notebook in the Neptune section of the AWS Management Console. In the notebook, I first define my analytics endpoint and load the data set from an S3 bucket. It takes only 18 seconds to load 17 million records.

Neptune Analytics - load data

Then, I start to explore the dataset using openCypher queries. I start by defining my parameters:

params = {'name': "Seb's Investments LLC", 'quarter': '2023Q4'}

First, I want to know what the top five holdings are for Seb’s Investments LLC in this quarter and who else holds more than $1 billion in the same companies. In openCypher, it translates to the query hereafter. The $name parameter’s value is “Seb’s Investment LLC” and the $quarter parameter’s value is 2023Q4.

MATCH p=(h:Holder)-->(hq1)-[o:owns]->(holding)
WHERE h.name = $name AND hq1.name = $quarter
WITH DISTINCT holding as holding, o ORDER BY o.value DESC LIMIT 5
MATCH (holding)<-[o2:owns]-(hq2)<--(coholder:Holder)
WHERE hq2.name = '2023Q4'
WITH sum(o2.value) AS totalValue, coholder, holding
WHERE totalValue > 1000000000
RETURN coholder.name, collect(holding.name)

Neptune Analytics - query 1

Then, I want to know what the other top five companies are that have similar holdings as “Seb’s Investments LLC.” I use the topKByNode() function to perform a vector search.

MATCH (n:Holder)
WHERE n.name = $name
CALL neptune.algo.vectors.topKByNode(n)
YIELD node, score
WHERE score >0
RETURN node.name LIMIT 5

This query identifies a specific Holder node with the name “Seb’s Investments LLC.” Then, it utilizes the Neptune Analytics custom vector similarity search algorithm on the embedding property of the Holder node to find other nodes in the graph that are similar. The results are filtered to include only those with a positive similarity score, and the query finally returns the names of up to five related nodes.

Neptune Analytics - query 2

Pricing and availability
Neptune Analytics is available today in seven AWS Regions: US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Singapore, Tokyo), and Europe (Frankfurt, Ireland).

AWS charges for the usage on a pay-as-you-go basis, with no recurring subscriptions or one-time setup fees.

Pricing is based on configurations of memory-optimized Neptune capacity units (m-NCU). Each m-NCU corresponds to one hour of compute and networking capacity and 1 GiB of memory. You can choose configurations starting with 128 m-NCUs and up to 4096 m-NCUs. In addition to m-NCU, storage charges apply for graph snapshots.

I invite you to read the Neptune pricing page for more details

Neptune Analytics is a new analytics database engine to analyze large graph datasets. It helps you discover insights faster for use cases such as fraud detection and prevention, digital advertising, cybersecurity, transportation logistics, and bioinformatics.

Get started
Log in to the AWS Management Console to give Neptune Analytics a try.

— seb

Vector search for Amazon DocumentDB (with MongoDB compatibility) is now generally available

This post was originally published on this site

Today, we are announcing the general availability of vector search for Amazon DocumentDB (with MongoDB compatibility), a new built-in capability that lets you store, index, and search millions of vectors with millisecond response times within your document database.

Vector search is an emerging technique used in machine learning (ML) to find similar data points to given data by comparing their vector representations using distance or similarity metrics. Vectors are numerical representation of unstructured data created from large language models (LLM) hosted in Amazon Bedrock, Amazon SageMaker, and other open source or proprietary ML services. This approach is useful in creating generative artificial intelligence (AI) applications, such as intuitive search, product recommendation, personalization, and chatbots using Retrieval Augmented Generation (RAG) model approach. For example, if your data set contained individual documents for movies, you could semantically search for movies similar to Titanic based on shared context such as “boats”, “tragedy”, or “movies based on true stories” instead of simply matching keywords.

With vector search for Amazon DocumentDB, you can effectively search the database based on nuanced meaning and context without spending time and cost to manage a separate vector database infrastructure. You also benefit from the fully managed, scalable, secure, and highly available JSON-based document database that Amazon DocumentDB provides.

Getting started with vector search on Amazon DocumentDB
The vector search feature is available on your Amazon DocumentDB 5.0 instance-based clusters. To implement a vector search application, you generate vectors using embedding models for fields inside your document and store vectors side by side your source data inside Amazon DocumentDB.

Next, you create a vector index on a vector field that will help retrieve similar vectors and can search the Amazon DocumentDB database using semantic search. Finally, user-submitted queries are converted to vectors using the same embedding model to get semantically similar documents and return them to the client.

Let’s look at how to implement a simple semantic search application using vector search on Amazon DocumentDB.

Step 1. Create vector embeddings using the Amazon Titan Embeddings model
Let’s use the Amazon Titan Embeddings model to create an embedding vector. Amazon Titan Embeddings model is available in Amazon Bedrock, a serverless generative AI service. You can easily access it using a single API and without managing any infrastructure.

prompt = "I love dog and cat."
response = bedrock_runtime.invoke_model(
    body= json.dumps({"inputText": prompt}), 
    modelId='amazon.titan-embed-text-v1', 
    accept='application/json', 
    contentType='application/json'
)
response_body = json.loads(response['body'].read())
embedding = response_body.get('embedding')

The returned vector embedding will look similar to this:

[0.82421875, -0.6953125, -0.115722656, 0.87890625, 0.05883789, -0.020385742, 0.32421875, -0.00078201294, -0.40234375, 0.44140625, ...]

Step 2. Insert vector embeddings and create a vector index
You can add generated vector embeddings using the insertMany( [{},...,{}] ) operation with a list of the documents that you want added to your collection in Amazon DocumentDB.

db.collection.insertMany([
    {sentence: "I love a dog and cat.", vectorField: [0.82421875, -0.6953125,...]},
    {sentence: "My dog is very cute.", vectorField: [0.05883789, -0.020385742,...]},
    {sentence: "I write with a pen.", vectorField: [-0.020385742, 0.32421875,...]},
  ...
]);

You can create a vector index using the createIndex command. Amazon DocumentDB performs an approximate nearest neighbor (ANN) search using the inverted file with flat compression (IVFFLAT) vector index. The feature supports three distance metrics: euclidean, cosine, and inner product. We will use the euclidean distance, a measure of the straight-line distance between two points in space. The smaller the euclidean distance, the closer the vectors are to each other.

db.collection.createIndex (
   { vectorField: "vector" },
   { "name": "index name",
     "vectorOptions": {
        "dimensions": 100, // the number of vector data dimensions
        "similarity": "euclidean", // Or cosine and dotProduct
        "lists": 100 
      }
   }
);

Step 3.  Search vector embeddings from Amazon DocumentDB
You can now search for similar vectors within your documents using a new aggregation pipeline operator within $search. The example code to search “I like pets” is as follows:

db.collection.aggregate ({
  $search: {
    "vectorSearch": {
      "vector": [0.82421875, -0.6953125,...], // Search for ‘I like pets’
      "path": vectorField,
      "k": 5,
      "similarity": "euclidean", // Or cosine and dotProduct
      "probes": 1 // the number of clusters for vector search
      }
     }
   });

This returns search results such as “I love a dog and cat.” which is semantically similar.

To learn more, see Amazon DocumentDB documentation. To see a more practical example—a semantic movie search with Amazon DocumentDB—find the Python source codes and data-sets in the GitHub repository.

Now available
Vector search for Amazon DocumentDB is now available at no additional cost to all customers using Amazon DocumentDB 5.0 instance-based clusters in all AWS Regions where Amazon DocumentDB is available. Standard compute, I/O, storage, and backup charges will apply as you store, index, and search vector embeddings on Amazon DocumentDB.

To learn more, see the Amazon DocumentDB documentation and send feedback to AWS re:Post for Amazon DocumentDB or through your usual AWS Support contacts.

Channy

Vector engine for Amazon OpenSearch Serverless is now available

This post was originally published on this site

Today we are announcing the general availability of the vector engine for Amazon OpenSearch Serverless with new features. In July 2023, we introduced the preview release of the vector engine for Amazon OpenSearch Serverless, a simple, scalable, and high-performing similarity search capability. The vector engine makes it easy for you to build modern machine learning (ML) augmented search experiences and generative artificial intelligence (generative AI) applications without needing to manage the underlying vector database infrastructure.

You can now store, update, and search billions of vector embeddings with thousands of dimensions in milliseconds. The highly performant similarity search capability of vector engine enables generative AI-powered applications to deliver accurate and reliable results with consistent milliseconds-scale response times.

The vector engine also enables you to optimize and tune results with hybrid search by combining vector search and full-text search in the same query, removing the need to manage and maintain separate data stores or a complex application stack. The vector engine provides a secure, reliable, scalable, and enterprise-ready platform to cost effectively build a prototyping application and then seamlessly scale to production.

You can now get started in minutes with the vector engine by creating a specialized vector engine–based collection, which is a logical grouping of embeddings that works together to support a workload.

The vector engine uses OpenSearch Compute Units (OCUs), compute capacity unit, to ingest and run similarity search queries. One OCU can handle up to 2 million vectors for 128 dimensions or 500,000 for 768 dimensions at 99 percent recall rate.

The vector engine built on OpenSearch Serverless is a highly available service by default. It requires a minimum of four OCUs (2 OCUs for the ingest, including primary and standby, and 2 OCUs for the search with two active replicas across Availability Zones) for the first collection in an account. All subsequent collections using the same AWS Key Management Service (AWS KMS) key can share those OCUs.

What’s new at GA?
Since the preview, the vector engine for Amazon OpenSearch Serverless became one of the vector database options in the knowledge base of Amazon Bedrock to build generative AI applications using a Retrieval Augmented Generation (RAG) concept.

Here are some new or improved features for this GA release:

Disable redundant replica (development and test focused) option
As we announced in our preview blog post, this feature eliminates the need to have redundant OCUs in another Availability Zone solely for availability purposes. A collection can be deployed with two OCUs – one for indexing and one for search. This cuts the costs in half compared to default deployment with redundant replicas. The reduced cost makes this configuration suitable and economical for development and testing workloads.

With this option, we will still provide durability guarantees since the vector engine persists all the data in Amazon S3, but single-AZ failures would impact your availability.

If you want to disable a redundant replica, uncheck Enable redundancy when creating a new vector search
collection.

Fractional OCU for the development and test focused option
Support for fractional OCU billing for development and test focused workloads (that is, no redundant replica option) reduces the floor price for vector search collection. The vector engine will initially deploy smaller 0.5 OCUs while providing the same capabilities at lower scale and will scale up to a full OCU and beyond to meet your workload demand. This option will further reduce the monthly costs when experimenting with using the vector engine.

Automatic scaling for a billion scale
With vector engine’s seamless auto-scaling, you no longer have to reindex for scaling purposes. At preview, we were supporting about 20 million vector embeddings. With the general availability of vector engine, we have raised the limits to support a billion vector scale.

Now available
The vector engine for Amazon OpenSearch Serverless is now available in all AWS Regions where Amazon OpenSearch Serverless is available.

To get started, you can refer to the following resources:

Give it a try and send feedback to AWS re:Post for Amazon OpenSearch Service or through your usual AWS support contacts.

Channy

Introducing Amazon SageMaker HyperPod, a purpose-built infrastructure for distributed training at scale

This post was originally published on this site

Today, we are introducing Amazon SageMaker HyperPod, which helps reducing time to train foundation models (FMs) by providing a purpose-built infrastructure for distributed training at scale. You can now use SageMaker HyperPod to train FMs for weeks or even months while SageMaker actively monitors the cluster health and provides automated node and job resiliency by replacing faulty nodes and resuming model training from a checkpoint.

The clusters come preconfigured with SageMaker’s distributed training libraries that help you split your training data and model across all the nodes to process them in parallel and fully utilize the cluster’s compute and network infrastructure. You can further customize your training environment by installing additional frameworks, debugging tools, and optimization libraries.

Let me show you how to get started with SageMaker HyperPod. In the following demo, I create a SageMaker HyperPod and show you how to train a Llama 2 7B model using the example shared in the AWS ML Training Reference Architectures GitHub repository.

Create and manage clusters
As the SageMaker HyperPod admin, you can create and manage clusters using the AWS Management Console or AWS Command Line Interface (AWS CLI). In the console, navigate to Amazon SageMaker, select Cluster management under HyperPod Clusters in the left menu, then choose Create a cluster.

Amazon SageMaker HyperPod Clusters

In the setup that follows, provide a cluster name and configure instance groups with your instance types of choice and the number of instances to allocate to each instance group.

Amazon SageMaker HyperPod

You also need to prepare and upload one or more lifecycle scripts to your Amazon Simple Storage Service (Amazon S3) bucket to run in each instance group during cluster creation. With lifecycle scripts, you can customize your cluster environment and install required libraries and packages. You can find example lifecycle scripts for SageMaker HyperPod in the GitHub repo.

Using the AWS CLI
You can also use the AWS CLI to create and manage clusters. For my demo, I specify my cluster configuration in a JSON file. I choose to create two instance groups, one for the cluster controller node(s) that I call “controller-group,” and one for the cluster worker nodes that I call “worker-group.” For the worker nodes that will perform model training, I specify Amazon EC2 Trn1 instances powered by AWS Trainium chips.

// demo-cluster.json
{
   "InstanceGroups": [
        {
            "InstanceGroupName": "controller-group",
            "InstanceType": "ml.m5.xlarge",
            "InstanceCount": 1,
            "lifecycleConfig": {
                "SourceS3Uri": "s3://<your-s3-bucket>/<lifecycle-script-directory>/",
                "OnCreate": "on_create.sh"
            },
            "ExecutionRole": "arn:aws:iam::111122223333:role/my-role-for-cluster",
            "ThreadsPerCore": 1
        },
        {
            "InstanceGroupName": "worker-group",
            "InstanceType": "trn1.32xlarge",
            "InstanceCount": 4,
            "lifecycleConfig": {
                "SourceS3Uri": "s3://<your-s3-bucket>/<lifecycle-script-directory>/",
                "OnCreate": "on_create.sh"
            },
            "ExecutionRole": "arn:aws:iam::111122223333:role/my-role-for-cluster",
            "ThreadsPerCore": 1
        }
    ]
}

To create the cluster, I run the following AWS CLI command:

aws sagemaker create-cluster 
--cluster-name antje-demo-cluster 
--instance-groups file://demo-cluster.json

Upon creation, you can use aws sagemaker describe-cluster and aws sagemaker list-cluster-nodes to view your cluster and node details. Note down the cluster ID and instance ID of your controller node. You need that information to connect to your cluster.

You also have the option to attach a shared file system, such as Amazon FSx for Lustre. To use FSx for Lustre, you need to set up your cluster with an Amazon Virtual Private Cloud (Amazon VPC) configuration. Here’s an AWS CloudFormation template that shows how to create a SageMaker VPC and how to deploy FSx for Lustre.

Connect to your cluster
As a cluster user, you need to have access to the cluster provisioned by your cluster admin. With access permissions in place, you can connect to the cluster using SSH to schedule and run jobs. You can use the preinstalled AWS CLI plugin for AWS Systems Manager to connect to the controller node of your cluster.

For my demo, I run the following command specifying my cluster ID and instance ID of the control node as the target.

aws ssm start-session 
--target sagemaker-cluster:ntg44z9os8pn_i-05a854e0d4358b59c 
--region us-west-2

Schedule and run jobs on the cluster using Slurm
At launch, SageMaker HyperPod supports Slurm for workload orchestration. Slurm is a popular an open source cluster management and job scheduling system. You can install and set up Slurm through lifecycle scripts as part of the cluster creation. The example lifecycle scripts show how. Then, you can use the standard Slurm commands to schedule and launch jobs. Check out the Slurm Quick Start User Guide for architecture details and helpful commands.

For this demo, I’m using this example from the AWS ML Training Reference Architectures GitHub repo that shows how to train Llama 2 7B on Slurm with Trn1 instances. My cluster is already setup with Slurm, and I have an FSx for Lustre filesystem mounted.

Note
The Llama 2 model is governed by Meta. You can request access through the Meta request access page.

Set up the cluster environment
SageMaker HyperPod supports training in a range of environments, including Conda, venv, Docker, and enroot. Following the instructions in the README, I build my virtual environment aws_neuron_venv_pytorch and set up the torch_neuronx and neuronx-nemo-megatron libraries for training models on Trn1 instances.

Prepare model, tokenizer, and dataset
I follow the instructions to download the Llama 2 model and tokenizer and convert the model into the Hugging Face format. Then, I download and tokenize the RedPajama dataset. As a final preparation step, I pre-compile the Llama 2 model using ahead-of-time (AOT) compilation to speed up model training.

Launch jobs on the cluster
Now, I’m ready to start my model training job using the sbatch command.

sbatch --nodes 4 --auto-resume=1 run.slurm ./llama_7b.sh

You can use the squeue command to view the job queue. Once the training job is running, the SageMaker HyperPod resiliency features are automatically enabled. SageMaker HyperPod will automatically detect hardware failures, replace nodes as needed, and resume training from checkpoints if the auto-resume parameter is set, as shown in the preceding command.

You can view the output of the model training job in the following file:

tail -f slurm-run.slurm-<JOB_ID>.out

A sample output indicating that model training has started will look like this:

Epoch 0:  22%|██▏       | 4499/20101 [22:26:14<77:48:37, 17.95s/it, loss=2.43, v_num=5563, reduced_train_loss=2.470, gradient_norm=0.121, parameter_norm=1864.0, global_step=4512.0, consumed_samples=1.16e+6, iteration_time=16.40]
Epoch 0:  22%|██▏       | 4500/20101 [22:26:32<77:48:18, 17.95s/it, loss=2.43, v_num=5563, reduced_train_loss=2.470, gradient_norm=0.121, parameter_norm=1864.0, global_step=4512.0, consumed_samples=1.16e+6, iteration_time=16.40]
Epoch 0:  22%|██▏       | 4500/20101 [22:26:32<77:48:18, 17.95s/it, loss=2.44, v_num=5563, reduced_train_loss=2.450, gradient_norm=0.120, parameter_norm=1864.0, global_step=4512.0, consumed_samples=1.16e+6, iteration_time=16.50]

To further monitor and profile your model training jobs, you can use SageMaker hosted TensorBoard or any other tool of your choice.

Now available
SageMaker HyperPod is available today in AWS Regions US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

Learn more:

  • See Amazon SageMaker HyperPod for pricing information and a list of supported cluster instance types
  • Check out the Developer Guide
  • Visit the AWS Management Console to start training your FMs with SageMaker HyperPod

— Antje

PS: Writing a blog post at AWS is always a team effort, even when you see only one name under the post title. In this case, I want to thank Brad Doran, Justin Pirtle, Ben Snyder, Pierre-Yves Aquilanti, Keita Watanabe, and Verdi March for their generous help with example code and sharing their expertise in managing large-scale model training infrastructures, Slurm, and SageMaker HyperPod.

Amazon Titan Image Generator, Multimodal Embeddings, and Text models are now available in Amazon Bedrock

This post was originally published on this site

Today, we’re introducing two new Amazon Titan multimodal foundation models (FMs): Amazon Titan Image Generator (preview) and Amazon Titan Multimodal Embeddings. I’m also happy to share that Amazon Titan Text Lite and Amazon Titan Text Express are now generally available in Amazon Bedrock. You can now choose from three available Amazon Titan Text FMs, including Amazon Titan Text Embeddings.

Amazon Titan models incorporate 25 years of artificial intelligence (AI) and machine learning (ML) innovation at Amazon and offer a range of high-performing image, multimodal, and text model options through a fully managed API. AWS pre-trained these models on large datasets, making them powerful, general-purpose models built to support a variety of use cases while also supporting the responsible use of AI.

You can use the base models as is, or you can privately customize them with your own data. To enable access to Amazon Titan FMs, navigate to the Amazon Bedrock console and select Model access on the bottom left menu. On the model access overview page, choose Manage model access and enable access to the Amazon Titan FMs.

Amazon Titan Models

Let me give you a quick tour of the new models.

Amazon Titan Image Generator (preview)
As a content creator, you can now use Amazon Titan Image Generator to quickly create and refine images using English natural language prompts. This helps companies in advertising, e-commerce, and media and entertainment to create studio-quality, realistic images in large volumes and at low cost. The model makes it easy to iterate on image concepts by generating multiple image options based on the text descriptions. The model can understand complex prompts with multiple objects and generates relevant images. It is trained on high-quality, diverse data to create more accurate outputs, such as realistic images with inclusive attributes and limited distortions.

Titan Image Generator’s image editing features include the ability to automatically edit an image with a text prompt using a built-in segmentation model. The model supports inpainting with an image mask and outpainting to extend or change the background of an image. You can also configure image dimensions and specify the number of image variations you want the model to generate.

In addition, you can customize the model with proprietary data to generate images consistent with your brand guidelines or to generate images in a specific style, for example, by fine-tuning the model with images from a previous marketing campaign. Titan Image Generator also mitigates harmful content generation to support the responsible use of AI. All images generated by Amazon Titan contain an invisible watermark, by default, designed to help reduce the spread of misinformation by providing a discreet mechanism to identify AI-generated images.

Amazon Titan Image Generator in action
You can start using the model in the Amazon Bedrock console by submitting either an English natural language prompt to generate images or by uploading an image for editing. In the following example, I show you how to generate an image with Amazon Titan Image Generator using the AWS SDK for Python (Boto3).

First, let’s have a look at the configuration options for image generation that you can specify in the body of the inference request. For task type, I choose TEXT_IMAGE to create an image from a natural language prompt.

import boto3
import json

bedrock = boto3.client(service_name="bedrock")
bedrock_runtime = boto3.client(service_name="bedrock-runtime")

# ImageGenerationConfig Options:
#   numberOfImages: Number of images to be generated
#   quality: Quality of generated images, can be standard or premium
#   height: Height of output image(s)
#   width: Width of output image(s)
#   cfgScale: Scale for classifier-free guidance
#   seed: The seed to use for reproducibility  

body = json.dumps(
    {
        "taskType": "TEXT_IMAGE",
        "textToImageParams": {
            "text": "green iguana",   # Required
#           "negativeText": "<text>"  # Optional
        },
        "imageGenerationConfig": {
            "numberOfImages": 1,   # Range: 1 to 5 
            "quality": "premium",  # Options: standard or premium
            "height": 768,         # Supported height list in the docs 
            "width": 1280,         # Supported width list in the docs
            "cfgScale": 7.5,       # Range: 1.0 (exclusive) to 10.0
            "seed": 42             # Range: 0 to 214783647
        }
    }
)

Next, specify the model ID for Amazon Titan Image Generator and use the InvokeModel API to send the inference request.

response = bedrock_runtime.invoke_model(
    body=body, 
    modelId="amazon.titan-image-generator-v1" 
    accept="application/json", 
    contentType="application/json"
)

Then, parse the response and decode the base64-encoded image.

import base64
from PIL import Image
from io import BytesIO

response_body = json.loads(response.get("body").read())
images = [Image.open(BytesIO(base64.b64decode(base64_image))) for base64_image in response_body.get("images")]

for img in images:
    display(img)

Et voilà, here’s the green iguana (one of my favorite animals, actually):

Green iguana generated by Amazon Titan Image Generator

To learn more about all the Amazon Titan Image Generator features, visit the Amazon Titan product page. (You’ll see more of the iguana over there.)

Next, let’s use this image with the new Amazon Titan Multimodal Embeddings model.

Amazon Titan Multimodal Embeddings
Amazon Titan Multimodal Embeddings helps you build more accurate and contextually relevant multimodal search and recommendation experiences for end users. Multimodal refers to a system’s ability to process and generate information using distinct types of data (modalities). With Titan Multimodal Embeddings, you can submit text, image, or a combination of the two as input.

The model converts images and short English text up to 128 tokens into embeddings, which capture semantic meaning and relationships between your data. You can also fine-tune the model on image-caption pairs. For example, you can combine text and images to describe company-specific manufacturing parts to understand and identify parts more effectively.

By default, Titan Multimodal Embeddings generates vectors of 1,024 dimensions, which you can use to build search experiences that offer a high degree of accuracy and speed. You can also configure smaller vector dimensions to optimize for speed and price performance. The model provides an asynchronous batch API, and the Amazon OpenSearch Service will soon offer a connector that adds Titan Multimodal Embeddings support for neural search.

Amazon Titan Multimodal Embeddings in action
For this demo, I create a combined image and text embedding. First, I base64-encode my image, and then I specify either inputText, inputImage, or both in the body of the inference request.

# Maximum image size supported is 2048 x 2048 pixels
with open("iguana.png", "rb") as image_file:
    input_image = base64.b64encode(image_file.read()).decode('utf8')

# You can specify either text or image or both
body = json.dumps(
    {
        "inputText": "Green iguana on tree branch",
        "inputImage": input_image
    }
)

Next, specify the model ID for Amazon Titan Multimodal Embeddings and use the InvokeModel API to send the inference request.

response = bedrock_runtime.invoke_model(
	body=body, 
	modelId="amazon.titan-embed-image-v1", 
	accept="application/json", 
	contentType="application/json"
)

Let’s see the response.

response_body = json.loads(response.get("body").read())
print(response_body.get("embedding"))
	
[-0.015633494, -0.011953583, -0.022617092, -0.012395329, 0.03954641, 0.010079376, 0.08505301, -0.022064181, -0.0037248489, ...]

I redacted the output for brevity. The distance between multimodal embedding vectors, measured with metrics like cosine similarity or euclidean distance, shows how similar or different the represented information is across modalities. Smaller distances mean more similarity, while larger distances mean more dissimilarity.

As a next step, you could build an image database by storing and indexing the multimodal embeddings in a vector store or vector database. To implement text-to-image search, query the database with inputText. For image-to-image search, query the database with inputImage. For image+text-to-image search, query the database with both inputImage and inputText.

Amazon Titan Text
Amazon Titan Text Lite and Amazon Titan Text Express are large language models (LLMs) that support a wide range of text-related tasks, including summarization, translation, and conversational chatbot systems. They can also generate code and are optimized to support popular programming languages and text formats like JSON and CSV.

Titan Text Express – Titan Text Express has a maximum context length of 8,192 tokens and is ideal for a wide range of tasks, such as open-ended text generation and conversational chat, and support within Retrieval Augmented Generation (RAG) workflows.

Titan Text Lite – Titan Text Lite has a maximum context length of 4,096 tokens and is a price-performant version that is ideal for English-language tasks. The model is highly customizable and can be fine-tuned for tasks such as article summarization and copywriting.

Amazon Titan Text in action
For this demo, I ask Titan Text to write an email to my team members suggesting they organize a live stream: “Compose a short email from Antje, Principal Developer Advocate, encouraging colleagues in the developer relations team to organize a live stream to demo our new Amazon Titan V1 models.”

body = json.dumps({
    "inputText": prompt, 
    "textGenerationConfig":{  
        "maxTokenCount":512,
        "stopSequences":[],
        "temperature":0,
        "topP":0.9
    }
})

Titan Text FMs support temperature and topP inference parameters to control the randomness and diversity of the response, as well as maxTokenCount and stopSequences to control the length of the response.

Next, choose the model ID for one of the Titan Text models and use the InvokeModel API to send the inference request.

response = bedrock_runtime.invoke_model(
    body=body,
	# Choose modelID
	# Titan Text Express: "amazon.titan-text-express-v1"
	# Titan Text Lite: "amazon.titan-text-lite-v1"
	modelID="amazon.titan-text-express-v1"
    accept="application/json", 
    contentType="application/json"
)

Let’s have a look at the response.

response_body = json.loads(response.get('body').read())
outputText = response_body.get('results')[0].get('outputText')

text = outputText[outputText.index('n')+1:]
email = text.strip()
print(email)

Subject: Demo our new Amazon Titan V1 models live!

Dear colleagues,

I hope this email finds you well. I am excited to announce that we have recently launched our new Amazon Titan V1 models, and I believe it would be a great opportunity for us to showcase their capabilities to the wider developer community.

I suggest that we organize a live stream to demo these models and discuss their features, benefits, and how they can help developers build innovative applications. This live stream could be hosted on our YouTube channel, Twitch, or any other platform that is suitable for our audience.

I believe that showcasing our new models will not only increase our visibility but also help us build stronger relationships with developers. It will also provide an opportunity for us to receive feedback and improve our products based on the developer’s needs.

If you are interested in organizing this live stream, please let me know. I am happy to provide any support or guidance you may need. Together, let’s make this live stream a success and showcase the power of Amazon Titan V1 models to the world!

Best regards,
Antje
Principal Developer Advocate

Nice. I could send this email right away!

Availability and pricing
Amazon Titan Text FMs are available today in AWS Regions US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore, Tokyo), and Europe (Frankfurt). Amazon Titan Multimodal Embeddings is available today in the AWS Regions US East (N. Virginia) and US West (Oregon). Amazon Titan Image Generator is available in public preview in the AWS Regions US East (N. Virginia) and US West (Oregon). For pricing details, see the Amazon Bedrock Pricing page.

Learn more

Go to the AWS Management Console to start building generative AI applications with Amazon Titan FMs on Amazon Bedrock today!

— Antje

Amazon Bedrock now provides access to Anthropic’s latest model, Claude 2.1

This post was originally published on this site

Today, we’re announcing the availability of Anthropic’s Claude 2.1 foundation model (FM) in Amazon Bedrock. Last week, Anthropic introduced its latest model, Claude 2.1, delivering key capabilities for enterprises such as an industry-leading 200,000 token context window (2x the context of Claude 2.0), reduced rates of hallucination, improved accuracy over long documents, system prompts, and a beta tool use feature for function calling and workflow orchestration.

With Claude 2.1’s availability in Amazon Bedrock, you can build enterprise-ready generative artificial intelligence (AI) applications using more honest and reliable AI systems from Anthropic. You can now use the Claude 2.1 model provided by Anthropic in the Amazon Bedrock console.

Here are some key highlights about the new Claude 2.1 model in Amazon Bedrock:

200,000 token context window – Enterprise applications demand larger context windows and more accurate outputs when working with long documents such as product guides, technical documentation, or financial or legal statements. Claude 2.1 supports 200,000 tokens, the equivalent of roughly 150,000 words or over 500 pages of documents. When uploading extensive information to Claude, you can summarize, perform Q&A, forecast trends, and compare and contrast multiple documents for drafting business plans and analyzing complex contracts.

Strong accuracy upgrades – Claude 2.1 has also made significant gains in honesty, with a 2x decrease in hallucination rates, 50 percent fewer hallucinations in open-ended conversation and document Q&A, a 30 percent reduction in incorrect answers, and a 3–4 times lower rate of mistakenly concluding that a document supports a particular claim compared to Claude 2.0. Claude increasingly knows what it doesn’t know and will more likely demur rather than hallucinate. With this improved accuracy, you can build more reliable, mission-critical applications for your customers and employees.

System prompts – Claude 2.1 now supports system prompts, a new feature that can improve Claude’s performance in a variety of ways, including greater character depth and role adherence in role-playing scenarios, particularly over longer conversations, as well as stricter adherence to guidelines, rules, and instructions. This represents a structural change, but not a content change from former ways of prompting Claude.

Tool use for function calling and workflow orchestration – Available as a beta feature, Claude 2.1 can now integrate with your existing internal processes, products, and APIs to build generative AI applications. Claude 2.1 accurately retrieves and processes data from additional knowledge sources as well as invokes functions for a given task.  Claude 2.1 can answer questions by searching databases using private APIs and a web search API, translate natural language requests into structured API calls, or connect to product datasets to make recommendations and help customers complete purchases. Access to this feature is currently limited to select early access partners, with plans for open access in the near future. If you are interested in gaining early access, please contact your AWS account team.

To learn more about Claude 2.1’s features and capabilities, visit Anthropic Claude on Amazon Bedrock and the Amazon Bedrock documentation.

Claude 2.1 in action
To get started with Claude 2.1 in Amazon Bedrock, go to the Amazon Bedrock console. Choose Model access on the bottom left pane, then choose Manage model access on the top right side, submit your use case, and request model access to the Anthropic Claude model. It may take several minutes to get access to models. If you already have access to the Claude model, you don’t need to request access separately for Claude 2.1.

To test Claude 2.1 in chat mode, choose Text or Chat under Playgrounds in the left menu pane. Then select Anthropic and then Claude v2.1.

By choosing View API request, you can also access the model via code examples in the AWS Command Line Interface (AWS CLI) and AWS SDKs. Here is a sample of the AWS CLI command:

$ aws bedrock-runtime invoke-model 
      --model-id anthropic.claude-v2:1 
      --body "{"prompt":"Human: nnHuman: Tell me funny joke about outer space!nnAssistant:", "max_tokens_to_sample": 50}' 
      --cli-binary-format raw-in-base64-out 
      invoke-model-output.txt

You can use system prompt engineering techniques provided by the Claude 2.1 model, where you place your inputs and documents before any questions that reference or utilize that content. Inputs can be natural language text, structured documents, or code snippets using <document>, <papers>, <books>, or <code> tags, and so on. You can also use conversational text, such as chat history, and Retrieval Augmented Generation (RAG) results, such as chunked documents.

Here is a system prompt example for support agents to respond to customer questions based on corporate documents.

Here are some documents for you to reference for your task:
<documents>
 <document index="1">
  <document_content>
  (the text content of the document - could be a passage, web page, article, etc)
   </document_content>
<document index="2">
  <source>https://mycompany.repository/userguide/what-is-it.html</source>
</document>
<document index="3">
  <source>https://mycompany.repository/docs/techspec.pdf</source>
 </document>
...
</documents>

You are Larry, and you are a customer advisor with deep knowledge of your company's products. Larry has a great deal of patience with his customers, even when they say nonsense or are sarcastic. Larry's answers are polite but sometimes funny. However, he only answers questions about the company's products and doesn't know much about other questions. Use the provided documentation to answer user questions.

Human: Your product is making a weird stuttering sound when I operate. What might be the problem?

To learn more about prompt engineering on Amazon Bedrock, see the Prompt engineering guidelines included in the Amazon Bedrock documentation. You can learn general prompt techniques, templates, and examples for Amazon Bedrock text models, including Claude.

Now available
Claude 2.1 is available today in the US East (N. Virginia) and US West (Oregon) Regions.

You only pay for what you use, with no time-based term commitments for on-demand mode. For text generation models, you are charged for every input token processed and every output token generated. Or you can choose the provisioned throughput mode to meet your application’s performance requirements in exchange for a time-based term commitment. To learn more, see Amazon Bedrock Pricing.

Give Anthropic Claude 2.1 a try in Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Channy

New generative AI capabilities for Amazon DataZone to further simplify data cataloging and discovery (preview)

This post was originally published on this site

Today, we are announcing a preview of an automation feature backed by generative artificial intelligence (AI) for Amazon DataZone that will dramatically decrease the amount of time needed to provide context for organizational data. The new feature can automate the traditionally labor-intensive process of data cataloging. Powered by the large language models (LLMs) of Amazon Bedrock, it generates detailed descriptions of data assets and their schemas, and suggests analytical use cases. You can generate a comprehensive business context with a single click.

We heard from customers that data consumers such as data analysts, scientists, and engineers in organizations struggle to understand the data’s relevance with little metadata. As a result, they either spend more time interpreting the data, or they return to data producers with continued questions. So, data producers such as data owners, engineers, and analysts who own the data and make it available for consumers need to manually enter detailed context for higher-priority data to make data shareable and discoverable. This is time-consuming and the number one problem customers have when trying to collate their data in a system for self-service by consumers.

When we launched the general availability of Amazon DataZone in October 2023, we introduced the first feature that brings generative AI capabilities to automate the generation of the table name and column names of a business catalog asset. In the data portal of Amazon DataZone, the green brain icon indicates automatically generated metadata suggestions. You could accept, edit, or reject each suggestion recommended by Amazon DataZone.

What’s new with today’s preview announcement?
Now, in addition to column and table names, you can automatically generate more detailed descriptions of the table and schema, as well as suggested uses.

In the Business Metadata tab in the data portal, when you choose Generate summary, new content will be generated to explain the table and its metadata.

You can also accept, edit, and reject this recommendation.

When you choose the Schema tab, you can also see new Description recommendations as well as the Name. You can review generated metadata and choose to accept, edit, or reject the recommendation.

This new feature will enhance data discoverability and reduce on back-and-forth communications between data consumers and producers. You will have a richer search experience based on extensive data insights in the future.

Join the preview
The new metadata generation ability is now previewed in the AWS US East (N. Virginia) and US West (Oregon) Regions. With this new generative AI capability, you can reduce time-to-insight by accelerating data cataloging and boosting data discovery. To learn more, visit the Amazon DataZone: Automate Data Discovery.

Give it a try and send feedback to AWS re:Post for Amazon DataZone or through your usual AWS Support contacts.

Channy