Tag Archives: AWS

Vector search for Amazon DocumentDB (with MongoDB compatibility) is now generally available

This post was originally published on this site

Today, we are announcing the general availability of vector search for Amazon DocumentDB (with MongoDB compatibility), a new built-in capability that lets you store, index, and search millions of vectors with millisecond response times within your document database.

Vector search is an emerging technique used in machine learning (ML) to find similar data points to given data by comparing their vector representations using distance or similarity metrics. Vectors are numerical representation of unstructured data created from large language models (LLM) hosted in Amazon Bedrock, Amazon SageMaker, and other open source or proprietary ML services. This approach is useful in creating generative artificial intelligence (AI) applications, such as intuitive search, product recommendation, personalization, and chatbots using Retrieval Augmented Generation (RAG) model approach. For example, if your data set contained individual documents for movies, you could semantically search for movies similar to Titanic based on shared context such as “boats”, “tragedy”, or “movies based on true stories” instead of simply matching keywords.

With vector search for Amazon DocumentDB, you can effectively search the database based on nuanced meaning and context without spending time and cost to manage a separate vector database infrastructure. You also benefit from the fully managed, scalable, secure, and highly available JSON-based document database that Amazon DocumentDB provides.

Getting started with vector search on Amazon DocumentDB
The vector search feature is available on your Amazon DocumentDB 5.0 instance-based clusters. To implement a vector search application, you generate vectors using embedding models for fields inside your document and store vectors side by side your source data inside Amazon DocumentDB.

Next, you create a vector index on a vector field that will help retrieve similar vectors and can search the Amazon DocumentDB database using semantic search. Finally, user-submitted queries are converted to vectors using the same embedding model to get semantically similar documents and return them to the client.

Let’s look at how to implement a simple semantic search application using vector search on Amazon DocumentDB.

Step 1. Create vector embeddings using the Amazon Titan Embeddings model
Let’s use the Amazon Titan Embeddings model to create an embedding vector. Amazon Titan Embeddings model is available in Amazon Bedrock, a serverless generative AI service. You can easily access it using a single API and without managing any infrastructure.

prompt = "I love dog and cat."
response = bedrock_runtime.invoke_model(
    body= json.dumps({"inputText": prompt}), 
    modelId='amazon.titan-embed-text-v1', 
    accept='application/json', 
    contentType='application/json'
)
response_body = json.loads(response['body'].read())
embedding = response_body.get('embedding')

The returned vector embedding will look similar to this:

[0.82421875, -0.6953125, -0.115722656, 0.87890625, 0.05883789, -0.020385742, 0.32421875, -0.00078201294, -0.40234375, 0.44140625, ...]

Step 2. Insert vector embeddings and create a vector index
You can add generated vector embeddings using the insertMany( [{},...,{}] ) operation with a list of the documents that you want added to your collection in Amazon DocumentDB.

db.collection.insertMany([
    {sentence: "I love a dog and cat.", vectorField: [0.82421875, -0.6953125,...]},
    {sentence: "My dog is very cute.", vectorField: [0.05883789, -0.020385742,...]},
    {sentence: "I write with a pen.", vectorField: [-0.020385742, 0.32421875,...]},
  ...
]);

You can create a vector index using the createIndex command. Amazon DocumentDB performs an approximate nearest neighbor (ANN) search using the inverted file with flat compression (IVFFLAT) vector index. The feature supports three distance metrics: euclidean, cosine, and inner product. We will use the euclidean distance, a measure of the straight-line distance between two points in space. The smaller the euclidean distance, the closer the vectors are to each other.

db.collection.createIndex (
   { vectorField: "vector" },
   { "name": "index name",
     "vectorOptions": {
        "dimensions": 100, // the number of vector data dimensions
        "similarity": "euclidean", // Or cosine and dotProduct
        "lists": 100 
      }
   }
);

Step 3.  Search vector embeddings from Amazon DocumentDB
You can now search for similar vectors within your documents using a new aggregation pipeline operator within $search. The example code to search “I like pets” is as follows:

db.collection.aggregate ({
  $search: {
    "vectorSearch": {
      "vector": [0.82421875, -0.6953125,...], // Search for ‘I like pets’
      "path": vectorField,
      "k": 5,
      "similarity": "euclidean", // Or cosine and dotProduct
      "probes": 1 // the number of clusters for vector search
      }
     }
   });

This returns search results such as “I love a dog and cat.” which is semantically similar.

To learn more, see Amazon DocumentDB documentation. To see a more practical example—a semantic movie search with Amazon DocumentDB—find the Python source codes and data-sets in the GitHub repository.

Now available
Vector search for Amazon DocumentDB is now available at no additional cost to all customers using Amazon DocumentDB 5.0 instance-based clusters in all AWS Regions where Amazon DocumentDB is available. Standard compute, I/O, storage, and backup charges will apply as you store, index, and search vector embeddings on Amazon DocumentDB.

To learn more, see the Amazon DocumentDB documentation and send feedback to AWS re:Post for Amazon DocumentDB or through your usual AWS Support contacts.

Channy

Vector engine for Amazon OpenSearch Serverless is now available

This post was originally published on this site

Today we are announcing the general availability of the vector engine for Amazon OpenSearch Serverless with new features. In July 2023, we introduced the preview release of the vector engine for Amazon OpenSearch Serverless, a simple, scalable, and high-performing similarity search capability. The vector engine makes it easy for you to build modern machine learning (ML) augmented search experiences and generative artificial intelligence (generative AI) applications without needing to manage the underlying vector database infrastructure.

You can now store, update, and search billions of vector embeddings with thousands of dimensions in milliseconds. The highly performant similarity search capability of vector engine enables generative AI-powered applications to deliver accurate and reliable results with consistent milliseconds-scale response times.

The vector engine also enables you to optimize and tune results with hybrid search by combining vector search and full-text search in the same query, removing the need to manage and maintain separate data stores or a complex application stack. The vector engine provides a secure, reliable, scalable, and enterprise-ready platform to cost effectively build a prototyping application and then seamlessly scale to production.

You can now get started in minutes with the vector engine by creating a specialized vector engine–based collection, which is a logical grouping of embeddings that works together to support a workload.

The vector engine uses OpenSearch Compute Units (OCUs), compute capacity unit, to ingest and run similarity search queries. One OCU can handle up to 2 million vectors for 128 dimensions or 500,000 for 768 dimensions at 99 percent recall rate.

The vector engine built on OpenSearch Serverless is a highly available service by default. It requires a minimum of four OCUs (2 OCUs for the ingest, including primary and standby, and 2 OCUs for the search with two active replicas across Availability Zones) for the first collection in an account. All subsequent collections using the same AWS Key Management Service (AWS KMS) key can share those OCUs.

What’s new at GA?
Since the preview, the vector engine for Amazon OpenSearch Serverless became one of the vector database options in the knowledge base of Amazon Bedrock to build generative AI applications using a Retrieval Augmented Generation (RAG) concept.

Here are some new or improved features for this GA release:

Disable redundant replica (development and test focused) option
As we announced in our preview blog post, this feature eliminates the need to have redundant OCUs in another Availability Zone solely for availability purposes. A collection can be deployed with two OCUs – one for indexing and one for search. This cuts the costs in half compared to default deployment with redundant replicas. The reduced cost makes this configuration suitable and economical for development and testing workloads.

With this option, we will still provide durability guarantees since the vector engine persists all the data in Amazon S3, but single-AZ failures would impact your availability.

If you want to disable a redundant replica, uncheck Enable redundancy when creating a new vector search
collection.

Fractional OCU for the development and test focused option
Support for fractional OCU billing for development and test focused workloads (that is, no redundant replica option) reduces the floor price for vector search collection. The vector engine will initially deploy smaller 0.5 OCUs while providing the same capabilities at lower scale and will scale up to a full OCU and beyond to meet your workload demand. This option will further reduce the monthly costs when experimenting with using the vector engine.

Automatic scaling for a billion scale
With vector engine’s seamless auto-scaling, you no longer have to reindex for scaling purposes. At preview, we were supporting about 20 million vector embeddings. With the general availability of vector engine, we have raised the limits to support a billion vector scale.

Now available
The vector engine for Amazon OpenSearch Serverless is now available in all AWS Regions where Amazon OpenSearch Serverless is available.

To get started, you can refer to the following resources:

Give it a try and send feedback to AWS re:Post for Amazon OpenSearch Service or through your usual AWS support contacts.

Channy

Introducing Amazon SageMaker HyperPod, a purpose-built infrastructure for distributed training at scale

This post was originally published on this site

Today, we are introducing Amazon SageMaker HyperPod, which helps reducing time to train foundation models (FMs) by providing a purpose-built infrastructure for distributed training at scale. You can now use SageMaker HyperPod to train FMs for weeks or even months while SageMaker actively monitors the cluster health and provides automated node and job resiliency by replacing faulty nodes and resuming model training from a checkpoint.

The clusters come preconfigured with SageMaker’s distributed training libraries that help you split your training data and model across all the nodes to process them in parallel and fully utilize the cluster’s compute and network infrastructure. You can further customize your training environment by installing additional frameworks, debugging tools, and optimization libraries.

Let me show you how to get started with SageMaker HyperPod. In the following demo, I create a SageMaker HyperPod and show you how to train a Llama 2 7B model using the example shared in the AWS ML Training Reference Architectures GitHub repository.

Create and manage clusters
As the SageMaker HyperPod admin, you can create and manage clusters using the AWS Management Console or AWS Command Line Interface (AWS CLI). In the console, navigate to Amazon SageMaker, select Cluster management under HyperPod Clusters in the left menu, then choose Create a cluster.

Amazon SageMaker HyperPod Clusters

In the setup that follows, provide a cluster name and configure instance groups with your instance types of choice and the number of instances to allocate to each instance group.

Amazon SageMaker HyperPod

You also need to prepare and upload one or more lifecycle scripts to your Amazon Simple Storage Service (Amazon S3) bucket to run in each instance group during cluster creation. With lifecycle scripts, you can customize your cluster environment and install required libraries and packages. You can find example lifecycle scripts for SageMaker HyperPod in the GitHub repo.

Using the AWS CLI
You can also use the AWS CLI to create and manage clusters. For my demo, I specify my cluster configuration in a JSON file. I choose to create two instance groups, one for the cluster controller node(s) that I call “controller-group,” and one for the cluster worker nodes that I call “worker-group.” For the worker nodes that will perform model training, I specify Amazon EC2 Trn1 instances powered by AWS Trainium chips.

// demo-cluster.json
{
   "InstanceGroups": [
        {
            "InstanceGroupName": "controller-group",
            "InstanceType": "ml.m5.xlarge",
            "InstanceCount": 1,
            "lifecycleConfig": {
                "SourceS3Uri": "s3://<your-s3-bucket>/<lifecycle-script-directory>/",
                "OnCreate": "on_create.sh"
            },
            "ExecutionRole": "arn:aws:iam::111122223333:role/my-role-for-cluster",
            "ThreadsPerCore": 1
        },
        {
            "InstanceGroupName": "worker-group",
            "InstanceType": "trn1.32xlarge",
            "InstanceCount": 4,
            "lifecycleConfig": {
                "SourceS3Uri": "s3://<your-s3-bucket>/<lifecycle-script-directory>/",
                "OnCreate": "on_create.sh"
            },
            "ExecutionRole": "arn:aws:iam::111122223333:role/my-role-for-cluster",
            "ThreadsPerCore": 1
        }
    ]
}

To create the cluster, I run the following AWS CLI command:

aws sagemaker create-cluster 
--cluster-name antje-demo-cluster 
--instance-groups file://demo-cluster.json

Upon creation, you can use aws sagemaker describe-cluster and aws sagemaker list-cluster-nodes to view your cluster and node details. Note down the cluster ID and instance ID of your controller node. You need that information to connect to your cluster.

You also have the option to attach a shared file system, such as Amazon FSx for Lustre. To use FSx for Lustre, you need to set up your cluster with an Amazon Virtual Private Cloud (Amazon VPC) configuration. Here’s an AWS CloudFormation template that shows how to create a SageMaker VPC and how to deploy FSx for Lustre.

Connect to your cluster
As a cluster user, you need to have access to the cluster provisioned by your cluster admin. With access permissions in place, you can connect to the cluster using SSH to schedule and run jobs. You can use the preinstalled AWS CLI plugin for AWS Systems Manager to connect to the controller node of your cluster.

For my demo, I run the following command specifying my cluster ID and instance ID of the control node as the target.

aws ssm start-session 
--target sagemaker-cluster:ntg44z9os8pn_i-05a854e0d4358b59c 
--region us-west-2

Schedule and run jobs on the cluster using Slurm
At launch, SageMaker HyperPod supports Slurm for workload orchestration. Slurm is a popular an open source cluster management and job scheduling system. You can install and set up Slurm through lifecycle scripts as part of the cluster creation. The example lifecycle scripts show how. Then, you can use the standard Slurm commands to schedule and launch jobs. Check out the Slurm Quick Start User Guide for architecture details and helpful commands.

For this demo, I’m using this example from the AWS ML Training Reference Architectures GitHub repo that shows how to train Llama 2 7B on Slurm with Trn1 instances. My cluster is already setup with Slurm, and I have an FSx for Lustre filesystem mounted.

Note
The Llama 2 model is governed by Meta. You can request access through the Meta request access page.

Set up the cluster environment
SageMaker HyperPod supports training in a range of environments, including Conda, venv, Docker, and enroot. Following the instructions in the README, I build my virtual environment aws_neuron_venv_pytorch and set up the torch_neuronx and neuronx-nemo-megatron libraries for training models on Trn1 instances.

Prepare model, tokenizer, and dataset
I follow the instructions to download the Llama 2 model and tokenizer and convert the model into the Hugging Face format. Then, I download and tokenize the RedPajama dataset. As a final preparation step, I pre-compile the Llama 2 model using ahead-of-time (AOT) compilation to speed up model training.

Launch jobs on the cluster
Now, I’m ready to start my model training job using the sbatch command.

sbatch --nodes 4 --auto-resume=1 run.slurm ./llama_7b.sh

You can use the squeue command to view the job queue. Once the training job is running, the SageMaker HyperPod resiliency features are automatically enabled. SageMaker HyperPod will automatically detect hardware failures, replace nodes as needed, and resume training from checkpoints if the auto-resume parameter is set, as shown in the preceding command.

You can view the output of the model training job in the following file:

tail -f slurm-run.slurm-<JOB_ID>.out

A sample output indicating that model training has started will look like this:

Epoch 0:  22%|██▏       | 4499/20101 [22:26:14<77:48:37, 17.95s/it, loss=2.43, v_num=5563, reduced_train_loss=2.470, gradient_norm=0.121, parameter_norm=1864.0, global_step=4512.0, consumed_samples=1.16e+6, iteration_time=16.40]
Epoch 0:  22%|██▏       | 4500/20101 [22:26:32<77:48:18, 17.95s/it, loss=2.43, v_num=5563, reduced_train_loss=2.470, gradient_norm=0.121, parameter_norm=1864.0, global_step=4512.0, consumed_samples=1.16e+6, iteration_time=16.40]
Epoch 0:  22%|██▏       | 4500/20101 [22:26:32<77:48:18, 17.95s/it, loss=2.44, v_num=5563, reduced_train_loss=2.450, gradient_norm=0.120, parameter_norm=1864.0, global_step=4512.0, consumed_samples=1.16e+6, iteration_time=16.50]

To further monitor and profile your model training jobs, you can use SageMaker hosted TensorBoard or any other tool of your choice.

Now available
SageMaker HyperPod is available today in AWS Regions US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

Learn more:

  • See Amazon SageMaker HyperPod for pricing information and a list of supported cluster instance types
  • Check out the Developer Guide
  • Visit the AWS Management Console to start training your FMs with SageMaker HyperPod

— Antje

PS: Writing a blog post at AWS is always a team effort, even when you see only one name under the post title. In this case, I want to thank Brad Doran, Justin Pirtle, Ben Snyder, Pierre-Yves Aquilanti, Keita Watanabe, and Verdi March for their generous help with example code and sharing their expertise in managing large-scale model training infrastructures, Slurm, and SageMaker HyperPod.

Amazon Titan Image Generator, Multimodal Embeddings, and Text models are now available in Amazon Bedrock

This post was originally published on this site

Today, we’re introducing two new Amazon Titan multimodal foundation models (FMs): Amazon Titan Image Generator (preview) and Amazon Titan Multimodal Embeddings. I’m also happy to share that Amazon Titan Text Lite and Amazon Titan Text Express are now generally available in Amazon Bedrock. You can now choose from three available Amazon Titan Text FMs, including Amazon Titan Text Embeddings.

Amazon Titan models incorporate 25 years of artificial intelligence (AI) and machine learning (ML) innovation at Amazon and offer a range of high-performing image, multimodal, and text model options through a fully managed API. AWS pre-trained these models on large datasets, making them powerful, general-purpose models built to support a variety of use cases while also supporting the responsible use of AI.

You can use the base models as is, or you can privately customize them with your own data. To enable access to Amazon Titan FMs, navigate to the Amazon Bedrock console and select Model access on the bottom left menu. On the model access overview page, choose Manage model access and enable access to the Amazon Titan FMs.

Amazon Titan Models

Let me give you a quick tour of the new models.

Amazon Titan Image Generator (preview)
As a content creator, you can now use Amazon Titan Image Generator to quickly create and refine images using English natural language prompts. This helps companies in advertising, e-commerce, and media and entertainment to create studio-quality, realistic images in large volumes and at low cost. The model makes it easy to iterate on image concepts by generating multiple image options based on the text descriptions. The model can understand complex prompts with multiple objects and generates relevant images. It is trained on high-quality, diverse data to create more accurate outputs, such as realistic images with inclusive attributes and limited distortions.

Titan Image Generator’s image editing features include the ability to automatically edit an image with a text prompt using a built-in segmentation model. The model supports inpainting with an image mask and outpainting to extend or change the background of an image. You can also configure image dimensions and specify the number of image variations you want the model to generate.

In addition, you can customize the model with proprietary data to generate images consistent with your brand guidelines or to generate images in a specific style, for example, by fine-tuning the model with images from a previous marketing campaign. Titan Image Generator also mitigates harmful content generation to support the responsible use of AI. All images generated by Amazon Titan contain an invisible watermark, by default, designed to help reduce the spread of misinformation by providing a discreet mechanism to identify AI-generated images.

Amazon Titan Image Generator in action
You can start using the model in the Amazon Bedrock console by submitting either an English natural language prompt to generate images or by uploading an image for editing. In the following example, I show you how to generate an image with Amazon Titan Image Generator using the AWS SDK for Python (Boto3).

First, let’s have a look at the configuration options for image generation that you can specify in the body of the inference request. For task type, I choose TEXT_IMAGE to create an image from a natural language prompt.

import boto3
import json

bedrock = boto3.client(service_name="bedrock")
bedrock_runtime = boto3.client(service_name="bedrock-runtime")

# ImageGenerationConfig Options:
#   numberOfImages: Number of images to be generated
#   quality: Quality of generated images, can be standard or premium
#   height: Height of output image(s)
#   width: Width of output image(s)
#   cfgScale: Scale for classifier-free guidance
#   seed: The seed to use for reproducibility  

body = json.dumps(
    {
        "taskType": "TEXT_IMAGE",
        "textToImageParams": {
            "text": "green iguana",   # Required
#           "negativeText": "<text>"  # Optional
        },
        "imageGenerationConfig": {
            "numberOfImages": 1,   # Range: 1 to 5 
            "quality": "premium",  # Options: standard or premium
            "height": 768,         # Supported height list in the docs 
            "width": 1280,         # Supported width list in the docs
            "cfgScale": 7.5,       # Range: 1.0 (exclusive) to 10.0
            "seed": 42             # Range: 0 to 214783647
        }
    }
)

Next, specify the model ID for Amazon Titan Image Generator and use the InvokeModel API to send the inference request.

response = bedrock_runtime.invoke_model(
    body=body, 
    modelId="amazon.titan-image-generator-v1" 
    accept="application/json", 
    contentType="application/json"
)

Then, parse the response and decode the base64-encoded image.

import base64
from PIL import Image
from io import BytesIO

response_body = json.loads(response.get("body").read())
images = [Image.open(BytesIO(base64.b64decode(base64_image))) for base64_image in response_body.get("images")]

for img in images:
    display(img)

Et voilà, here’s the green iguana (one of my favorite animals, actually):

Green iguana generated by Amazon Titan Image Generator

To learn more about all the Amazon Titan Image Generator features, visit the Amazon Titan product page. (You’ll see more of the iguana over there.)

Next, let’s use this image with the new Amazon Titan Multimodal Embeddings model.

Amazon Titan Multimodal Embeddings
Amazon Titan Multimodal Embeddings helps you build more accurate and contextually relevant multimodal search and recommendation experiences for end users. Multimodal refers to a system’s ability to process and generate information using distinct types of data (modalities). With Titan Multimodal Embeddings, you can submit text, image, or a combination of the two as input.

The model converts images and short English text up to 128 tokens into embeddings, which capture semantic meaning and relationships between your data. You can also fine-tune the model on image-caption pairs. For example, you can combine text and images to describe company-specific manufacturing parts to understand and identify parts more effectively.

By default, Titan Multimodal Embeddings generates vectors of 1,024 dimensions, which you can use to build search experiences that offer a high degree of accuracy and speed. You can also configure smaller vector dimensions to optimize for speed and price performance. The model provides an asynchronous batch API, and the Amazon OpenSearch Service will soon offer a connector that adds Titan Multimodal Embeddings support for neural search.

Amazon Titan Multimodal Embeddings in action
For this demo, I create a combined image and text embedding. First, I base64-encode my image, and then I specify either inputText, inputImage, or both in the body of the inference request.

# Maximum image size supported is 2048 x 2048 pixels
with open("iguana.png", "rb") as image_file:
    input_image = base64.b64encode(image_file.read()).decode('utf8')

# You can specify either text or image or both
body = json.dumps(
    {
        "inputText": "Green iguana on tree branch",
        "inputImage": input_image
    }
)

Next, specify the model ID for Amazon Titan Multimodal Embeddings and use the InvokeModel API to send the inference request.

response = bedrock_runtime.invoke_model(
	body=body, 
	modelId="amazon.titan-embed-image-v1", 
	accept="application/json", 
	contentType="application/json"
)

Let’s see the response.

response_body = json.loads(response.get("body").read())
print(response_body.get("embedding"))
	
[-0.015633494, -0.011953583, -0.022617092, -0.012395329, 0.03954641, 0.010079376, 0.08505301, -0.022064181, -0.0037248489, ...]

I redacted the output for brevity. The distance between multimodal embedding vectors, measured with metrics like cosine similarity or euclidean distance, shows how similar or different the represented information is across modalities. Smaller distances mean more similarity, while larger distances mean more dissimilarity.

As a next step, you could build an image database by storing and indexing the multimodal embeddings in a vector store or vector database. To implement text-to-image search, query the database with inputText. For image-to-image search, query the database with inputImage. For image+text-to-image search, query the database with both inputImage and inputText.

Amazon Titan Text
Amazon Titan Text Lite and Amazon Titan Text Express are large language models (LLMs) that support a wide range of text-related tasks, including summarization, translation, and conversational chatbot systems. They can also generate code and are optimized to support popular programming languages and text formats like JSON and CSV.

Titan Text Express – Titan Text Express has a maximum context length of 8,192 tokens and is ideal for a wide range of tasks, such as open-ended text generation and conversational chat, and support within Retrieval Augmented Generation (RAG) workflows.

Titan Text Lite – Titan Text Lite has a maximum context length of 4,096 tokens and is a price-performant version that is ideal for English-language tasks. The model is highly customizable and can be fine-tuned for tasks such as article summarization and copywriting.

Amazon Titan Text in action
For this demo, I ask Titan Text to write an email to my team members suggesting they organize a live stream: “Compose a short email from Antje, Principal Developer Advocate, encouraging colleagues in the developer relations team to organize a live stream to demo our new Amazon Titan V1 models.”

body = json.dumps({
    "inputText": prompt, 
    "textGenerationConfig":{  
        "maxTokenCount":512,
        "stopSequences":[],
        "temperature":0,
        "topP":0.9
    }
})

Titan Text FMs support temperature and topP inference parameters to control the randomness and diversity of the response, as well as maxTokenCount and stopSequences to control the length of the response.

Next, choose the model ID for one of the Titan Text models and use the InvokeModel API to send the inference request.

response = bedrock_runtime.invoke_model(
    body=body,
	# Choose modelID
	# Titan Text Express: "amazon.titan-text-express-v1"
	# Titan Text Lite: "amazon.titan-text-lite-v1"
	modelID="amazon.titan-text-express-v1"
    accept="application/json", 
    contentType="application/json"
)

Let’s have a look at the response.

response_body = json.loads(response.get('body').read())
outputText = response_body.get('results')[0].get('outputText')

text = outputText[outputText.index('n')+1:]
email = text.strip()
print(email)

Subject: Demo our new Amazon Titan V1 models live!

Dear colleagues,

I hope this email finds you well. I am excited to announce that we have recently launched our new Amazon Titan V1 models, and I believe it would be a great opportunity for us to showcase their capabilities to the wider developer community.

I suggest that we organize a live stream to demo these models and discuss their features, benefits, and how they can help developers build innovative applications. This live stream could be hosted on our YouTube channel, Twitch, or any other platform that is suitable for our audience.

I believe that showcasing our new models will not only increase our visibility but also help us build stronger relationships with developers. It will also provide an opportunity for us to receive feedback and improve our products based on the developer’s needs.

If you are interested in organizing this live stream, please let me know. I am happy to provide any support or guidance you may need. Together, let’s make this live stream a success and showcase the power of Amazon Titan V1 models to the world!

Best regards,
Antje
Principal Developer Advocate

Nice. I could send this email right away!

Availability and pricing
Amazon Titan Text FMs are available today in AWS Regions US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore, Tokyo), and Europe (Frankfurt). Amazon Titan Multimodal Embeddings is available today in the AWS Regions US East (N. Virginia) and US West (Oregon). Amazon Titan Image Generator is available in public preview in the AWS Regions US East (N. Virginia) and US West (Oregon). For pricing details, see the Amazon Bedrock Pricing page.

Learn more

Go to the AWS Management Console to start building generative AI applications with Amazon Titan FMs on Amazon Bedrock today!

— Antje

Amazon Bedrock now provides access to Anthropic’s latest model, Claude 2.1

This post was originally published on this site

Today, we’re announcing the availability of Anthropic’s Claude 2.1 foundation model (FM) in Amazon Bedrock. Last week, Anthropic introduced its latest model, Claude 2.1, delivering key capabilities for enterprises such as an industry-leading 200,000 token context window (2x the context of Claude 2.0), reduced rates of hallucination, improved accuracy over long documents, system prompts, and a beta tool use feature for function calling and workflow orchestration.

With Claude 2.1’s availability in Amazon Bedrock, you can build enterprise-ready generative artificial intelligence (AI) applications using more honest and reliable AI systems from Anthropic. You can now use the Claude 2.1 model provided by Anthropic in the Amazon Bedrock console.

Here are some key highlights about the new Claude 2.1 model in Amazon Bedrock:

200,000 token context window – Enterprise applications demand larger context windows and more accurate outputs when working with long documents such as product guides, technical documentation, or financial or legal statements. Claude 2.1 supports 200,000 tokens, the equivalent of roughly 150,000 words or over 500 pages of documents. When uploading extensive information to Claude, you can summarize, perform Q&A, forecast trends, and compare and contrast multiple documents for drafting business plans and analyzing complex contracts.

Strong accuracy upgrades – Claude 2.1 has also made significant gains in honesty, with a 2x decrease in hallucination rates, 50 percent fewer hallucinations in open-ended conversation and document Q&A, a 30 percent reduction in incorrect answers, and a 3–4 times lower rate of mistakenly concluding that a document supports a particular claim compared to Claude 2.0. Claude increasingly knows what it doesn’t know and will more likely demur rather than hallucinate. With this improved accuracy, you can build more reliable, mission-critical applications for your customers and employees.

System prompts – Claude 2.1 now supports system prompts, a new feature that can improve Claude’s performance in a variety of ways, including greater character depth and role adherence in role-playing scenarios, particularly over longer conversations, as well as stricter adherence to guidelines, rules, and instructions. This represents a structural change, but not a content change from former ways of prompting Claude.

Tool use for function calling and workflow orchestration – Available as a beta feature, Claude 2.1 can now integrate with your existing internal processes, products, and APIs to build generative AI applications. Claude 2.1 accurately retrieves and processes data from additional knowledge sources as well as invokes functions for a given task.  Claude 2.1 can answer questions by searching databases using private APIs and a web search API, translate natural language requests into structured API calls, or connect to product datasets to make recommendations and help customers complete purchases. Access to this feature is currently limited to select early access partners, with plans for open access in the near future. If you are interested in gaining early access, please contact your AWS account team.

To learn more about Claude 2.1’s features and capabilities, visit Anthropic Claude on Amazon Bedrock and the Amazon Bedrock documentation.

Claude 2.1 in action
To get started with Claude 2.1 in Amazon Bedrock, go to the Amazon Bedrock console. Choose Model access on the bottom left pane, then choose Manage model access on the top right side, submit your use case, and request model access to the Anthropic Claude model. It may take several minutes to get access to models. If you already have access to the Claude model, you don’t need to request access separately for Claude 2.1.

To test Claude 2.1 in chat mode, choose Text or Chat under Playgrounds in the left menu pane. Then select Anthropic and then Claude v2.1.

By choosing View API request, you can also access the model via code examples in the AWS Command Line Interface (AWS CLI) and AWS SDKs. Here is a sample of the AWS CLI command:

$ aws bedrock-runtime invoke-model 
      --model-id anthropic.claude-v2:1 
      --body "{"prompt":"Human: nnHuman: Tell me funny joke about outer space!nnAssistant:", "max_tokens_to_sample": 50}' 
      --cli-binary-format raw-in-base64-out 
      invoke-model-output.txt

You can use system prompt engineering techniques provided by the Claude 2.1 model, where you place your inputs and documents before any questions that reference or utilize that content. Inputs can be natural language text, structured documents, or code snippets using <document>, <papers>, <books>, or <code> tags, and so on. You can also use conversational text, such as chat history, and Retrieval Augmented Generation (RAG) results, such as chunked documents.

Here is a system prompt example for support agents to respond to customer questions based on corporate documents.

Here are some documents for you to reference for your task:
<documents>
 <document index="1">
  <document_content>
  (the text content of the document - could be a passage, web page, article, etc)
   </document_content>
<document index="2">
  <source>https://mycompany.repository/userguide/what-is-it.html</source>
</document>
<document index="3">
  <source>https://mycompany.repository/docs/techspec.pdf</source>
 </document>
...
</documents>

You are Larry, and you are a customer advisor with deep knowledge of your company's products. Larry has a great deal of patience with his customers, even when they say nonsense or are sarcastic. Larry's answers are polite but sometimes funny. However, he only answers questions about the company's products and doesn't know much about other questions. Use the provided documentation to answer user questions.

Human: Your product is making a weird stuttering sound when I operate. What might be the problem?

To learn more about prompt engineering on Amazon Bedrock, see the Prompt engineering guidelines included in the Amazon Bedrock documentation. You can learn general prompt techniques, templates, and examples for Amazon Bedrock text models, including Claude.

Now available
Claude 2.1 is available today in the US East (N. Virginia) and US West (Oregon) Regions.

You only pay for what you use, with no time-based term commitments for on-demand mode. For text generation models, you are charged for every input token processed and every output token generated. Or you can choose the provisioned throughput mode to meet your application’s performance requirements in exchange for a time-based term commitment. To learn more, see Amazon Bedrock Pricing.

Give Anthropic Claude 2.1 a try in Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Channy

New generative AI capabilities for Amazon DataZone to further simplify data cataloging and discovery (preview)

This post was originally published on this site

Today, we are announcing a preview of an automation feature backed by generative artificial intelligence (AI) for Amazon DataZone that will dramatically decrease the amount of time needed to provide context for organizational data. The new feature can automate the traditionally labor-intensive process of data cataloging. Powered by the large language models (LLMs) of Amazon Bedrock, it generates detailed descriptions of data assets and their schemas, and suggests analytical use cases. You can generate a comprehensive business context with a single click.

We heard from customers that data consumers such as data analysts, scientists, and engineers in organizations struggle to understand the data’s relevance with little metadata. As a result, they either spend more time interpreting the data, or they return to data producers with continued questions. So, data producers such as data owners, engineers, and analysts who own the data and make it available for consumers need to manually enter detailed context for higher-priority data to make data shareable and discoverable. This is time-consuming and the number one problem customers have when trying to collate their data in a system for self-service by consumers.

When we launched the general availability of Amazon DataZone in October 2023, we introduced the first feature that brings generative AI capabilities to automate the generation of the table name and column names of a business catalog asset. In the data portal of Amazon DataZone, the green brain icon indicates automatically generated metadata suggestions. You could accept, edit, or reject each suggestion recommended by Amazon DataZone.

What’s new with today’s preview announcement?
Now, in addition to column and table names, you can automatically generate more detailed descriptions of the table and schema, as well as suggested uses.

In the Business Metadata tab in the data portal, when you choose Generate summary, new content will be generated to explain the table and its metadata.

You can also accept, edit, and reject this recommendation.

When you choose the Schema tab, you can also see new Description recommendations as well as the Name. You can review generated metadata and choose to accept, edit, or reject the recommendation.

This new feature will enhance data discoverability and reduce on back-and-forth communications between data consumers and producers. You will have a richer search experience based on extensive data insights in the future.

Join the preview
The new metadata generation ability is now previewed in the AWS US East (N. Virginia) and US West (Oregon) Regions. With this new generative AI capability, you can reduce time-to-insight by accelerating data cataloging and boosting data discovery. To learn more, visit the Amazon DataZone: Automate Data Discovery.

Give it a try and send feedback to AWS re:Post for Amazon DataZone or through your usual AWS Support contacts.

Channy

Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service is now available

This post was originally published on this site

Today, we are announcing the general availability of Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service, which lets you perform a search on your DynamoDB data by automatically replicating and transforming it without custom code or infrastructure. This zero-ETL integration reduces the operational burden and cost involved in writing code for a data pipeline architecture, keeping the data in sync, and updating code with frequent application changes, enabling you to focus on your application.

With this zero-ETL integration, Amazon DynamoDB customers can now use the powerful search features of Amazon OpenSearch Service, such as full-text search, fuzzy search, auto-complete, and vector search for machine learning (ML) capabilities to offer new experiences that boost user engagement and improve satisfaction with their applications.

This zero-ETL integration uses Amazon OpenSearch Ingestion to synchronize the data between Amazon DynamoDB and Amazon OpenSearch Service. You choose the DynamoDB table whose data needs to be synchronized and Amazon OpenSearch Ingestion synchronizes the data to an Amazon OpenSearch managed cluster or serverless collection within seconds of it being available.

You can also specify index mapping templates to ensure that your Amazon DynamoDB fields are mapped to the correct fields in your Amazon OpenSearch Service indexes. Also, you can synchronize data from multiple DynamoDB tables into one Amazon OpenSearch Service managed cluster or serverless collection to offer holistic insights across several applications.

Getting started with this zero-ETL integration
With a few clicks, you can synchronize data from DynamoDB to OpenSearch Service. To create an integration between DynamoDB and OpenSearch Service, choose the Integrations menu in the left pane of the DynamoDB console and the DynamoDB table whose data you want to synchronize.

You must turn on point-in-time recovery (PITR) and the DynamoDB Streams feature. This feature allows you to capture item-level changes in your table and push the changes to a stream. Choose Turn on for PITR and enable DynamoDB Streams in the Exports and streams tab.

After turning on PITR and DynamoDB Stream, choose Create to set up an OpenSearch Ingestion pipeline in your account that replicates the data to an OpenSearch Service managed domain.

In the first step, enter a unique pipeline name and set up pipeline capacity and compute resources to automatically scale your pipeline based on the current ingestion workload.

Now you can configure the pre-defined pipeline configuration in YAML file format. You can browse resources to look up and paste information to build the pipeline configuration. This pipeline is a combination of a source part from DyanmoDB settings and a sink part for OpenSearch Service.

You must set multiple IAM roles (sts_role_arn) with the necessary permissions to read data from the DynamoDB table and write to an OpenSearch domain. This role is then assumed by OpenSearch Ingestion pipelines to ensure that the right security posture is always maintained when moving the data from source to destination. To learn more, see Setting up roles and users in Amazon OpenSearch Ingestion in the AWS documentation.

After entering all required values, you can validate the pipeline configuration to ensure that your configuration is valid. To learn more, see Creating Amazon OpenSearch Ingestion pipelines in the AWS documentation.

Take a few minutes to set up the OpenSearch Ingestion pipeline, and you can see your integration is completed in the DynamoDB table.

Now you can search synchronized items in the OpenSearch Dashboards.

Things to know
Here are a couple of things that you should know about this feature:

  • Custom schema – You can specify your custom data schema along with the index mappings used by OpenSearch Ingestion when writing data from Amazon DynamoDB to OpenSearch Service. This experience is added to the console within Amazon DynamoDB so that you have full control over the format of indices that are created on OpenSearch Service.
  • Pricing – There will be no additional cost to use this feature apart from the cost of the existing underlying components. Note that Amazon OpenSearch Ingestion charges OpenSearch Compute Units (OCUs) which will be used to replicate data between Amazon DynamoDB and Amazon OpenSearch Service. Furthermore, this feature uses Amazon DynamoDB streams for the change data capture (CDC) and you will incur the standard costs for Amazon DynamoDB Streams.
  • Monitoring – You can monitor the state of the pipelines by checking the status of the integration on the DynamoDB console or using the OpenSearch Ingestion dashboard. Additionally, you can use Amazon CloudWatch to provide real-time metrics and logs, which lets you to set up alerts in case of a breach of user-defined thresholds.

Now available
Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service is now generally available in all AWS Regions where OpenSearch Ingestion is available today.

Channy

New generative AI features in Amazon Connect, including Amazon Q, facilitate improved contact center service

This post was originally published on this site

If you manage a contact center, then you know the critical role that agents play in helping your organization build customer trust and loyalty. Those of us who’ve reached out to a contact center know how important agents are with guiding through complex decisions and providing fast and accurate solutions where needed. This can take time, and if not done correctly, then it may lead to frustration.

Generative AI capabilities in Amazon Connect
Today, we’re announcing that the existing artificial intelligence (AI) features of Amazon Connect now have generative AI capabilities that are powered by large language models (LLMs) available through Amazon Bedrock to transform how contact centers provide service to customers. LLMs are pre-trained on vast amounts of data, commonly known as foundation models (FMs), and they can understand and learn, generate text, engage in interactive conversations, answer questions, summarize dialogs and documents, and provide recommendations.

Amazon Q in Connect: recommended responses and actions for faster customer support
Organizations are in a state of constant change. To maintain a high level of performance that keeps up with these organizational changes, contact centers continuously onboard, train, and coach agents. Even with training and coaching, agents must often search through different sources of information, such as product guides and organization policies, to provide exceptional service to customers. This can increase customer wait times, lowering customer satisfaction and increasing contact center costs.

Amazon Q in Connect, a generative AI-powered agent assistant that includes functionality formerly available as Amazon Connect Wisdom, understands customer intents and uses relevant sources of information to deliver accurate responses and actions for the agent to communicate and resolve unique customer needs, all in real-time. Try Amazon Q in Connect for no charge until March 1, 2024. The feature is easy to enable, and you can get started in the Amazon Connect console.

Amazon Connect Contact Lens: generative post-contact summarization for increased productivity
To improve customer interactions and make sure details are available for future reference, contact center managers rely on the notes that agents manually create after every customer interaction. These notes include details on how a customer issue was addressed, key moments of the conversation, and any pending follow-up items.

Amazon Connect Contact Lens now provides generative AI-powered post-contact summarization, and enables contact center managers to more efficiently monitor and help improve contact quality and agent performance. For example, you can use summaries to track commitments made to customers and make sure of the prompt completion of follow-up actions. Moments after a customer interaction, Contact Lens now condenses the conversation into a concise and coherent summary.

Amazon Lex in Amazon Connect: assisted slot resolution
Using Amazon Lex, you can already build chatbots, virtual agents, and interactive voice response (IVR) which lets your customers schedule an appointment without speaking to a human agent. For example, “I need to change my travel reservation for myself and my two children,” might be difficult for a traditional bot to resolve to a numeric value (how many people are on the travel reservation?).

With the new assisted slot resolution feature, Amazon Lex can now resolve slot values in user utterances with great accuracy (for example, providing an answer to the previous question by providing a correct numeric value of three). This is powered by the advanced reasoning capabilities of LLMs which improve accuracy and provide a better customer experience. Learn about all the features of Amazon Lex, including the new generative AI-powered capabilities to help you build better self-service experiences.

Amazon Connect Customer Profiles: quicker creation of unified customer profiles for personalized customer experiences
Customers expect personalized customer service experiences. To provide this, contact centers need a comprehensive understanding of customers’ preferences, purchases, and interactions. To achieve that, contact center administrators create unified customer profiles by merging customer data from a number of applications. These applications each have different types of customer data stored in varied formats across a range of data stores. Stitching together data from these various data stores needs contact center administrators to understand their data and figure out how to organize and combine it into a unified format. To accomplish this, they spend weeks compiling unified customer profiles.

Starting today, Amazon Connect Customer Profiles uses LLMs to shorten the time needed to create unified customer profiles. When contact center administrators add data sources such as Amazon Simple Storage Service (Amazon S3), Adobe Analytics, Salesforce, ServiceNow, and Zendesk, Customer Profiles analyze the data to understand what the data format and content represents and how the data relates to customers’ profiles. Then, Customer Profiles then automatically determines how to organize and combine data from different sources into complete, accurate profiles. With just a few steps, managers can review, make any necessary edits, and complete the setup of customer profiles.

Review summary mapping

In-app, web, and video capabilities in Amazon Connect
As an organization, you want to provide great, easy-to-use, and convenient customer service. Earlier in this post I talked about self-service chatbots and how they help you with this. At times customers want to move beyond the chatbot, and beyond an audio conversation with the agent.

Amazon Connect now has in-app, web, and video capabilities to help you deliver rich, personalized customer experiences (see Amazon Lex features for details). Using the fully-managed communication widget, and with a few lines of code, you can implement these capabilities on your web and mobile applications. This allows your customers to get support from a web or mobile application without ever having to leave the page. Video can be enabled by either the agent only, by the customer only, or by both agent and customer.

Video calling

Amazon Connect SMS: two-way SMS capabilities
Almost everyone owns a mobile device and we love the flexibility of receiving text-based support on-the-go. Contact center leaders know this, and in the past have relied on disconnected, third-party solutions to provide two-way SMS to customers.

Amazon Connect now has two-way SMS capabilities to enable contact center leaders to provide this flexibility (see Amazon Lex features for details). This improves customer satisfaction and increases agent productivity without costly integration with third-party solutions. SMS chat can be enabled using the same configuration, Amazon Connect agent workspace, and analytics as calls and chats.

Learn more

Send feedback

Veliswa

New Amazon Q in QuickSight uses generative AI assistance for quicker, easier data insights (preview)

This post was originally published on this site

Today, I’m happy to share that Amazon Q in QuickSight is available for preview. Now you can experience the Generative BI capabilities in Amazon QuickSight announced on July 26, as well as two additional capabilities for business users.

Turning insights into impact faster with Amazon Q in QuickSight
With this announcement, business users can now generate compelling sharable stories examining their data, see executive summaries of dashboards surfacing key insights from data in seconds, and confidently answer questions of data not answered by dashboards and reports with a reimagined Q&A experience.

Before we go deeper into each capability, here’s a quick summary:

  • Stories — This is a new and visually compelling way to present and share insights. Stories can automatically generated in minutes using natural language prompts, customized using point-and-click options, and shared securely with others.
  • Executive summaries — With this new capability, Amazon Q helps you to understand key highlights in your dashboard.
  • Data Q&A — This capability provides a new and easy-to-use natural-language Q&A experience to help you get answers for questions beyond what is available in existing dashboards and reports.​​

To get started, you need to enable Preview Q Generative Capabilities in Preview manager.

Once enabled, you’re ready to experience what Amazon Q in QuickSight brings for business users and business analysts building dashboards.

Stories automatically builds formatted narratives
Business users often need to share their findings of data with others to inform team decisions; this has historically involved taking data out of the business intelligence (BI) system. Stories are a new feature enabling business users to create beautifully formatted narratives that describe data, and include visuals, images, and text in document or slide format directly that can easily be shared with others within QuickSight.

Now, business users can use natural language to ask Amazon Q to build a story about their data by starting from the Amazon Q Build menu on an Amazon QuickSight dashboard. Amazon Q extracts data insights and statistics from selected visuals, then uses large language models (LLMs) to build a story in multiple parts, examining what the data may mean to the business and suggesting ideas to achieve specific goals.

For example, a sales manager can ask, “Build me a story about overall sales performance trends. Break down data by product and region. Suggest some strategies for improving sales.” Or, “Write a marketing strategy that uses regional sales trends to uncover opportunities that increase revenue.” Amazon Q will build a story exploring specific data insights, including strategies to grow sales.

Once built, business users get point-and-click tools augmented with artificial intelligence- (AI) driven rewriting capabilities to customize stories using a rich text editor to refine the message, add ideas, and highlight important details.

Stories can also be easily and securely shared with other QuickSight users by email.

Executive summaries deliver a quick snapshot of important information
Executive summaries are now available with a single click using the Amazon Q Build menu in Amazon QuickSight. Amazon QuickSight automatically determines interesting facts and statistics, then use LLMs to write about interesting trends.

This new capability saves time in examining detailed dashboards by providing an at-a-glance view of key insights described using natural language.

The executive summaries feature provides two advantages. First, it helps business users generate all the key insights without the need to browse through tens of visuals on the dashboard and understand changes from each. Secondly, it enables readers to find key insights based on information in the context of dashboards and reports with minimum effort.

New data Q&A experience
Once an interesting insight is discovered, business users frequently need to dig in to understand data more deeply than they can from existing dashboards and reports. Natural language query (NLQ) solutions designed to solve this problem frequently expect that users already know what fields may exist or how they should be combined to answer business questions. However, business users aren’t always experts in underlying data schemas, and their questions frequently come in more general terms, like “How were sales last week in NY?” Or, “What’s our top campaign?”

The new Q&A experience accessed within the dashboards and reports helps business users confidently answer questions about data. It includes AI-suggested questions and a profile of what data can be asked about and automatically generated multi-visual answers with narrative summaries explaining data context.

Furthermore, Amazon Q brings the ability to answer vague questions and offer alternatives for specific data. For example, customers can ask a vague question, such as “Top products,” and Amazon Q will provide an answer that breaks down products by sales and offers alternatives for products by customer count and products by profit. Amazon Q explains answer context in a narrative summarizing total sales, number of products, and picking out the sales for the top product.

Customers can search for specific data values and even a single word such as, for example, the product name “contactmatcher.” Amazon Q returns a complete set of data related to that product and provides a natural language breakdown explaining important insights like total units sold. Specific visuals from the answers can also be added to a pinboard for easy future access.

Watch the demo
To see these new capabilities in action, have a look at the demo.

Things to Know
Here are a few additional things that you need to know:

Join the preview
Amazon Q in QuickSight product page

Happy building!
— Donnie

Introducing Amazon Q, a new generative AI-powered assistant (preview)

This post was originally published on this site

Today, we are announcing Amazon Q, a new generative artificial intelligence- (AI)-powered assistant designed for work that can be tailored to your business. You can use Amazon Q to have conversations, solve problems, generate content, gain insights, and take action by connecting to your company’s information repositories, code, data, and enterprise systems. Amazon Q provides immediate, relevant information and advice to employees to streamline tasks, accelerate decision-making and problem-solving, and help spark creativity and innovation at work.

Amazon Q offers user-based plans, so you get features, pricing, and options tailored to how you use the product. Amazon Q can adapt its interactions to each individual user based on the existing identities, roles, and permissions of your business. AWS never uses customers’ content from Amazon Q to train the underlying models. In other words, your company information remains secure and private.

In this post, I’ll give you a quick tour of how you can use Amazon Q for general business use.

Amazon Q is your business expert
Let’s look at a few examples of how Amazon Q can help business users complete tasks using simple natural language prompts. As a marketing manager, you could ask Amazon Q to transform a press release into a blog post, create a summary of the press release, or create an email draft based on the provided release. Amazon Q searches through your company content, which can include internal style guides, for example, to provide a response appropriate to your company’s brand standards. Then, you could ask Amazon Q to generate tailored social media prompts to promote your story through each of your social media channels. Later, you can ask Amazon Q to analyze the results of your campaign and summarize them for leadership reviews.

Amazon Q

In the following example, I deployed Amazon Q with access to my AWS News Blog posts from 2023 and called the assistant “AWS Blog Expert.”

Amazon Q

Coming back to my previous example, let’s assume I’m a marketing manager and want Amazon Q to help me create social media posts for recent company blog posts.

I enter the following prompt: “Summarize the key insights from Antje’s recent AWS Weekly Roundup posts and craft a compelling social media post that not only highlights the most important points but also encourages engagement. Consider our target audience and aim for a tone that aligns with our brand identity. The social media post should be concise, informative, and enticing to encourage readers to click through and read the full articles. Please ensure the content is shareable and includes relevant hashtags for maximum visibility.”

Amazon Q

Behind the scenes, Amazon Q searches the documents in connected data sources and creates a relevant and detailed suggestion for a social media post based on my blog posts. Amazon Q also tells me which document was used to generate the answer. In this case, it is PDF file of the blog posts in question.

As an administrator, you can define the context for responses, restrict irrelevant topics, and configure whether to respond only using trusted company information or complement responses with knowledge from the underlying model. Restricting responses to trusted company information helps mitigate hallucinations, a common phenomenon where the underlying model generates responses that sound plausible but are based on misinterpreted or nonexistent data.

Amazon Q provides fine-grained access controls that restrict responses to only using data or acting based on the employee’s level of access and provides citations and references to the original sources for fact-checking and traceability. You can choose among 40+ built-in connectors for popular data sources and enterprise systems, including Amazon S3, Google Drive, Microsoft SharePoint, Salesforce, ServiceNow, and Slack.

How to tailor Amazon Q to your business
To tailor Amazon Q to your business, navigate to Amazon Q in the console, select Applications in the left menu, and choose Create application.

Amazon Q

This starts the following workflow.

Step 1. Create application. Provide an application name and create a new or select an existing AWS Identity and Access Management (IAM) service role that Amazon Q is allowed to assume. I call my application AWS-Blog-Expert. Then, choose Create.

Amazon Q

Step 2. Select retriever. A retriever pulls data from the index in real time during a conversation. You can choose between two options: use the Amazon Q native retriever or use an existing Amazon Kendra retriever. The native retriever can connect to the Amazon Q supported data sources. If you already use Amazon Kendra, you can select the existing Amazon Kendra retriever to connect the associated data sources to your Amazon Q application. I select the native retriever option. Then, choose Next.

Amazon Q

Step 3. Connect data sources. Amazon Q comes with built-in connectors for popular data sources and enterprise systems. For this demo, I choose Amazon S3 and configure the data source by pointing to my S3 bucket with the PDFs of my blog posts.

Amazon Q
Once the data source sync is successfully complete and the retriever shows the accurate document count, you can preview the web experience and start a conversation. Note that the data source sync can take from a few minutes to a few hours, depending on the amount and size of data to index.

You can also connect plugins that manage access to enterprise systems, including ServiceNow, Jira, Salesforce, and Zendesk. Plugins enable Amazon Q to perform user-requested tasks, such as creating support tickets or analyzing sales forecasts.

Amazon Q

Preview and deploy web experience
In the application overview, choose Preview web experience. This opens the web experience with the conversational interface to chat with the tailored Amazon Q AWS Blog Expert. In the final step, you deploy the Amazon Q web experience. You can integrate your SAML 2.0–compliant external identity provider (IdP) using IAM. Amazon Q can work with any IdP that’s compliant with SAML 2.0. Amazon Q uses service-initiated single sign-on (SSO) to authenticate users.

Join the preview
Amazon Q is available today in preview in AWS Regions US East (N. Virginia) and US West (Oregon). Visit the product page to learn how Amazon Q can become your expert in your business.

Also, check out the Amazon Q Slack Gateway GitHub repository that shows how to make Amazon Q available to users as a Slack Bot application.Amazon Q Slack Bot

Learn more

— Antje