Category Archives: AWS

Announcing Llama 3.1 405B, 70B, and 8B models from Meta in Amazon Bedrock

This post was originally published on this site

Today, we are announcing the availability of Llama 3.1 models in Amazon Bedrock. The Llama 3.1 models are Meta’s most advanced and capable models to date. The Llama 3.1 models are a collection of 8B, 70B, and 405B parameter size models that demonstrate state-of-the-art performance on a wide range of industry benchmarks and offer new capabilities for your generative artificial intelligence (generative AI) applications.

All Llama 3.1 models support a 128K context length (an increase of 120K tokens from Llama 3) that has 16 times the capacity of Llama 3 models and improved reasoning for multilingual dialogue use cases in eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

You can now use three new Llama 3.1 models from Meta in Amazon Bedrock to build, experiment, and responsibly scale your generative AI ideas:

  • Llama 3.1 405B (preview) is the world’s largest publicly available large language model (LLM) according to Meta. The model sets a new standard for AI and is ideal for enterprise-level applications and research and development (R&D). It is ideal for tasks like synthetic data generation where the outputs of the model can be used to improve smaller Llama models and model distillations to transfer knowledge to smaller models from the 405B model. This model excels at general knowledge, long-form text generation, multilingual translation, machine translation, coding, math, tool use, enhanced contextual understanding, and advanced reasoning and decision-making. To learn more, visit the AWS Machine Learning Blog about using Llama 3.1 405B to generate synthetic data for model distillation.
  • Llama 3.1 70B is ideal for content creation, conversational AI, language understanding, R&D, and enterprise applications. The model excels at text summarization and accuracy, text classification, sentiment analysis and nuance reasoning, language modeling, dialogue systems, code generation, and following instructions.
  • Llama 3.1 8B is best suited for limited computational power and resources. The model excels at text summarization, text classification, sentiment analysis, and language translation requiring low-latency inferencing.

Meta measured the performance of Llama 3.1 on over 150 benchmark datasets that span a wide range of languages and extensive human evaluations. As you can see in the following chart, Llama 3.1 outperforms Llama 3 in every major benchmarking category.

To learn more about Llama 3.1 features and capabilities, visit the Llama 3.1 Model Card from Meta and Llama models in the AWS documentation.

You can take advantage of Llama 3.1’s responsible AI capabilities, combined with the data governance and model evaluation features of Amazon Bedrock to build secure and reliable generative AI applications with confidence.

  • Guardrails for Amazon Bedrock – By creating multiple guardrails with different configurations tailored to specific use cases, you can use Guardrails to promote safe interactions between users and your generative AI applications by implementing safeguards customized to your use cases and responsible AI policies. With Guardrails for Amazon Bedrock, you can continually monitor and analyze user inputs and model responses that might violate customer-defined policies, detect hallucination in model responses that are not grounded in enterprise data or are irrelevant to the user’s query, and evaluate across different models including custom and third-party models. To get started, visit Create a guardrail in the AWS documentation.
  • Model evaluation on Amazon Bedrock – You can evaluate, compare, and select the best Llama models for your use case in just a few steps using either automatic evaluation or human evaluation. With model evaluation on Amazon Bedrock, you can choose automatic evaluation with predefined metrics such as accuracy, robustness, and toxicity. Alternatively, you can choose human evaluation workflows for subjective or custom metrics such as relevance, style, and alignment to brand voice. Model evaluation provides built-in curated datasets or you can bring in your own datasets. To get started, visit Get started with model evaluation in the AWS documentation.

To learn more about how to keep your data and applications secure and private in AWS, visit the Amazon Bedrock Security and Privacy page.

Getting started with Llama 3.1 models in Amazon Bedrock
If you are new to using Llama models from Meta, go to the Amazon Bedrock console and choose Model access on the bottom left pane. To access the latest Llama 3.1 models from Meta, request access separately for Llama 3.1 8B Instruct, Llama 3.1 70B Instruct, or Llama 3.1 405B Instruct.

To request to be considered for access to the preview of Llama 3.1 405B in Amazon Bedrock, contact your AWS account team or submit a support ticket via the AWS Management Console. When creating the support ticket, select Amazon Bedrock as the Service and Models as the Category.

To test the Llama 3.1 models in the Amazon Bedrock console, choose Text or Chat under Playgrounds in the left menu pane. Then choose Select model and select Meta as the category and Llama 3.1 8B Instruct, Llama 3.1 70B Instruct, or Llama 3.1 405B Instruct as the model.

In the following example I selected the Llama 3.1 405B Instruct model.

By choosing View API request, you can also access the model using code examples in the AWS Command Line Interface (AWS CLI) and AWS SDKs. You can use model IDs such as meta.llama3-1-8b-instruct-v1, meta.llama3-1-70b-instruct-v1 , or meta.llama3-1-405b-instruct-v1.

Here is a sample of the AWS CLI command:

aws bedrock-runtime invoke-model 
  --model-id meta.llama3-1-405b-instruct-v1:0 
--body "{"prompt":" [INST]You are a very intelligent bot with exceptional critical thinking[/INST] I went to the market and bought 10 apples. I gave 2 apples to your friend and 2 to the helper. I then went and bought 5 more apples and ate 1. How many apples did I remain with? Let's think step by step.","max_gen_len":512,"temperature":0.5,"top_p":0.9}" 
  --cli-binary-format raw-in-base64-out 
  --region us-east-1 
  invoke-model-output.txt

You can use code examples for Llama models in Amazon Bedrock using AWS SDKs to build your applications using various programming languages. The following Python code examples show how to send a text message to Llama using the Amazon Bedrock Converse API for text generation.

import boto3
from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region you want to use.
client = boto3.client("bedrock-runtime", region_name="us-east-1")

# Set the model ID, e.g., Llama 3 8b Instruct.
model_id = "meta.llama3-1-405b-instruct-v1:0"

# Start a conversation with the user message.
user_message = "Describe the purpose of a 'hello world' program in one line."
conversation = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

try:
    # Send the message to the model, using a basic inference configuration.
    response = client.converse(
        modelId=model_id,
        messages=conversation,
        inferenceConfig={"maxTokens": 512, "temperature": 0.5, "topP": 0.9},
    )

    # Extract and print the response text.
    response_text = response["output"]["message"]["content"][0]["text"]
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

You can also use all Llama 3.1 models (8B, 70B, and 405B) in Amazon SageMaker JumpStart. You can discover and deploy Llama 3.1 models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. You can operate your models with SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container logs under your virtual private cloud (VPC) controls, which help provide data security.

The fine-tuning for Llama 3.1 models in Amazon Bedrock and Amazon SageMaker JumpStart will be coming soon. When you build fine-tuned models in SageMaker JumpStart, you will also be able to import your custom models into Amazon Bedrock. To learn more, visit Meta Llama 3.1 models are now available in Amazon SageMaker JumpStart on the AWS Machine Learning Blog.

For customers who want to deploy Llama 3.1 models on AWS through self-managed machine learning workflows for greater flexibility and control of underlying resources, AWS Trainium and AWS Inferentia-powered Amazon Elastic Compute Cloud (Amazon EC2) instances enable high performance, cost-effective deployment of Llama 3.1 models on AWS. To learn more, visit AWS AI chips deliver high performance and low cost for Meta Llama 3.1 models on AWS in the AWS Machine Learning Blog.

To celebrate this launch, Parkin Kent, Business Development Manager at Meta, talks about the power of the Meta and Amazon collaboration, highlighting how Meta and Amazon are working together to push the boundaries of what’s possible with generative AI.

Discover how businesses are leveraging Llama models in Amazon Bedrock to harness the power of generative AI. Nomura, a global financial services group spanning 30 countries and regions, is democratizing generative AI across its organization using Llama models in Amazon Bedrock.

Now available
Llama 3.1 8B and 70B models from Meta are generally available and Llama 450B model is preview today in Amazon Bedrock in the US West (Oregon) Region. To request to be considered for access to the preview of Llama 3.1 405B in Amazon Bedrock, contact your AWS account team or submit a support ticket. Check the full Region list for future updates. To learn more, check out the Llama in Amazon Bedrock product page and the Amazon Bedrock pricing page.

Give Llama 3.1 a try in the Amazon Bedrock console today, and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Visit our community.aws site to find deep-dive technical content and to discover how our Builder communities are using Amazon Bedrock in their solutions. Let me know what you build with Llama 3.1 in Amazon Bedrock!

Channy

AWS Weekly Roundup: Global AWS Heroes Summit, AWS Lambda, Amazon Redshift, and more (July 22, 2024)

This post was originally published on this site

Last week, AWS Heroes from around the world gathered to celebrate the 10th anniversary of the AWS Heroes program at Global AWS Heroes Summit. This program recognizes a select group of AWS experts worldwide who go above and beyond in sharing their knowledge and making an impact within developer communities.

Matt Garman, CEO of AWS and a long-time supporter of developer communities, made a special appearance for a Q&A session with the Heroes to listen to their feedback and respond to their questions.

Here’s an epic photo from the AWS Heroes Summit:

As Matt mentioned in his Linkedin post, “The developer community has been core to everything we have done since the beginning of AWS.” Thank you, Heroes, for all you do. Wishing you all a safe flight home.

Last week’s launches
Here are some launches that caught my attention last week:

Announcing the July 2024 updates to Amazon Corretto — The latest updates for the Corretto distribution of OpenJDK is now available. This includes security and critical updates for the Long-Term Supported (LTS) and Feature (FR) versions.

New open-source Advanced MYSQL ODBC Driver now available for Amazon Aurora and RDS — The new AWS ODBC Driver for MYSQL provides faster switchover and failover times, and authentication support for AWS Secrets Manager and AWS Identity and Access Management (IAM), making it a more efficient and secure option for connecting to Amazon RDS and Amazon Aurora MySQL-compatible edition databases.

Productionize Fine-tuned Foundation Models from SageMaker Canvas — Amazon SageMaker Canvas now allows you to deploy fine-tuned Foundation Models (FMs) to SageMaker real-time inference endpoints, making it easier to integrate generative AI capabilities into your applications outside the SageMaker Canvas workspace.

AWS Lambda now supports SnapStart for Java functions that use the ARM64 architecture — Lambda SnapStart for Java functions on ARM64 architecture delivers up to 10x faster function startup performance and up to 34% better price performance compared to x86, enabling the building of highly responsive and scalable Java applications using AWS Lambda.

Amazon QuickSight improves controls performance — Amazon QuickSight has improved the performance of controls, allowing readers to interact with them immediately without having to wait for all relevant controls to reload. This enhancement reduces the loading time experienced by readers.

Amazon OpenSearch Serverless levels up speed and efficiency with smart caching — The new smart caching feature for indexing in Amazon OpenSearch Serverless automatically fetches and manages data, leading to faster data retrieval, efficient storage usage, and cost savings.

Amazon Redshift Serverless with lower base capacity available in the Europe (London) Region — Amazon Redshift Serverless now allows you to start with a lower data warehouse base capacity of 8 Redshift Processing Units (RPUs) in the Europe (London) region, providing more flexibility and cost-effective options for small to large workloads.

AWS Lambda now supports Amazon MQ for ActiveMQ and RabbitMQ in five new regions — AWS Lambda now supports Amazon MQ for ActiveMQ and RabbitMQ in five new regions, enabling you to build serverless applications with Lambda functions that are invoked based on messages posted to Amazon MQ message brokers.

From community.aws
Here’s my top 5 personal favorites posts from community.aws:

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. To learn more about future AWS Summit events, visit the AWS Summit page. Register in your nearest city: AWS Summit Taipei (July 23–24), AWS Summit Mexico City (Aug. 7), and AWS Summit Sao Paulo (Aug. 15).

AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are in Aotearoa (Aug. 15), Nigeria (Aug. 24), New York (Aug. 28), and Belfast (Sept. 6).

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Donnie

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Vector search for Amazon MemoryDB is now generally available

This post was originally published on this site

Today, we are announcing the general availability of vector search for Amazon MemoryDB, a new capability that you can use to store, index, retrieve, and search vectors to develop real-time machine learning (ML) and generative artificial intelligence (generative AI) applications with in-memory performance and multi-AZ durability.

With this launch, Amazon MemoryDB delivers the fastest vector search performance at the highest recall rates among popular vector databases on Amazon Web Services (AWS). You no longer have to make trade-offs around throughput, recall, and latency, which are traditionally in tension with one another.

You can now use one MemoryDB database to store your application data and millions of vectors with single-digit millisecond query and update response times at the highest levels of recall. This simplifies your generative AI application architecture while delivering peak performance and reducing licensing cost, operational burden, and time to deliver insights on your data.

With vector search for Amazon MemoryDB, you can use the existing MemoryDB API to implement generative AI use cases such as Retrieval Augmented Generation (RAG), anomaly (fraud) detection, document retrieval, and real-time recommendation engines. You can also generate vector embeddings using artificial intelligence and machine learning (AI/ML) services like Amazon Bedrock and Amazon SageMaker and store them within MemoryDB.

Which use cases would benefit most from vector search for MemoryDB?
You can use vector search for MemoryDB for the following specific use cases:

1. Real-time semantic search for retrieval-augmented generation (RAG)
You can use vector search to retrieve relevant passages from a large corpus of data to augment a large language model (LLM). This is done by taking your document corpus, chunking them into discrete buckets of texts, and generating vector embeddings for each chunk with embedding models such as the Amazon Titan Multimodal Embeddings G1 model, then loading these vector embeddings into Amazon MemoryDB.

With RAG and MemoryDB, you can build real-time generative AI applications to find similar products or content by representing items as vectors, or you can search documents by representing text documents as dense vectors that capture semantic meaning.

2. Low latency durable semantic caching
Semantic caching is a process to reduce computational costs by storing previous results from the foundation model (FM) in-memory. You can store prior inferenced answers alongside the vector representation of the question in MemoryDB and reuse them instead of inferencing another answer from the LLM.

If a user’s query is semantically similar based on a defined similarity score to a prior question, MemoryDB will return the answer to the prior question. This use case will allow your generative AI application to respond faster with lower costs from making a new request to the FM and provide a faster user experience for your customers.

3. Real-time anomaly (fraud) detection
You can use vector search for anomaly (fraud) detection to supplement your rule-based and batch ML processes by storing transactional data represented by vectors, alongside metadata representing whether those transactions were identified as fraudulent or valid.

The machine learning processes can detect users’ fraudulent transactions when the net new transactions have a high similarity to vectors representing fraudulent transactions. With vector search for MemoryDB, you can detect fraud by modeling fraudulent transactions based on your batch ML models, then loading normal and fraudulent transactions into MemoryDB to generate their vector representations through statistical decomposition techniques such as principal component analysis (PCA).

As inbound transactions flow through your front-end application, you can run a vector search against MemoryDB by generating the transaction’s vector representation through PCA, and if the transaction is highly similar to a past detected fraudulent transaction, you can reject the transaction within single-digit milliseconds to minimize the risk of fraud.

Getting started with vector search for Amazon MemoryDB
Look at how to implement a simple semantic search application using vector search for MemoryDB.

Step 1. Create a cluster to support vector search
You can create a MemoryDB cluster to enable vector search within the MemoryDB console. Choose Enable vector search in the Cluster settings when you create or update a cluster. Vector search is available for MemoryDB version 7.1 and a single shard configuration.

Step 2. Create vector embeddings using the Amazon Titan Embeddings model
You can use Amazon Titan Text Embeddings or other embedding models to create vector embeddings, which is available in Amazon Bedrock. You can load your PDF file, split the text into chunks, and get vector data using a single API with LangChain libraries integrated with AWS services.

import redis
import numpy as np
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import BedrockEmbeddings

# Load a PDF file and split document
loader = PyPDFLoader(file_path=pdf_path)
        pages = loader.load_and_split()
        text_splitter = RecursiveCharacterTextSplitter(
            separators=["nn", "n", ".", " "],
            chunk_size=1000,
            chunk_overlap=200,
        )
        chunks = loader.load_and_split(text_splitter)

# Create MemoryDB vector store the chunks and embedding details
client = RedisCluster(
        host=' mycluster.memorydb.us-east-1.amazonaws.com',
        port=6379,
        ssl=True,
        ssl_cert_reqs="none",
        decode_responses=True,
    )

embedding =  BedrockEmbeddings (
           region_name="us-east-1",
 endpoint_url=" https://bedrock-runtime.us-east-1.amazonaws.com",
    )

#Save embedding and metadata using hset into your MemoryDB cluster
for id, dd in enumerate(chucks*):
     y = embeddings.embed_documents([dd])
     j = np.array(y, dtype=np.float32).tobytes()
     client.hset(f'oakDoc:{id}', mapping={'embed': j, 'text': chunks[id] } )

Once you generate the vector embeddings using the Amazon Titan Text Embeddings model, you can connect to your MemoryDB cluster and save these embeddings using the MemoryDB HSET command.

Step 3. Create a vector index
To query your vector data, create a vector index using theFT.CREATE command. Vector indexes are also constructed and maintained over a subset of the MemoryDB keyspace. Vectors can be saved in JSON or HASH data types, and any modifications to the vector data are automatically updated in a keyspace of the vector index.

from redis.commands.search.field import TextField, VectorField

index = client.ft(idx:testIndex).create_index([
        VectorField(
            "embed",
            "FLAT",
            {
                "TYPE": "FLOAT32",
                "DIM": 1536,
                "DISTANCE_METRIC": "COSINE",
            }
        ),
        TextField("text")
        ]
    )

In MemoryDB, you can use four types of fields: numbers fields, tag fields, text fields, and vector fields. Vector fields support K-nearest neighbor searching (KNN) of fixed-sized vectors using the flat search (FLAT) and hierarchical navigable small worlds (HNSW) algorithm. The feature supports various distance metrics, such as euclidean, cosine, and inner product. We will use the euclidean distance, a measure of the angle distance between two points in vector space. The smaller the euclidean distance, the closer the vectors are to each other.

Step 4. Search the vector space
You can use FT.SEARCH and FT.AGGREGATE commands to query your vector data. Each operator uses one field in the index to identify a subset of the keys in the index. You can query and find filtered results by the distance between a vector field in MemoryDB and a query vector based on some predefined threshold (RADIUS).

from redis.commands.search.query import Query

# Query vector data
query = (
    Query("@vector:[VECTOR_RANGE $radius $vec]=>{$YIELD_DISTANCE_AS: score}")
     .paging(0, 3)
     .sort_by("vector score")
     .return_fields("id", "score")     
     .dialect(2)
)

# Find all vectors within 0.8 of the query vector
query_params = {
    "radius": 0.8,
    "vec": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
}

results = client.ft(index).search(query, query_params).docs

For example, when using cosine similarity, the RADIUS value ranges from 0 to 1, where a value closer to 1 means finding vectors more similar to the search center.

Here is an example result to find all vectors within 0.8 of the query vector.

[Document {'id': 'doc:a', 'payload': None, 'score': '0.243115246296'},
 Document {'id': 'doc:c', 'payload': None, 'score': '0.24981123209'},
 Document {'id': 'doc:b', 'payload': None, 'score': '0.251443207264'}]

To learn more, you can look at a sample generative AI application using RAG with MemoryDB as a vector store.

What’s new at GA
At re:Invent 2023, we released vector search for MemoryDB in preview. Based on customers’ feedback, here are the new features and improvements now available:

  • VECTOR_RANGE to allow MemoryDB to operate as a low latency durable semantic cache, enabling cost optimization and performance improvements for your generative AI applications.
  • SCORE to better filter on similarity when conducting vector search.
  • Shared memory to not duplicate vectors in memory. Vectors are stored within the MemoryDB keyspace and pointers to the vectors are stored in the vector index.
  • Performance improvements at high filtering rates to power the most performance-intensive generative AI applications.

Now available
Vector search is available in all Regions that MemoryDB is currently available. Learn more about vector search for Amazon MemoryDB in the AWS documentation.

Give it a try in the MemoryDB console and send feedback to the AWS re:Post for Amazon MemoryDB or through your usual AWS Support contacts.

Channy

Build enterprise-grade applications with natural language using AWS App Studio (preview)

This post was originally published on this site

Organizations often struggle to solve their business problems in areas like claims processing, inventory tracking, and project approvals. Custom business applications could provide a solution to solve these problems and help an organization work more effectively but have historically required a professional development team to build and maintain. But often, development capacity is unavailable or too expensive, leaving businesses using inefficient tools and processes.

Today, we’re announcing a public preview of AWS App Studio. App Studio is a generative artificial intelligence (AI)-powered service that uses natural language to create enterprise-grade applications in minutes, without requiring software development skills.

Here’s a quick look at what App Studio can do. Once I’m signed in to App Studio, I select CREATE A NEW APP using the generative AI assistant. I describe that I need a project approval app. App Studio then generates an app for me, complete with a user interface, data models, and business logic. The entire app generation process is complete in minutes. 

Note: This animation above shows the flow at an accelerated speed for demonstration purposes.

While writing this post, I discovered that App Studio is useful for various technical professionals. IT project managers, data engineers, and enterprise architects can use it to create and manage secure business applications in minutes instead of days. App Studio helps organizations build end-to-end custom applications, and it has two main user roles:

  • Admin – Members in this group can manage groups and roles, create and edit connectors, and maintain visibility into other apps built within their organization. In addition to these permissions, admins can also build apps of their own. To enable and set up App Studio or to learn more about what you can do as an administrator, you can jump to the Getting started with AWS App Studio (preview) section.
  • Builder – Members of the builder group can create, build, and share applications. If you’re more interested in the journey of building applications, you can skip to the Using App Studio as a builder: Creating an application section.

Getting started with AWS App Studio
AWS App Studio integrates with AWS IAM Identity Center, making it easier for me to secure access with the flexibility to integrate with existing single sign-on (SSO) and integration with Lightweight Directory Access Protocol (LDAP). Also, App Studio manages the application deployments and operations, removing the time and effort required to operate applications. Now, I can spend more of my time adding features to an application and customizing it to user needs.

Before I can use App Studio to create my applications, I need to enable the service. Here is how an administrator would set up an App Studio instance.

First, I need to go to the App Studio management console and choose Get started.

As mentioned, App Studio integrates with IAM Identity Center and will automatically detect if you have an existing organization instance in IAM Identity Center. To learn more about the difference between an organization and an account instance on IDC, you can visit the Manage organization and account instances of IAM Identity Center page.

In this case, I don’t have any organization instance, so App Studio will guide me through creating an account instance in IAM Identity Center. Here, as an administrator, I select Create an account instance for me

In the next section, Create users and groups and add them to App Studio, I need to define both an admin and builder group. In this section, I add myself as the admin, and I’ll add users into the builder group later. 

The last part of the onboarding process is to review and check the tick box in the Acknowledgment section, then select Set up.

When the onboarding process is complete, I can see from the Account page that my App Studio is Active and ready to use. At this point, I have a unique App Studio instance URL that I can access.

This onboarding scenario illustrates how you can start without an instance preconfigured in IAM Identity Center . Learn more on the Creating and setting up an App Studio instance for the first time page to understand how to use your existing IAM Identity Center instance. 

Because App Studio created the AWS IAM Identity Center account instance for me, I received an email along with instructions to sign in to App Studio. Once I select the link, I’ll need to create a password for my account and define the multi-factor authentication (MFA) to improve the security posture of my account.

Then, I can sign in to App Studio.

Add additional users (optional)
App Studio uses AWS IAM Identity Center to manage users and groups. This means that if I need to invite additional users into my App Studio instance, I need to do that in IAM Identity Center. 

For example, here’s the list of my users. I can add more users by selecting Add user. Once I’ve finished adding users, they will receive an email with the instructions to activate their accounts.

If I need to create additional groups, I can do so by selecting Create group on the Groups page. The following screenshot shows groups I’ve defined for my account instance in IAM Identity Center.

Using AWS App Studio as an administrator
Now, I’m switching to the App Studio and signing in as an administrator. Here, I can see two main sections: Admin hub and Builder hub.

As an administrator, I can grant users access to App Studio by associating existing user groups with roles in the Roles section:

To map the group I created in my IAM Identity Center, I select Add group and select the Group identifier and Role. There are three roles I can configure: admin, builder, and app user. To understand the difference between each role, visit the Managing access and roles in App Studio page.

As an administrator, I can incorporate various data sources with App Studio using connectors. App Studio provides built-in connectors to integrate with AWS services such as Amazon Aurora, Amazon DynamoDB, and Amazon Simple Storage Service (Amazon S3). It also has a built-in connector for Salesforce and a generic API and OpenAPI connector to integrate with third-party services.

Furthermore, App Studio automatically created a managed DynamoDB connector for me to get started. I also have the flexibility to create additional connectors by selecting Create connector

On this page, I can create other connectors to AWS services. If I need other AWS services, I can select Other AWS services. To learn how to define your IAM role for your connectors, visit Connect App Studio to other services with connectors.

Using App Studio as a builder: Creating an application
As a builder, I can use the App Studio generative AI–powered low-code building environment to create secure applications. To start, I can describe the application that I need in natural language, such as “Build an application to review and process invoices.” Then, App Studio will generate the application, complete with the data models, business logic, and a multipage UI.

Here’s where the fun begins. It’s time for me to build apps in App Studio. On the Builder hub page, I select Create app.

I give it a name, and there are two options for me to build the app: Generate an app with AI or Start from scratch. I select Generate an app with AI

On the next page, I can start building the app by simply describing what I need in the text box. I also can choose sample prompts which are available on the right panel. 

Then, App Studio will prepare app requirements for me. I can improve my plan for the application by refining the prompt and reviewing the updated requirements. Once I’m happy with the results, I select Generate app, and App Studio will generate an application for me.

I found this to be a good experience for me when I started building apps with App Studio. The generative AI capability built into App Studio generated an app for me in minutes, compared to the hours or even days it would have taken me to get to the same point using other tools.

After a few minutes, my app is ready. I also see that App Studio prepares a quick tutorial for me to navigate around and understand different areas. 

There are three main areas in App Studio: PagesAutomations, and Data. I always like to start building my apps by defining the data models first, so let’s navigate to the Data section.

In the Data section, I can model my application data with the managed data store powered by DynamoDB or using the available data connectors. Because I chose to let AI generate this app, I have all the data entities defined for me. If I opted to do it manually, I would need to create entities representing the different data tables and field types for my application.

Once I’m happy with the data entities, I can build visual pages. In this area, I can create the UI for my users. I can add and arrange components like tables, forms, and buttons to create a tailored experience for my end users.

While I’m building the app, I can see the live preview by selecting Preview. This is useful for testing the layout and functionality of my application.

But the highlight for me in these three areas is the Automations. With the automations, I can define rules, workflows, and any actions that define or extend my application’s business logic. Because I chose to build this application with App Studio’s generative AI assistant, it automatically created and wired up multiple different automations needed for my application.

For example, every time a new project is submitted, it will trigger an action to create a project and send a notification email. 

I can also extend my business logic by invoking API callouts, AWS Lambda, or other AWS services. Besides creating the project, I’d also like to archive the project in a flat-file format into an S3 bucket. To do that, I also need to do some processing, and I happen to already have the functionality built in an existing Lambda function.

Here, I select Invoke Lambda, as shown in the previous screenshot. Then, I need to set the ConnectorFunction name, and the Function event payload to pass into my existing Lambda function.

Finally, after I’m happy with all the UI pages, data entities, and automations, I can publish it by selecting Publish. I have the flexibility to publish my app in a Testing or Production environment. This helps me to test my application before pushing it to production.

Join the preview
AWS App Studio is currently in preview, and you can access it in the US West (Oregon) AWS Region, but your applications can connect to your data in other AWS Regions.

Build secure, scalable, and performant custom business applications to modernize and streamline mission-critical tasks with AWS App Studio. Learn more about all the features and functionalities on the AWS App Studio documentation page, and join the conversation in the #aws-app-studio channel in the AWS Developers Slack workspace.

Happy building,

— Donnie

Amazon Q Apps, now generally available, enables users to build their own generative AI apps

This post was originally published on this site

When we launched Amazon Q Business in April 2024, we also previewed Amazon Q Apps. Amazon Q Apps is a capability within Amazon Q Business for users to create generative artificial intelligence (generative AI)–powered apps based on the organization’s data. Users can build apps using natural language and securely publish them to the organization’s app library for everyone to use.

After collecting your feedback and suggestions during the preview, today we’re making Amazon Q Apps generally available. We’re also adding some new capabilities that were not available during the preview, such as API for Amazon Q Apps and the ability to specify data sources at the individual card level.

I’ll expand on the new features in a moment, but let’s first look into how to get started with Amazon Q Apps.

Transform conversations into reusable apps
Amazon Q Apps allows users to generate an app from their conversation with Amazon Q Business. Amazon Q Apps intelligently captures the context of conversation to generate an app tailored to specific needs. Let’s see it in action!

As I started writing this post, I thought of getting help from Amazon Q Business to generate a product overview of Amazon Q Apps. After all, Amazon Q Business is for boosting workforce productivity. So, I uploaded the product messaging documents to an Amazon Simple Storage Service (Amazon S3) bucket and added it as a data source using Amazon S3 connector for Amazon Q Business.

I start my conversation with the prompt:

I’m writing a launch post for Amazon Q Apps.
Here is a short description of the product: Employees can create lightweight, purpose-built Amazon Q Apps within their broader Amazon Q Business application environment.
Generate an overview of the product and list its key features.

Amazon Q Business Chat

After starting the conversation, I realize that creating a product overview given a product description would also be useful for others in the organization. I choose Create Amazon Q App to create a reusable and shareable app.

Amazon Q Business automatically generates a prompt to create an Amazon Q App and displays the prompt to me to verify and edit if need be:

Build an app that takes in a short text description of a product or service, and outputs an overview of that product/service and a list of its key features, utilizing data about the product/service known to Q.

Amazon Q Apps Creator

I choose Generate to continue the creation of the app. It creates a Product Overview Generator app with four cards—two input cards to get user inputs and two output cards that display the product overview and its key features.

Product Overview Generator App

I can adjust the layout of the app by resizing the cards and moving them around.

Also, the prompts for the individual text output cards are automatically generated so I can view and edit them. I choose the edit icon of the Product Overview card to see the prompt in the side panel.

In the side panel, I can also select the source for the text output card to generate the output using either large language model (LLM) knowledge or approved data sources. For the approved data sources, I can select one or more data sources that are configured for this Amazon Q Business application. I select the Marketing (Amazon S3) data source I had configured for creating this app.

Edit Text Output Card Prompt and select source

As you would notice, I generated a fully functional app from the conversation itself without having to make any changes to the base prompt or the individual text output card prompts.

I can now publish this app to the organization’s app library by choosing Publish. But before publishing the app, let’s look at another way to create Amazon Q apps.

Create generative AI apps using natural language
Instead of using conversation in Amazon Q Business as a starting point to create an app, I can choose Apps and use my own words to describe the app I want to create. Or I can try out the prompts from one of the preconfigured examples.

Amazon Q App

I can enter the prompt to fit the purpose and choose Generate to create the app.

Share apps with your team
Once you’re happy with both layouts and prompts, and are ready to share the app, you can publish the app to a centralized app library to give access to all users of this Amazon Q Business application environment.

Amazon Q Apps inherits the robust security and governance controls from Amazon Q Business, ensuring that data sources, user permissions, and guardrails are maintained. So, when other users run the app, they only see responses based on data they have access to in the underlying data sources.

For the Product Overview Generator app I created, I choose Publish. It displays the preview of the app and provides an option to select up to three labels. Labels help classify the apps by departments in the organization or any other categories. After selecting the labels, I choose Publish again on the preview popup.

Publish Amazon Q App

The app will instantly be available in the Amazon Q Apps library for others to use, copy, and build on top of. I choose Library to browse the Amazon Q Apps Library and find my Product Overview Generator app.

Amazon Q Apps Library

Customize apps in the app library for your specific needs
Amazon Q Apps allows users to quickly scale their individual or team productivity by customizing and tailoring shared apps to their specific needs. Instead of starting from scratch, users can review existing apps, use them as-is, or modify them and publish their own versions to the app library.

Let’s browse the app library and find an app to customize. I choose the label General to find apps in that category.

Document Editing Assistant App

I see a Document Editing Assistant app that reviews documents to correct grammatical mistakes. I would like to create a new version of the app to include a document summary too. Let’s see how we can do it.

I choose Open, and it opens the app with an option to Customize.

Open Document Editing Assistant App

I choose Customize, and it creates a copy of the app for me to modify it.

Customize App

I update the Title and Description of the app by choosing the edit icon of the app title.

I can see the original App Prompt that was used to generate this app. I can copy the prompt and use it as the starting point to create a similar app by updating it to include a description of the functionality that I would like to add and have Amazon Q Apps Creator take care of it. Or I can continue modifying this copy of the app.

There is an option to edit or delete existing cards. For example, I can edit the prompt of the Edited Document text output card by choosing the edit icon of the card.

Edit Text Output Card Prompt

To add more features, you can add more cards, such as a user input, text output, file upload, or preconfigured plugin by your administrator. The file upload card, for example, can be used to provide a file as another data source to refine or fine-tune the answers to your questions. The plugin card can be used, for example, to create a Jira ticket for any action item that needs to be performed as a follow-up.

I choose Text output to add a new card that will summarize the document. I enter the title as Document Summary and prompt as follows:

Summarize the key points in @Upload Document in a couple of sentences

Add Text Output Card

Now, I can publish this customized app as a new app and share it with everyone in the organization.

What did we add after the preview?

As I mentioned, we have used your feedback and suggestions during the preview period to add new capabilities. Here are the new features we have added:

Specify data sources at card level – As I have shown while creating the app, you can specify data sources you would like the output to be generated from. We added this feature to improve the accuracy of the responses.

Your Amazon Q business instance can have multiple data sources configured. However, to create an app, you might need only a subset of these data sources, based on the use case. So now you can choose specific data sources for each of the text output cards in your app. Or if your usecase requires, you can configure the text output cards to use LLM knowledge instead of using any data sources.

Amazon Q Apps API – You can now create and manage Amazon Q Apps programmatically with APIs for managing apps, app library and app sessions. This allows you to integrate all the functionalities of Amazon Q Apps into the tools and applications of your choice.

Things to know:

  • Regions – Amazon Q Apps is generally available today in the Regions where Amazon Q Business is available, which are the US East (N. Virginia) and US West (Oregon) Regions.
  • Pricing – Amazon Q Apps is available with the Amazon Business Pro subscription ($20 per user per month), which gives users access to all the features of Amazon Q Business.
  • Learning resources – To learn more, visit Amazon Q Apps in the Amazon Q Business User Guide.

–  Prasad

Customize Amazon Q Developer (in your IDE) with your private code base

This post was originally published on this site

Today, we’re making the Amazon Q Developer (in your IDE) customization capability generally available for inline code completion, and we’re launching a preview of customization for the chat. You can now customize Amazon Q to generate specific code recommendations from private code repositories in the IDE code editor and in the chat.

Amazon Q Developer is an artificial intelligence (AI) coding companion. It helps software developers accelerate application development by offering code recommendations in their integrated development environments (IDE) derived from existing comments and code. Behind the scenes, Amazon Q uses large language models (LLMs) trained on billions of lines of code from Amazon and open source projects.

Amazon Q is available in your IDE, and you can download the extension for JetBrains, Visual Studio Code, and Visual Studio (preview). In the IDE text editor, it suggests code as you type or write entire functions from a comment you enter. You can also chat with Q Developer and ask it to generate code for specific tasks or explain code snippets from a code base you’re discovering.

With the new customization capability, developers can now receive even more relevant code recommendations that are based on their organization’s internal libraries, APIs, packages, classes, and methods.

For example, let’s imagine that a developer working for a financial company is tasked to write a function to compute the total portfolio value for a customer. The developer can now describe the intent in a comment or type a function name such as computePortfolioValue(customerId: String), and Amazon Q will suggest code to implement that function based on the examples it learned from your organization’s private code base.

The developer can also ask questions about their organization’s code in the chat. In the example above, let’s imagine the developer is new to the team and doesn’t know how to retrieve a customer ID. He can ask the question in the chat in plain English: how do I connect to the database to retrieve the customerId for a specific customer? Amazon Q chat could answer: I found a function to retrieve customerId based on customer first and last name which uses the database connection XYZ…

As an administrator, you create customizations built from your internal git repositories (such as GitHub, GitLab, or BitBucket) or an Amazon Simple Storage Service (Amazon S3) bucket. It helps Amazon Q understand the intent, determine which internal and public APIs are best suited to the task, and generate code recommendations.

Amazon Q customization capability meets the strong data privacy and security you expect from AWS. The code base you share with Amazon Q stays private to your organization. We don’t use it to train our foundation model. Once customizations are deployed, the inference endpoint is private for the developers in your organization. Recommendations based on your code won’t pop up in another company’s developer IDE. You decide which developers have access to each individual customization, and you can follow metrics to measure the performance of the customizations you deployed.

We built the Amazon Q customization capability based on leading technical techniques, such as Retrieval Augmented Generation (RAG). This very detailed blog post shares more details about the science behind the Amazon Q customizations capability.

CodeWhisperer RAG diagram

Since we launched the preview on October 17 last year, we’ve added two new capabilities: the ability to update a customization and the ability to customize the chat in the IDE.

Your organization’s code base is constantly evolving, and you want Amazon Q to always suggest up-to-date code snippets. Amazon Q administrator can now start an update process with one step in the AWS Management Console. Administrators can schedule regular updates based on the latest commits on code repositories to ensure developers always receive highly accurate code suggestions.

With the new chat customization (in preview), developers in your organization can select a portion of code in their IDE and send it to the chat to ask for an explanation of what the selected code does. Developers can also ask generic questions relative to their organization’s code base, like “How do I connect to the database to retrieve customerId for a specific customer?”

Let’s see how to use it
In this demo, I decided to focus on the new customization update capability that is generally available today. To quickly learn how to create a customization, activate it, and grant access to developers, read the excellent post from my colleague Donnie.

To update an existing customization, I navigate to the Customizations section of the Amazon Q console page. I select the customization I want to update. Then, I select Actions and Create new version.

Codewhisperer customization - update 1a

Similarly to what I did when I created the customization, I choose the source code repository and select Create.

Codewhisperer customization

Creating a new version of the customization might take a while because depends on the quantity of code to ingest. When ready, a new version appears under the Versions tab. You can compare the Evaluation score of the new version with the previous versions and decide to activate it to make it available to your developers. At any point, you can revert to a previous version.

Codewhisperer customization - update 3a

One of the aspects I like about active customizations is that I can monitor their effectiveness to help increase developer productivity in my organization.

On the Dashboard page, I track the User activity. I can track how many Daily active users there are, how many Lines of code have been generated, how many Security scans were performed, and so on. If, like me, you have used Amazon CodeWhisperer Professional in the past, when you use it now, you might still see the name CodeWhisperer appear on some pages. It will progressively be replaced with the new name: Amazon Q Developer.

Codewhisperer customization dashboard 1

Amazon Q generates more metrics and publishes them on Amazon CloudWatch. I can build CloudWatch dashboards to monitor the performance of the customizations I deployed. For example, here is a custom CloudWatch dashboard that monitors the code suggestions’ Block Accept Rate and Line Accept Rate, broken down per programming language.

Codewhisperer customization dashboard 2

Supported programming languages
Currently, you can customize Amazon Q recommendations on codebases written in Java, JavaScript, TypeScript, and Python. Files written in other languages supported by Amazon Q, such as C#, Go, Rust, PHP, Ruby, Kotlin, C, C++, Shell scripting, SQL, and Scala will not be used when creating the customization or when providing customized recommendations in the IDE.

Pricing and availability
Amazon Q is AWS Region agnostic and available to developers worldwide. Amazon Q is currently hosted in US East (N. Virginia). Amazon Q administrators can configure Amazon Q as an authorized cross-Region application if you have AWS IAM Identity Center in other Regions.

The Amazon Q customization capability is available at no additional charge within the Amazon Q Developer Professional subscription. You can create up to eight customizations per AWS account and keep up to two customizations active.

Now go build, and start to propose Amazon Q customizations to your organization’s developers.

— seb

Agents for Amazon Bedrock now support memory retention and code interpretation (preview)

This post was originally published on this site

With Agents for Amazon Bedrock, generative artificial intelligence (AI) applications can run multistep tasks across different systems and data sources. A couple of months back, we simplified the creation and configuration of agents. Today, we are introducing in preview two new fully managed capabilities:

Retain memory across multiple interactions – Agents can now retain a summary of their conversations with each user and be able to provide a smooth, adaptive experience, especially for complex, multistep tasks, such as user-facing interactions and enterprise automation solutions like booking flights or processing insurance claims.

Support for code interpretation – Agents can now dynamically generate and run code snippets within a secure, sandboxed environment and be able to address complex use cases such as data analysis, data visualization, text processing, solving equations, and optimization problems. To make it easier to use this feature, we also added the ability to upload documents directly to an agent.

Let’s see how these new capabilities work in more detail.

Memory retention across multiple interactions
With memory retention, you can build agents that learn and adapt to each user’s unique needs and preferences over time. By maintaining a persistent memory, agents can pick up right where the users left off, providing a smooth flow of conversations and workflows, especially for complex, multistep tasks.

Imagine a user booking a flight. Thanks to the ability to retain memory, the agent can learn their travel preferences and use that knowledge to streamline subsequent booking requests, creating a personalized and efficient experience. For example, it can automatically propose the right seat to a user or a meal similar to their previous choices.

Using memory retention to be more context-aware also simplifies business process automation. For example, an agent used by an enterprise to process customer feedback can now be aware of previous and on-going interactions with the same customer without having to handle custom integrations.

Each user’s conversation history and context are securely stored under a unique memory identifier (ID), ensuring complete separation between users. With memory retention, it’s easier to build agents that provide seamless, adaptive, and personalized experiences that continuously improve over time. Let’s see how this works in practice.

Using memory retention in Agents for Amazon Bedrock
In the Amazon Bedrock console, I choose Agents from the Builder Tools section of the navigation pane and start creating an agent.

For the agent, I use agent-book-flight as the name with this as description:

Help book a flight.

Then, in the agent builder, I select the Anthropic’s Claude 3 Sonnet model and enter these instructions:

To book a flight, you should know the origin and destination airports and the day and time the flight takes off.

In Additional settings, I enable User input to allow the agent to ask clarifying questions to capture necessary inputs. This will help when a request to book a flight misses some necessary information such as the origin and destination or the date and time of the flight.

In the new Memory section, I enable memory to generate and store a session summary at the end of each session and use the default 30 days for memory duration.

Console screenshot.

Then, I add an action group to search and book flights. I use search-and-book-flights as name and this description:

Search for flights between two destinations on a given day and book a specific flight.

Then, I choose to define the action group with function details and then to create a new Lambda function. The Lambda function will implement the business logic for all the functions in this action group.

I add two functions to this action group: one to search for flights and another to book flights.

The first function is search-for-flights and has this description:

Search for flights on a given date between two destinations.

All parameters of this function are required and of type string. Here are the parameters’ names and descriptions:

origin_airport – Origin IATA airport code
destination_airport –
Destination IATA airport code
date –
Date of the flight in YYYYMMDD format

The second function is book-flight and uses this description:

Book a flight at a given date and time between two destinations.

Again, all parameters are required and of type string. These are the names and descriptions for the parameters:

origin_airportOrigin IATA airport code
destination_airportDestination IATA airport code
dateDate of the flight in YYYYMMDD format
timeTime of the flight in HHMM format

To complete the creation of the agent, I choose Create.

To access the source code of the Lambda function, I choose the search-and-book-flights action group and then View (near the Select Lambda function settings). Normally, I’d use this Lambda function to integrate with an existing system such as a travel booking platform. In this case, I use this code to simulate a booking platform for the agent.

import json
import random
from datetime import datetime, time, timedelta


def convert_params_to_dict(params_list):
    params_dict = {}
    for param in params_list:
        name = param.get("name")
        value = param.get("value")
        if name is not None:
            params_dict[name] = value
    return params_dict


def generate_random_times(date_str, num_flights, min_hours, max_hours):
    # Set seed based on input date
    seed = int(date_str)
    random.seed(seed)

    # Convert min_hours and max_hours to minutes
    min_minutes = min_hours * 60
    max_minutes = max_hours * 60

    # Generate random times
    random_times = set()
    while len(random_times) < num_flights:
        minutes = random.randint(min_minutes, max_minutes)
        hours, mins = divmod(minutes, 60)
        time_str = f"{hours:02d}{mins:02d}"
        random_times.add(time_str)

    return sorted(random_times)


def get_flights_for_date(date):
    num_flights = random.randint(1, 6) # Between 1 and 6 flights per day
    min_hours = 6 # 6am
    max_hours = 22 # 10pm
    flight_times = generate_random_times(date, num_flights, min_hours, max_hours)
    return flight_times
    
    
def get_days_between(start_date, end_date):
    # Convert string dates to datetime objects
    start = datetime.strptime(start_date, "%Y%m%d")
    end = datetime.strptime(end_date, "%Y%m%d")
    
    # Calculate the number of days between the dates
    delta = end - start
    
    # Generate a list of all dates between start and end (inclusive)
    date_list = [start + timedelta(days=i) for i in range(delta.days + 1)]
    
    # Convert datetime objects back to "YYYYMMDD" string format
    return [date.strftime("%Y%m%d") for date in date_list]


def lambda_handler(event, context):
    print(event)
    agent = event['agent']
    actionGroup = event['actionGroup']
    function = event['function']
    param = convert_params_to_dict(event.get('parameters', []))

    if actionGroup == 'search-and-book-flights':
        if function == 'search-for-flights':
            flight_times = get_flights_for_date(param['date'])
            body = f"On {param['date']} (YYYYMMDD), these are the flights from {param['origin_airport']} to {param['destination_airport']}:n{json.dumps(flight_times)}"
        elif function == 'book-flight':
            body = f"Flight from {param['origin_airport']} to {param['destination_airport']} on {param['date']} (YYYYMMDD) at {param['time']} (HHMM) booked and confirmed."
        elif function == 'get-flights-in-date-range':
            days = get_days_between(param['start_date'], param['end_date'])
            flights = {}
            for day in days:
                flights[day] = get_flights_for_date(day)
            body = f"These are the times (HHMM) for all the flights from {param['origin_airport']} to {param['destination_airport']} between {param['start_date']} (YYYYMMDD) and {param['end_date']} (YYYYMMDD) in JSON format:n{json.dumps(flights)}"
        else:
            body = f"Unknown function {function} for action group {actionGroup}."
    else:
        body = f"Unknown action group {actionGroup}."
    
    # Format the output as expected by the agent
    responseBody =  {
        "TEXT": {
            "body": body
        }
    }

    action_response = {
        'actionGroup': actionGroup,
        'function': function,
        'functionResponse': {
            'responseBody': responseBody
        }

    }

    function_response = {'response': action_response, 'messageVersion': event['messageVersion']}
    print(f"Response: {function_response}")

    return function_response

I prepare the agent to test it in the console and ask this question:

Which flights are available from London Heathrow to Rome Fiumicino on July 20th, 2024?

The agent replies with a list of times. I choose Show trace to get more information about how the agent processed my instructions.

In the Trace tab, I explore the trace steps to understand the chain of thought used by the agent’s orchestration. For example, here I see that the agent handled the conversion of the airport names to codes (LHR for London Heathrow, FCO for Rome Fiumicino) before calling the Lambda function.

In the new Memory tab, I see what’s the content of the memory. The console uses a specific test memory ID. In an application, to keep memory separated for each user, I can use a different memory ID for every user.

I look at the list of flights and ask to book one:

Book the one at 6:02pm.

The agent replies confirming the booking.

After a few minutes, after the session has expired, I see a summary of my conversation in the Memory tab.

Console screenshot.

I choose the broom icon to start with a new conversation and ask a question that, by itself, doesn’t provide a full context to the agent:

Which other flights are available on the day of my flight?

The agent recalls the flight that I booked from our previous conversation. To provide me with an answer, the agent asks me to confirm the flight details. Note that the Lambda function is just a simulation and didn’t store the booking information in any database. The flight details were retrieved from the agent’s memory.

Console screenshot.

I confirm those values and get the list of the other flights with the same origin and destination on that day.

Yes, please.

To better demonstrate the benefits of memory retention, let’s call the agent using the AWS SDK for Python (Boto3). To do so, I first need to create an agent alias and version. I write down the agent ID and the alias ID because they are required when invoking the agent.

In the agent invocation, I add the new memoryId option to use memory. By including this option, I get two benefits:

  • The memory retained for that memoryId (if any) is used by the agent to improve its response.
  • A summary of the conversation for the current session is retained for that memoryId so that it can be used in another session.

Using an AWS SDK, I can also get the content or delete the content of the memory for a specific memoryId.

import random
import string
import boto3
import json

DEBUG = False # Enable debug to see all trace steps
DATE_FORMAT = "%Y-%m-%d %H:%M:%S"

AGENT_ID = 'URSVOGLFNX'
AGENT_ALIAS_ID = 'JHLX9ERCMD'

SESSION_ID_LENGTH = 10
SESSION_ID = "".join(
    random.choices(string.ascii_uppercase + string.digits, k=SESSION_ID_LENGTH)
)

# A unique identifier for each user
MEMORY_ID = 'danilop-92f79781-a3f3-4192-8de6-890b67c63d8b' 
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')


def invoke_agent(prompt, end_session=False):
    response = bedrock_agent_runtime.invoke_agent(
        agentId=AGENT_ID,
        agentAliasId=AGENT_ALIAS_ID,
        sessionId=SESSION_ID,
        inputText=prompt,
        memoryId=MEMORY_ID,
        enableTrace=DEBUG,
        endSession=end_session,
    )

    completion = ""

    for event in response.get('completion'):
        if DEBUG:
            print(event)
        if 'chunk' in event:
            chunk = event['chunk']
            completion += chunk['bytes'].decode()

    return completion


def delete_memory():
    try:
        response = bedrock_agent_runtime.delete_agent_memory(
            agentId=AGENT_ID,
            agentAliasId=AGENT_ALIAS_ID,
            memoryId=MEMORY_ID,
        )
    except Exception as e:
        print(e)
        return None
    if DEBUG:
        print(response)


def get_memory():
    response = bedrock_agent_runtime.get_agent_memory(
        agentId=AGENT_ID,
        agentAliasId=AGENT_ALIAS_ID,
        memoryId=MEMORY_ID,
        memoryType='SESSION_SUMMARY',
    )
    memory = ""
    for content in response['memoryContents']:
        if 'sessionSummary' in content:
            s = content['sessionSummary']
            memory += f"Session ID {s['sessionId']} from {s['sessionStartTime'].strftime(DATE_FORMAT)} to {s['sessionExpiryTime'].strftime(DATE_FORMAT)}n"
            memory += s['summaryText'] + "n"
    if memory == "":
        memory = "<no memory>"
    return memory


def main():
    print("Delete memory? (y/n)")
    if input() == 'y':
        delete_memory()

    print("Memory content:")
    print(get_memory())

    prompt = input('> ')
    if len(prompt) > 0:
        print(invoke_agent(prompt, end_session=False)) # Start a new session
        invoke_agent('end', end_session=True) # End the session

if __name__ == "__main__":
    main()

I run the Python script from my laptop. I choose to delete the current memory (even if it should be empty for now) and then ask to book a morning flight on a specific date.

Delete memory? (y/n)
y
Memory content:
<no memory>
> Book me on a morning flight on July 20th, 2024 from LHR to FCO.
I have booked you on the morning flight from London Heathrow (LHR) to Rome Fiumicino (FCO) on July 20th, 2024 at 06:44.

I wait a couple of minutes and run the script again. The script creates a new session every time it’s run. This time, I don’t delete memory and see the summary of my previous interaction with the same memoryId. Then, I ask on which date my flight is scheduled. Even though this is a new session, the agent finds the previous booking in the content of the memory.

Delete memory? (y/n)
n
Memory content:
Session ID MM4YYW0DL2 from 2024-07-09 15:35:47 to 2024-07-09 15:35:58
The user's goal was to book a morning flight from LHR to FCO on July 20th, 2024. The assistant booked a 0644 morning flight from LHR to FCO on the requested date of July 20th, 2024. The assistant successfully booked the requested morning flight for the user. The user requested a morning flight booking on July 20th, 2024 from London Heathrow (LHR) to Rome Fiumicino (FCO). The assistant booked a 0644 flight for the specified route and date.

> Which date is my flight on?
I recall from our previous conversation that you booked a morning flight from London Heathrow (LHR) to Rome Fiumicino (FCO) on July 20th, 2024. Please confirm if this date of July 20th, 2024 is correct for the flight you are asking about.

Yes, that’s my flight!

Depending on your use case, memory retention can help track previous interactions and preferences from the same user and provide a seamless experience across sessions.

A session summary includes a general overview and the points of view of the user and the assistant. For a short session as this one, this can cause some repetition.

Code interpretation support
Agents for Amazon Bedrock now supports code interpretation, so that agents can dynamically generate and run code snippets within a secure, sandboxed environment, significantly expanding the use cases they can address, including complex tasks such as data analysis, visualization, text processing, equation solving, and optimization problems.

Agents are now able to process input files with diverse data types and formats, including CSV, XLS, YAML, JSON, DOC, HTML, MD, TXT, and PDF. Code interpretation allows agents to also generate charts, enhancing the user experience and making data interpretation more accessible.

Code interpretation is used by an agent when the large language model (LLM) determines it can help solve a specific problem more accurately and does not support by design scenarios where users request arbitrary code generation. For security, each user session is provided with an isolated, sandboxed code runtime environment.

Let’s do a quick test to see how this can help an agent handle complex tasks.

Using code interpretation in Agents for Amazon Bedrock
In the Amazon Bedrock console, I select the same agent from the previous demo (agent-book-flight) and choose Edit in Agent Builder. In the agent builder, I enable Code Interpreter under Additional Settings and save.

Console screenshot.

I prepare the agent and test it straight in the console. First, I ask a mathematical question.

Compute the sum of the first 10 prime numbers.

After a few seconds, I get the answer from the agent:

The sum of the first 10 prime numbers is 129.

That’s accurate. Looking at the traces, the agent built and ran this Python program to compute what I asked:

import math

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True

primes = []
n = 2
while len(primes) < 10:
    if is_prime(n):
        primes.append(n)
    n += 1
    
print(f"The first 10 prime numbers are: {primes}")
print(f"The sum of the first 10 prime numbers is: {sum(primes)}")

Now, let’s go back to the agent-book-flight agent. I want to have a better understanding of the overall flights available during a long period of time. To do so, I start by adding a new function to the same action group to get all the flights available in a date range.

I name the new function get-flights-in-date-range and use this description:

Get all the flights between two destinations for each day in a date range.

All the parameters are required and of type string. These are the parameters names and descriptions:

origin_airportOrigin IATA airport code
destination_airportDestination IATA airport code
start_date – Start date of the flight in YYYYMMDD format
end_dateEnd date of the flight in YYYYMMDD format

If you look at the Lambda function code I shared earlier, you’ll find that it already supports this agent function.

Now that the agent has a way to extract more information with a single function call, I ask the agent to visualize flight information data in a chart:

Draw a chart with the number of flights each day from JFK to SEA for the first ten days of August, 2024.

The agent reply includes a chart:

Console screenshot.

I choose the link to download the image on my computer:

Flight chart.

That’s correct. In fact, the simulator in the Lambda functions generates between one and six flights per day as shown in the chart.

Using code interpretation with attached files
Because code interpretation allows agents to process and extract information from data, we introduced the capability to include documents when invoking an agent. For example, I have an Excel file with the number of flights booked for different flights:

Origin Destination Number of flights
LHR FCO 636
FCO LHR 456
JFK SEA 921
SEA JFK 544

Using the clip icon in the test interface, I attach the file and ask (the agent replies in bold):

What is the most popular route? And the least one?

Based on the analysis, the most popular route is JFK -> SEA with 921 bookings, and the least popular route is FCO -> LHR with 456 bookings.

How many flights in total have been booked?

The total number of booked flights across all routes is 2557.

Draw a chart comparing the % of flights booked for these routes compared to the total number.

Chart generated with Code Interpreter

I can look at the traces to see the Python code used to extract information from the file and pass it to the agent. I can attach more than one file and use different file formats. These options are available in AWS SDKs to let agents use files in your applications.

Things to Know
Memory retention is available in preview in all AWS Regions where Agents for Amazon Bedrocks and Anthropic’s Claude 3 Sonnet or Haiku (the models supported during the preview) are available. Code interpretation is available in preview in the US East (N. Virginia), US West (Oregon), and Europe (Frankfurt) Regions.

There are no additional costs during the preview for using memory retention and code interpretation with your agents. When using agents with these features, normal model use charges apply. When memory retention is enabled, you pay for the model used to summarize the session. For more information, see the Amazon Bedrock Pricing page.

To learn more, see the Agents for Amazon Bedrock section of the User Guide. For deep-dive technical content and to discover how others are using generative AI in their solutions, visit community.aws.

Danilo

Guardrails for Amazon Bedrock can now detect hallucinations and safeguard apps built using custom or third-party FMs

This post was originally published on this site

Guardrails for Amazon Bedrock enables customers to implement safeguards based on application requirements and and your company’s responsible artificial intelligence (AI) policies. It can help prevent undesirable content, block prompt attacks (prompt injection and jailbreaks), and remove sensitive information for privacy. You can combine multiple policy types to configure these safeguards for different scenarios and apply them across foundation models (FMs) on Amazon Bedrock, as well as custom and third-party FMs outside of Amazon Bedrock. Guardrails can also be integrated with Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock.

Guardrails for Amazon Bedrock provides additional customizable safeguards on top of native protections offered by FMs, delivering safety features that are among the best in the industry:

  • Blocks as much as 85% more harmful content
  • Allows customers to customize and apply safety, privacy and truthfulness protections within a single solution
  • Filters over 75% hallucinated responses for RAG and summarization workloads

Guardrails for Amazon Bedrock was first released in preview at re:Invent 2023 with support for policies such as content filter and denied topics. At general availability in April 2024, Guardrails supported four safeguards: denied topics, content filters, sensitive information filters, and word filters.

MAPFRE is the largest insurance company in Spain, operating in 40 countries worldwide. “MAPFRE implemented Guardrails for Amazon Bedrock to ensure Mark.IA (a RAG based chatbot) aligns with our corporate security policies and responsible AI practices.” said Andres Hevia Vega, Deputy Director of Architecture at MAPFRE. “MAPFRE uses Guardrails for Amazon Bedrock to apply content filtering to harmful content, deny unauthorized topics, standardize corporate security policies, and anonymize personal data to maintain the highest levels of privacy protection. Guardrails has helped minimize architectural errors and simplify API selection processes to standardize our security protocols. As we continue to evolve our AI strategy, Amazon Bedrock and its Guardrails feature are proving to be invaluable tools in our journey toward more efficient, innovative, secure, and responsible development practices.”

Today, we are announcing two more capabilities:

  1. Contextual grounding checks to detect hallucinations in model responses based on a reference source and a user query.
  2. ApplyGuardrail API to evaluate input prompts and model responses for all FMs (including FMs on Amazon Bedrock, custom and third-party FMs), enabling centralized governance across all your generative AI applications.

Contextual grounding check – A new policy type to detect hallucinations
Customers usually rely on the inherent capabilities of the FMs to generate grounded (credible) responses that are based on company’s source data. However, FMs can conflate multiple pieces of information, producing incorrect or new information – impacting the reliability of the application. Contextual grounding check is a new and fifth safeguard that enables hallucination detection in model responses that are not grounded in enterprise data or are irrelevant to the users’ query. This can be used to improve response quality in use cases such as RAG, summarization, or information extraction. For example, you can use contextual grounding checks with Knowledge Bases for Amazon Bedrock to deploy trustworthy RAG applications by filtering inaccurate responses that are not grounded in your enterprise data. The results retrieved from your enterprise data sources are used as the reference source by the contextual grounding check policy to validate the model response.

There are two filtering parameters for the contextual grounding check:

  1. Grounding – This can be enabled by providing a grounding threshold that represents the minimum confidence score for a model response to be grounded. That is, it is factually correct based on the information provided in the reference source and does not contain new information beyond the reference source. A model response with a lower score than the defined threshold is blocked and the configured blocked message is returned.
  2. Relevance – This parameter works based on a relevance threshold that represents the minimum confidence score for a model response to be relevant to the user’s query. Model responses with a lower score below the defined threshold are blocked and the configured blocked message is returned.

A higher threshold for the grounding and relevance scores will result in more responses being blocked. Make sure to adjust the scores based on the accuracy tolerance for your specific use case. For example, a customer-facing application in the finance domain may need a high threshold due to lower tolerance for inaccurate content.

Contextual grounding check in action
Let me walk you through a few examples to demonstrate contextual grounding checks.

I navigate to the AWS Management Console for Amazon Bedrock. From the navigation pane, I choose Guardrails, and then Create guardrail. I configure a guardrail with the contextual grounding check policy enabled and specify the thresholds for grounding and relevance.

To test the policy, I navigate to the Guardrail Overview page and select a model using the Test section. This allows me to easily experiment with various combinations of source information and prompts to verify the contextual grounding and relevance of the model response.

For my test, I use the following content (about bank fees) as the source:

• There are no fees associated with opening a checking account.
• The monthly fee for maintaining a checking account is $10.
• There is a 1% transaction charge for international transfers.
• There are no charges associated with domestic transfers.
• The charges associated with late payments of a credit card bill is 23.99%.

Then, I enter questions in the Prompt field, starting with:

"What are the fees associated with a checking account?"

I choose Run to execute and View Trace to access details:

The model response was factually correct and relevant. Both grounding and relevance scores were above their configured thresholds, allowing the model response to be sent back to the user.

Next, I try another prompt:

"What is the transaction charge associated with a credit card?"

The source data only mentions about late payment charges for credit cards, but doesn’t mention transaction charges associated with the credit card. Hence, the model response was relevant (related to the transaction charge), but factually incorrect. This resulted in a low grounding score, and the response was blocked since the score was below the configured threshold of 0.85.

Finally, I tried this prompt:

"What are the transaction charges for using a checking bank account?"

In this case, the model response was grounded, since that source data mentions the monthly fee for a checking bank account. However, it was irrelevant because the query was about transaction charges, and the response was related to monthly fees. This resulted in a low relevance score, and the response was blocked since it was below the configured threshold of 0.5.

Here is an example of how you would configure contextual grounding with the CreateGuardrail API using the AWS SDK for Python (Boto3):

   bedrockClient.create_guardrail(
        name='demo_guardrail',
        description='Demo guardrail',
        contextualGroundingPolicyConfig={
            "filtersConfig": [
                {
                    "type": "GROUNDING",
                    "threshold": 0.85,
                },
                {
                    "type": "RELEVANCE",
                    "threshold": 0.5,
                }
            ]
        },
    )

After creating the guardrail with contextual grounding check, it can be associated with Knowledge Bases for Amazon Bedrock, Agents for Amazon Bedrock, or referenced during model inference.

But, that’s not all!

ApplyGuardrail – Safeguard applications using FMs available outside of Amazon Bedrock
Until now, Guardrails for Amazon Bedrock was primarily used to evaluate input prompts and model responses for FMs available in Amazon Bedrock, only during the model inference.

Guardrails for Amazon Bedrock now supports a new ApplyGuardrail API to evaluate all user inputs and model responses against the configured safeguards. This capability enables you to apply standardized and consistent safeguards for all your generative AI applications built using any self-managed (custom), or third-party FMs, regardless of the underlying infrastructure. In essence, you can now use Guardrails for Amazon Bedrock to apply the same set of safeguards on input prompts and model responses for FMs available in Amazon Bedrock, FMs available in other services (such as Amazon SageMaker), on infrastructure such as Amazon Elastic Compute Cloud (Amazon EC2), on on-premises deployments, and other third-party FMs beyond Amazon Bedrock.

In addition, you can also use the ApplyGuardrail API to evaluate user inputs and model responses independently at different stages of your generative AI applications, enabling more flexibility in application development. For example, in a RAG application, you can use guardrails to evaluate and filter harmful user inputs prior to performing a search on your knowledge base. Subsequently, you can evaluate the output separately after completing the retrieval (search) and the generation step from the FM.

Let me show you how to use the ApplyGuardrail API in an application. In the following example, I have used the AWS SDK for Python (Boto3).

I started by creating a new guardrail (using the create_guardrail function) along with a set of denied topics, and created a new version (using the create_guardrail_version function):

import boto3

bedrockRuntimeClient = boto3.client('bedrock-runtime', region_name="us-east-1")
bedrockClient = boto3.client('bedrock', region_name="us-east-1")
guardrail_name = 'fiduciary-advice'

def create_guardrail():
    
    create_response = bedrockClient.create_guardrail(
        name=guardrail_name,
        description='Prevents the model from providing fiduciary advice.',
        topicPolicyConfig={
            'topicsConfig': [
                {
                    'name': 'Fiduciary Advice',
                    'definition': 'Providing personalized advice or recommendations on managing financial assets in a fiduciary capacity.',
                    'examples': [
                        'What stocks should I invest in for my retirement?',
                        'Is it a good idea to put my money in a mutual fund?',
                        'How should I allocate my 401(k) investments?',
                        'What type of trust fund should I set up for my children?',
                        'Should I hire a financial advisor to manage my investments?'
                    ],
                    'type': 'DENY'
                }
            ]
        },
        blockedInputMessaging='I apologize, but I am not able to provide personalized advice or recommendations on managing financial assets in a fiduciary capacity.',
        blockedOutputsMessaging='I apologize, but I am not able to provide personalized advice or recommendations on managing financial assets in a fiduciary capacity.',
    )

    version_response = bedrockClient.create_guardrail_version(
        guardrailIdentifier=create_response['guardrailId'],
        description='Version of Guardrail to block fiduciary advice'
    )

    return create_response['guardrailId'], version_response['version']

Once the guardrail was created, I invoked the apply_guardrail function with the required text to be evaluated along with the ID and version of the guardrail that I just created:

def apply(guardrail_id, guardrail_version):

    response = bedrockRuntimeClient.apply_guardrail(guardrailIdentifier=guardrail_id,guardrailVersion=guardrail_version, source='INPUT', content=[{"text": {"inputText": "How should I invest for my retirement? I want to be able to generate $5,000 a month"}}])
                                                                                                                                                    
    print(response["output"][0]["text"])

I used the following prompt:

How should I invest for my retirement? I want to be able to generate $5,000 a month

Thanks to the guardrail, the message got blocked and the pre-configured response was returned:

I apologize, but I am not able to provide personalized advice or recommendations on managing financial assets in a fiduciary capacity. 

In this example, I set the source to INPUT, which means that the content to be evaluated is from a user (typically the LLM prompt). To evaluate the model output, the source should be set to OUTPUT.

Now available
Contextual grounding check and the ApplyGuardrail API are available today in all AWS Regions where Guardrails for Amazon Bedrock is available. Try them out in the Amazon Bedrock console, and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts.

To learn more about Guardrails, visit the Guardrails for Amazon Bedrock product page and the Amazon Bedrock pricing page to understand the costs associated with Guardrail policies.

Don’t forget to visit the community.aws site to find deep-dive technical content on solutions and discover how our builder communities are using Amazon Bedrock in their solutions.

— Abhishek

Knowledge Bases for Amazon Bedrock now supports additional data connectors (in preview)

This post was originally published on this site

Using Knowledge Bases for Amazon Bedrock, foundation models (FMs) and agents can retrieve contextual information from your company’s private data sources for Retrieval Augmented Generation (RAG). RAG helps FMs deliver more relevant, accurate, and customized responses.

Over the past months, we’ve continuously added choices of embedding models, vector stores, and FMs to Knowledge Bases.

Today, I’m excited to share that in addition to Amazon Simple Storage Service (Amazon S3), you can now connect your web domains, Confluence, Salesforce, and SharePoint as data sources to your RAG applications (in preview).

Select Web Crawler as data source

New data source connectors for web domains, Confluence, Salesforce, and SharePoint
By including your web domains, you can give your RAG applications access to your public data, such as your company’s social media feeds, to enhance the relevance, timeliness, and comprehensiveness of responses to user inputs. Using the new connectors, you can now add your existing company data sources in Confluence, Salesforce, and SharePoint to your RAG applications.

Let me show you how this works. In the following examples, I’ll use the web crawler to add a web domain and connect Confluence as a data source to a knowledge base. Connecting Salesforce and SharePoint as data sources follows a similar pattern.

Add a web domain as a data source
To give it a try, navigate to the Amazon Bedrock console and create a knowledge base. Provide the knowledge base details, including name and description, and create a new or use an existing service role with the relevant AWS Identity and Access Management (IAM) permissions.

Create knowledge base

Then, choose the data source you want to use. I select Web Crawler.

Connect additional data sources with Knowledge Bases for Amazon Bedrock

In the next step, I configure the web crawler. I enter a name and description for the web crawler data source. Then, I define the source URLs. For this demo, I add the URL of my AWS News Blog author page that lists all my posts. You can add up to ten seed or starting point URLs of the websites you want to crawl.

Configure Web Crawler as data source

Optionally, you can configure custom encryption settings and the data deletion policy that defines whether the vector store data will be retained or deleted when the data source is deleted. I keep the default advanced settings.

In the sync scope section, you can configure the level of sync domains you want to use, the maximum number of URLs to crawl per minute, and regular expression patterns to include or exclude certain URLs.

Define sync scope

After you’re done with the web crawler data source configuration, complete the knowledge base setup by selecting an embeddings model and configuring your vector store of choice. You can check the knowledge base details after creation to monitor the data source sync status. After the sync is complete, you can test the knowledge base and see FM responses with web URLs as citations.

Test your knowledge base

To create data sources programmatically, you can use the AWS Command Line Interface (AWS CLI) or AWS SDKs. For code examples, check out the Amazon Bedrock User Guide.

Connect Confluence as a data source
Now, let’s select Confluence as a data source in the knowledge base setup.

Connect Confluence as a data source with Knowledge Bases for Amazon Bedrock

To configure Confluence as a data source, I provide a name and description for the data source again, and choose the hosting method, and enter the Confluence URL.

To connect to Confluence, you can choose between base and OAuth 2.0 authentication. For this demo, I choose Base authentication, which expects a user name (your Confluence user account email address) and password (Confluence API token). I store the relevant credentials in AWS Secrets Manager and choose the secret.

Note: Make sure that the secret name starts with “AmazonBedrock-” and your IAM service role for Knowledge Bases has permissions to access this secret in Secrets Manager.

Configure Confluence as a data source

In the metadata settings, you can control the scope of content you want to crawl using regular expression include and exclude patterns and configure the content chunking and parsing strategy.

Configure Confluence as a data source

After you’re done with the Confluence data source configuration, complete the knowledge base setup by selecting an embeddings model and configuring your vector store of choice.

You can check the knowledge base details after creation to monitor the data source sync status. After the sync is complete, you can test the knowledge base. For this demo, I have added some fictional meeting notes to my Confluence space. Let’s ask about the action items from one of the meetings!

Confluence as a data source for Knowledge Bases

For instructions on how to connect Salesforce and SharePoint as a data source, check out the Amazon Bedrock User Guide.

Things to know

  • Inclusion and exclusion filters – All data sources support inclusion and exclusion filters so you can have granular control over what data is crawled from a given source.
  • Web Crawler – Remember that you must only use the web crawler on your own web pages or web pages that you have authorization to crawl.

Now available
The new data source connectors are available today in all AWS Regions where Knowledge Bases for Amazon Bedrock is available. Check the Region list for details and future updates. To learn more about Knowledge Bases, visit the Amazon Bedrock product page. For pricing details, review the Amazon Bedrock pricing page.

Give the new data source connectors a try in the Amazon Bedrock console today, send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts, and engage with the generative AI builder community at community.aws.

— Antje

Introducing Amazon Q Developer in SageMaker Studio to streamline ML workflows

This post was originally published on this site

Today, we are announcing a new capability in Amazon SageMaker Studio that simplifies and accelerates the machine learning (ML) development lifecycle. Amazon Q Developer in SageMaker Studio is a generative AI-powered assistant built natively into the SageMaker JupyterLab experience. This assistant takes your natural language inputs and crafts a tailored execution plan for your ML development lifecycle by recommending the best tools for each task, providing step-by-step guidance, generating code to get started, and offering troubleshooting assistance when you encounter errors. It also helps when facing challenges such as translating complex ML problems into smaller tasks and searching for relevant information in the documentation.

You may be a first-time user who evaluates Amazon SagaMaker for generative artificial intelligence (generative AI) or traditional ML use cases or a returning user who knows how to use SageMaker but want to further improve productivity and accelerate time to insights. With Amazon Q Developer in SageMaker Studio, you can build, train and deploy ML models without having to leave SageMaker Studio to search for sample notebooks, code snippets and instructions on documentation pages and online forums.

Now, let me show you different capabilities of Amazon Q Developer in SageMaker Studio.

Getting started with Amazon Q Developer in SageMaker Studio
In the Amazon SageMaker console, I go to Domains under Admin configurations and enable Amazon Q Developer under domain settings. If you are new to Amazon SageMaker, check out Amazon SageMaker domain overview documentation. I choose Studio from the Launch dropdown of mytestuser to launch the Amazon SageMaker Studio.

When my environment is ready, I choose JupyterLab under Applications and then choose Open JupyterLab to open up my Jupyter notebook.

The generative AI–powered assistant Amazon Q Developer is next to my Jupyter notebook. There are built-in commands that I can now use to get started.

I can immediately start the conversation with Amazon Q Developer by describing an ML problem in natural language. The assistant helps me use SageMaker without having to spend time researching how to use the tool and its features. I use the following prompt:

I have data in my S3 bucket. I want to use that data and train an XGBoost algorithm for prediction. Can you list down the steps with sample code.

Amazon Q Developer provides me step-by-step guidance and generates code for training an XGBoost algorithm for prediction. I can follow the recommended steps and add the required cells to my notebook easily.

Amazon Q Developer Code Generation

Let me try another prompt to generate code for downloading a dataset from S3 and read it using Pandas. I can use it to build or train my model. This helps streamlining the coding process by handling repetitive tasks and reducing manual work. I use the following prompt:

Can you write the code to download a dataset from S3 and read it using Pandas?

I can also ask Amazon Q Developer for guidance to debug and fix errors. The assistant helps me troubleshoot based on frequently seen errors and resolutions, preventing me from time-consuming online research and trial-and-error approaches. I use the following prompt:

How can I resolve the error "Unable to infer schema for JSON. It must be specified manually." when running a merge job for model quality monitoring with batch inference in SageMaker?

As a final example, I ask Amazon Q Developer to provide me recommendations on how to schedule a notebook job. I use the following prompt to get the answer:

What are the options to schedule a notebook job? 

Now available
You have access to Amazon Q Developer in all Regions where Amazon SageMaker is generally available.

The assistant is available for all Amazon Q Developer Pro Tier users. For pricing information, visit the Amazon Q Developer pricing page.

Get started with Amazon Q Developer in SageMaker Studio today to access the generative AI–powered assistant at any point of your ML development lifecycle.

— Esra