Category Archives: AWS

Now Available – Second-Generation FPGA-Powered Amazon EC2 instances (F2)

This post was originally published on this site

Equipped with up to eight AMD Field-Programmable Gate Arrays (FPGAs), AMD EPYC (Milan) processors with up to 192 cores, High Bandwidth Memory (HBM), up to 8 TiB of SSD-based instance storage, and up to 2 TiB of memory, the new F2 instances are available in two sizes, and are ready to accelerate your genomics, multimedia processing, big data, satellite communication, networking, silicon simulation, and live video workloads.

A Quick FPGA Recap
Here’s how I explained the FPGA model when we previewed the first generation of FPGA-powered Amazon Elastic Compute Cloud (Amazon EC2) instances

One of the more interesting routes to a custom, hardware-based solution is known as a Field Programmable Gate Array, or FPGA. In contrast to a purpose-built chip which is designed with a single function in mind and then hard-wired to implement it, an FPGA is more flexible. It can be programmed in the field, after it has been plugged in to a socket on a PC board. Each FPGA includes a fixed, finite number of simple logic gates. Programming an FPGA is “simply” a matter of connecting them up to create the desired logical functions (AND, OR, XOR, and so forth) or storage elements (flip-flops and shift registers). Unlike a CPU which is essentially serial (with a few parallel elements) and has fixed-size instructions and data paths (typically 32 or 64 bit), the FPGA can be programmed to perform many operations in parallel, and the operations themselves can be of almost any width, large or small.

Since that launch, AWS customers have used F1 instances to host many different types of applications and services. With a newer FPGA, more processing power, and more memory bandwidth, the new F2 instances are an even better host for highly parallelizable, compute-intensive workloads.

Each of the AMD Virtex UltraScale+ HBM VU47P FPGAs has 2.85 million system logic cells and 9,024 DSP slices (up to 28 TOPS of DSP compute performance when processing INT8 values). The FPGA Accelerator Card associated with each F2 instance provides 16 GiB of High Bandwidth Memory and 64 GiB of DDR4 memory per FPGA.

Inside the F2
F2 instances are powered by 3rd generation AMD EPYC (Milan) processors. In comparison to F1 instances, they offer up to 3x as many processor cores, up to twice as much system memory and NVMe storage, and up to 4x the network bandwidth. Each FPGA comes with 16 GiB High Bandwidth Memory (HBM) with up to 460 GiB/s bandwidth. Here are the instance sizes and specs:

Instance Name vCPUs
FPGAs
FPGA Memory
HBM / DDR4
Instance Memory
NVMe Storage
EBS Bandwidth
Network Bandwidth
f2.12xlarge 48 2 32 GiB /
128 GiB
512 GiB 1900 GiB
(2x 950 GiB)
15 Gbps 25 Gbps
f2.48xlarge 192 8 128 GiB /
512 GiB
2,048 GiB 7600 GiB
(8x 950 GiB)
60 Gbps 100 Gbps

The high-end f2.48xlarge instance supports the AWS Cloud Digital Interface (CDI) to reliably transport uncompressed live video between applications, with instance-to-instance latency as low as 8 milliseconds.

Building FPGA Applications
The AWS EC2 FPGA Development Kit contains the tools that you will use to develop, simulate, debug, compile, and run your hardware-accelerated FPGA applications. You can launch the kit’s FPGA Developer AMI on a memory-optimized or compute-optimized instance for development and simulation, then use an F2 instance for final debugging and testing.

The tools included in the developer kit support a variety of development paradigms, tools, accelerator languages, and debugging options. Regardless of your choice, you will ultimately create an Amazon FPGA Image (AFI) which contains your custom acceleration logic and the AWS Shell which implements access to the FPGA memory, PCIe bus, interrupts, and external peripherals. You can deploy AFIs to as many F2 instances as desired, share with other AWS accounts or publish on AWS Marketplace.

If you have already created an application that runs on F1 instances, you will need to update your development environment to use the latest AMD tools, then rebuild and validate before upgrading to F2 instances.

FPGA Instances in Action
Here are some cool examples of how F1 and F2 instances can support unique and highly demanding workloads:

Genomics – Multinational pharmaceutical and biotechnology company AstraZeneca used thousands of F1 instances to build the world’s fastest genomics pipeline, able to process over 400K whole genome samples in under two months. They will adopt Illumina DRAGEN for F2 to realize better performance at a lower cost, while accelerating disease discovery, diagnosis, and treatment.

Satellite Communication – Satellite operators are moving from inflexible and expensive physical infrastructure (modulators, demodulators, combiners, splitters, and so forth) toward agile, software-defined, FPGA-powered solutions. Using the digital signal processor (DSP) elements on the FPGA, these solutions can be reconfigured in the field to support new waveforms and to meet changing requirements. Key F2 features such as support for up to 8 FPGAs per instance, generous amounts of network bandwidth, and support for the Data Plan Development Kit (DPDK) using Virtual Ethernet can be used to support processing of multiple, complex waveforms in parallel.

AnalyticsNeuroBlade‘s SQL Processing Unit (SPU) integrates with Presto, Apache Spark, and other open source query engines, delivering faster query processing and market-leading query throughput efficiency when run on F2 instances.

Things to Know
Here are a couple of final things that you should know about the F2 instances:

Regions – F2 instances are available today in the US East (N. Virginia) and Europe (London) AWS Regions, with plans to extend availability to additional regions over time.

Operating Systems – F2 instances are Linux-only.

Purchasing Options – F2 instances are available in On-Demand, SpotSavings Plan, Dedicated Instance, and Dedicated Host form.

Jeff;

Introducing Buy with AWS: an accelerated procurement experience on AWS Partner sites, powered by AWS Marketplace

This post was originally published on this site

Today, we are announcing Buy with AWS, a new way to discover and purchase solutions available in AWS Marketplace from AWS Partner sites. You can use Buy with AWS to accelerate and streamline your product procurement process on websites outside of Amazon Web Services (AWS). This feature provides you the ability to find, try, and buy solutions from Partner websites using your AWS account

AWS Marketplace is a curated digital store for you to find, buy, deploy, and manage cloud solutions from Partners. Buy with AWS is another step towards AWS Marketplace making it easy for you to find and procure the right Partner solutions, when and where you need them. You can conveniently find and procure solutions in AWS Marketplace, through integrated AWS service consoles, and now on Partner websites.

Accelerate cloud solution discovery and evaluation

You can now discover solutions from Partners available for purchase through AWS Marketplace as you explore solutions on the web beyond AWS.

Look for products that are “Available in AWS Marketplace” when browsing on Partner sites, then accelerate your evaluation process with fast access to free trials, demo requests, and inquiries for custom pricing.

For example, I want to evaluate Wiz to see how it can help with my cloud security requirements. While browsing the Wiz website, I come across a page where I see “Connect Wiz with Amazon Web Services (AWS)”.

Wiz webpage featuring Buy With AWS

I choose Try with AWS. It asks me to sign in to my AWS account if I’m not signed in already. I’m then presented with a Wiz and AWS co-branded page for me to sign up for the free trial.

Wiz and AWS co-branded page to sign up for free trial using Buy with AWS through AWS Marketplace

The discovery experience that you see will vary depending on type of the Partner website you’re shopping from. Wiz is an example of how Buy with AWS can be implemented by an independent software vendor (ISV). Now, let’s look at an example of an AWS Marketplace Channel Partner, or reseller, who operates a storefront of their own.

I browse to the Bytes storefront with product listings from AWS Marketplace. I have the option to filter and search from the curated product listings, which are available in AWS Marketplace, on the Bytes site.

Bytes storefront with product listings from AWS Marketplace

I choose View Details for Fortinet and see an option to Request Private Offer from AWS.

Bytes storefront with option to Request Private Offer for Fortinet from AWS Marketplace

As you can tell, on a Channel Partner site, you can browse curated product listings available in AWS Marketplace, filter products, and request custom pricing directly from their website.

Streamline product procurement on AWS Partner sites
I had a seamless experience using Buy with AWS to access a free trial for Wiz and browse through the Bytes storefront to request a private offer.

Now I want to try Databricks for one of the applications I’m building. I sign up for a Databricks trial through their website.

Database homepage after login with option to Upgrade

I chose Upgrade and see Databricks is available in AWS Marketplace, which gives me the option to Buy with AWS.

Option to upgrade to Databricks premium using Buy with AWS feature of AWS marketplace

I choose Buy with AWS, and after I sign in to my AWS account, I land on a Databricks and AWS Marketplace co-branded procurement page.

Databricks and AWS co-branded page to subscribe using Buy with AWS

I complete the purchase on the co-branded procurement page and continue to set up my Databricks account.

Databricks and AWS co-branded page after subscribing using Buy with AWS

As you can tell, I didn’t have to navigate the challenge of managing procurement processes for multiple vendors. I also didn’t have to speak with a sales representative or onboard a new vendor in my billing system, which would have required multiple approvals and delayed the overall process.

Access centralized billing and benefits through AWS Marketplace
Because Buy with AWS purchases are transacted through and managed in AWS Marketplace, you also benefit from the post-purchase experience of AWS Marketplace, including consolidated AWS billing, centralized subscription management, and access to cost optimization tools.

For example, through the AWS Billing and Cost Management console, I can centrally manage all my AWS purchases, including Buy with AWS purchases, from one dashboard. I can easily access and process invoices for all of my organization’s AWS purchases. I also need to have valid AWS Identity and Access Management (IAM) permissions to manage subscriptions and make a purchase through AWS Marketplace.

AWS Marketplace not only simplifies my billing but also helps in maintaining governance over spending by helping me manage purchasing authority and subscription access for my organization with centralized visibility and controls. I can manage my budget with pricing flexibility, cost transparency, and AWS cost management tools.

Buy with AWS for Partners
Buy with AWS enables Partners who sell or resell products in AWS Marketplace to create new solution discovery and buying experiences for customers on their own websites. By adding call to action (CTA) buttons to their websites such as “Buy with AWS”, “Try free with AWS”, “Request private offer”, and “Request demo”, Partners can help accelerate product evaluation and the path-to-purchase for customers.

By integrating with AWS Marketplace APIs, Partners can display products from the AWS Marketplace catalog, allow customers to sort and filter products, and streamline private offers. Partners implementing Buy with AWS can access AWS Marketplace creative and messaging resources for guidance on building their own web experiences. Partners who implement Buy with AWS can access metrics for insights into engagement and conversion performance.

The Buy with AWS onboarding guide in the AWS Marketplace Management Portal details how Partners can get started.

Learn more
Visit the Buy with AWS page to learn more and explore Partner sites that offer Buy with AWS.

To learn more about selling or reselling products using Buy with AWS on your website, visit:

Prasad

Accelerate foundation model training and fine-tuning with new Amazon SageMaker HyperPod recipes

This post was originally published on this site

Today, we’re announcing the general availability of Amazon SageMaker HyperPod recipes to help data scientists and developers of all skill sets to get started training and fine-tuning foundation models (FMs) in minutes with state-of-the-art performance. They can now access optimized recipes for training and fine-tuning popular publicly available FMs such as Llama 3.1 405B, Llama 3.2 90B, or Mixtral 8x22B.

At AWS re:Invent 2023, we introduced SageMaker HyperPod to reduce time to train FMs by up to 40 percent and scale across more than a thousand compute resources in parallel with preconfigured distributed training libraries. With SageMaker HyperPod, you can find the required accelerated compute resources for training, create the most optimal training plans, and run training workloads across different blocks of capacity based on the availability of compute resources.

SageMaker HyperPod recipes include a training stack tested by AWS, removing tedious work experimenting with different model configurations, eliminating weeks of iterative evaluation and testing. The recipes automate several critical steps, such as loading training datasets, applying distributed training techniques, automating checkpoints for faster recovery from faults, and managing the end-to-end training loop.

With a simple recipe change, you can seamlessly switch between GPU- or Trainium-based instances to further optimize training performance and reduce costs. You can easily run workloads in production on SageMaker HyperPod or SageMaker training jobs.

SageMaker HyperPod recipes in action
To get started, visit the SageMaker HyperPod recipes GitHub repository to browse training recipes for popular publicly available FMs.

You only need to edit straightforward recipe parameters to specify an instance type and the location of your dataset in cluster configuration, then run the recipe with a single line command to achieve state-of-art performance.

You need to edit the recipe config.yaml file to specify the model and cluster type after cloning the repository.

$ git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
$ cd sagemaker-hyperpod-recipes
$ pip3 install -r requirements.txt.
$ cd ./recipes_collections
$ vim config.yaml

The recipes support SageMaker HyperPod with Slurm, SageMaker HyperPod with Amazon Elastic Kubernetes Service (Amazon EKS), and SageMaker training jobs. For example, you can set up a cluster type (Slurm orchestrator), a model name (Meta Llama 3.1 405B language model), an instance type (ml.p5.48xlarge), and your data locations, such as storing the training data, results, logs, and so on.

defaults:
- cluster: slurm # support: slurm / k8s / sm_jobs
- recipes: fine-tuning/llama/hf_llama3_405b_seq8k_gpu_qlora # name of model to be trained
debug: False # set to True to debug the launcher configuration
instance_type: ml.p5.48xlarge # or other supported cluster instances
base_results_dir: # Location(s) to store the results, checkpoints, logs etc.

You can optionally adjust model-specific training parameters in this YAML file, which outlines the optimal configuration, including the number of accelerator devices, instance type, training precision, parallelization and sharding techniques, the optimizer, and logging to monitor experiments through TensorBoard.

run:
  name: llama-405b
  results_dir: ${base_results_dir}/${.name}
  time_limit: "6-00:00:00"
restore_from_path: null
trainer:
  devices: 8
  num_nodes: 2
  accelerator: gpu
  precision: bf16
  max_steps: 50
  log_every_n_steps: 10
  ...
exp_manager:
  exp_dir: # location for TensorBoard logging
  name: helloworld 
  create_tensorboard_logger: True
  create_checkpoint_callback: True
  checkpoint_callback_params:
    ...
  auto_checkpoint: True # for automated checkpointing
use_smp: True 
distributed_backend: smddp # optimized collectives
# Start training from pretrained model
model:
  model_type: llama_v3
  train_batch_size: 4
  tensor_model_parallel_degree: 1
  expert_model_parallel_degree: 1
  # other model-specific params

To run this recipe in SageMaker HyperPod with Slurm, you must prepare the SageMaker HyperPod cluster following the cluster setup instruction.

Then, connect to the SageMaker HyperPod head node, access the Slurm controller, and copy the edited recipe. Next, you run a helper file to generate a Slurm submission script for the job that you can use for a dry run to inspect the content before starting the training job.

$ python3 main.py --config-path recipes_collection --config-name=config

After training completion, the trained model is automatically saved to your assigned data location.

To run this recipe on SageMaker HyperPod with Amazon EKS, clone the recipe from the GitHub repository, install the requirements, and edit the recipe (cluster: k8s) on your laptop. Then, create a link between your laptop and running the EKS cluster and subsequently use the HyperPod Command Line Interface (CLI) to run the recipe.

$ hyperpod start-job –recipe fine-tuning/llama/hf_llama3_405b_seq8k_gpu_qlora 
--persistent-volume-claims fsx-claim:data 
--override-parameters 
'{
  "recipes.run.name": "hf-llama3-405b-seq8k-gpu-qlora",
  "recipes.exp_manager.exp_dir": "/data/<your_exp_dir>",
  "cluster": "k8s",
  "cluster_type": "k8s",
  "container": "658645717510.dkr.ecr.<region>.amazonaws.com/smdistributed-modelparallel:2.4.1-gpu-py311-cu121",
  "recipes.model.data.train_dir": "<your_train_data_dir>",
  "recipes.model.data.val_dir": "<your_val_data_dir>",
}'

You can also run recipe on SageMaker training jobs using SageMaker Python SDK. The following example is running PyTorch training scripts on SageMaker training jobs with overriding training recipes.

...
recipe_overrides = {
    "run": {
        "results_dir": "/opt/ml/model",
    },
    "exp_manager": {
        "exp_dir": "",
        "explicit_log_dir": "/opt/ml/output/tensorboard",
        "checkpoint_dir": "/opt/ml/checkpoints",
    },   
    "model": {
        "data": {
            "train_dir": "/opt/ml/input/data/train",
            "val_dir": "/opt/ml/input/data/val",
        },
    },
}
pytorch_estimator = PyTorch(
           output_path=<output_path>,
           base_job_name=f"llama-recipe",
           role=<role>,
           instance_type="p5.48xlarge",
           training_recipe="fine-tuning/llama/hf_llama3_405b_seq8k_gpu_qlora",
           recipe_overrides=recipe_overrides,
           sagemaker_session=sagemaker_session,
           tensorboard_output_config=tensorboard_output_config,
)
...

As training progresses, the model checkpoints are stored on Amazon Simple Storage Service (Amazon S3) with the fully automated checkpointing capability, enabling faster recovery from training faults and instance restarts.

Now available
Amazon SageMaker HyperPod recipes are now available in the SageMaker HyperPod recipes GitHub repository. To learn more, visit the SageMaker HyperPod product page and the Amazon SageMaker AI Developer Guide.

Give SageMaker HyperPod recipes a try and send feedback to AWS re:Post for SageMaker or through your usual AWS Support contacts.

Channy

AWS Education Equity Initiative: Applying generative AI to educate the next wave of innovators

This post was originally published on this site

Building on the work that we and our partners have been doing for many years, Amazon is committing up to $100 million in cloud technology and technical resources to help existing, dedicated learning organizations reach more learners by creating new and innovative digital learning solutions, all as part of the AWS Education Equity Initiative.

The Work So Far
AWS and Amazon have a long-standing commitment to learning and education. Here’s a sampling of what we have already done:

AWS AI & ML Scholarship Program – This program has awarded $28 million in scholarships to approximately 6000 students.

Machine Learning University – MLU offers a free program helping community colleges and Historically Black Colleges and Universities (HBCUs) teach data management, artificial intelligence, and machine learning concepts. The program is designed to address opportunity gaps by supporting students who are historically underserved and underrepresented in technology disciplines.

Amazon Future Engineer – Since 2021, up to $46 million in scholarships has been awarded to 1150 students through this program. In the past year, more than 2.1 million students received over 17 million hours of STEM education, literacy, and career exploration courses through this and other Amazon philanthropic education programs in the United States. I was able to speak to one such session last year and it was an amazing experience:

Free Cloud Training – In late 2020 we set a goal of helping 29 million people grow their tech skills with free cloud computing training by 2025. We worked hard and met that target a year ahead of time!

There’s More To Do
Despite all of this work and progress, there’s still more to be done. The future is definitely not evenly distributed: over half a billion students cannot be reached by digital learning today.

We believe that Generative AI can amplify the good work that socially-minded edtech organizations, non-profits, and governments are already doing. Our goal is to empower them to build new and innovative digital learning systems that can amplify their work and allow them to reach a bigger audience.

With the launch of the AWS Education Equity Initiative, we want to help pave the way for the next generation of technology pioneers as they build powerful tools, train foundation models at scale, and create AI-powered teaching assistants.

We are committing up to $100 million in cloud technology and comprehensive technical advising over the next five years. The awardees will have access to the portfolio of AWS services and technical expertise so that they can build and scale learning management systems, mobile apps, chatbots, and other digital learning tools. As part of the application process, applicants will be asked to demonstrate how their proposed solution will benefit students from underserved and underrepresented communities.

As I mentioned earlier, our partners are already doing a lot of great work in this area. For example:

Code.org has already used AWS to scale their free computer science curriculum to millions of students in more than 100 countries. With this initiative, they will expand their use of Amazon Bedrock to provide an automated assessment of student projects, freeing up educator time that can be use for individual instruction and tailored learning.

Rocket Learning focuses on early childhood education in India. They will use Amazon Q in QuickSight to enhance learning outcomes for more than three million children.

I’m super excited about this initiative and look forward to seeing how it will help to create and educate the next generation of technology pioneers!

Jeff;

Solve complex problems with new scenario analysis capability in Amazon Q in QuickSight

This post was originally published on this site

Today, we announced a new capability of Amazon Q in QuickSight that helps users perform scenario analyses to find answers to complex problems quickly. This AI-assisted data analysis experience helps business users find answers to complex problems by guiding them step-by-step through in-depth data analysis—suggesting analytical approaches, automatically analyzing data, and summarizing findings with suggested actions—using natural language prompts. This new capability eliminates hours of tedious and error-prone manual work traditionally required to perform analyses using spreadsheets or other alternatives. In fact, Amazon Q in QuickSight enables business users to perform complex scenario analysis up to 10x faster than spreadsheets. This capability expands upon existing data Q&A capabilities of Amazon QuickSight so business professionals can start their analysis by simply asking a question.

How it works
Business users are often faced with complex questions that have traditionally required specialized training and days or weeks of time analyzing data in spreadsheets or other tools to address. For example, let’s say you’re a franchisee with multiple locations to manage. You might use this new capability in Amazon Q in QuickSight to ask, “How can I help our new Chicago store perform as well as the flagship store in New York?” Using an agentic approach, Amazon Q would then suggest analytical approaches needed to address the underlying business goal, automatically analyze data, and present results complete with visualizations and suggested actions. You can conduct this multistep analysis in an expansive analysis canvas, giving you the flexibility to make changes, explore multiple analysis paths simultaneously, and adapt to situations over time.

This new analysis experience is part of Amazon QuickSight meaning it can read from QuickSight dashboards which connect to sources such as Amazon Athena, Amazon Aurora, Amazon Redshift, Amazon Simple Storage Service (Amazon S3), and Amazon OpenSearch Service. Specifically, this new experience is part of Amazon Q in QuickSight, which allows it to seamlessly integrate with other generative business intelligence (BI) capabilities such as data Q&A. You can also upload either a .csv or a single-table, single-sheet .xlsx file to incorporate into your analysis.

Here’s a visual walkthrough of this new analysis experience in Amazon Q in QuickSight.

I’m planning a customer event, and I’ve received an Excel spreadsheet of all who’ve registered to attend the event. I want to learn more about the attendees, so I analyze the spreadsheet and ask a few questions. I start by describing what I want to explore.

I upload the spreadsheet to start my analysis. Firstly, I want to understand how many people have registered for the event.

To design an agenda that’s suitable for the audience, I want to understand the various roles that will be attending. I select on the + icon to add a new block for asking a question following along the thread from the previous block.

I can continue to ask more questions. However, there are suggested questions for analyzing my data even further, and I now select one of these suggested questions. I want to increase marketing efforts at companies that don’t currently have a lot of attendees in this case, companies with fewer than two attendees.

Amazon Q executes the required analysis and keeps me updated of the progress. Step 1 of the process identifies companies that have fewer than two attendees and lists them.

Step 2 gives an estimate of how many more attendees I might get from each company if marketing efforts are increased.

In Step 3 I can see the potential increase in total attendees (including the percentage increase) in line with the increase in marketing efforts.

Lastly, Step 4 goes even further to highlight companies I should prioritize for these increased marketing efforts.

To increase the potential number of attendees even more, I wanted to change the analysis to identify companies with fewer than three attendees instead of two attendees. I choose the AI sparkle icon in the upper right to launch a modal that I then use to provide more context and make specific changes to the previous result.


This change resulted in new projections, and I can choose to consider them for my marketing efforts or keep to the previous projections.


Now available
Amazon Q in QuickSight Pro users can use this new capability in preview in the following AWS Regions at launch: US East (N. Virginia) and US West (Oregon). Get started with a free 30-day trial of QuickSight today. To learn more, visit the Amazon QuickSight User Guide. You can submit your questions to AWS re:Post for Amazon QuickSight, or through your usual AWS Support contacts.

Veliswa.

Amazon Bedrock Guardrails now supports multimodal toxicity detection with image support (preview)

This post was originally published on this site

Today, we’re announcing the preview of multimodal toxicity detection with image support in Amazon Bedrock Guardrails. This new capability detects and filters out undesirable image content in addition to text, helping you improve user experiences and manage model outputs in your generative AI applications.

Amazon Bedrock Guardrails helps you implement safeguards for generative AI applications by filtering undesirable content, redacting personally identifiable information (PII), and enhancing content safety and privacy. You can configure policies for denied topics, content filters, word filters, PII redaction, contextual grounding checks, and Automated Reasoning checks (preview), to tailor safeguards to your specific use cases and responsible AI policies.

With this launch, you can now use the existing content filter policy in Amazon Bedrock Guardrails to detect and block harmful image content across categories such as hate, insults, sexual, and violence. You can configure thresholds from low to high to match your application’s needs.

This new image support works with all foundation models (FMs) in Amazon Bedrock that support image data, as well as any custom fine-tuned models you bring. It provides a consistent layer of protection across text and image modalities, making it easier to build responsible AI applications.

Tero Hottinen, VP, Head of Strategic Partnerships at KONE, envisions the following use case:

In its ongoing evaluation, KONE recognizes the potential of Amazon Bedrock Guardrails as a key component in protecting gen AI applications, particularly for relevance and contextual grounding checks, as well as the multimodal safeguards. The company envisions integrating product design diagrams and manuals into its applications, with Amazon Bedrock Guardrails playing a crucial role in enabling more accurate diagnosis and analysis of multimodal content.

Here’s how it works.

Multimodal toxicity detection in action
To get started, create a guardrail in the AWS Management Console and configure the content filters for either text or image data or both. You can also use AWS SDKs to integrate this capability into your applications.

Create guardrail
On the console, navigate to Amazon Bedrock and select Guardrails. From there, you can create a new guardrail and use the existing content filters to detect and block image data in addition to text data. The categories for Hate, Insults, Sexual, and Violence under Configure content filters can be configured for either text or image content or both. The Misconduct and Prompt attacks categories can be configured for text content only.

Amazon Bedrock Guardrails Multimodal Support

After you’ve selected and configured the content filters you want to use, you can save the guardrail and start using it to build safe and responsible generative AI applications.

To test the new guardrail in the console, select the guardrail and choose Test. You have two options: test the guardrail by choosing and invoking a model or to test the guardrail without invoking a model by using the Amazon Bedrock Guardrails independent ApplyGuardail API.

With the ApplyGuardrail API, you can validate content at any point in your application flow before processing or serving results to the user. You can also use the API to evaluate inputs and outputs for any self-managed (custom), or third-party FMs, regardless of the underlying infrastructure. For example, you could use the API to evaluate a Meta Llama 3.2 model hosted on Amazon SageMaker or a Mistral NeMo model running on your laptop.

Test guardrail by choosing and invoking a model
Select a model that supports image inputs or outputs, for example, Anthropic’s Claude 3.5 Sonnet. Verify that the prompt and response filters are enabled for image content. Next, provide a prompt, upload an image file, and choose Run.

Amazon Bedrock Guardrails Multimodal Support

In my example, Amazon Bedrock Guardrails intervened. Choose View trace for more details.

The guardrail trace provides a record of how safety measures were applied during an interaction. It shows whether Amazon Bedrock Guardrails intervened or not and what assessments were made on both input (prompt) and output (model response). In my example, the content filters blocked the input prompt because they detected insults in the image with a high confidence.

Amazon Bedrock Guardrails Multimodal Support

Test guardrail without invoking a model
In the console, choose Use Guardrails independent API to test the guardrail without invoking a model. Choose whether you want to validate an input prompt or an example of a model generated output. Then, repeat the steps from before. Verify that the prompt and response filters are enabled for image content, provide the content to validate, and choose Run.

Amazon Bedrock Guardrails Multimodal Support

I reused the same image and input prompt for my demo, and Amazon Bedrock Guardrails intervened again. Choose View trace again for more details.

Amazon Bedrock Guardrails Multimodal Support

Join the preview
Multimodal toxicity detection with image support is available today in preview in Amazon Bedrock Guardrails in the US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Tokyo), Europe (Frankfurt, Ireland, London), and AWS GovCloud (US-West) AWS Regions. To learn more, visit Amazon Bedrock Guardrails.

Give the multimodal toxicity detection content filter a try today in the Amazon Bedrock console and let us know what you think! Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Antje

New Amazon Bedrock capabilities enhance data processing and retrieval

This post was originally published on this site

Today, Amazon Bedrock introduces four enhancements that streamline how you can analyze data with generative AI:

Amazon Bedrock Data Automation (preview) – A fully managed capability of Amazon Bedrock that streamlines the generation of valuable insights from unstructured, multimodal content such as documents, images, audio, and videos. With Amazon Bedrock Data Automation, you can build automated intelligent document processing (IDP), media analysis, and Retrieval-Augmented Generation (RAG) workflows quickly and cost-effectively. Insights include video summaries of key moments, detection of inappropriate image content, automated analysis of complex documents, and much more. You can customize outputs to tailor insights into your specific business needs. Amazon Bedrock Data Automation can be used as a standalone feature or as a parser when setting up a knowledge base for RAG workflows.

Amazon Bedrock Knowledge Bases now processes multimodal data –To help build applications that process both text and visual elements in documents and images, you can configure a knowledge base to parse documents using either Amazon Bedrock Data Automation or use a foundation model (FM) as the parser. Multimodal data processing can improve the accuracy and relevancy of the responses you get from a knowledge base which includes information embedded in both images and text.

Amazon Bedrock Knowledge Bases now supports GraphRAG (preview) – We now offer one of the first fully-managed GraphRAG capabilities. GraphRAG enhances generative AI applications by providing more accurate and comprehensive responses to end users by using RAG techniques combined with graphs.

Amazon Bedrock Knowledge Bases now supports structured data retrieval – This capability extends a knowledge base to support natural language querying of data warehouses and data lakes so that applications can access business intelligence (BI) through conversational interfaces and improve the accuracy of the responses by including critical enterprise data. Amazon Bedrock Knowledge Bases provides one of the first fully-managed out-of-the-box RAG solutions that can natively query structured data from where it resides. This capability helps break data silos across data sources and accelerates building generative AI applications from over a month to just a few days.

These new capabilities make it easier to build comprehensive AI applications that can process, understand, and retrieve information from structured and unstructured data sources. For example, a car insurance company can use Amazon Bedrock Data Automation to automate their claims adjudication workflow to reduce the time taken to process automobile claims, improving the productivity of their claims department.

Similarly, a media company can analyze TV shows and extract insights needed for smart advertisement placement such as scene summaries, industry standard advertising taxonomies (IAB), and company logos. A media production company can generate scene-by-scene summaries and capture key moments in their video assets. A financial services company can process complex financial documents containing charts and tables and use GraphRAG to understand relationships between different financial entities. All these companies can use structured data retrieval to query their data warehouse while retrieving information from their knowledge base.

Let’s take a closer look at these features.

Introducing Amazon Bedrock Data Automation
Amazon Bedrock Data Automation is a capability of Amazon Bedrock that simplifies the process of extracting valuable insights from multimodal, unstructured content, such as documents, images, videos, and audio files.

Amazon Bedrock Data Automation provides a unified, API-driven experience that developers can use to process multimodal content through a single interface, eliminating the need to manage and orchestrate multiple AI models and services. With built-in safeguards, such as visual grounding and confidence scores, Amazon Bedrock Data Automation helps promote the accuracy and trustworthiness of the extracted insights, making it easier to integrate into enterprise workflows.

Amazon Bedrock Data Automation supports 4 modalities (documents, images, video, and audio). When used in an application, all modalities use the same asynchronous inference API, and results are written to an Amazon Simple Storage Service (Amazon S3) bucket.

For each modality, you can configure the output based on your processing needs and generate two types of outputs:

Standard output – With standard output, you get predefined default insights that are relevant to the input data type. Examples include semantic representation of documents, summaries of videos by scene, audio transcripts and more. You can configure which insights you want to extract with just a few steps.

Custom output – With custom output, you have the flexibility to define and specify your extraction needs using artifacts called “blueprints” to generate insights tailored to your business needs. You can also transform the generated output into a specific format or schema that is compatible with your downstream systems such as databases or other applications.

Standard output can be used with all formats (audio, documents, images, and videos). During the preview, custom output can only be used with documents and images.

Both standard and custom output configurations can be saved in a project to reference in the Amazon Bedrock Data Automation inference API. A project can be configured to generate both standard output and custom output for each processed file.

Let’s look at an example of processing a document for both standard and custom outputs.

Using Amazon Bedrock Data Automation
On the Amazon Bedrock console, I choose Data Automation in the navigation pane. Here, I can review how this capability works with a few sample use cases.

Console screenshot.

Then, I choose Demo in the Data Automation section of the navigation pane. I can try this capability using one of the provided sample documents or by uploading my own. For example, let’s say I am working on an application that needs to process birth certificates.

I start by uploading a birth certificate to see the standard output results. The first time I upload a document, I’m asked to confirm to create an S3 bucket to store the assets. When I look at the standard output, I can tailor the result with a few quick settings.

Console screenshot.

I choose the Custom output tab. The document is recognized by one of the sample blueprints and information is extracted across multiple fields.

Console screenshot.

Most of the data for my application is there but I need a few customizations. For example, the date the birth certificate was issued (JUNE 10, 2022) is in a different format than the other dates in the document. I also need the state that issued the certificate and a couple of flags that tell me if the child last name matches the one from the mother or the father.

Most of the fields in the previous blueprint use the Explicit extraction type. That means they’re extracted as they are from the document.

If I want a date in a specific format, I can create a new field using the Inferred extraction type and add instructions on how to format the result starting from the content of the document. Inferred extractions can be used to perform transformations, such as date or Social Security number (SSN) format, or validations, for example, to check if a person is over 21 based on today’s date.

Sample blueprints cannot be edited. I choose Duplicate blueprint to create a new blueprint that I can edit and then Add field from the Fields drop down.

I add four fields with extraction type Inferred and these instructions:

  1. The date the birth certificate was issued in MM/DD/YYYY format
  2. The state that issued the birth certificate 
  3. Is ChildLastName equal to FatherLastName
  4. Is ChildLastName equal to MotherLastName

The first two fields are strings and the last two booleans.

Console screenshot.

After I create the new fields, I can apply the new blueprint to the document I previously uploaded.

I choose Get result and look for the new fields in the results. I see the date formatted as I need, the two flags, and the state.

Console screenshot.

Now that I have created this custom blueprint tailored to the needs of my application, I can add it to a project. I can associate multiple blueprints with a project for the different document types I want to process, such as a blueprint for passports, a blueprint for birth certificates, a blueprint for invoices, and so on. When processing documents, Amazon Bedrock Data Automation matches each document to a blueprints within the project to extract relevant information.

I can also create a new blueprint form scratch. In that case, I can start with a prompt where I declare any fields I expect to find in the uploaded document and perform normalizations or validations.

Amazon Bedrock Data Automation can also process audio and video files. For example, here’s the standard output when uploading a video from a keynote presentation by Swami Sivasubramanian VP, AI and Data at AWS.

Console screenshot.

It takes a few minutes to get the output. The results include a summarization of the overall video, a summary scene by scene, and the text that appears during the video. From here, I can toggle the options to have a full audio transcript, content moderation, or Interactive Advertising Bureau (IAB) taxonomy.

I can also use Amazon Bedrock Data Automation as a parser when creating a knowledge base to extract insights from visually rich documents and images, for retrieval and response generation. Let’s see that in the next section.

Using multimodal data processing in Amazon Bedrock Knowledge Bases
Multimodal data processing support enables applications to understand both text and visual elements in documents.

With multimodal data processing, applications can use a knowledge base to:

  • Retrieve answers from visual elements in addition to existing support of text.
  • Generate responses based on the context that includes both text and visual data.
  • Provide source attribution that references visual elements from the original documents.

When creating a knowledge base in the Amazon Bedrock console, I now have the option to select Amazon Bedrock Data Automation as Parsing strategy.

When I select Amazon Bedrock Data Automation as parser, Amazon Bedrock Data Automation handles the extraction, transformation, and generation of insights from visually rich content, while Amazon Bedrock Knowledge Bases manages ingestion, retrieval, model response generation, and source attribution.

Alternatively, I can use the existing Foundation models as a parser option. With this option, there’s now support for Anthropic’s Claude 3.5 Sonnet as parser, and I can use the default prompt or modify it to suit a specific use case.

Console screenshot.

In the next step, I specify the Multimodal storage destination on Amazon S3 that will be used by Amazon Bedrock Knowledge Bases to store images extracted from my documents in the knowledge base data source. These images can be retrieved based on a user query, used to generate the response, and cited in the response.

Console screenshot.

When using the knowledge base, the information extracted by Amazon Bedrock Data Automation or FMs as parser is used to retrieve information about visual elements, understand charts and diagrams, and provide responses that reference both textual and visual content.

Using GraphRAG in Amazon Bedrock Knowledge Bases
Extracting insights from scattered data sources presents significant challenges for RAG applications, requiring multi-step reasoning across these data sources to generate relevant responses. For example, a customer might ask a generative AI-powered travel application to identify family-friendly beach destinations with direct flights from their home location that also offer good seafood restaurants. This requires a connected workflow to identify suitable beaches that other families have enjoyed, match these to flight routes, and select highly-rated local restaurants. A traditional RAG system may struggle to synthesize all these pieces into a cohesive recommendation because the information lives in disparate sources and is not interlinked.

Knowledge graphs can address this challenge by modeling complex relationships between entities in a structured way. However, building and integrating graphs into an application requires significant expertise and effort.

Amazon Bedrock Knowledge Bases now offers one of the first fully managed GraphRAG capabilities that enhances generative AI applications by providing more accurate and comprehensive responses to end users by using RAG techniques combined with graphs.

When creating a knowledge base, I can now enable GraphRAG in just a few steps by choosing Amazon Neptune Analytics as database, automatically generating vector and graph representations of the underlying data, entities and their relationships, and reducing development effort from several weeks to just a few hours.

I start the creation of new knowledge base. In the Vector database section, when creating a new vector store, I select Amazon Neptune Analytics (GraphRAG). If I don’t want to create a new graph, I can provide an existing vector store and select a Neptune Analytics graph from the list. GraphRAG uses Anthropic’s Claude 3 Haiku to automatically build graphs for a knowledge base.

Console screenshot.

After I complete the creation of the knowledge base, Amazon Bedrock automatically builds a graph, linking related concepts and documents. When retrieving information from the knowledge base, GraphRAG traverses these relationships to provide more comprehensive and accurate responses.

Using structured data retrieval in Amazon Bedrock Knowledge Bases
Structured data retrieval allows natural language querying of databases and data warehouses. For example, a business analyst might ask, “What were our top-selling products last quarter?” and the system automatically generates and runs the appropriate SQL query for a data warehouse stored in an Amazon Redshift database.

When creating a knowledge base, I now have the option to use a structured data store.

Console screenshot.

I enter a name and description for the knowledge base. In Data source details, I use Amazon Redshift as Query engine. I create a new AWS Identity and Access Management (IAM) service role to manage the knowledge base resources and choose Next.

Console screenshot.

I choose Redshift serverless in Connection options and the Workgroup to use. Amazon Redshift provisioned clusters are also supported. I use the previously created IAM role for Authentication. Storage metadata can be managed with AWS Glue Data Catalog or directly within an Amazon Redshift database. I select a database from the list.

Console screenshot.

In the configuration of the knowledge base, I can define the maximum duration for a query and include or exclude access to tables or columns. To improve the accuracy of query generation from natural language, I can optionally add a description for tables and columns and a list of curated queries that provides practical examples of how to translate a question into a SQL query for my database. I choose Next, review the settings, and complete the creation of the knowledge base

After a few minutes, the knowledge base is ready. Once synced, Amazon Bedrock Knowledge Bases handles generating, running, and formatting the result of the query, making it easy to build natural language interfaces to structured data. When invoking a knowledge base using structured data, I can ask to only generate SQL, retrieve data, or summarize the data in natural language.

Things to know
These new capabilities are available today in the following AWS Regions:

  • Amazon Bedrock Data Automation is available in preview in US West (Oregon).
  • Multimodal data processing support in Amazon Bedrock Knowledge Bases using Amazon Bedrock Data Automation as parser is available in preview in US West (Oregon). FM as a parser is available in all Regions where Amazon Bedrock Knowledge Bases is offered.
  • GraphRAG in Amazon Bedrock Knowledge Bases is available in preview in all commercial Regions where Amazon Bedrock Knowledge Bases and Amazon Neptune Analytics are offered.
  • Structured data retrieval is available in Amazon Bedrock Knowledge Bases in all commercial Regions where Amazon Bedrock Knowledge Bases is offered.

As usual with Amazon Bedrock, pricing is based on usage:

  • Amazon Bedrock Data Automation charges per images, per page for documents, and per minute for audio or video.
  • Multimodal data processing in Amazon Bedrock Knowledge Bases is charged based on the use of either Amazon Bedrock Data Automation or the FM as parser.
  • There is no additional cost for using GraphRAG in Amazon Bedrock Knowledge Bases but you pay for using Amazon Neptune Analytics as the vector store. For more information, visit Amazon Neptune pricing.
  • There is an additional cost when using structured data retrieval in Amazon Bedrock Knowledge Bases.

For detailed pricing information, see Amazon Bedrock pricing.

Each capability can be used independently or in combination. Together, they make it easier and faster to build applications that use AI to process data. To get started, visit the Amazon Bedrock console. To learn more, you can access the Amazon Bedrock documentation and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new capabilities!

Danilo

Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview)

This post was originally published on this site

Today, Amazon Bedrock has introduced in preview two capabilities that help reduce costs and latency for generative AI applications:

Amazon Bedrock Intelligent Prompt Routing – When invoking a model, you can now use a combination of foundation models (FMs) from the same model family to help optimize for quality and cost. For example, with the Anthropic’s Claude model family, Amazon Bedrock can intelligently route requests between Claude 3.5 Sonnet and Claude 3 Haiku depending on the complexity of the prompt. Similarly, Amazon Bedrock can route requests between Meta Llama 3.1 70B and 8B. The prompt router predicts which model will provide the best performance for each request while optimizing the quality of response and cost. This is particularly useful for applications such as customer service assistants, where uncomplicated queries can be handled by smaller, faster, and more cost-effective models, and complex queries are routed to more capable models. Intelligent Prompt Routing can reduce costs by up to 30 percent without compromising on accuracy.

Amazon Bedrock now supports prompt caching – You can now cache frequently used context in prompts across multiple model invocations. This is especially valuable for applications that repeatedly use the same context, such as document Q&A systems where users ask multiple questions about the same document or coding assistants that need to maintain context about code files. The cached context remains available for up to 5 minutes after each access. Prompt caching in Amazon Bedrock can reduce costs by up to 90% and latency by up to 85% for supported models.

These features make it easier to reduce latency and balance performance with cost efficiency. Let’s look at how you can use them in your applications.

Using Amazon Bedrock Intelligent Prompt Routing in the console
Amazon Bedrock Intelligent Prompt Routing uses advanced prompt matching and model understanding techniques to predict the performance of each model for every request, optimizing for quality of responses and cost. During the preview, you can use the default prompt routers for Anthropic’s Claude and Meta Llama model families.

Intelligent prompt routing can be accessed through the AWS Management Console, the AWS Command Line Interface (AWS CLI), and the AWS SDKs. In the Amazon Bedrock console, I choose Prompt routers in the Foundation models section of the navigation pane.

Console screenshot.

I choose the Anthropic Prompt Router default router to get more information.

Console screenshot.

From the configuration of the prompt router, I see that it’s routing requests between Claude 3.5 Sonnet and Claude 3 Haiku using cross-Region inference profiles. The routing criteria defines the quality difference between the response of the largest model and the smallest model for each prompt as predicted by the router internal model at runtime. The fallback model, used when none of the chosen models meet the desired performance criteria, is Anthropic’s Claude 3.5 Sonnet.

I choose Open in Playground to chat using the prompt router and enter this prompt:

Alice has N brothers and she also has M sisters. How many sisters does Alice’s brothers have?

The result is quickly provided. I choose the new Router metrics icon on the right to see which model was selected by the prompt router. In this case, because the question is rather complex, Anthropic’s Claude 3.5 Sonnet was used.

Console screenshot.

Now I ask a straightforward question to the same prompt router:

Describe the purpose of a 'hello world' program in one line.

This time, Anthropic’s Claude 3 Haiku has been selected by the prompt router.

Console screenshot.

I select the Meta Prompt Router to check its configuration. It’s using the cross-Region inference profiles for Llama 3.1 70B and 8B with the 70B model as fallback.

Console screenshot.

Prompt routers are integrated with other Amazon Bedrock capabilities, such as Amazon Bedrock Knowledge Bases and Amazon Bedrock Agents, or when performing evaluations. For example, here I create a model evaluation to help me compare, for my use case, a prompt router to another model or prompt router.

Console screenshot.

To use a prompt router in an application, I need to set the prompt router Amazon Resource Name (ARN) as model ID in the Amazon Bedrock API. Let’s see how this works with the AWS CLI and an AWS SDK.

Using Amazon Bedrock Intelligent Prompt Routing with the AWS CLI
The Amazon Bedrock API has been extended to handle prompt routers. For example, I can list the existing prompt routes in an AWS Region using ListPromptRouters:

aws bedrock list-prompt-routers

In output, I receive a summary of the existing prompt routers, similar to what I saw in the console.

Here’s the full output of the previous command:

{
    "promptRouterSummaries": [
        {
            "promptRouterName": "Anthropic Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.26
            },
            "description": "Routes requests among models in the Claude family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/anthropic.claude:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        },
        {
            "promptRouterName": "Meta Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.0
            },
            "description": "Routes requests among models in the LLaMA family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        }
    ]
}

I can get information about a specific prompt router using GetPromptRouter with a prompt router ARN. For example, for the Meta Llama model family:

aws bedrock get-prompt-router --prompt-router-arn arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1
{
    "promptRouterName": "Meta Prompt Router",
    "routingCriteria": {
        "responseQualityDifference": 0.0
    },
    "description": "Routes requests among models in the LLaMA family",
    "createdAt": "2024-11-20T00:00:00+00:00",
    "updatedAt": "2024-11-20T00:00:00+00:00",
    "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",
    "models": [
        {
            "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
        },
        {
            "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
        }
    ],
    "fallbackModel": {
        "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
    },
    "status": "AVAILABLE",
    "type": "default"
}

To use a prompt router with Amazon Bedrock, I set the prompt router ARN as model ID when making API calls. For example, here I use the Anthropic Prompt Router with the AWS CLI and the Amazon Bedrock Converse API:

aws bedrock-runtime converse 
    --model-id arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/anthropic.claude:1 
    --messages '[{ "role": "user", "content": [ { "text": "Alice has N brothers and she also has M sisters. How many sisters does Alice’s brothers have?" } ] }]' 

In output, invocations using a prompt router include a new trace section that tells which model was actually used. In this case, it’s Anthropic’s Claude 3.5 Sonnet:

{
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "text": "To solve this problem, let's think it through step-by-step:nn1) First, we need to understand the relationships:n   - Alice has N brothersn   - Alice has M sistersnn2) Now, we need to consider who Alice's brothers' sisters are:n   - Alice herself is a sister to all her brothersn   - All of Alice's sisters are also sisters to Alice's brothersnn3) So, the total number of sisters that Alice's brothers have is:n   - The number of Alice's sisters (M)n   - Plus Alice herself (+1)nn4) Therefore, the answer can be expressed as: M + 1nnThus, Alice's brothers have M + 1 sisters."
                }
            ]
        }
    },
    . . .
    "trace": {
        "promptRouter": {
            "invokedModelId": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
        }
    }
}

Using Amazon Bedrock Intelligent Prompt Routing with an AWS SDK
Using an AWS SDK with a prompt router is similar to the previous command line experience. When invoking a model, I set the model ID to the prompt model ARN. For example, in this Python code I’m using the Meta Llama router with the ConverseStream API:

import json
import boto3

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

MODEL_ID = "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1"

user_message = "Describe the purpose of a 'hello world' program in one line."
messages = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

streaming_response = bedrock_runtime.converse_stream(
    modelId=MODEL_ID,
    messages=messages,
)

for chunk in streaming_response["stream"]:
    if "contentBlockDelta" in chunk:
        text = chunk["contentBlockDelta"]["delta"]["text"]
        print(text, end="")
    if "messageStop" in chunk:
        print()
    if "metadata" in chunk:
        if "trace" in chunk["metadata"]:
            print(json.dumps(chunk['metadata']['trace'], indent=2))

This script prints the response text and the content of the trace in response metadata. For this uncomplicated request, the faster and more affordable model has been selected by the prompt router:

A "Hello World" program is a simple, introductory program that serves as a basic example to demonstrate the fundamental syntax and functionality of a programming language, typically used to verify that a development environment is set up correctly.
{
  "promptRouter": {
    "invokedModelId": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
  }
}

Using prompt caching with an AWS SDK
You can use prompt caching with the Amazon Bedrock Converse API. When you tag content for caching and send it to the model for the first time, the model processes the input and saves the intermediate results in a cache. For subsequent requests containing the same content, the model loads the preprocessed results from the cache, significantly reducing both costs and latency.

You can implement prompt caching in your applications with a few steps:

  1. Identify the portions of your prompts that are frequently reused.
  2. Tag these sections for caching in the list of messages using the new cachePoint block.
  3. Monitor cache usage and latency improvements in the response metadata usage section.

Here’s an example of implementing prompt caching when working with documents.

First, I download three decision guides in PDF format from the AWS website. These guides help choose the AWS services that fit your use case.

Then, I use a Python script to ask three questions about the documents. In the code, I create a converse() function to handle the conversation with the model. The first time I call the function, I include a list of documents and a flag to add a cachePoint block.

import json

import boto3

MODEL_ID = "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
AWS_REGION = "us-west-2"

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name=AWS_REGION,
)

DOCS = [
    "bedrock-or-sagemaker.pdf",
    "generative-ai-on-aws-how-to-choose.pdf",
    "machine-learning-on-aws-how-to-choose.pdf",
]

messages = []


def converse(new_message, docs=[], cache=False):

    if len(messages) == 0 or messages[-1]["role"] != "user":
        messages.append({"role": "user", "content": []})

    for doc in docs:
        print(f"Adding document: {doc}")
        name, format = doc.rsplit('.', maxsplit=1)
        with open(doc, "rb") as f:
            bytes = f.read()
        messages[-1]["content"].append({
            "document": {
                "name": name,
                "format": format,
                "source": {"bytes": bytes},
            }
        })

    messages[-1]["content"].append({"text": new_message})

    if cache:
        messages[-1]["content"].append({"cachePoint": {"type": "default"}})

    response = bedrock_runtime.converse(
        modelId=MODEL_ID,
        messages=messages,
    )

    output_message = response["output"]["message"]
    response_text = output_message["content"][0]["text"]

    print("Response text:")
    print(response_text)

    print("Usage:")
    print(json.dumps(response["usage"], indent=2))

    messages.append(output_message)


converse("Compare AWS Trainium and AWS Inferentia in 20 words or less.", docs=DOCS, cache=True)
converse("Compare Amazon Textract and Amazon Transcribe in 20 words or less.")
converse("Compare Amazon Q Business and Amazon Q Developer in 20 words or less.")

For each invocation, the script prints the response and the usage counters.

Adding document: bedrock-or-sagemaker.pdf
Adding document: generative-ai-on-aws-how-to-choose.pdf
Adding document: machine-learning-on-aws-how-to-choose.pdf
Response text:
AWS Trainium is optimized for machine learning training, while AWS Inferentia is designed for low-cost, high-performance machine learning inference.
Usage:
{
  "inputTokens": 4,
  "outputTokens": 34,
  "totalTokens": 29879,
  "cacheReadInputTokenCount": 0,
  "cacheWriteInputTokenCount": 29841
}
Response text:
Amazon Textract extracts text and data from documents, while Amazon Transcribe converts speech to text from audio or video files.
Usage:
{
  "inputTokens": 59,
  "outputTokens": 30,
  "totalTokens": 29930,
  "cacheReadInputTokenCount": 29841,
  "cacheWriteInputTokenCount": 0
}
Response text:
Amazon Q Business answers questions using enterprise data, while Amazon Q Developer assists with building and operating AWS applications and services.
Usage:
{
  "inputTokens": 108,
  "outputTokens": 26,
  "totalTokens": 29975,
  "cacheReadInputTokenCount": 29841,
  "cacheWriteInputTokenCount": 0
}

The usage section of the response contains two new counters: cacheReadInputTokenCount and cacheWriteInputTokenCount. The total number of tokens for an invocation is the sum of the input and output tokens plus the tokens read and written into the cache.

Each invocation processes a list of messages. The messages in the first invocation contain the documents, the first question, and the cache point. Because the messages preceding the cache point aren’t currently in the cache, they’re written to cache. According to the usage counters, 29,841 tokens have been written into the cache.

"cacheWriteInputTokenCount": 29841

For the next invocations, the previous response and the new question are appended to the list of messages. The messages before the cachePoint are not changed and found in the cache.

As expected, we can tell from the usage counters that the same number of tokens previously written is now read from the cache.

"cacheReadInputTokenCount": 29841

In my tests, the next invocations take 55 percent less time to complete compared to the first one. Depending on your use case (for example, with more cached content), prompt caching can improve latency up to 85 percent.

Depending on the model, you can set more than one cache point in a list of messages. To find the right cache points for your use case, try different configurations and look at the effect on the reported usage.

Things to know
Amazon Bedrock Intelligent Prompt Routing is available in preview today in US East (N. Virginia) and US West (Oregon) AWS Regions. During the preview, you can use the default prompt routers, and there is no additional cost for using a prompt router. You pay the cost of the selected model. You can use prompt routers with other Amazon Bedrock capabilities such as performing evaluations, using knowledge bases, and configuring agents.

Because the internal model used by the prompt routers needs to understand the complexity of a prompt, intelligent prompt routing currently only supports English language prompts.

Amazon Bedrock support for prompt caching is available in preview in US West (Oregon) for Anthropic’s Claude 3.5 Sonnet V2 and Claude 3.5 Haiku. Prompt caching is also available in US East (N. Virginia) for Amazon Nova Micro, Amazon Nova Lite, and Amazon Nova Pro.

With prompt caching, cache reads receive a 90 percent discount compared to noncached input tokens. There are no additional infrastructure charges for cache storage. When using Anthropic models, you pay an additional cost for tokens written in the cache. There are no additional costs for cache writes with Amazon Nova models. For more information, see Amazon Bedrock pricing.

When using prompt caching, content is cached for up to 5 minutes, with each cache hit resetting this countdown. Prompt caching has been implemented to transparently support cross-Region inference. In this way, your applications can get the cost optimization and latency benefit of prompt caching with the flexibility of cross-Region inference.

These new capabilities make it easier to build cost-effective and high-performing generative AI applications. By intelligently routing requests and caching frequently used content, you can significantly reduce your costs while maintaining and even improving application performance.

To learn more and start using these new capabilities today, visit the Amazon Bedrock documentation and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws.

Danilo

Amazon Bedrock Marketplace: Access over 100 foundation models in one place

This post was originally published on this site

Today, we’re introducing Amazon Bedrock Marketplace, a new capability that gives you access to over 100 popular, emerging, and specialized foundation models (FMs) through Amazon Bedrock. With this launch, you can now discover, test, and deploy new models from enterprise providers such as IBM and Nvidia, specialized models such as Upstages’ Solar Pro for Korean language processing, and Evolutionary Scale’s ESM3 for protein research, alongside Amazon Bedrock general-purpose FMs from providers such as Anthropic and Meta.

Models deployed with Amazon Bedrock Marketplace can be accessed through the same standard APIs as the serverless models and, for models which are compatible with Converse API, be used with tools such as Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases.

As generative AI continues to reshape how organizations work, the need for specialized models optimized for specific domains, languages, or tasks is growing. However, finding and evaluating these models can be challenging and costly. You need to discover them across different services, build abstractions to use them in your applications, and create complex security and governance layers. Amazon Bedrock Marketplace addresses these challenges by providing a single interface to access both specialized and general-purpose FMs.

Using Amazon Bedrock Marketplace
To get started, in the Amazon Bedrock console, I choose Model catalog in the Foundation models section of the navigation pane. Here, I can search for models that help me with a specific use case or language. The results of the search include both serverless models and models available in Amazon Bedrock Marketplace. I can filter results by provider, modality (such as text, image, or audio), or task (such as classification or text summarization).

In the catalog, there are models from organizations like Arcee AI, which builds context-adapted small language models (SLMs), and Widn.AI, which provides multilingual models.

For example, I am interested in the IBM Granite models and search for models from IBM Data and AI.

Console screenshot.

I select Granite 3.0 2B Instruct, a language model designed for enterprise applications. Choosing the model opens the model detail page where I can see more information from the model provider such as highlights about the model, pricing, and usage including sample API calls.

Console screenshot.

This specific model requires a subscription, and I choose View subscription options.

From the subscription dialog, I review pricing and legal notes. In Pricing details, I see the software price set by the provider. For this model, there are no additional costs on top of the deployed infrastructure. The Amazon SageMaker infrastructure cost is charged separately and can be seen in Amazon SageMaker pricing.

To proceed with this model, I choose Subscribe.

Console screenshot.

After the subscription has been completed, which usually takes a few minutes, I can deploy the model. For Deployment details, I use the default settings and the recommended instance type.

Console screenshot.

I expand the optional Advanced settings. Here, I can choose to deploy in a virtual private cloud (VPC) or specify the AWS Identity and Access Management (IAM) service role used by the deployment. Amazon Bedrock Marketplace automatically creates a service role to access Amazon Simple Storage Service (Amazon S3) buckets where the model weights are stored, but I can choose to use an existing role.

I keep the default values and complete the deployment.

Console screenshot.

After a few minutes, the deployment is In Service and can be reviewed in the Marketplace deployments page from the navigation pane.

There, I can choose an endpoint to view details and edit the configuration such as the number of instances. To test the deployment, I choose Open in playground and ask for some poetry.

Console screenshot.

I can also select the model from the Chat/text page of the Playground using the new Marketplace category where the deployed endpoints are listed.

In a similar way, I can use the model with other tools such as Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, Amazon Bedrock Prompt Management, Amazon Bedrock Guardrails, and model evaluations, by choosing Select Model and selecting the Marketplace model endpoint.

Console screenshot.

The model I used here is text-to-text, but I can use Amazon Bedrock Marketplace to deploy models with different modalities. For example, after I deploy Stability AI Stable Diffusion 3.5 Large, I can run a quick test in the Amazon Bedrock Image playground.

Console screenshot.

The models I deployed are now available through the Amazon Bedrock InvokeModel API. When a model is deployed, I can use it with the AWS Command Line Interface (AWS CLI) and any AWS SDKs using the endpoint Amazon Resource Name (ARN) as model ID.

For chat-tuned text-to-text models, I can also use the Amazon Bedrock Converse API, which abstracts model differences and enables model switching with a single parameter change.

Things to know
Amazon Bedrock Marketplace is available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), and South America (São Paulo).

With Amazon Bedrock Marketplace, you pay a software fee to the third-party model provider (which can be zero, as in the previous example) and a hosting fee based on the type and number of instances you choose for your model endpoints.

Start browsing the new models using the Model catalog in the Amazon Bedrock console, visit the Amazon Bedrock Marketplace documentation, and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws.

Danilo

Meet your training timelines and budgets with new Amazon SageMaker HyperPod flexible training plans

This post was originally published on this site

Today, we’re announcing the general availability of Amazon SageMaker HyperPod flexible training plans to help data scientists train large foundation models (FMs) within their timelines and budgets and save them weeks of effort in managing the training process based on compute availability.

At AWS re:Invent 2023, we introduced SageMaker HyperPod to reduce the time to train FMs by up to 40 percent and scale across thousands of compute resources in parallel with preconfigured distributed training libraries and built-in resiliency. Most generative AI model development tasks need accelerated compute resources in parallel. Our customers struggle to find timely access to compute resources to complete their training within their timeline and budget constraints.

With today’s announcement, you can find the required accelerated compute resources for training, create the most optimal training plans, and run training workloads across different blocks of capacity based on the availability of the compute resources. Within a few steps, you can identify training completion date, budget, compute resources requirements, create optimal training plans, and run fully managed training jobs, without needing manual intervention.

SageMaker HyperPod training plans in action
To get started, go to the Amazon SageMaker AI console, choose Training plans in the left navigation pane, and choose Create training plan.

For example, choose your preferred training date and time (10 days), instance type and count (16 ml.p5.48xlarge) for SageMaker HyperPod cluster, and choose Find training plan.

SageMaker HyperPod suggests a training plan that is split into two five-day segments. This includes the total upfront price for the plan.

If you accept this training plan, add your training details in the next step and choose Create your plan.

After creating your training plan, you can see the list of training plans. When you’ve created a training plan, you have to pay upfront for the plan within 12 hours. One plan is in the Active state and already started, with all the instances being used. The second plan is Scheduled to start later, but you can already submit jobs that start automatically when the plan begins.

In the active status, the compute resources are available in SageMaker HyperPod, resume automatically after pauses in availability, and terminates at the end of the plan. There is a first segment currently running and another segment queued up to run after the current segment.

This is similar to the Managed Spot training in SageMaker AI, where SageMaker AI takes care of instance interruptions and continues the training with no manual intervention. To learn more, visit the SageMaker HyperPod training plans in the Amazon SageMaker AI Developer Guide.

Now available
Amazon SageMaker HyperPod training plans are now available in US East (N. Virginia), US East (Ohio), US West (Oregon) AWS Regions and support ml.p4d.48xlarge, ml.p5.48xlarge, ml.p5e.48xlargeml.p5en.48xlarge, and ml.trn2.48xlarge instances. Trn2 and P5en instances are only in US East (Ohio) Region. To learn more, visit the SageMaker HyperPod product page and SageMaker AI pricing page.

Give HyperPod training plans a try in the Amazon SageMaker AI console and send feedback to AWS re:Post for SageMaker AI or through your usual AWS Support contacts.

Channy