Category Archives: AWS

Computer Vision at the Edge with AWS Panorama

This post was originally published on this site

Today, the AWS Panorama Appliance is generally available to all of you. The AWS Panorama Appliance is a computer vision (CV) appliance designed to be deployed on your network to analyze images provided by your on-premises cameras.

AWS Panorama Appliance

Every week, I read about new and innovative use cases for computer vision. Some customers are using CV to verify pallet trucks are parked in designated areas to ensure worker safety in warehouses, some are analyzing customer walking flows in retail stores to optimize space and product placement, and some are using it to recognize cats and mice, just to name a few.

AWS customers agree the cloud is the most convenient place to train computer vision models thanks to its virtually infinite access to storage and compute resources. In the cloud, data scientists have access to powerful tools such as Amazon SageMaker and a wide variety of compute resources and frameworks.

However, when it’s time to analyze images from one or multiple video feeds, many of you are telling us the cloud is not the place where you want to run such workloads. There are a number of reasons for that: sometimes the facilities where the images are captured do not have enough bandwidth to send video feeds to the cloud, some use cases require very low latency, or some just want to keep their images on premises and not send them for analysis outside of their network.

At re:Invent 2020, we announced the AWS Panorama Appliance and SDK to address these requirements.

AWS Panorama is a machine learning appliance and software development kit (SDK) that allows you to bring computer vision to on-premises cameras to make predictions locally with high accuracy and low latency. With the AWS Panorama Appliance, you can automate tasks that have traditionally required human inspection to improve visibility into potential issues. For example, you can use AWS Panorama Appliance to evaluate manufacturing quality, identify bottlenecks in industrial processes, and monitor workplace security even in environments with limited or no internet connectivity. The software development kit allows camera manufacturers to bring equivalent capabilities directly inside their IP camera.

As usual on this blog, I would like to walk you through the development and deployment of a computer vision application for the AWS Panorama Appliance. The demo application from this blog uses a machine learning model to recognise objects in frames of video from a network camera. The application loads a model onto the AWS Panorama Appliance, gets images from a camera, and runs those images through the model. The application then overlays the results on top of the original video and outputs it to a connected display. The application uses libraries provided by AWS Panorama to interact with input and output video streams and the model, no low level programming is required.

Let’s first define a few concepts. I borrowed the following definitions from the AWS Panorama documentation page.

The AWS Panorama Appliance is the hardware that runs your applications. You use the AWS Panorama console or AWS SDKs to register an appliance, update its software, and deploy applications to it. The software that runs on the appliance discovers and connects to camera streams, sends frames of video to your application, and optionally displays video output on an attached display.

The appliance is an edge device. Instead of sending images to the AWS Cloud for processing, it runs applications locally on optimized hardware. This enables you to analyze video in real time and process the results with limited connectivity. The appliance only requires an internet connection to report its status, upload logs, and get software updates and deployments.

An application comprises multiple components called nodes, which represent cameras, models, code, or global variables. A node can be configuration only (inputs and outputs), or include artifacts (models and code). Application nodes are bundled in node packages that you upload to an S3 access point, where the AWS Panorama Appliance can access them. An application manifest is a configuration file that defines connections between the nodes.

A computer vision model is a machine learning network that is trained to process images. Computer vision models can perform various tasks such as classification, detection, segmentation, and tracking. A computer vision model takes an image as input and outputs information about the image or objects in the image.

AWS Panorama supports models built with Apache MXNet, DarkNet, GluonCV, Keras, ONNX, PyTorch, TensorFlow, and TensorFlow Lite. You can build models with Amazon SageMaker and import them from an Amazon Simple Storage Service (Amazon S3) bucket.

AWS Panorama Context

Now that we grasp the concepts, let’s get our hands on.

Unboxing Your AWS Panorama Appliance
In the box the service team sent me, I found the appliance itself (no surprise!), a power cord and two ethernet cables. The box also contains a USB key to initially configure the appliance. The device is designed to work in industrial environments. It has two ethernet ports next to the power connector on the back. On the front, protected behind a sliding door, I found a SD card reader, one HDMI connector and two USB ports. There is also a power button and a reset button to reinitialise the device to its factory state.

AWS Panorama Appliance Front Side AWS Panorama Appliance Back Side AWS Panorama USB key

Configuring Your Appliance
I first configured it for my network (cable + DHCP, but it also supports static IP configuration) and registered it to securely connect back to my AWS Account. To do so, I navigated to the AWS Management Console, entered my network configuration details. It generated a set of configuration files and certificates. I copied them to the appliance using the provided USB key. My colleague Martin Beeby shared screenshots of this process. The team slightly modified the screens based on the feedback they received during the preview, but I don’t think it is worth going through the step-by-step process again. Tip from the field: be sure to use the USB key provided in the box, it is correctly formatted and automatically recognised by the appliance (my own USB key was not recognized properly).

I then downloaded a sample application from the Panorama GitHub repository and tried it with the Test Utility for Panorama, also available on this GitHub (the test utility is an EC2 instance configured to act as a simulator). The Test Utility for Panorama uses Jupyter notebooks to quickly experiment with sample applications or your code before deploying it to the appliance. It also lists commands allowing you to deploy your applications to the appliance programmatically.

Panorama Command Line
The Panorama command line simplifies the operations to create a project, import assets, package it, and deploy it to the AWS Panorama Appliance. You can follow these instructions to download and install the Panorama command line.

When receiving an application developed by someone else, like the sample application, I have to replace AWS account IDs in all application files and directory names. I do this with one single command:

panorama-cli import-application

Application Structure
A Panorama application structure looks as follows:

├── assets
├── graphs
│   └── example_project
│       └── graph.json
└── packages
    ├── accountXYZ-model-1.0
    │   ├── descriptor.json
    │   └── package.json
    └── accountXYZ-sample-app-1.0
        ├── Dockerfile
        ├── descriptor.json
        ├── package.json
        └── src

  • graph.json lists all the packages and nodes in this application. Nodes are the way to define an application in Panorama.
  • in each package package.json has details about the package and the assets it uses.
  • model package model has a descriptor.json which contains the metadata required for compiling the model.
  • container packagesample-app package contains the application code in the src directory and a Dockerfile to build the container. descriptor.json has details about which command and file to use when the container is launched.
  • assets directory is where all the assets reside, such as packaged code and compiled models. You should not make any changes in this directory.

Note that package names are prefixed with your account number.

When my application is ready, I build the container (I am using a Linux machine with Docker Engine and Docker CLI to avoid using Docker Desktop for macOS or Windows.)

$ panorama-cli build-container                               
               --container-asset-name {container_asset_name}  
               --package-path packages/{account_id}-{package_name}-1.0 

A Note About the Cameras
AWS Panorama Appliance has a concept of “abstract cameras”. Abstract camera sources are placeholders that can be mapped to actual camera devices during application deployment. The Test Utility for Panorama allows you to map abstract cameras to video files for easy, repeatable tests.

Adding a ML Model
The AWS Panorama Appliance supports multiple ML Model frameworks. Models may be trained on Amazon SageMaker or any other solution of your choice. I downloaded my ML model from S3 and import it to my project:

panorama-cli add-raw-model                                                 
    --model-asset-name {asset_name}                                        
    --model-s3-uri s3://{S3_BUCKET}/{project_name}/{ML_MODEL_FNAME}.tar.gz 
    --descriptor-path {descriptor_path}                                    
    --packages-path {package_path}

Behind the scenes, ML Models are compiled to optimise them to the Nvidia Accelerated Linux Arm64 architecture of the AWS Panorama Appliance.

Package the Application
Now that I have a ML model and my application code packaged in a container, I am ready to package my application assets for AWS Panorama Appliance:

panorama-cli package-application

This command uploads all my application assets to the AWS cloud account along with all the manifests.

Deploy the Application
Finally I deploy the application to the AWS Panorama Appliance. A deployment copies the application and its configuration, like camera stream selection, from the AWS cloud to my on-premise AWS Panorama Appliance. I may deploy my application programmatically using Python code (and the Boto3 SDK you might know already):

client = boto3.client('panorama')
    Name="AWS News Blog Sample Application",
    Description="An object detection app",
       'PayloadData': manifest         # <== this is the graph.json file content 
    RuntimeRoleArn=role,               # <== this is a role that gives my app permissions to use AWS Services such as Cloudwatch
    DefaultRuntimeContextDevice=device # <== this is my device name 

Alternatively, I may use the AWS Management Console:

On Deployed applications, I select Deploy application.

AWS Panorama - Start the deployment

I copy and paste the content of graphs/<project name>/graph.json to the console and select Next.

AWS Panorama - Copy Graph JSON

I give my application a name and an optional description. I select Proceed to deploy.

AWS Panoram Deploy - Application Name

The next steps are

  • declare an IAM role to give permissions to my application to use AWS Service. The minimal permissions set allows to call the PuMetricData API on CloudWatch.
  • select the AWS Panorama Appliance I want to deploy to
  • map the abstract cameras defined in the application descriptors.json to physical cameras known by the AWS Panorama Appliance
  • fill in any application-specific inputs, such as acceptable threshold value, log level etc.

An example IAM policy is

AWSTemplateFormatVersion: '2010-09-09'
Description: Resources for an AWS Panorama application.
    Type: AWS::IAM::Role
        Version: "2012-10-17"
            Effect: Allow
              - sts:AssumeRole
        - PolicyName: cloudwatch-putmetrics
            Version: 2012-10-17
              - Effect: Allow
                Action: 'cloudwatch:PutMetricData'
                Resource: '*'
      Path: /service-role/

These six screenhots capture this process:

AWS Panorama - Deploy 2 AWS Panorama - Deploy 1 AWS Panorama - Deploy 3
AWS Panorama - Deploy 4 AWS Panorama - Deploy 5 AWS Panorama - Deploy 5

The deployment takes 15-30 minutes depending on the size of your code and your ML models, and the appliance available bandwidth. Eventually, the status turn green to “Running”.

AWS Panorama - Deployment Completed

Once the application is deployed to your AWS Panorama Appliance it begins to run, continuously analyzing video and generating highly accurate predictions locally within milliseconds. I connect an HDMI cable to the AWS Panorama Appliance to monitor the output, and I can see:

AWS Panorama HDMI output

Should anything goes wrong during the deployment or during the life of the application, I have access to the logs on Amazon CloudWatch. There are two log streams created, one for the AWS Panorama Appliance itself and one for the application.

AWS Panorama Appliance log AWS Panorama application log

Pricing and Availability
The AWS Panorama Appliance is available to purchase at AWS Elemental order page in the AWS Console. You can place orders from the United States, Canada, the United Kingdom, and the European Union. There is a one-time charge of $4,000 for the appliance itself.

There is a usage charge of $8.33 / month / camera feed.

AWS Panorama stores versioned copies of all assets deployed to the AWS Panorama Appliance (including ML models and business logic) in the cloud. You are charged $0.10 per-GB, per-month for this storage.

You may incur additional charges if the business logic deployed to your AWS Panorama Appliance uses other AWS services. For example, if your business logic uploads ML predictions to S3 for offline analysis, you will be billed separately by S3 for any storage charges incurred.

The AWS Panorama Appliance can be installed anywhere. The appliance connects back to the AWS Panorama service in the AWS cloud in one of the following AWS Region : US East (N. Virginia), US West (Oregon), Canada (Central), or Europe (Ireland).

Go and build your first computer vision model today.

— seb

New – AWS Data Exchange for Amazon Redshift

This post was originally published on this site

Back in 2019 I told you about AWS Data Exchange and showed you how to Find, Subscribe To, and Use Data Products. Today, you can choose from over 3600 data products in ten categories:

In my introductory post I showed you how could subscribe to data products and then download the data sets into an Amazon Simple Storage Service (Amazon S3) bucket. I then suggested various options for further processing, including AWS Lambda functions, a AWS Glue crawler, or an Amazon Athena query.

Today we are making it even easier for you to find, subscribe to, and use third-party data with the introduction of AWS Data Exchange for Amazon Redshift. As a subscriber, you can directly use data from providers without any further processing, and no need for an Extract Transform Load (ETL) process. Because you don’t have to do any processing, the data is always current and can be used directly in your Amazon Redshift queries. AWS Data Exchange for Amazon Redshift takes care of managing all entitlements and payments for you, with all charges billed to your AWS account.

As a provider, you now have a new way to license your data and make it available to your customers.

As I was writing this post, it was cool to realize just how many existing aspects of Redshift, and Data Exchange played central roles. Because Redshift has a clean separation of storage and compute, along with built-in data sharing features, the data provider allocates and pays for storage, and the data subscriber does the same for compute. The provider does not need to scale their cluster in proportion to the size of their subscriber base, and can focus on acquiring and providing data.

Let’s take a look at this feature from two vantage points: subscribing to a data product, and publishing a data product.

AWS Data Exchange for Amazon Redshift – Subscribing to a Data Product
As a data subscriber I can browse through the AWS Data Exchange catalog and find data products that are relevant to my business, and subscribe to them.

Data providers can also create private offers and extend them to me for access via the AWS Data Exchange Console. I click My product offers, and review the offers that have been extended to me. I click on Continue to subscribe to proceed:

Then I complete my subscription by reviewing the offer and the subscription terms, noting the data sets that I will get, and clicking Subscribe:

Once the subscription is completed, I am notified and can move forward:

From the Redshift Console, I click Datashares, select From other accounts, and I can see the subscribed data set:

Next, I associate it with one or more of my Redshift clusters by creating a database that points to the subscribed datashare, and use the tables, views, and stored procedures to power my Redshift queries and my applications.

AWS Data Exchange for Amazon Redshift – Publishing a Data Product
As a data provider I can include Redshift tables, views, schemas and user-defined functions in my AWS Data Exchange product. To keep things simple, I’ll create a product that includes just one Redshift table.

I use the spiffy new Redshift Query Editor V2 to create a table that maps US area codes to a city and a state:

Then I examine the list of existing datashares for my Redshift cluster, and click Create datashare to make a new one:

Next, I go through the usual process for creating a datashare. I select AWS Data Exchange datashare, assign a name (area_code_reference), pick the database within the cluster, and make the datashare accessible to publicly accessible clusters:

Then I scroll down and click Add to move forward:

I choose my schema (public), opt to include only tables and views in my datashare, and then add the area_codes table:

At this point I can click Add to wrap up, or Add and repeat to make a more complex product that contains additional objects.

I confirm that the datashare contains the table, and click Create datashare to move forward:

Now I am ready to start publishing my data! I visit the AWS Data Exchange Console, expand the navigation on the left, and click Owned data sets:

I review the Data set creation steps, and click Create data set to proceed:

I select Amazon Redshift datashare, give my data set a name (United States Area Codes), enter a description, and click Create data set to proceed:

I create a revision called v1:

I select my datashare and click Add datashare(s):

Then I finalize the revision:

I showed you how to create a datashare and a dataset, and to publish a product using the console. If you are publishing multiple products and/or making regular revisions, you can automate all of these steps using the AWS Command Line Interface (CLI) and the Amazon Data Exchange APIs.

Initial Data Products
Multiple data providers are working to make their data products available to you through AWS Data Exchange for Amazon Redshift. Here are some of the initial offerings and the official descriptions:

  • FactSet Supply Chain Relationships – FactSet Revere Supply Chain Relationships data is built to expose business relationship interconnections among companies globally. This feed provides access to the complex networks of companies’ key customers, suppliers, competitors, and strategic partners, collected from annual filings, investor presentations, and press releases.
  • Foursquare Places 2021: New York City Sample – This trial dataset contains Foursquare’ss integrated Places (POI) database for New York City, accessible as a Redshift Data Share. Instantly load Foursquare’s Places data in to a Redshift table for further processing and analysis. Foursquare data is privacy-compliant, uniquely sourced, and trusted by top enterprises like Uber, Samsung, and Apple.
  • Mathematica Medicare Pilot Dataset – Aggregate Medicare HCC counts and prevalence by state, county, payer, and filtered to the diabetic population from 2017 to 2019.
  • COVID-19 Vaccination in Canada – This listing contains sample datasets for COVID-19 Vaccination in Canada data.
  • Revelio Labs Workforce Composition and Trends Data (Trial data) – Understand the workforce composition and trends of any company.
  • Facteus – US Card Consumer Payment – CPG Backtest – Historical sample from panel of SKU-level transaction detail from cash and card transactions across hundreds of Consumer-Packaged Goods sold at over 9,000 urban convenience stores and bodegas across the U.S.
  • Decadata Argo Supply Chain Trial Data – Supply chain data for CPG firms delivering products to US Grocery Retailers.


VMware Cloud on AWS Outposts Brings VMware SDDC as a Fully Managed Service on Premises

This post was originally published on this site

In 2017, AWS and VMware brought VMware Cloud on AWS, the VMware enterprise-class Software-Defined Data Center (SDDC) software for all vSphere-based workloads, to the AWS Cloud with optimized access to native AWS services. VMware Cloud on AWS provides dedicated, single-tenant cloud infrastructure, delivered on the next-generation bare-metal AWS infrastructure based on the latest Amazon EC2 storage optimized high I/O instances and featuring low-latency non-volatile memory express (NVMe) based SSDs.

Still, some customers have certain workloads that will likely need to remain on-premises such as applications that are latency sensitive, have to meet specific data residency needs, require local data processing, or need to be in close proximity to on-premises assets. These customers would like to be able to use the same VMware tools, APIs, and skill sets that they’ve been using to run their infrastructure on premises and seamlessly integrate these on-premises workloads with the rest of their applications in the AWS Cloud.

AWS and VMware are bringing the VMware Cloud on AWS experience on premises by announcing the general availability of VMware Cloud on AWS Outposts, VMware’s enterprise-class SDDC software deployed on AWS Nitro System-based EC2 bare-metal instances in AWS Outposts, a fully managed service that brings AWS infrastructure and services on premises for a truly consistent hybrid experience. VMware Cloud on AWS Outposts runs on AWS infrastructure within any location provided to us by the customer to support applications that require low latency and to accommodate local data processing and data residency needs as long as the network requirements are met.

AWS Outposts VMware Cloud on AWS Outposts VMware Cloud on AWS
Use cases Low-latency compute, Local data processing, Data residency, Migration and modernization Low-latency compute, Local data processing, Data residency, Migration and modernization with consistent VMware environments Cloud Migrations, Data center extension, Disaster Recovery, Scalable VDI and DevTest, and App modernization
Control Plane AWS console VMware Cloud portal
Software AWS services VMware SDDC built on AWS services*
Infrastructure AWS custom-built hardware
Hardware Location Customers’ datacenters, co-location space, or on-premises facilities AWS Regions

* VMware Cloud runs on Amazon Nitro System-based EC2 bare-metal instances provisioned in AWS Outposts or AWS Regions.

With VMware Cloud on AWS Outposts, you can remove the overhead associated with designing, procuring, and managing IT infrastructure, thereby improving IT efficiency. You can get operational consistency with a single pane of glass in vCenter that allows you to manage your SDDCs in the AWS Regions, on VMware Cloud on AWS Outposts, and in your self-managed on-premises VMware environments.

Preview with VMware on AWS Outposts
To get started with VMware Cloud on AWS Outposts, a group of experts from AWS or VMware will help you understand your specific requirements and sizing needs. Please contact your usual sales representatives from either AWS or VMware before the order.

After your requirements and site conditions are collected, you can simply log into the VMware Cloud Service Portal. Choose VMware Cloud on AWS Outposts from My Services, and start an order. This ordering process via the VMware Cloud Service Portal will be generally available by the end of VMware’s Q3FY22 fiscal quarter (October 29, 2021). If you need to order VMware Cloud on AWS Outposts sooner, contact your AWS or VMware representative.

This order initiates a process for us to collect the necessary site and installation information. You will receive an email confirmation once the order is successfully submitted and confirmed.

AWS will contact you to schedule and perform the site assessment. If your site is compliant with all requirements, then your Outpost will be ordered and installed on your site. Once the VMware Cloud on AWS Outposts capacity is delivered to your site and plugged into power and network connections, AWS will provision the Amazon EC2 instances for SDDC consumption. VMware will perform additional validation and notify you when the VMware Cloud on AWS Outposts service is available.

Next, you will be able to see the available capacity on the VMware Cloud Service Portal and create your SDDC as needed. The connection to your on-premises network is already configured based on the information we collected from the previous steps.

You will need to configure or use your own Virtual Private Cloud (VPC) and subnet to connect. Workloads running on VMware Cloud on AWS Outpost communicate with other resources in your VPC through elastic network interface (ENI) the same way they do for VMware Cloud on AWS. Performance is subject to the service link connection to the parent AWS Region.

Things to Know
Here are a couple of things to keep in mind about VMware Cloud on AWS Outposts:

Support Process: Unlike AWS Outposts, the VMware Cloud on AWS Outposts service is operated and managed by AWS and VMware. VMware will be your first line of support for VMware Cloud on AWS Outposts. AWS will contact you regarding hardware-related maintenance and replacement. For all other issues, you can use the in-service chat support, which is available 24×5 in English across all global Regions or contact your enterprise support personnel from VMware.

Hybrid with Other Outposts: Similar to VMware Cloud on AWS, you can use ENI to connect your SDDCs to AWS services running on another AWS Outpost or in an AWS Region. VMware Cloud on AWS Outposts has been validated to function with native AWS services such as Amazon CloudWatch, AWS Systems Manager, and Amazon S3 from the connected VPC in the AWS Region. Also, it can fully integrate with all native AWS services such as Amazon EC2, S3, and Amazon RDS that are supported on a native AWS Outposts rack that exists in the same location. Please reach out to your AWS or VMware representative for additional assistance in setting up the connectivity between your Outpost and the nearest AWS Region.

Network Connectivity: While VMware Cloud on AWS Outposts requires reliable network connectivity to the nearest AWS Region, the SDDC continues functioning if network connectivity to the AWS Region is temporarily unavailable. However, the VMware Cloud control plane will be unavailable when network connectivity is down. SDDC configurations can be accessed but network functions such as creating a new logical network , deleting an existing logical network and modifying a logical network will fail. However, you can still access vCenter to perform VM operations, and your data remains safely stored on your Outpost during periods of disconnect.

Data Residency: Your data will remain on VMware Cloud on AWS Outposts by default. This is enabled through the local storage and VMware vSAN technology. You may choose to replicate some or all of your data to AWS Regions or VMware Cloud on AWS based on your specific residency requirements. Some limited metadata will flow back to the AWS Region and the VMware Cloud Service Platform. As an example, information about instance health, instance activity (launched, stopped), and the underlying hypervisor system may be sent back to the parent AWS Region. This information enables AWS to provide alerting on instance health and capacity and apply patches and updates to the Outpost.

Get started with VMware Cloud on AWS Outposts
We’re pleased to announce the general availability of VMware Cloud on AWS Outposts. It can be shipped to the United States and connected to an AWS Region where VMware Cloud on AWS is supported: US East (N. Virginia) or US West (Oregon). If you want to deploy VMware Cloud on AWS Outposts outside the United States or connect VMware Cloud on AWS Outposts to other AWS Regions, please contact your AWS or VMware sales representative.

You can contact your AWS or VMware sales representative to place an order. You will also be able to place an order via the VMware Cloud Service Portal by October 29, 2021. When purchasing through AWS, you can also take advantage of your existing AWS enterprise contracts and AWS field promotional programs such as Migration Acceleration Program (MAP).

To learn more, visit the VMware Cloud on AWS Outposts page. Please send feedback to the VMware forum for VMware on AWS or through your usual AWS support contacts.


AWS Cloud Control API, a Uniform API to Access AWS & Third-Party Services

This post was originally published on this site

Today, I am happy to announce the availability of AWS Cloud Control API a set of common application programming interfaces (APIs) that are designed to make it easy for developers to manage their AWS and third-party services.

AWS delivers the broadest and deepest portfolio of cloud services. Builders leverage these to build any type of cloud infrastructure. It started with Amazon Simple Storage Service (Amazon S3) 15 years ago and grew over 200+ services. Each AWS service has a specific API with its own vocabulary, input parameters, and error reporting. For example, you use the S3 CreateBucket API to create an Amazon Simple Storage Service (Amazon S3) bucket and the Amazon Elastic Compute Cloud (Amazon EC2) RunInstances API to create an EC2 instances.

Some of you use AWS APIs to build infrastructure-as-code, some to inspect and automatically improve your security posture, some others for configuration management, or to provision and to configure high performance compute clusters. The use cases are countless.

As applications and infrastructures become increasingly sophisticated and you work across more AWS services, it becomes increasingly difficult to learn and manage distinct APIs. This challenge is exacerbated when you also use third-party services in your infrastructure, since you have to build and maintain custom code to manage both the AWS and third-party services together.

Cloud Control API is a standard set of APIs to Create, Read, Update, Delete, and List (CRUDL) resources across hundreds of AWS Services (more being added) and dozens of third-party services (and growing).

It exposes five common verbs (CreateResource, GetResource, UpdateResource, DeleteResource, ListResource) to manage the lifecycle of services. For example, to create an Amazon Elastic Container Service (Amazon ECS) cluster or an AWS Lambda function, you call the same CreateResource API, passing as parameters the type and attributes of the resource you want to create: an Amazon ECS cluster or an Lambda function. The input parameters are defined by an unified resource model using JSON. Similarly, the return types and error messages are uniform across all verbs and all resources.

Cloud Control API provides support for hundreds of AWS resources today, and we will continue to add support for existing AWS resources across services such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3) in the coming months. It will support new AWS resources typically on the day of launch.

Until today, when I want to get the details about an Lambda function or a Amazon Kinesis stream, I use the get-function API to call Lambda and the describe-stream API to call Kinesis. Notice in the example below how different these two API calls are: they have different names, different naming conventions, different JSON outputs, etc.

aws lambda get-function --function-name TictactoeDatabaseCdkStack
    "Configuration": {
        "FunctionName": "TictactoeDatabaseCdkStack",
        "FunctionArn": "arn:aws:lambda:us-west-2:0123456789:function:TictactoeDatabaseCdkStack",
        "Runtime": "nodejs14.x",
        "Role": "arn:aws:iam::0123456789:role/TictactoeDatabaseCdkStack",
        "Handler": "framework.onEvent",
        "CodeSize": 21539,
        "Timeout": 900,
        "MemorySize": 128,
        "LastModified": "2021-06-07T11:26:39.767+0000",


aws kinesis describe-stream --stream-name AWSNewsBlog
    "StreamDescription": {
        "Shards": [
                "ShardId": "shardId-000000000000",
                "HashKeyRange": {
                    "StartingHashKey": "0",
                    "EndingHashKey": "340282366920938463463374607431768211455"
                "SequenceNumberRange": {
                    "StartingSequenceNumber": "49622132796672989268327879810972713309953024040638611458"
        "StreamARN": "arn:aws:kinesis:us-west-2:012345678901:stream/AWSNewsBlog",
        "StreamName": "AWSNewsBlog",
        "StreamStatus": "ACTIVE",
        "RetentionPeriodHours": 24,
        "EncryptionType": "NONE",
        "KeyId": null,
        "StreamCreationTimestamp": "2021-09-17T14:58:20+02:00"

In contrast, when using the Cloud Control API, I use a single API name get-resource, and I receive a consistent output.

aws cloudcontrol get-resource        
    --type-name AWS::Kinesis::Stream 
    --identifier NewsBlogDemo
   "TypeName": "AWS::Kinesis::Stream",
   "ResourceDescription": {
      "Identifier": "NewsBlogDemo",
      "Properties": "{"Arn":"arn:aws:kinesis:us-west-2:486652066693:stream/NewsBlogDemo","RetentionPeriodHours":168,"Name":"NewsBlogDemo","ShardCount":3}"

Similary, to create the resource above I used the create-resource API.

aws cloudcontrol create-resource    
   --type-name AWS::Kinesis::Stream 
   --desired-state "{"Name": "NewsBlogDemo","RetentionPeriodHours":168, "ShardCount":3}"

In my opinion, there are three types of builders that are going to adopt Cloud Control API:

The first community is builders using AWS Services APIs to manage their infrastructure or their customer’s infrastructure. The ones requiring usage of low-level AWS Services APIs rather than higher level tools. For example, I know companies that manages AWS infrastructures on behalf of their clients. Many developed solutions to list and describe all resources deployed in their client’s AWS Accounts, for management and billing purposes. Often, they built specific tools to address their requirements, but find it hard to keep up with new AWS Services and features. Cloud Control API simplifies this type of tools by offering a consistent, resource-centric approach. It makes easier to keep up with new AWS Services and features.

Another example: Stedi is a developer-focused platform for building automated Electronic Data Interchange (EDI) solutions that integrate with any business system. “We have a strong focus on infrastructure as code (IaC) within Stedi and have been looking for a programmatic way to discover and delete legacy cloud resources that are no longer managed through CloudFormation – helping us reduce complexity and manage cost,” said Olaf Conjin, Serverless Engineer at Stedi, Inc. “With AWS Cloud Control API, our teams can easily list each of these legacy resources, cross-reference them against CloudFormation managed resources, apply additional logic and delete the legacy resources. By deleting these unused legacy resources using Cloud Control API, we can manage our cloud spend in a simpler and faster manner. Cloud Control API allows us to remove the need to author and maintain custom code to discover and delete each type of resource, helping us improve our developer velocity”.

APN Partners
The second community that benefits from Cloud Control API is APN Partners, such as HashiCorp (maker of Terraform) and Pulumi, and other APN Partners offering solutions that relies on AWS Services APIs. When AWS releases a new service or feature, our partner’s engineering teams need to learn, integrate, and test a new set of AWS Service APIs to expose it in their offerings. This is a time consuming process and often leads to a lag between the AWS release and the availability of the service or feature in their solution. With the new Cloud Control API, partners are now able to build a unique REST API code base, using unified API verbs, common input parameters, and common error types. They just have to merge the standardized pre-defined uniform resource model to interact with new AWS Services exposed as REST resources.

Launch Partners
HashiCorp and Pulumi are our launch partners, both solutions are integrated with Cloud Control API today.

HashiCorp provides cloud infrastructure automation software that enables organizations to provision, secure, connect, and run any infrastructure for any application. “AWS Cloud Control API makes it easier for our teams to build solutions to integrate with new and existing AWS services,” said James Bayer – EVP Product, HashiCorp. “Integrating HashiCorp Terraform with AWS Cloud Control API means developers are able to use the newly released AWS features and services, typically on the day of launch.”

Pulumi’s new AWS Native Provider, powered by the AWS Cloud Control API, “gives Pulumi’s users faster access to the latest AWS innovations, typically the day they launch, without any need for us to manually implement support,” said Joe Duffy, CEO at Pulumi. “The full surface area of AWS resources provided by AWS Cloud Control API can now be automated from familiar languages like Python, TypeScript, .NET, and Go, with standard IDEs, package managers, and test frameworks, with high fidelity and great quality. Using this new provider, developers and infrastructure teams can develop and ship modern AWS applications and infrastructure faster and with more confidence than ever before.”

To learn more about HashiCorp and Pulumi’s integration with Cloud Control API, refer to their blog post and announcements. I will add the links here as soon as they are available.

AWS Customers
The third type of builders that will benefit from Cloud Control API is AWS customers using solution such as Terraform or Pulumi. You can benefit from Cloud Control API too. For example, when using the new Terraform AWS Cloud Control provider or Pulumi’s AWS Native Provider, you can benefit from availability of new AWS Services and features typically on the day of launch.

Now that you understand the benefits, let’s see Cloud Control API in action.

How It Works?
To start using Cloud Control API, I first make sure I use the latest AWS Command Line Interface (CLI) version. Depending on how the CLI was installed, there are different methods to update the CLI. Cloud Control API is available from our AWS SDKs as well.

To create an AWS Lambda function, I first create an handler, I zip it, and upload the zip file to one of my private bucket. I pay attention that the S3 bucket is in the same AWS Region where I will create the Lambda function:

cat << EOF >  
heredoc> import json 
def lambda_handler(event, context):
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')

aws s3 cp s3://private-bucket-seb/

Then, I call the create-resource API, passing the same set of arguments as required by the corresponding CloudFormation resource. In this example, the Code, Role, Runtime, and Handler arguments are mandatory, as per the CloudFormation AWS::Lambda::Function documentation.

aws cloudcontrol create-resource          
       --type-name AWS::Lambda::Function   
       --desired-state '{"Code":{"S3Bucket":"private-bucket-seb","S3Key":""},"Role":"arn:aws:iam::0123456789:role/lambda_basic_execution","Runtime":"python3.9","Handler":"index.lambda_handler"}' 
       --client-token 123

    "ProgressEvent": {
        "TypeName": "AWS::Lambda::Function",
        "RequestToken": "56a0782b-2b26-491c-b082-18f63d571bbd",
        "Operation": "CREATE",
        "OperationStatus": "IN_PROGRESS",
        "EventTime": "2021-09-26T12:05:42.210000+02:00"

I may call the same command again to get the status or to learn about an eventual error:

aws cloudcontrol create-resource          
       --type-name AWS::Lambda::Function   
       --desired-state '{"Code":{"S3Bucket":"private-bucket-seb","S3Key":""},"Role":"arn:aws:iam::0123456789:role/lambda_basic_execution","Runtime":"python3.9","Handler":"index.lambda_handler"}' 
       --client-token 123

    "ProgressEvent": {
        "TypeName": "AWS::Lambda::Function",
        "Identifier": "ukjfq7sqG15LvfC30hwbRAMfR-96K3UNUCxNd9",
        "RequestToken": "f634b21d-22ed-41bb-9612-8740297d20a3",
        "Operation": "CREATE",
        "OperationStatus": "SUCCESS",
        "EventTime": "2021-09-26T19:46:46.643000+02:00"

Here, the OperationStatus is SUCCESS and the function name is ukjfq7sqG15LvfC30hwbRAMfR-96K3UNUCxNd9 (I can pass my own name if I want something more descriptive 🙂 )

I then invoke the Lambda function to ensure it works as expected:

aws lambda invoke 
    --function-name ukjfq7sqG15LvfC30hwbRAMfR-96K3UNUCxNd9 
    out.txt && cat out.txt && rm out.txt 

    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
{"statusCode": 200, "body": ""Hello from Lambda!""}

When finished, I delete the Lambda function using Cloud Control API:

aws cloudcontrol delete-resource 
     --type-name AWS::Lambda::Function 
     --identifier ukjfq7sqG15LvfC30hwbRAMfR-96K3UNUCxNd9 
    "ProgressEvent": {
        "TypeName": "AWS::Lambda::Function",
        "Identifier": "ukjfq7sqG15LvfC30hwbRAMfR-96K3UNUCxNd9",
        "RequestToken": "8923991d-72b3-4981-8160-4d9a585965a3",
        "Operation": "DELETE",
        "OperationStatus": "IN_PROGRESS",
        "EventTime": "2021-09-26T20:06:22.013000+02:00"

You might have noticed the client-token parameter I passed to the create-resource API call. Create, Update, and Delete requests all accept a ClientToken, which is used to ensure idempotency of the request.

  • We recommend always passing a client token. This will disambiguate requests in case a retry is needed. Otherwise, you may encounter unexpected errors like ConcurrentOperationException or AlreadyExists.
  • We recommend that client tokens always be unique for every single request, such as by passing a UUID.

One More Thing
At the heart of AWS Cloud Control API source of data, there is the CloudFormation Public Registry, which my colleague Steve announced earlier this month in this blog post. It allows anyone to expose a set of AWS resources through CloudFormation and AWS CDK. This is the mechanism AWS Service teams are now using to release their services and features as CloudFormation and AWS CDK resources. Multiple third-party vendors are also publishing their solutions in the CloudFormation Public Registry. All resources published are modelled with a standard schema that defines the resource, its properties, and their attributes in a uniform way.

AWS Cloud Control API is a CRUDL API layer on top of resources published in the CloudFormation Public Registry. Any resource published in the registry exposes its attributes with standard JSON schemas. The resource can then be created, updated, deleted, or listed using Cloud Control API with no additional work.

For example, imagine I decide to expose a public CloudFormation stack to let any AWS customer create VPN servers, based on EC2 instances. I model the VPNServer resource type and publish it in the CloudFormation Public Registry. With no additional work on my side, my custom resource “VPNServer” is now available to all AWS customers through the Cloud Control API REST API. Not only, it is also automatically available through solutions like Hashicorp’s Terraform and Pulumi, and potentially others who adopt Cloud Control API in the future.

It is worth mentioning Cloud Control API is not aimed at replacing the traditional AWS service-level APIs. They are still there and will always be there, but we think that Cloud Control API is easier and more consistent to use and you should use it for new apps.

Availability and Pricing
Cloud Control API is available in all AWS Regions, except China.

You will only pay for the usage of underlying AWS resources, such as a CloudWatch logs or Lambda functions invocations, or pay for the number of handler operations and handler operation duration associated with using third-party resources (such as Datadog monitors or MongoDB Atlas clusters). There are no minimum fees and no required upfront commitments.

I can’t wait to discover what you are going to build on top of this new Cloud Control API. Go build!

— seb

Now — AWS Step Functions Supports 200 AWS Services To Enable Easier Workflow Automation

This post was originally published on this site

Today AWS Step Functions expands the number of supported AWS services from 17 to over 200 and AWS API Actions from 46 to over 9,000 with its new capability AWS SDK Service Integrations.

When developers build distributed architectures, one of the patterns they use is the workflow-based orchestration pattern. This pattern is helpful for workflow automation inside a service to perform distributed transactions. An example of a distributed transaction is all the tasks required to handle an order and keep track of the transaction status at all times.

Step Functions is a low-code visual workflow service used for workflow automation, to orchestrate services, and help you to apply this pattern. Developers use Step Functions with managed services such as Artificial Intelligence services, Amazon Simple Storage Service (Amazon S3), and Amazon DynamoDB.

Introducing Step Functions AWS SDK Service Integrations
Until today, when developers were building workflows that integrate with AWS services, they had to choose from the 46 supported services integrations that Step Functions provided. If the service integration was not available, they had to code the integration in an AWS Lambda function. This is not ideal as it added more complexity and costs to the application.

Now with Step Functions AWS SDK Service Integrations, developers can integrate their state machines directly to AWS service that has AWS SDK support.

You can create state machines that use AWS SDK Service Integrations with Amazon States Language (ASL), AWS Cloud Development Kit (AWS CDK), or visually using AWS Step Function Workflow Studio. To get started, create a new Task state. Then call AWS SDK services directly from the ASL in the resource field of a task state. To do this, use the following syntax.


Let me show you how to get started with a demo.

In this demo, you are building an application that, when given a video file stored in S3, transcribes it and translates from English to Spanish.

Let’s build this demo with Step Functions. The state machine, with the service integrations, integrates directly to S3, Amazon Transcribe, and Amazon Translate. The API for transcribing is asynchronous. To verify that the transcribing job is completed, you need a polling loop, which waits for it to be ready.

State machine we are going to build

Create the state machine
To follow this demo along, you need to complete these prerequisites:

  • An S3 bucket where you will put the original file that you want to process
  • A video or audio file in English stored in that bucket
  • An S3 bucket where you want the processing to happen

I will show you how to do this demo using the AWS Management Console. If you want to deploy this demo as infrastructure as code, deploy the AWS CloudFormation template for this project.

To get started with this demo, create a new standard state machine. Choose the option Write your workflow in code to build the state machine using ASL. Create a name for the state machine and create a new role.

Creating a state machine

Start a transcription job
To get started working on the state machine definition, you can Edit the state machine.

Edit the state machine definition

The following piece of ASL code is a state machine with two tasks that are using the new AWS SDK Service Integrations capability. The first task is copying the file from one S3 bucket to another, and the second task is starting the transcription job by directly calling Amazon Transcribe.

For using this new capability from Step Functions, the state type needs to be a Task. You need to specify the service name and API action using this syntax: “arn:aws:states:::aws-sdk:serviceName:apiAction.<serviceIntegrationPattern>”. Use camelCase for apiAction names in the Resource field, such as “copyObject”, and use PascalCase for parameter names in the Parameters field, such as “CopySource”.

For the parameters, find the name and required parameters in the AWS API documentation for this service and API action.

  "Comment": "A State Machine that process a video file",
  "StartAt": "GetSampleVideo",
  "States": {
    "GetSampleVideo": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:s3:copyObject",
      "Parameters": {
        "Bucket.$": "$.S3BucketName",
        "Key.$": "$.SampleDataInputKey",
        "CopySource.$": "States.Format('{}/{}',$.SampleDataBucketName,$.SampleDataInputKey)"
      "ResultPath": null,
      "Next": "StartTranscriptionJob"
    "StartTranscriptionJob": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:transcribe:startTranscriptionJob",
      "Parameters": {
        "Media": {
          "MediaFileUri.$": "States.Format('s3://{}/{}',$.S3BucketName,$.SampleDataInputKey)"
        "TranscriptionJobName.$": "$$.Execution.Name",
        "LanguageCode": "en-US",
        "OutputBucketName.$": "$.S3BucketName",
        "OutputKey": "transcribe.json"
      "ResultPath": "$.transcription",
      "End": true

In the previous piece of code, you can see an interesting use case of the intrinsic functions that ASL provides. You can construct a string using different parameters. Using intrinsic functions in combination with AWS SDK Service Integrations allows you to manipulate data without the needing a Lambda function. For example, this line:

"MediaFileUri.$": "States.Format('s3://{}/{}',$.S3BucketName,$.SampleDataInputKey)"

Give permissions to the state machine
If you start the execution of the state machine now, it will fail. This state machine doesn’t have permissions to access the S3 buckets or use Amazon Transcribe. Step Functions can’t autogenerate IAM policies for most AWS SDK Service Integrations, so you need to add those to the role manually.

Add those permissions to the IAM role that was created for this state machine. You can find a quick link to the role in the state machine details. Attach the “AmazonTranscribeFullAccess” and the “AmazonS3FullAccess” policies to the role.

Link of the IAM role

Running the state machine for the first time
Now that the permissions are in place, you can run this state machine. This state machine takes as an input the S3 bucket name where the original video is uploaded, the name for the file and the name of the S3 bucket where you want to store this file and do all the processing.

For this to work, this file needs to be a video or audio file and it needs to be in English. When the transcription job is done, it saves the result in the bucket you specify in the input with the name transcribe.json.

  "SampleDataBucketName": "<name of the bucket where the original file is>",
  "SampleDataInputKey": "<name of the original file>",
  "S3BucketName": "<name of the bucket where the processing will happen>"

As StartTranscriptionJob is an asynchronous call, you won’t see the results right away. The state machine is only calling the API, and then it completes. You need to wait until the transcription job is ready and then see the results in the output bucket in the file transcribe.json.

Adding a polling loop
Because you want to translate the text using your transcriptions results, your state machine needs to wait for the transcription job to complete. For building an API poller in a state machine, you can use a Task, Wait, and Choice state.

  • Task state gets the job status. In your case, it is calling the service Amazon Transcribe and the API getTranscriptionJob.
  • Wait state waits for 20 seconds, as the transcription job’s length depends on the size of the input file.
  • Choice state moves to the right step based on the result of the job status. If the job is completed, it moves to the next step in the machine, and if not, it keeps on waiting.

States of a polling loop

Wait state
The first of the states you are going to add is the Wait state. This is a simple state that waits for 20 seconds.

"Wait20Seconds": {
        "Type": "Wait",
        "Seconds": 20,
        "Next": "CheckIfTranscriptionDone"

Task state
The next state to add is the Task state, which calls the API getTranscriptionJob. For calling this API, you need to pass the transcription job name. This state returns the job status that is the input of the Choice state.

"CheckIfTranscriptionDone": {
        "Type": "Task",
        "Resource": "arn:aws:states:::aws-sdk:transcribe:getTranscriptionJob",
        "Parameters": {
          "TranscriptionJobName.$": "$.transcription.TranscriptionJob.TranscriptionJobName"
        "ResultPath": "$.transcription",
        "Next": "IsTranscriptionDone?"

Choice state
The Choice state has one rule that checks if the transcription job status is completed. If that rule is true, then it goes to the next state. If not, it goes to the Wait state.

 "IsTranscriptionDone?": {
        "Type": "Choice",
        "Choices": [
            "Variable": "$.transcription.TranscriptionJob.TranscriptionJobStatus",
            "StringEquals": "COMPLETED",
            "Next": "GetTranscriptionText"
        "Default": "Wait20Seconds"

Getting the transcription text
In this step you are extracting only the transcription text from the output file returned by the transcription job. You need only the transcribed text, as the result file has a lot of metadata that makes the file too long and confusing to translate.

This is a step that you would generally do with a Lambda function. But you can do it directly from the state machine using ASL.

First you need to create a state using AWS SDK Service Integration that gets the result file from S3. Then use another ASL intrinsic function to convert the file text from a String to JSON.

In the next state you can process the file as a JSON object. This state is a Pass state, which cleans the output from the previous state to get only the transcribed text.

 "GetTranscriptionText": {
        "Type": "Task",
        "Resource": "arn:aws:states:::aws-sdk:s3:getObject",
        "Parameters": {
          "Bucket.$": "$.S3BucketName",
          "Key": "transcribe.json"
        "ResultSelector": {
          "filecontent.$": "States.StringToJson($.Body)"
        "ResultPath": "$.transcription",
        "Next": "PrepareTranscriptTest"
      "PrepareTranscriptTest" : {
        "Type": "Pass",
        "Parameters": {
          "transcript.$": "$.transcription.filecontent.results.transcripts[0].transcript"
        "Next": "TranslateText"

Translating the text
After preparing the transcribed text, you can translate it. For that you will use Amazon Translate API translateText directly from the state machine. This will be the last state for the state machine and it will return the translated text in the output of this state.

"TranslateText": {
        "Type": "Task",
        "Resource": "arn:aws:states:::aws-sdk:translate:translateText",
        "Parameters": {
          "SourceLanguageCode": "en",
          "TargetLanguageCode": "es",
          "Text.$": "$.transcript"
         "ResultPath": "$.translate",
        "End": true

Add the permissions to the state machine to call the Translate API, by attaching the managed policy “TranslateReadOnly”.

Now with all these in place, you can run your state machine. When the state machine finishes running, you will see the translated text in the output of the last state.

Final state machine

Important things to know
Here are some things that will help you to use AWS SDK Service Integration:

  • Call AWS SDK services directly from the ASL in the resource field of a task state. To do this, use the following syntax: arn:aws:states:::aws-sdk:serviceName:apiAction.[serviceIntegrationPattern]
  • Use camelCase for apiAction names in the Resource field, such as “copyObject”, and use PascalCase for parameter names in the Parameters field, such as “CopySource”.
  • Step Functions can’t autogenerate IAM policies for most AWS SDK Service Integrations, so you need to add those to the IAM role of the state machine manually.
  • Take advantage of ASL intrinsic functions, as those allow you to manipulate the data and avoid using Lambda functions for simple transformations.

Get started today!
AWS SDK Service Integration is generally available in the following regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Canada (Central), Europe (Ireland), Europe (Milan), Africa (Cape Town) and Asia Pacific (Tokyo). It will be generally available in all other commercial regions where Step Functions is available in the coming days.

Learn more about this new capability by reading its documentation.


AWS Lambda Functions Powered by AWS Graviton2 Processor – Run Your Functions on Arm and Get Up to 34% Better Price Performance

This post was originally published on this site

Many of our customers (such as Formula One, Honeycomb, Intuit, SmugMug, and Snap Inc.) use the Arm-based AWS Graviton2 processor for their workloads and enjoy better price performance. Starting today, you can get the same benefits for your AWS Lambda functions. You can now configure new and existing functions to run on x86 or Arm/Graviton2 processors.

With this choice, you can save money in two ways. First, your functions run more efficiently due to the Graviton2 architecture. Second, you pay less for the time that they run. In fact, Lambda functions powered by Graviton2 are designed to deliver up to 19 percent better performance at 20 percent lower cost.

With Lambda, you are charged based on the number of requests for your functions and the duration (the time it takes for your code to execute) with millisecond granularity. For functions using the Arm/Graviton2 architecture, duration charges are 20 percent lower than the current pricing for x86. The same 20 percent reduction also applies to duration charges for functions using Provisioned Concurrency.

In addition to the price reduction, functions using the Arm architecture benefit from the performance and security built into the Graviton2 processor. Workloads using multithreading and multiprocessing, or performing many I/O operations, can experience lower execution time and, as a consequence, even lower costs. This is particularly useful now that you can use Lambda functions with up to 10 GB of memory and 6 vCPUs. For example, you can get better performance for web and mobile backends, microservices, and data processing systems.

If your functions don’t use architecture-specific binaries, including in their dependencies, you can switch from one architecture to the other. This is often the case for many functions using interpreted languages such as Node.js and Python or functions compiled to Java bytecode.

All Lambda runtimes built on top of Amazon Linux 2, including the custom runtime, are supported on Arm, with the exception of Node.js 10 that has reached end of support. If you have binaries in your function packages, you need to rebuild the function code for the architecture you want to use. Functions packaged as container images need to be built for the architecture (x86 or Arm) they are going to use.

To measure the difference between architectures, you can create two versions of a function, one for x86 and one for Arm. You can then send traffic to the function via an alias using weights to distribute traffic between the two versions. In Amazon CloudWatch, performance metrics are collected by function versions, and you can look at key indicators (such as duration) using statistics. You can then compare, for example, average and p99 duration between the two architectures.

You can also use function versions and weighted aliases to control the rollout in production. For example, you can deploy the new version to a small amount of invocations (such as 1 percent) and then increase up to 100 percent for a complete deployment. During rollout, you can lower the weight or set it to zero if your metrics show something suspicious (such as an increase in errors).

Let’s see how this new capability works in practice with a few examples.

Changing Architecture for Functions with No Binary Dependencies
When there are no binary dependencies, changing the architecture of a Lambda function is like flipping a switch. For example, some time ago, I built a quiz app with a Lambda function. With this app, you can ask and answer questions using a web API. I use an Amazon API Gateway HTTP API to trigger the function. Here’s the Node.js code including a few sample questions at the beginning:

const questions = [
      "Are there more synapses (nerve connections) in your brain or stars in our galaxy?",
    answers: [
      "More stars in our galaxy.",
      "More synapses (nerve connections) in your brain.",
      "They are about the same.",
    correctAnswer: 1,
      "Did Cleopatra live closer in time to the launch of the iPhone or to the building of the Giza pyramids?",
    answers: [
      "To the launch of the iPhone.",
      "To the building of the Giza pyramids.",
      "Cleopatra lived right in between those events.",
    correctAnswer: 0,
      "Did mammoths still roam the earth while the pyramids were being built?",
    answers: [
      "No, they were all exctint long before.",
      "Mammooths exctinction is estimated right about that time.",
      "Yes, some still survived at the time.",
    correctAnswer: 2,

exports.handler = async (event) => {

  const method = event.requestContext.http.method;
  const path = event.requestContext.http.path;
  const splitPath = path.replace(/^/+|/+$/g, "").split("/");

  console.log(method, path, splitPath);

  var response = {
    statusCode: 200,
    body: "",

  if (splitPath[0] == "questions") {
    if (splitPath.length == 1) {
      response.body = JSON.stringify(Object.keys(questions));
    } else {
      const questionId = splitPath[1];
      const question = questions[questionId];
      if (question === undefined) {
        response = {
          statusCode: 404,
          body: JSON.stringify({ message: "Question not found" }),
      } else {
        if (splitPath.length == 2) {
          const publicQuestion = {
            question: question.question,
            answers: question.answers.slice(),
          response.body = JSON.stringify(publicQuestion);
        } else {
          const answerId = splitPath[2];
          if (answerId == question.correctAnswer) {
            response.body = JSON.stringify({ correct: true });
          } else {
            response.body = JSON.stringify({ correct: false });

  return response;

To start my quiz, I ask for the list of question IDs. To do so, I use curl with an HTTP GET on the /questions endpoint:

$ curl https://<api-id>

Then, I ask more information on a question by adding the ID to the endpoint:

$ curl https://<api-id>
  "question": "Did Cleopatra live closer in time to the launch of the iPhone or to the building of the Giza pyramids?",
  "answers": [
    "To the launch of the iPhone.",
    "To the building of the Giza pyramids.",
    "Cleopatra lived right in between those events."

I plan to use this function in production. I expect many invocations and look for options to optimize my costs. In the Lambda console, I see that this function is using the x86_64 architecture.

Console screenshot.

Because this function is not using any binaries, I switch architecture to arm64 and benefit from the lower pricing.

Console screenshot.

The change in architecture doesn’t change the way the function is invoked or communicates its response back. This means that the integration with the API Gateway, as well as integrations with other applications or tools, are not affected by this change and continue to work as before.

I continue my quiz with no hint that the architecture used to run the code has changed in the backend. I answer back to the previous question by adding the number of the answer (starting from zero) to the question endpoint:

$ curl https://<api-id>
  "correct": true

That’s correct! Cleopatra lived closer in time to the launch of the iPhone than the building of the Giza pyramids. While I am digesting this piece of information, I realize that I completed the migration of the function to Arm and optimized my costs.

Changing Architecture for Functions Packaged Using Container Images
When we introduced the capability to package and deploy Lambda functions using container images, I did a demo with a Node.js function generating a PDF file with the PDFKit module. Let’s see how to migrate this function to Arm.

Each time it is invoked, the function creates a new PDF mail containing random data generated by the faker.js module. The output of the function is using the syntax of the Amazon API Gateway to return the PDF file using Base64 encoding. For convenience, I replicate the code (app.js) of the function here:

const PDFDocument = require('pdfkit');
const faker = require('faker');
const getStream = require('get-stream');

exports.lambdaHandler = async (event) => {

    const doc = new PDFDocument();

    const randomName =;

    doc.text(randomName, { align: 'right' });
    doc.text(faker.address.streetAddress(), { align: 'right' });
    doc.text(faker.address.secondaryAddress(), { align: 'right' });
    doc.text(faker.address.zipCode() + ' ' +, { align: 'right' });
    doc.text('Dear ' + randomName + ',');
    for(let i = 0; i < 3; i++) {
    doc.text(, { align: 'right' });

    pdfBuffer = await getStream.buffer(doc);
    pdfBase64 = pdfBuffer.toString('base64');

    const response = {
        statusCode: 200,
        headers: {
            'Content-Length': Buffer.byteLength(pdfBase64),
            'Content-Type': 'application/pdf',
            'Content-disposition': 'attachment;filename=test.pdf'
        isBase64Encoded: true,
        body: pdfBase64
    return response;

To run this code, I need the pdfkit, faker, and get-stream npm modules. These packages and their versions are described in the package.json and package-lock.json files.

I update the FROM line in the Dockerfile to use an AWS base image for Lambda for the Arm architecture. Given the chance, I also update the image to use Node.js 14 (I was using Node.js 12 at the time). This is the only change I need to switch architecture.

COPY app.js package*.json ./
RUN npm install
CMD [ "app.lambdaHandler" ]

For the next steps, I follow the post I mentioned previously. This time I use random-letter-arm for the name of the container image and for the name of the Lambda function. First, I build the image:

$ docker build -t random-letter-arm .

Then, I inspect the image to check that it is using the right architecture:

$ docker inspect random-letter-arm | grep Architecture

"Architecture": "arm64",

To be sure the function works with the new architecture, I run the container locally.

$ docker run -p 9000:8080 random-letter-arm:latest

Because the container image includes the Lambda Runtime Interface Emulator, I can test the function locally:

$ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'

It works! The response is a JSON document containing a base64-encoded response for the API Gateway:

    "statusCode": 200,
    "headers": {
        "Content-Length": 2580,
        "Content-Type": "application/pdf",
        "Content-disposition": "attachment;filename=test.pdf"
    "isBase64Encoded": true,
    "body": "..."

Confident that my Lambda function works with the arm64 architecture, I create a new Amazon Elastic Container Registry repository using the AWS Command Line Interface (CLI):

$ aws ecr create-repository --repository-name random-letter-arm --image-scanning-configuration scanOnPush=true

I tag the image and push it to the repo:

$ docker tag random-letter-arm:latest
$ aws ecr get-login-password | docker login --username AWS --password-stdin
$ docker push

In the Lambda console, I create the random-letter-arm function and select the option to create the function from a container image.

Console screenshot.

I enter the function name, browse my ECR repositories to select the random-letter-arm container image, and choose the arm64 architecture.

Console screenshot.

I complete the creation of the function. Then, I add the API Gateway as a trigger. For simplicity, I leave the authentication of the API open.

Console screenshot.

Now, I click on the API endpoint a few times and download some PDF mails generated with random data:

Screenshot of some PDF files.

The migration of this Lambda function to Arm is complete. The process will differ if you have specific dependencies that do not support the target architecture. The ability to test your container image locally helps you find and fix issues early in the process.

Comparing Different Architectures with Function Versions and Aliases
To have a function that makes some meaningful use of the CPU, I use the following Python code. It computes all prime numbers up to a limit passed as a parameter. I am not using the best possible algorithm here, that would be the sieve of Eratosthenes, but it’s a good compromise for an efficient use of memory. To have more visibility, I add the architecture used by the function to the response of the function.

import json
import math
import platform
import timeit

def primes_up_to(n):
    primes = []
    for i in range(2, n+1):
        is_prime = True
        sqrt_i = math.isqrt(i)
        for p in primes:
            if p > sqrt_i:
            if i % p == 0:
                is_prime = False
        if is_prime:
    return primes

def lambda_handler(event, context):
    start_time = timeit.default_timer()
    N = int(event['queryStringParameters']['max'])
    primes = primes_up_to(N)
    stop_time = timeit.default_timer()
    elapsed_time = stop_time - start_time

    response = {
        'machine': platform.machine(),
        'elapsed': elapsed_time,
        'message': 'There are {} prime numbers <= {}'.format(len(primes), N)
    return {
        'statusCode': 200,
        'body': json.dumps(response)

I create two function versions using different architectures.

Console screenshot.

I use a weighted alias with 50% weight on the x86 version and 50% weight on the Arm version to distribute invocations evenly. When invoking the function through this alias, the two versions running on the two different architectures are executed with the same probability.

Console screenshot.

I create an API Gateway trigger for the function alias and then generate some load using a few terminals on my laptop. Each invocation computes prime numbers up to one million. You can see in the output how two different architectures are used to run the function.

$ while True
    curl https://<api-id>

{"machine": "aarch64", "elapsed": 1.2595275060011772, "message": "There are 78498 prime numbers <= 1000000"}
{"machine": "aarch64", "elapsed": 1.2591725109996332, "message": "There are 78498 prime numbers <= 1000000"}
{"machine": "x86_64", "elapsed": 1.7200910530000328, "message": "There are 78498 prime numbers <= 1000000"}
{"machine": "x86_64", "elapsed": 1.6874686619994463, "message": "There are 78498 prime numbers <= 1000000"}
{"machine": "x86_64", "elapsed": 1.6865161940004327, "message": "There are 78498 prime numbers <= 1000000"}
{"machine": "aarch64", "elapsed": 1.2583248640003148, "message": "There are 78498 prime numbers <= 1000000"}

During these executions, Lambda sends metrics to CloudWatch and the function version (ExecutedVersion) is stored as one of the dimensions.

To better understand what is happening, I create a CloudWatch dashboard to monitor the p99 duration for the two architectures. In this way, I can compare the performance of the two environments for this function and make an informed decision on which architecture to use in production.

Console screenshot.

For this particular workload, functions are running much faster on the Graviton2 processor, providing a better user experience and much lower costs.

Comparing Different Architectures with Lambda Power Tuning
The AWS Lambda Power Tuning open-source project, created by my friend Alex Casalboni, runs your functions using different settings and suggests a configuration to minimize costs and/or maximize performance. The project has recently been updated to let you compare two results on the same chart. This comes in handy to compare two versions of the same function, one using x86 and the other Arm.

For example, this chart compares x86 and Arm/Graviton2 results for the function computing prime numbers I used earlier in the post:


The function is using a single thread. In fact, the lowest duration for both architectures is reported when memory is configured with 1.8 GB. Above that, Lambda functions have access to more than 1 vCPU, but in this case, the function can’t use the additional power. For the same reason, costs are stable with memory up to 1.8 GB. With more memory, costs increase because there are no additional performance benefits for this workload.

I look at the chart and configure the function to use 1.8 GB of memory and the Arm architecture. The Graviton2 processor is clearly providing better performance and lower costs for this compute-intensive function.

Availability and Pricing
You can use Lambda Functions powered by Graviton2 processor today in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), Europe (Ireland), EU (London), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo).

The following runtimes running on top of Amazon Linux 2 are supported on Arm:

  • Node.js 12 and 14
  • Python 3.8 and 3.9
  • Java 8 (java8.al2) and 11
  • .NET Core 3.1
  • Ruby 2.7
  • Custom Runtime (provided.al2)

You can manage Lambda Functions powered by Graviton2 processor using AWS Serverless Application Model (SAM) and AWS Cloud Development Kit (AWS CDK). Support is also available through many AWS Lambda Partners such as AntStack, Check Point, Cloudwiry, Contino, Coralogix, Datadog, Lumigo, Pulumi, Slalom, Sumo Logic, Thundra, and Xerris.

Lambda functions using the Arm/Graviton2 architecture provide up to 34 percent price performance improvement. The 20 percent reduction in duration costs also applies when using Provisioned Concurrency. You can further reduce your costs by up to 17 percent with Compute Savings Plans. Lambda functions powered by Graviton2 are included in the AWS Free Tier up to the existing limits. For more information, see the AWS Lambda pricing page.

You can find help to optimize your workloads for the AWS Graviton2 processor in the Getting started with AWS Graviton repository.

Start running your Lambda functions on Arm today.


Amazon Managed Service for Prometheus Is Now Generally Available with Alert Manager and Ruler

This post was originally published on this site

At AWS re:Invent 2020, we introduced the preview of Amazon Managed Service for Prometheus, an open source Prometheus-compatible monitoring service that makes it easy to monitor containerized applications at scale. With Amazon Managed Service for Prometheus, you can use the Prometheus query language (PromQL) to monitor the performance of containerized workloads without having to manage the underlying infrastructure required to scale and secure the ingestion, storage, alert, and querying of operational metrics.

Amazon Managed Service for Prometheus automatically scales as your monitoring needs grow. It is a highly available service deployed across multiple Availability Zones (AZs) that integrates AWS security and compliance capabilities. The service offers native support for PromQL as well as the ability to ingest Prometheus metrics from over 150 Prometheus exporters maintained by the open source community.

With Amazon Managed Service for Prometheus, you can collect Prometheus metrics from Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), and Amazon Elastic Kubernetes Service (Amazon EKS) environments using AWS Distro for OpenTelemetry (ADOT) or Prometheus servers as collection agents.

During the preview, we contributed the high-availability alert manager to the open source Cortex project, a project providing horizontally scalable, highly available, multi-tenant, long-term store for Prometheus. Also, we reduced the price of metric samples ingested by up to 84 percent, and supported collection of metrics for AWS Lambda applications by ADOT.

Today, I am happy to announce the general availability of Amazon Managed Service for Prometheus with new features such as alert manager and ruler that support Amazon Simple Notification Service (Amazon SNS) as a receiver destination for notifications from Alert Manager. You can integrate Amazon SNS with destinations such as email, webhook, Slack, PagerDuty, OpsGenie, or VictorOps with Amazon SNS.

Getting Started with Alert Manager and Ruler
To get started in the AWS Management Console, you can simply create a workspace, a logical space dedicated to the storage, alerting, and querying of metrics from one or more Prometheus servers. You can set up the ingestion of Prometheus metrics to this workspace using Helm and query those metrics. To learn more, see Getting started in the Amazon Managed Service for Prometheus User Guide.

At general availability, we added new alert manager and rules management features. The service supports two types of rules: recording rules and alerting rules. These rules files are the same YAML format as standalone Prometheus, which may be configured and then evaluated at regular intervals.

To configure your workspace with a set of rules, choose Add namespace in Rules management and select a YAML format rules file.

An example rules file would record CPU usage metrics in container workloads and triggers an alert if CPU usage is high for five minutes.

Next, you can create a new Amazon SNS topic or reuse an existing SNS topic where it will route the alerts. The alertmanager routes the alerts to SNS and SNS routes to downstream locations. Configured alerting rules will fire alerts to the Alert Manager, which deduplicate, group, and route alerts to Amazon SNS via the SNS receiver. If you’d like to receive email notifications for your alerts, configure an email subscription for the SNS topic you had.

To give Amazon Managed Service for Prometheus permission to send messages to your SNS topic, select the topic you plan to send to, and add the access policy block:

    "Sid": "Allow_Publish_Alarms",
    "Effect": "Allow",
    "Principal": {
        "Service": ""
    "Action": [
    "Resource": "arn:aws:sns:us-east-1:123456789012:Notifyme"

If you have a topic to get alerts, you can configure this SNS receiver in the alert manager configuration. An example config file is the same format as Prometheus, but you have to provide the config underneath an alertmanager_config: block in for the service’s Alert Manager. For more information about the Alert Manager config, visit Alerting Configuration in Prometheus guide.

    receiver: default
    repeat_interval: 5m
    name: default
      topic_arn: "arn:aws:sns:us-east-1:123456789012:Notifyme"
        region: us-west-2
        key: severity
        value: "warning"

You can replace the topic_arn for the topic that you create while setting up the SNS connection. To learn more about the SNS receiver in the alert manager config, visit Prometheus SNS receiver on the Prometheus Github page.

To configure the Alert Manager, open the Alert Manager and choose Add definition, then select a YAML format alert config file.

When an alert is created by Prometheus and sent to the Alert Manager, it can be queried by hitting the ListAlerts endpoint to see all the active alerts in the system. After hitting the endpoint, you can see alerts in the list of actively firing alerts.

$ curl
GET /workspaces/ws-0123456/alertmanager/api/v2/alerts HTTP/1.1
X-Amz-Date: 20210628T171924Z
    "receivers": [
        "name": "default"
    "startsAt": "2021-09-24T01:37:42.393Z",
    "updatedAt": "2021-09-24T01:37:42.393Z",
    "endsAt": "2021-09-24T01:37:42.393Z",
    "status": {
      "state": "unprocessed",
    "labels": {
      "severity": "warning"

A successful notification will result in an email received from your SNS topic with the alert details. Also, you can output messages in JSON format to be easily processed downstream of SNS by AWS Lambda or other APIs and webhook receiving endpoints. For example, you can connect SNS with a Lambda function for message transformation or triggering automation. To learn more, visit Configuring Alertmanager to output JSON to SNS in the Amazon Managed Service for Prometheus User Guide.

Sending from Amazon SNS to Other Notification Destinations
You can connect Amazon SNS to a variety of outbound destinations such as email, webhook (HTTP), Slack, PageDuty, and OpsGenie.

  • Webhook – To configure a preexisting SNS topic to output messages to a webhook endpoint, first create a subscription to an existing topic. Once active, your HTTP endpoint should receive SNS notifications.
  • Slack – You can either integrate with Slack’s email-to-channel integration where Slack has the ability to accept an email and forward it to a Slack channel, or you can utilize a Lambda function to rewrite the SNS notification to Slack. To learn more, see forwarding emails to Slack channels and AWS Lambda function to convert SNS messages to Slack.
  • PagerDuty – To customize the payload sent to PagerDuty, customize the template that is used in generating the message sent to SNS by adjusting or updating template_files block in your alertmanager definition.

Available Now
Amazon Managed Service for Prometheus is available today in nine AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), Europe (Ireland), Europe (Stockholm), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo).

You pay only for what you use, based on metrics ingested, queried, and stored. As part of the AWS Free Tier, you can get started with Amazon Managed Service for Prometheus for 40 million metric samples ingested and 10 GB metrics stored per month. To learn more, visit the pricing page.

If you want to learn about AWS observability on AWS, visit One Observability Workshop which provides a hands-on experience for you on the wide variety of toolsets AWS offers to set up monitoring and observability on your applications.

Please send feedback to the AWS forum for Amazon Managed Service for Prometheus or through your usual AWS support contacts.


Introducing Amazon Redshift Query Editor V2, a Free Web-based Query Authoring Tool for Data Analysts

This post was originally published on this site

When it comes to manipulating and analyzing relational data, Structured Query Language (SQL) has been an international standard since 1986, a couple of years before I was born. And yet, it sometimes takes hours to get access to a new database or data warehouse, configure credentials or single sign-on, download and install multiple desktop libraries or drivers, and get familiar with the new schema—all this before you even run a query. Not to mention the challenge of sharing queries, results, and analyses securely between members of the same team or across teams.

Today, I’m glad to announce the general availability of Amazon Redshift Query Editor V2, a web-based tool that you can use to explore, analyze, and share data using SQL. It allows you to explore, analyze, share, and collaborate on data stored on Amazon Redshift. It supports data warehouses on Amazon Redshift and data lakes through Amazon Redshift Spectrum.

Amazon Redshift Query Editor V2 provides a free serverless web interface that reduces the operational costs of managing query tools and infrastructure. Because it’s a managed SQL editor in your browser and it’s integrated with your single sign-on provider, the Query Editor V2 reduces the number of steps to the first query so you gain insights faster. You also get in-place visual analysis of query results (no data download required), all in one place. As an additional team productivity boost, it improves collaboration with saved queries and the ability to share results and analyses between users.

From a security standpoint, analysts can access Query Editor V2 without requiring any admin privileges on the Amazon Redshift cluster, using an IAM role for READ, WRITE, or ADMIN access. Check out the documentation for more details.

Connection Setup for Amazon Redshift Query Editor V2
First, you’ll need to configure the connection to your Amazon Redshift cluster.

After you have configured the connection, you can reuse it for future sessions. And, of course, you can edit or delete a connection at any time.

Simply click on a cluster to connect with Query Editor V2.

Amazon Redshift Query Editor V2 in Action
The web interface allows you to browse schemas, tables, views, functions, and stored procedures. You can also preview a table’s columns with one click and create or delete schemas, tables, or functions.

The interface is intuitive for newcomers and expert users alike. You can resize panels, create tabs, and configure your editor preferences.

Running or explaining a query is quite straightforward: You simply write (or paste) the query and choose Run. You can visualize and interact with the result set in the bottom pane. For example, you might want to change the row ordering or search for a specific word. Even though Amazon Redshift Query Editor V2 is a browser-based tool, the data movement between your browser and the Amazon Redshift cluster is optimized, so your browser doesn’t need to download any raw data. A lot of the filtering and reordering happens directly in the browser, without any wait time.

To export a result set as a JSON or CSV file on your local machine, simply right-click it.

So far so good! Running queries is the minimum you’d expect from a Query Editor. Let’s have a look at some of the more interesting features.

Team Collaboration with Amazon Redshift Query Editor V2
Amazon Redshift Query Editor V2 allows you to manage the permissions of your team members based on their IAM roles, so that you can easily share queries and cluster access in a secure way.

For example, you can use IAM managed policies such as AmazonRedshiftQueryEditorV2FullAccess, AmazonRedshiftQueryEditorV2ReadSharing, or AmazonRedshiftQueryEditorV2ReadWriteSharing. Also, don’t forget to include the redshift:GetClusterCredentials permission.

After you’ve set up the IAM roles for your team, choose Save to save a query.

The Untitled tab will show the query name. From now on, you edit this saved query to make updates and then choose Save again.

Individual users with WRITE access can run, edit, and delete shared queries, while users with READ access can only run shared queries.

If you work on multiple projects and collaborate with many different teams, it might be difficult to remember query names or even find them in a long list. In Amazon Redshift Query Editor V2, saved and shared queries are available from the left navigation in Queries. You can keep your queries organized into folders. Even nested folders are supported.

Last but not least, each saved query is versioned and the version history is always available. That’s pretty useful when you need to restore an older version.

Plot Your Queries with Amazon Redshift Query Editor V2
Sharing queries with teammates is great, but wouldn’t it even better if you could visualize a result set, export it as PNG or JPEG, and save the chart for later? Amazon Redshift Query Editor V2 allows you to perform in-place visualizations of your results. When you’re happy with the look and feel of your chart, you can save it for later and organize all your saved charts into folders. This allows you to simply choose a saved chart, rerun the corresponding query, and export the new image. No need to configure the plot from scratch or remember the configuration of hundreds of charts and queries across different projects.

Available Today
Amazon Redshift Query Editor V2 is available today in all commercial AWS Regions, except AP-Northeast-3 regions. It requires no license and it’s free, except for the cost for your Amazon Redshift cluster.

You can interact with the service using the Amazon Redshift console. It doesn’t require any driver or software on your local machine.

For more information, see the Amazon Redshift Query Editor V2 technical documentation or take a look at this video:

We look forward to your feedback.


New – Amazon Genomics CLI Is Now Open Source and Generally Available

This post was originally published on this site

Less than 70 years separate us from one of the greatest discoveries of all time: the double helix structure of DNA. We now know that DNA is a sort of a twisted ladder composed of four types of compounds, called bases. These four bases are usually identified by an uppercase letter: adenine (A), guanine (G), cytosine (C), and thymine (T). One of the reasons for the double helix structure is that when these compounds are at the two sides of the ladder, A always bonds with T, and C always bonds with G.

If we unroll the ladder on a table, we’d see two sequences of “letters”, and each of the two sides would carry the same genetic information. For example, here are two series (AGCT and TCGA) bound together:

A – T
G – C
C – G
T – A

These series of letters can be very long. For example, the human genome is composed of over 3 billion letters of code and acts as the biological blueprint of every cell in a person. The information in a person’s genome can be used to create highly personalized treatments to improve the health of individuals and even the entire population. Similarly, genomic data can be use to track infectious diseases, improve diagnosis, and even track epidemics, food pathogens and toxins. This is the emerging field of environmental genomics.

Accessing genomic data requires genome sequencing, which with recent advances in technology, can be done for large groups of individuals, quickly and more cost-effectively than ever before. In the next five years, genomics datasets are estimated to grow and contain more than a billion sequenced genomes.

How Genomics Data Analysis Works
Genomics data analysis uses a variety of tools that need to be orchestrated as a specific sequence of steps, or a workflow. To facilitate developing, sharing, and running workflows, the genomics and bioinformatics communities have developed specialized workflow definition languages like WDL, Nextflow, CWL, and Snakemake.

However, this process generates petabytes of raw genomic data and experts in genomics and life science struggle to scale compute and storage resources to handle data at such massive scale.

To process data and provide answers quickly, cloud resources like compute, storage, and networking need to be configured to work together with analysis tools. As a result, scientists and researchers often have to spend valuable time deploying infrastructure and modifying open-source genomics analysis tools instead of making contributions to genomics innovations.

Introducing Amazon Genomics CLI
A couple of months ago, we shared the preview of Amazon Genomics CLI, a tool that makes it easier to process genomics data at petabyte scale on AWS. I am excited to share that the Amazon Genomics CLI is now an open source project and is generally available today. You can use it with publicly available workflows as a starting point and develop your analysis on top of these.

Amazon Genomics CLI simplifies and automates the deployment of cloud infrastructure, providing you with an easy-to-use command line interface to quickly setup and run genomics workflows on AWS. By removing the heavy lifting from setting up and running genomics workflows in the cloud, software developers and researchers can automatically provision, configure and scale cloud resources to enable faster and more cost-effective population-level genetics studies, drug discovery cycles, and more.

Amazon Genomics CLI lets you run your workflows on an optimized cloud infrastructure. More specifically, the CLI:

  • Includes improvements to genomics workflow engines to make them integrate better with AWS, removing the burden to manually modify open-source tools and tune them to run efficiently at scale. These tools work seamlessly across Amazon Elastic Container Service (Amazon ECS), Amazon DynamoDB, Amazon Elastic File System (Amazon EFS), and Amazon Simple Storage Service (Amazon S3), helping you to scale compute and storage and at the same time optimize your costs using features like EC2 Spot Instances.
  • Eliminates the most time-consuming tasks like provisioning storage and compute capacities, deploying the genomics workflow engines, and tuning the clusters used to execute workflows.
  • Automatically increases or decreases cloud resources based on your workloads, which eliminates the risk of buying too much or too little capacity.
  • Tags resources so that you can use tools like AWS Cost & Usage Report to understand the costs related to your genomics data analysis across multiple AWS services.

The use of Amazon Genomics CLI is based on these three main concepts:

Workflow – These are bioinformatics workflows written in languages like WDL or Nextflow. They can be either single script files or packages of multiple files. These workflow script files are workflow definitions and combined with additional metadata, like the workflow language the definition is written in, form a workflow specification that is used by the CLI to execute workflows on appropriate compute resources.

Context – A context encapsulates and automates time-consuming tasks to configure and deploy workflow engines, create data access policies, and tune compute clusters (managed using AWS Batch) for operation at scale.

Project – A project links together workflows, datasets, and the contexts used to process them. From a user perspective, it handles resources related to the same problem or used by the same team.

Let’s see how this works in practice.

Using Amazon Genomics CLI
I follow the instructions to install Amazon Genomics CLI on my laptop. Now, I can use the agc command to manage genomic workloads. I see the available options with:

$ agc --help

The first time I use it, I activate my AWS account:

$ agc account activate

This creates the core infrastructure that Amazon Genomics CLI needs to operate, which includes an S3 bucket, a virtual private cloud (VPC), and a DynamoDB table. The S3 bucket is used for durable metadata, and the VPC is used to isolate compute resources.

Optionally, I can bring my own VPC. I can also use one of my named profiles for the AWS Command Line Interface (CLI). In this way, I can customize the AWS Region and the AWS account used by the Amazon Genomics CLI.

I configure my email address in the local settings. This wil be used to tag resources created by the CLI:

$ agc configure email

There are a few demo projects in the examples folder included by the Amazon Genomics CLI installation. These projects use different engines, such as Cromwell or Nextflow. In the demo-wdl-project folder, the agc-project.yaml file describes the workflows, the data, and the contexts for the Demo project:

name: Demo
schemaVersion: 1
      language: wdl
      version: 1.0
    sourceURL: workflows/hello
      language: wdl
      version: 1.0
    sourceURL: workflows/read
      language: wdl
      version: 1.0
    sourceURL: workflows/haplotype
      language: wdl
      version: 1.0
    sourceURL: workflows/words
  - location: s3://gatk-test-data
    readOnly: true
  - location: s3://broad-references
    readOnly: true
      - type: wdl
        engine: cromwell

    requestSpotInstances: true
      - type: wdl
        engine: cromwell

For this project, there are four workflows (hello, read, words-with-vowels, and haplotype). The project has read-only access to two S3 buckets and can run workflows using two contexts. Both contexts use the Cromwell engine. One context (spotCtx) uses Amazon EC2 Spot Instances to optimize costs.

In the demo-wdl-project folder, I use the Amazon Genomics CLI to deploy the spotCtx context:

$ agc context deploy -c spotCtx

After a few minutes, the context is ready, and I can execute the workflows. Once started, a context incurs about $0.40 per hour of baseline costs. These costs don’t include the resources created to execute workflows. Those resources depend on your specific use case. Contexts have the option to use spot instances by adding the requestSpotInstances flag to their configuration.

I use the CLI to see the status of the contexts of the project:

$ agc context status


Now, let’s look at the workflows included in this project:

$ agc workflow list

2021-09-24T11:15:29+01:00 𝒊  Listing workflows.
WORKFLOWNAME words-with-vowels

The simplest workflow is hello. The content of the hello.wdl file is quite understandable if you know any programming language:

version 1.0
workflow hello_agc {
    call hello {}
task hello {
    command { echo "Hello Amazon Genomics CLI!" }
    runtime {
        docker: "ubuntu:latest"
    output { String out = read_string( stdout() ) }

The hello workflow defines a single task (hello) that prints the output of a command. The task is executed on a specific container image (ubuntu:latest). The output is taken from standard output (stdout), the default file descriptor where a process can write output.

Running workflows is an asynchronous process. After submitting a workflow from the CLI, it is handled entirely in the cloud. I can run multiple workflows at a time. The underlying compute resources will automatically scale and I will be charged only for what I use.

Using the CLI, I start the hello workflow:

$ agc workflow run hello -c spotCtx

2021-09-24T13:03:47+01:00 𝒊  Running workflow. Workflow name: 'hello', Arguments: '', Context: 'spotCtx'

The workflow was successfully submitted, and the last line is the workflow execution ID. I can use this ID to reference a specific workflow execution. Now, I check the status of the workflow:

$ agc workflow status

2021-09-24T13:04:21+01:00 𝒊  Showing workflow run(s). Max Runs: 20
WORKFLOWINSTANCE	spotCtx	fcf72b78-f725-493e-b633-7dbe67878e91	true	RUNNING	2021-09-24T12:03:53Z	hello

The hello workflow is still running. After a few minutes, I check again:

$ agc workflow status

2021-09-24T13:12:23+01:00 𝒊  Showing workflow run(s). Max Runs: 20
WORKFLOWINSTANCE	spotCtx	fcf72b78-f725-493e-b633-7dbe67878e91	true	COMPLETE	2021-09-24T12:03:53Z	hello

The workflow has terminated and is now complete. I look at the workflow logs:

$ agc logs workflow hello

2021-09-24T13:13:08+01:00 𝒊  Showing the logs for 'hello'
2021-09-24T13:13:12+01:00 𝒊  Showing logs for the latest run of the workflow. Run id: 'fcf72b78-f725-493e-b633-7dbe67878e91'
Fri, 24 Sep 2021 13:07:22 +0100	download: s3://agc-123412341234-eu-west-1/scripts/1a82f9a96e387d78ae3786c967f97cc0 to tmp/tmp.498XAhEOy/batch-file-temp
Fri, 24 Sep 2021 13:07:22 +0100	*** LOCALIZING INPUTS ***
Fri, 24 Sep 2021 13:07:23 +0100	download: s3://agc-123412341234-eu-west-1/project/Demo/userid/danilop20tbvT/context/spotCtx/cromwell-execution/hello_agc/fcf72b78-f725-493e-b633-7dbe67878e91/call-hello/script to agc-024700040865-eu-west-1/project/Demo/userid/danilop20tbvT/context/spotCtx/cromwell-execution/hello_agc/fcf72b78-f725-493e-b633-7dbe67878e91/call-hello/script
Fri, 24 Sep 2021 13:07:23 +0100	*** COMPLETED LOCALIZATION ***
Fri, 24 Sep 2021 13:07:23 +0100	Hello Amazon Genomics CLI!
Fri, 24 Sep 2021 13:07:23 +0100	*** DELOCALIZING OUTPUTS ***
Fri, 24 Sep 2021 13:07:24 +0100	upload: ./hello-rc.txt to s3://agc-123412341234-eu-west-1/project/Demo/userid/danilop20tbvT/context/spotCtx/cromwell-execution/hello_agc/fcf72b78-f725-493e-b633-7dbe67878e91/call-hello/hello-rc.txt
Fri, 24 Sep 2021 13:07:25 +0100	upload: ./hello-stderr.log to s3://agc-123412341234-eu-west-1/project/Demo/userid/danilop20tbvT/context/spotCtx/cromwell-execution/hello_agc/fcf72b78-f725-493e-b633-7dbe67878e91/call-hello/hello-stderr.log
Fri, 24 Sep 2021 13:07:25 +0100	upload: ./hello-stdout.log to s3://agc-123412341234-eu-west-1/project/Demo/userid/danilop20tbvT/context/spotCtx/cromwell-execution/hello_agc/fcf72b78-f725-493e-b633-7dbe67878e91/call-hello/hello-stdout.log
Fri, 24 Sep 2021 13:07:25 +0100	*** COMPLETED DELOCALIZATION ***
Fri, 24 Sep 2021 13:07:25 +0100	*** EXITING WITH RETURN CODE ***
Fri, 24 Sep 2021 13:07:25 +0100	0

In the logs, I find as expected the Hello Amazon Genomics CLI! message printed by workflow.

I can also look at the content of hello-stdout.log on S3 using the information in the log above:

aws s3 cp s3://agc-123412341234-eu-west-1/project/Demo/userid/danilop20tbvT/context/spotCtx/cromwell-execution/hello_agc/fcf72b78-f725-493e-b633-7dbe67878e91/call-hello/hello-stdout.log -

Hello Amazon Genomics CLI!

It worked! Now, let’s look for at more complex workflows. Before I change project, I destroy the context for the Demo project:

$ agc context destroy -c spotCtx

In the gatk-best-practices-project folder, I list the available workflows for the project:

$ agc workflow list

2021-09-24T11:41:14+01:00 𝒊  Listing workflows.
WORKFLOWNAME	bam-to-unmapped-bams
WORKFLOWNAME	cram-to-bam
WORKFLOWNAME	gatk4-basic-joint-genotyping
WORKFLOWNAME	gatk4-data-processing
WORKFLOWNAME	gatk4-germline-snps-indels
WORKFLOWNAME	gatk4-rnaseq-germline-snps-indels
WORKFLOWNAME	interleaved-fastq-to-paired-fastq
WORKFLOWNAME	paired-fastq-to-unmapped-bam
WORKFLOWNAME	seq-format-validation

In the agc-project.yaml file, the gatk4-data-processing workflow points to a local directory with the same name. This is the content of that directory:

$ ls gatk4-data-processing


This workflow processes high-throughput sequencing data with GATK4, a genomic analysis toolkit focused on variant discovery.

The directory contains a MANIFEST.json file. The manifest file describes which file contains the main workflow to execute (there can be more than one WDL file in the directory) and where to find input parameters and options. Here’s the content of the manifest file:

  "mainWorkflowURL": "processing-for-variant-discovery-gatk4.wdl",
  "inputFileURLs": [
  "optionFileURL": "options.json"

In the gatk-best-practices-project folder, I create a context to run the workflows:

$ agc context deploy -c spotCtx

Then, I start the gatk4-data-processing workflow:

$ agc workflow run gatk4-data-processing -c spotCtx

2021-09-24T12:08:22+01:00 𝒊  Running workflow. Workflow name: 'gatk4-data-processing', Arguments: '', Context: 'spotCtx'

After a couple of hours, the workflow has terminated:

$ agc workflow status

2021-09-24T14:06:40+01:00 𝒊  Showing workflow run(s). Max Runs: 20
WORKFLOWINSTANCE	spotCtx	630e2d53-0c28-4f35-873e-65363529c3de	true	COMPLETE	2021-09-24T11:08:28Z	gatk4-data-processing

I look at the logs:

$ agc logs workflow gatk4-data-processing

Fri, 24 Sep 2021 14:02:32 +0100	*** DELOCALIZING OUTPUTS ***
Fri, 24 Sep 2021 14:03:45 +0100	upload: ./NA12878.hg38.bam to s3://agc-123412341234-eu-west-1/project/GATK/userid/danilop20tbvT/context/spotCtx/cromwell-execution/PreProcessingForVariantDiscovery_GATK4/630e2d53-0c28-4f35-873e-65363529c3de/call-GatherBamFiles/NA12878.hg38.bam
Fri, 24 Sep 2021 14:03:46 +0100	upload: ./NA12878.hg38.bam.md5 to s3://agc-123412341234-eu-west-1/project/GATK/userid/danilop20tbvT/context/spotCtx/cromwell-execution/PreProcessingForVariantDiscovery_GATK4/630e2d53-0c28-4f35-873e-65363529c3de/call-GatherBamFiles/NA12878.hg38.bam.md5
Fri, 24 Sep 2021 14:03:47 +0100	upload: ./NA12878.hg38.bai to s3://agc-123412341234-eu-west-1/project/GATK/userid/danilop20tbvT/context/spotCtx/cromwell-execution/PreProcessingForVariantDiscovery_GATK4/630e2d53-0c28-4f35-873e-65363529c3de/call-GatherBamFiles/NA12878.hg38.bai
Fri, 24 Sep 2021 14:03:48 +0100	upload: ./GatherBamFiles-rc.txt to s3://agc-123412341234-eu-west-1/project/GATK/userid/danilop20tbvT/context/spotCtx/cromwell-execution/PreProcessingForVariantDiscovery_GATK4/630e2d53-0c28-4f35-873e-65363529c3de/call-GatherBamFiles/GatherBamFiles-rc.txt
Fri, 24 Sep 2021 14:03:49 +0100	upload: ./GatherBamFiles-stderr.log to s3://agc-123412341234-eu-west-1/project/GATK/userid/danilop20tbvT/context/spotCtx/cromwell-execution/PreProcessingForVariantDiscovery_GATK4/630e2d53-0c28-4f35-873e-65363529c3de/call-GatherBamFiles/GatherBamFiles-stderr.log
Fri, 24 Sep 2021 14:03:50 +0100	upload: ./GatherBamFiles-stdout.log to s3://agc-123412341234-eu-west-1/project/GATK/userid/danilop20tbvT/context/spotCtx/cromwell-execution/PreProcessingForVariantDiscovery_GATK4/630e2d53-0c28-4f35-873e-65363529c3de/call-GatherBamFiles/GatherBamFiles-stdout.log
Fri, 24 Sep 2021 14:03:50 +0100	*** COMPLETED DELOCALIZATION ***
Fri, 24 Sep 2021 14:03:50 +0100	*** EXITING WITH RETURN CODE ***
Fri, 24 Sep 2021 14:03:50 +0100	0

Results have been written to the S3 bucket created during the account activation. The name of the bucket is in the logs but I can also find it stored as a parameter by AWS Systems Manager. I can save it in an environment variable with the following command:

$ export AGC_BUCKET=$(aws ssm get-parameter 
  --name /agc/_common/bucket 
  --query 'Parameter.Value' 
  --output text)

Using the AWS Command Line Interface (CLI), I can now explore the results on the S3 bucket and get the outputs of the workflow.

Before looking at the results, I remove the resources that I don’t need by stopping the context. This will destroy all compute resources, but retain data in S3.

$ agc context destroy -c spotCtx

Additional examples on configuring different contexts and running additional workflows are provided in the documentation on GitHub.

Availability and Pricing
Amazon Genomics CLI is an open source tool, and you can use it today in all AWS Regions with the exception of AWS GovCloud (US) and Regions located in China. There is no cost for using the AWS Genomics CLI. You pay for the AWS resources created by the CLI.

With the Amazon Genomics CLI, you can focus on science instead of architecting infrastructure. This gets you up and running faster, enabling research, development, and testing workloads. For production workloads that scale to several thousand parallel workflows, we can provide recommended ways to leverage additional Amazon services, like AWS Step Functions, just reach out to our account teams for more information.


Amazon QuickSight Q – Business Intelligence Using Natural Language Questions

This post was originally published on this site

Making sense of business data so that you can get value out of it is worthwhile yet still challenging. Even though the term Business Intelligence (BI) has been around since the mid-1800s (according to Wikipedia) adoption of contemporary BI tools within enterprises is still fairly low.

Amazon QuickSight was designed to make it easier for you to put BI to work in your organization. Announced in 2015 and launched in 2016, QuickSight is a scalable BI service built for the cloud. Since that 2016 launch, we have added many new features, including geospatial visualization and private VPC access in 2017, pay-per-session pricing in 2018, additional APIs (data, dashboard, SPICE, and permissions in 2019), embedded authoring of dashboards & support for auto-naratives in 2020, and Dataset-as-a-Source in 2021.

QuickSight Q is Here
My colleague Harunobu Kameda announced Amazon QuickSight Q (or Q for short) last December and gave you a sneak peek. Today I am happy to announce the general availability of Q, and would like to show you how it works!

To recap, Q is a natural language query tool for the Enterprise Edition of QuickSight. Powered by machine learning, it makes your existing data more accessible, and therefore more valuable. Think of Q as your personal Business Intelligence Engineer or Data Analyst, one that is on call 24 hours a day and always ready to provide you with quick, meaningful results! You get high-quality results in seconds, always shown in an appropriate form.

Behind the scenes, Q uses Natural Language Understanding (NLU) to discover the intent of your question. Aided by models that have been trained to recognize vocabulary and concepts drawn from multiple domains (sales, marketing, retail, HR, advertising, financial services, health care, and so forth), Q is able to answer questions that refer all data sources supported by QuickSight. This includes data from AWS sources such as Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Aurora, Amazon Athena, and Amazon Simple Storage Service (Amazon S3) as well as third party sources & SaaS apps such as Salesforce, Adobe Analytics, ServiceNow, and Excel.

Q in Action
Q is powered by topics, which are generally created by QuickSight Authors for use within an organization (if you are a QuickSight Author, you can learn more about getting started). Topics represent subject areas for questions, and are created interactively. To learn more about the five-step process that Authors use to create a topic, be sure to watch our new video, Tips to Create a Great Q Topic.

To use Q, I simply select a topic (B2B Sales in this case) and enter a question in the Q bar at the top of the page:

Q query --

In addition to the actual results, Q gives me access to explanatory information that I can review to ensure that my question was understood and processed as desired. For example, I can click on sales and learn how Q handles the field:

Detailed information on the use of the sales field.

I can fine-tune each aspect as well; here I clicked Sorted by:

Changing sort order for sales field.

Q chooses an appropriate visual representation for each answer, but I can fine-tune that as well:

Select a new visual type.

Perhaps I want a donut chart instead:

Now that you have seen how Q processes a question and gives you control over how the question is processed & displayed, let’s take a look at a few more questions, starting with “which product sells best in south?”


Here’s “what is total sales by region and category?” using the vertical stacked bar chart visual:

Total sales by region and catergory.

Behind the Scenes – Q Topics
As I mentioned earlier, Q uses topics to represent a particular subject matter. I click Topics to see the list of topics that I have created or that have been shared with me:

I click B2B Sales to learn more. The Summary page is designed to provide QuickSight Authors with information that they can use to fine-tune the topic:

Info about the B2B Sales Topic.

I can click on the Data tab and learn more about the list of fields that Q uses to answer questions. Each field can have some synonyms or friendly names to make the process of asking questions simpler and more natural:

List of fields for the B2B Sales topic.

I can expand a field (row) to learn more about how Q “understands” and uses the field. I can make changes in order to exercise control over the types of aggregations that make sense for the field, and I can also provide additional semantic information:

Information about the Product Name field.

As an example of providing additional semantic information, if the field’s Semantic Type is Location, I can choose the appropriate sub-type:

The User Activity tab shows me the questions that users are asking of this topic:

User activirty for the B2B Sales topic.

QuickSight Authors can use this tab to monitor user feedback, get a sense of the most common questions, and also use the common questions to drive improvements to the content provided on QuickSight dashboards.

Finally, the Verified answers tab shows the answers that have been manually reviewed and approved:

Things to Know
Here are a couple of things to know about Amazon QuickSight Q:

Pricing – There’s a monthly fee for each Reader and each Author; take a look at the QuickSight Pricing Page for more information.

Regions – Q is available in the US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), Europe (Frankfurt), and Europe (London) Regions.

Supported Languages – We are launching with question support in English.