Tag Archives: AWS

New – Convert Your Single-Region Amazon DynamoDB Tables to Global Tables

This post was originally published on this site

Hundreds of thousands of AWS customers are using Amazon DynamoDB. In 2017, we launched DynamoDB global tables, a fully managed solution to deploy multi-region, multi-master DynamoDB tables without having to build and maintain your own replication solution. When you create a global table, you specify the AWS Regions where you want the table to be available. DynamoDB performs all of the necessary tasks to create identical tables in these regions and propagate ongoing data changes to all of them.

AWS customers are using DynamoDB global tables for two main reasons: to provide a low-latency experience to their clients and to facilitate their backup or disaster recovery process. Latency is the time it takes for a piece of information to travel through the network and back. Lower latency apps have higher customer engagement and generate more revenue. Deploying your backend to multiple regions close to your customers allows you to reduce the latency in your app. Having a full copy of your data in another region makes it easy to switch traffic to that other region in case you would break your regional setup or in case of an exceedingly rare regional failure. As our CTO, Dr. Werner Vogels wrote: “failures are a given, and everything will eventually fail over time.”

Starting today, you can convert your existing DynamoDB tables to global tables with a few clicks in the AWS Management Console, or using the AWS Command Line Interface (CLI), or the Amazon DynamoDB API. Previously, only empty tables could be converted to global tables. You had to guess your regional usage of a table at the time you created it. Now you can go global, or you can extend existing global tables to additional regions at any time.

Your applications can continue to use the table while we set up the replication. When you add a region to your table, DynamoDB begins populating the new replica using a snapshot of your existing table. Your applications can continue writing to your existing region while DynamoDB builds the new replica, and all in-flight updates will be eventually replicated to your new replica.

To create a DynamoDB global table using the AWS Command Line Interface (CLI), I first create a local table in the US West (Oregon) Region (us-west-2):

aws dynamodb create-table --region us-west-2 
                          --table-name demo-global-table 
                          --key-schema AttributeName=id,KeyType=HASH 
                          --attribute-definitions AttributeName=id,AttributeType=S 
                          --billing-mode PAY_PER_REQUEST

The command returns:

{
    "TableDescription": {
        "AttributeDefinitions": [
            {
                "AttributeName": "id",
                "AttributeType": "S"
            }
        ],
        "TableName": "demo-global-table",
        "KeySchema": [
            {
                "AttributeName": "id",
                "KeyType": "HASH"
            }
        ],
        "TableStatus": "CREATING",
        "CreationDateTime": 1570278914.419,
        "ProvisionedThroughput": {
            "NumberOfDecreasesToday": 0,
            "ReadCapacityUnits": 0,
            "WriteCapacityUnits": 0
        },
        "TableSizeBytes": 0,
        "ItemCount": 0,
        "TableArn": "arn:aws:dynamodb:us-west-2:400000000003:table/demo-global-table",
        "TableId": "0a04bd34-bbff-42dd-ae18-78d05ce641fd",
        "BillingModeSummary": {
            "BillingMode": "PAY_PER_REQUEST"
        }
    }
}

Once the table is created, I insert some items:

aws dynamodb batch-write-item --region us-west-2 --request-items file://./batch-write-items.json

(The json file is available as a gist)

Then, I update the table to add an additional region, the US East (N. Virginia) Region (us-east-1):

aws dynamodb update-table --region us-west-2 --table-name demo-global-table --add-region us-east-1

The command returns a long JSON, the attributes you need to pay attention to are:

{
...
        "TableStatus": "UPDATING",
        "TableSizeBytes": 124,
        "ItemCount": 3,
        "StreamSpecification": {
            "StreamEnabled": true,
            "StreamViewType": "NEW_AND_OLD_IMAGES"
        },
        "LatestStreamLabel": "2019-10-22T19:33:37.819",
        "LatestStreamArn": "arn:aws:dynamodb:us-west-2:400000000003:table/demo-global-table/stream/2019-10-22T19:33:37.819"
    }
...
}

I can make the same update in the AWS Management Console I select the table to update and click Global Tables.

Enabling streaming is a requirement for global tables. I first click on Enable stream, then Add region:

I choose the region I want to replicate to, for this example EU West (Ireland) and click Create replica.

DynamoDB asynchronously replicates the table to the new region. I monitor the progress of the replication in the AWS Management Console. Table’s status will eventually change from Creating to Active. I also can check the status by calling the DescribeTable API and verify TableStatus = Active.

After a while, I can query the table in the new region:

aws dynamodb get-item --region eu-east-1 --table-name demo-global-table --key '{"id" : {"S" : "0123456789"}}'

{
    "Item": {
        "firstname": {
            "S": "Jeff"
        },
        "id": {
            "S": "0123456789"
        },
        "lastname": {
            "S": "Barr"
        }
    }
}

Starting today, you can update existing local tables to global tables. In a few weeks, we’ll release a tool that enables you to update your existing global tables to take advantage of this new capability. The update itself will take a few minutes at most. Your table will be available for your applications during the update process.

Other Improvements
We are also simplifying the internal mechanism used for data synchronization. Previously, DynamoDB global tables leveraged DynamoDB Streams and added three attributes (aws:rep:*) in your schema to keep your data in sync. DynamoDB now manages replication natively. It does not expose synchronization attributes to your data and it does not consume additional write capacity:

  • Only one write operation occurs in each region of your global table, which reduces the consumed replicated write capacity that is required on your table.
  • Because of that, a second DynamoDB Streams record is no longer published.
  • The three aws:rep:* attributes that were previously populated are no longer inserted in the item record.

These changes have two consequences for your apps. First, they reduce your DynamoDB costs when using global tables, because no extra write capacity is required to managed the synchronization. Secondly, when your application is relying on the three technical items (aws:rep:*), your application requires a slight code change. In particular, DynamoDB Mapper must not require the aws:rep:* attributes to exist in the item record.

With this change, we are also updating the UpdateTable API. Any operation that modifies global secondary indexes (GSIs), billing mode, server-side encryption, or write capacity units on a global table are applied to all other replicas asynchronously.

Availability
The improved Amazon DynamoDB global tables is available today in the 13 regions where Amazon DynamoDB global tables is available, and more regions are planned in the future. As of today, the list of AWS Regions is us-east-1 (Northern Virginia), us-west-2 (Oregon), us-east-2 (Ohio), us-west-1 (Northern California), ap-northeast-2 (Seoul), ap-southeast-1 (Singapore), ap-southeast-2 (Sydney), ap-northeast-1 (Tokyo), eu-central-1 (Frankfurt), eu-west-1 (Ireland), eu-west-2 (London), GovCloud (US-East), and GovCloud (US-West).

There is no change in pricing. You pay only for the resources you use in the additional regions and the data transfer between regions.

This update addresses the most common feedback that we have heard from you and will serve as the platform on which we will build additional features in the future. Continue to tell us how you are using global tables and what is important for your apps.

— seb

Announcing CloudTrail Insights: Identify and Respond to Unusual API Activity

This post was originally published on this site

Building software in the cloud makes it easy to instrument systems for logging from the very beginning. With tools like AWS CloudTrail, tracking every action taken on AWS accounts and services is straightforward, providing a way to find the event that caused a given change. But not all log entries are useful. When things are running smoothly, those log entries are like the steady, reassuring hum of machinery on a factory floor. When things start going wrong, that hum can make it harder to hear which piece of equipment has gone a bit wobbly. The same is true with large scale software systems: the volume of log data can be overwhelming. Sifting through those records to find actionable information is tedious. It usually requires a lot of custom software or custom integrations, and can result in false positives and alert fatigue when new services are added.

That’s where software automation and machine learning can help. Today, we’re launching AWS CloudTrail Insights in all commercial AWS regions. CloudTrail Insights automatically analyzes write management events from CloudTrail trails and alerts you to unusual activity. For example, if there is an increase in TerminateInstance events that differs from established baselines, you’ll see it as an Insight event. These events make finding and responding to unusual API activity easier than ever.

Enabling AWS CloudTrail Insights

CloudTrail tracks user activity and API usage. It provides an event history of AWS account activity, including actions taken through the AWS Management Console, AWS SDKs, command line tools, and other AWS services. With the launch of AWS CloudTrail Insights, you can enable machine learning models that detect unusual activity in these logs with just a few clicks. AWS CloudTrail Insights will analyze historical API calls, identifying usage patterns and generating Insight Events for unusual activity.

Screenshot showing how to enable CloudTrail Insights

You can also enable Insights on a trail from the AWS Command Line Interface (CLI) by using the put-insight-selectors command:

$ aws cloudtrail put-insight-selectors --trail-name trail_name --insight-selectors '{"InsightType": "ApiCallRateInsight"}'

Once enabled, CloudTrail Insights sends events to the S3 bucket specified on the trail details page. Events are also sent to CloudWatch Events, and optionally to an CloudWatch Logs log group, just like other CloudTrail Events. This gives you options when it comes to alerting, from sophisticated rules that respond to CloudWatch events to custom AWS Lambda functions. After enabling Insights, historical events for the trail will be analyzed. Anomalous usage patterns found will appear in the CloudTrail Console within 30 minutes.

Using CloudTrail Insights

In this post we’ll take a look at some AWS CloudTrail Insights Events from the AWS Console. If you’d like to view Insight events from the AWS CLI, you use the CloudTrail LookupEvents call with the event-category parameter.

$ aws cloudtrail lookup-events --event-category insight [--max-item] [--lookup-attributes]

Quickly scanning the list of CloudTrail Insights, the RunInstances event jumps out to me. Spinning up more EC2 instances can be expensive, and I’ve definitely mis-configured things such that I created more instances than needed before, so I want to take a closer look. Let’s filter the list down to just these events and see what we can learn from AWS CloudTrail Insights.

Let’s dig in to the latest event.

Here we see that over the course of one minute, there was a spike in RunInstances API call volume. From the Insights graph, we can see the raw event as JSON.

{
    "Records": [
        {
            "eventVersion": "1.07",
            "eventTime": "2019-11-07T13:25:00Z",
            "awsRegion": "us-east-1",
            "eventID": "a9edc959-9488-4790-be0f-05d60e56b547",
            "eventType": "AwsCloudTrailInsight",
            "recipientAccountId": "-REDACTED-",
            "sharedEventID": "c2806063-d85d-42c3-9027-d2c56a477314",
            "insightDetails": {
                "state": "Start",
                "eventSource": "ec2.amazonaws.com",
                "eventName": "RunInstances",
                "insightType": "ApiCallRateInsight",
                "insightContext": {
                    "statistics": {
                        "baseline": {
                            "average": 0.0020833333},
                        "insight": {
                            "average": 6}
                    }
                }
            },
            "eventCategory": "Insight"},
        {
            "eventVersion": "1.07",
            "eventTime": "2019-11-07T13:26:00Z",
            "awsRegion": "us-east-1",
            "eventID": "33a52182-6ff8-49c8-baaa-9caac16a96ce",
            "eventType": "AwsCloudTrailInsight",
            "recipientAccountId": "-REDACTED-",
            "sharedEventID": "c2806063-d85d-42c3-9027-d2c56a477314",
            "insightDetails": {
                "state": "End",
                "eventSource": "ec2.amazonaws.com",
                "eventName": "RunInstances",
                "insightType": "ApiCallRateInsight",
                "insightContext": {
                    "statistics": {
                        "baseline": {
                            "average": 0.0020833333},
                        "insight": {
                            "average": 6},
                        "insightDuration": 1}
                }
            },
            "eventCategory": "Insight"}
    ]}

Here we can see that the baseline API call volume is 0.002. That means that there’s usually one call to RunInstances roughly once every 500 minutes, so the activity we see in the graph is definitely not normal. By clicking over to the CloudTrail Events tab we can see the individual events that are grouped into this Insight event. It looks like this was probably a normal EC2 autoscaling activity, but I still want to dig in and confirm.

By expanding an event in this tab and clicking “View Event,” I can head directly to the event in CloudTrail for more information. After reviewing the event metadata and associated EC2 and IAM resources, I’ve confirmed that while this behavior was unusual, it’s not a cause for concern. It looks like autoscaling did what it was supposed to and that the correct type of instance was created.

Things to Know

Before you get started, here are some important things to know:

  • CloudTrail Insights costs $0.35 for every 100,000 write management events analyzed for each Insight type. At launch, API call volume insights are the only type available.
  • Activity baselines are scoped to the region and account in which the CloudTrail trail is operating.
  • After an account enables Insights events for the first time, if an unusual activity is detected, you can expect to receive the first Insights events within 36 hours of enabling Insights..
  • New unusual activity is logged as it is discovered, sending Insight Events to your destination S3 buckets and the AWS console within 30 minutes in most cases.

Let me know if you have any questions or feature requests, and happy building!

— Brandon

 

Improving Containers by Listening to Customers

This post was originally published on this site

At AWS, we build our product roadmap based upon feedback from our customers. The following three new features have all come about because customers have asked us to solve specific issues they have face when building and operating sophisticated container-based applications.

Managed Node Groups for Amazon Elastic Kubernetes Service
Our customers have told us that they want to focus on building innovative solutions for their customers, and focus less on the heavy lifting of managing Kubernetes infrastructure.

Amazon Elastic Kubernetes Service already provides you with a standard, highly-available Kubernetes cluster control plane, and now, AWS can also manage the nodes (Amazon Elastic Compute Cloud (EC2) instances) for your Kubernetes cluster. Amazon Elastic Kubernetes Service makes it easy to apply bug fixes and security patches to nodes, and updates them to the latest Kubernetes versions along with the cluster.

The Amazon Elastic Kubernetes Service console and API give you a single place to understand the state of your cluster, you no longer have to jump around different services to see all of the resources that make up your cluster.

You can provision managed nodes today when you create a new Amazon EKS cluster. There is no additional cost to use Amazon EKS managed node groups, you only pay for the Amazon EKS cluster and AWS resources they provision. To find out more check out this blog: Extending the EKS API: Managed Node Groups.

Managing your container Logs with AWS FireLens
Customers building container-based applications told us that they wanted more flexibility when it came to logging; however, they didn’t wish to to install, configure, or troubleshoot logging agents.

AWS FireLens, gives you this flexibility as you can now forward container logs to storage and analytics tools by configuring your task definition in Amazon ECS or AWS Fargate.

This means that developers have their containers send logs to Stdout and then FireLens picks up these logs and forwards them to the destination that has been configured.

FireLens works with the open-source projects Fluent Bit and Fluentd, which means that you can send logs to any destination supported by either of those projects.

There are a lot of configuration options with FireLens, and you can choose to filter logs and even have logs sent to multiple destinations. For more information, you can take a look at the demo I wrote earlier in the week: Announcing Firelens – A New Way to Manage Container Logs.

If’ you would like a deeper understanding of how the technology works and was built, Wesley Pettit goes into even further depth on the Containers Blog in his article: Under the hood: FireLens for Amazon ECS Tasks.

Amazon Elastic Container Registry EventBridge Support
Customers using Amazon Elastic Container Registry have told us they want to be able to start a build process when new container images are pushed to Elastic Container Registry.

We have therefore added Amazon Elastic Container Registry EventBridge support.

Using events that Elastic Container Registry now publishes to EventBridge, you can trigger actions such as starting a pipeline or posting a message to somewhere like Amazon Chime or Slack when your image is successfully pushed.

To learn more about this new feature, check out the following blog post where I give a more detailed explication and demo: EventBridge support in Amazon Elastic Container Registry.

More to come
These 3 new releases add to other great releases we have already had this year such as Savings Plans, Amazon EKS Windows Containers support, and Native Container Image Scanning in Amazon ECR.

We are still listening, and we need your feedback, so if you have a feature request or a pain point with your container applications, please let us know by creating or commenting on issues in our public containers roadmap. Sometime in the future I might one-day writing about a new feature that was inspired by you.

Martin

 

EventBridge Support in Amazon Elastic Container Registry

This post was originally published on this site

Many of our customers require a secure and private place to store their container images, and that’s why they use our fully managed container registry Amazon Elastic Container Registry. We recently added support for Amazon EventBridge so that you can trigger actions when images are pushed or deleted. These actions can trigger a continuous integration, continuous deployment pipeline when an image is pushed or post a message to your DevOps team Slack channel when an image has been deleted.

This new capability can even enable complicated workflows, for example, customers can use the image push event on a base image to trigger a rebuild of images built on top of that base. In this scenario, a base image might be rebuilt weekly to pick up the latest security patches. A push event from the base image repository can trigger other builds, so that all derivative images are patched, too.

To show you how to go about using this new capability, I thought I’d open up the console and work through an example of how all the pieces fit together.

In the Amazon EventBridge console, I create a new rule, and I enter a unique name and description.

Next, I scroll down to Define pattern and begin to customise the type of event pattern that I want to use. I leave the default Event pattern radio button selected and also that I want to use a Pre-defined patten by service. Since Elastic Container Registry is an AWS service, I select AWS as the Service Provider.

In the Service Name section, you can select one of the many different AWS services as the event source. I am going to choose the newest addition to this list Elastic Container Registry (ECR). Lastly, in this section, I select ECR Image Action as the Event type. This ECR Image Action contains both DELETE and PUSH as action types.

Next, I’m asked to configure which event bus I want to use. For this example, I select the AWS default event bus that comes with every AWS account.

Now that I have identified where my events are coming from, I now need to say where I want them to go. We call these targets, and there are plenty of options here. For example, I could send the event to a Lambda Function, a Kinesis stream, or any one of the wide variety of AWS targets.

To keep things simple, I’m going to choose to invoke a Amazon Simple Notification Service (SNS) topic. This topic is called ImageAction, and I have subscribed to this topic so that I receive an email when new messages are received by this topic.

Back over on my laptop, I push a new version of my container to my repository in to Elastic Container Registry.

If I go over to the Elastic Container Registry console, I can see that my Docker Image was successfully pushed, I’m now going to select the image and click the Delete button, which will delete my new image.

This will have sent both a PUSH and a DELETE event through to my SNS topic which in turn deliver two emails to me as a subscriber to that topic.

 

If I open up Outlook, sure enough, I have two (admittedly not pretty) emails that have both the respective action-type of PUSH and DELETE.

So there you have it, you can now wire up events in Elastic Container Registry and enable exciting and wonderful things to happen as a result. Amazon EventBridge support in Amazon Elastic Container Registry is available in all public AWS Regions and GovCloud (US). Try it now in the Amazon EventBridge console.

Happy Eventing!

Martin

Welcome to AWS Storage Day

This post was originally published on this site

Everyone on the AWS team has been working non-stop to make sure that re:Invent 2019 is the biggest and best one yet. Way back in September, the entire team of AWS News Bloggers gathered in Seattle for a set of top-secret briefings. We listened to the teams, read their PRFAQs (Press Release + FAQ), and chose the launches that we wanted to work on. We’ve all been heads-down ever since, reading the docs, putting the services to work, writing drafts, and responding to feedback.

Heads-Up
Today, a week ahead of opening day, we are making the first round of announcements, all of them related to storage. We are doing this in order to spread out the launches a bit, and to give you some time to sign up for the appropriate re:Invent sessions. We’ve written individual posts for some of the announcements, and are covering the rest in summary form in this post.

Regardless of the AWS storage service that you use, I think you’ll find something interesting and useful here. We are launching significant new features for Amazon Elastic Block Store (EBS), Amazon FSx for Windows File Server, Amazon Elastic File System (EFS), AWS DataSync, AWS Storage Gateway, and Amazon Simple Storage Service (S3). As I never tire of saying, all of these features are available now and you can start using them today!

Let’s get into it…

Elastic Block Store (EBS)
The cool new Fast Snapshot Restore (FSR) feature enables the creation of fully-initialized, full-performance EBS volumes.

Amazon FSx for Windows File Server
This file system now includes a long list of enterprise-ready features, including remote management, native multi-AZ file systems, user quotas, and more! Several new features make this file system even more cost-effective, including data deduplication and support for smaller SSD file systems.

Elastic File System (EFS)
Amazon EFS is now available in all commercial AWS regions. Check out the EFS Pricing page for pricing in your region, or read my original post, Amazon Elastic File System – Production-Ready in Three Regions, to learn more.

AWS DataSync
AWS DataSync was launched at re:Invent 2018. It supports Automated and Accelerated Data Transfer, and can be used for migration, upload & process operations, and backup/DR.

Effective November 1, 2019, we are reducing the per-GB price for AWS DataSync from $0.04/GB to $0.0125/GB. For more information, check out the AWS DataSync Pricing page.

The new task scheduling feature allows you to periodically execute a task that detects changes and copies them from the source storage system to the destination, with options to run tasks on an hourly, daily, weekly, or custom basis:

We recently added support in the Europe (London), Europe (Paris), and Canada (Central) Regions. Today we are adding support in the Europe (Stockholm), Asia Pacific (Mumbai), South America (São Paulo), Asia Pacific (Hong Kong), and AWS GovCloud (US-East) Regions. As a result, AWS DataSync is now available in all commercial and GovCloud regions.

For more information, check out the AWS DataSync Storage Day post!

AWS Storage Gateway
AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage. With this launch, you now have access to a new set of enterprise features:

High AvailabilityStorage Gateway now includes a range of health checks when running within a VMware vSphere High Availability (VMware HA) environment, and can now recover from most service interruptions in under 60 seconds. In the unlikely event that a recovery is necessary, sessions will be maintained and applications should continue to operate unaffected after a pause. The new gateway health checks integrate automatically with VMware through the VM heartbeat. You have the ability to adjust the sensitivity of the heartbeat from within VMware:

To learn more, read Deploy a Highly Available AWS Storage Gateway on a VMware vSphere Cluster.

Enhanced Metrics – If you enable Amazon CloudWatch Integration, AWS Storage Gateway now publishes cache utilization, access pattern, throughput, and I/O metrics to Amazon CloudWatch and makes them visible in the Monitoring tab for each gateway. To learn more, read Monitoring Gateways.

More Maintenance Options – You now have additional control over the software updates that are applied to each Storage Gateway. Mandatory security updates are always applied promptly, and you can control the schedule for feature updates. You have multiple options including day of the week and day of the month, with more coming soon. To learn more, read Managing Storage Gateway Updates.

Increased Performance – AWS Storage Gateway now delivers higher read performance when used as a Virtual Tape Library, and for reading data and listing directories when used as a File Gateway, providing you faster access to data managed through these gateways.

Amazon S3
We launched Same-Region Replication (SRR) in mid-September, giving you the ability to configure in-region replication based on bucket, prefix, or object tag. When an object is replicated using SRR, the metadata, Access Control Lists (ACLs), and objects tags associated with the object are also replicated. Once SRR has been configured on a source bucket, any changes to these elements will trigger a replication to the destination bucket. To learn more, read about S3 Replication.

Today we are launching a Service Level Agreement (SLA) for S3 Replication, along with the ability to monitor the status of each of your replication configurations. To learn more, read S3 Replication Update: Replication SLA, Metrics, and Events.

AWS Snowball Edge
This is, as I have already shared, a large-scale data migration and edge computing device with on-board compute and storage capabilities. We recently launched three free training courses that will help you to learn more about this unique and powerful device:

You may also enjoy reading about Data Migration Best Practices with AWS Snowball Edge.

Jeff;

 

New – Amazon EBS Fast Snapshot Restore (FSR)

This post was originally published on this site

Amazon Elastic Block Store (EBS) has been around for more than a decade and is a fundamental AWS building block. You can use it to create persistent storage volumes that can store up to 16 TiB and supply up to 64,000 IOPS (Input/Output Operations per Second). You can choose between four types of volumes, making the choice that best addresses your data transfer throughput, IOPS, and pricing requirements. If your requirements change, you can modify the type of a volume, expand it, or change the performance while the volume remains online and active. EBS snapshots allow you to capture the state of a volume for backup, disaster recovery, and other purposes. Once created, a snapshot can be used to create a fresh EBS volume. Snapshots are stored in Amazon Simple Storage Service (S3) for high durability.

Our ever-creative customers are using EBS snapshots in many interesting ways. In addition to the backup and disaster recovery use cases that I just mentioned, they are using snapshots to quickly create analytical or test environments using data drawn from production, and to support Virtual Desktop Interface (VDI) environments. As you probably know, the AMIs (Amazon Machine Images), that you use to launch EC2 instances are also stored as one or more snapshots.

Fast Snapshot Restore
Today we are launching Fast Snapshot Restore (FSR) for EBS. You can enable it for new and existing snapshots on a per-AZ (Availability Zone) basis, and then create new EBS volumes that deliver their maximum performance and do not need to be initialized.

This performance enhancement will allow you to build AWS-based systems that are even faster and more responsive than before. Faster boot times will speed up your VDI environments and allow your Auto Scaling Groups to come online and start processing traffic more quickly, even if you use large and/or custom AMIs. I am sure that you will soon dream up new applications that can take advantage of this new level of speed and predictability.

Fast Snapshot Restore can be enabled on a snapshot even while the snapshot is being created. If you create nightly backup snapshots, enabling them for FSR will allow you to do fast restores the following day regardless of the size of the volume or the snapshot.

Enabling & Using Fast Snapshot Restore
I can get started in minutes! I open the EC2 Console and find the first snapshot that I want to set up for fast restore:

I select the snapshot and choose Manage Fast Snapshot Restore from the Actions menu:

Then I select the Availability Zones where I plan to create EBS volumes, and click Save:

After the settings are saved, I receive a confirmation:

The console shows me that my snapshot is being enabled for Fast Snapshot Restore:

The status progresses from enabling to optimizing, and then to enabled. Behind the scenes and with no extra effort on my part, the optimization process provisions extra resources to deliver the fast restores, proceeding at a rate of one TiB per hour. By contrast, non-optimized volumes retrieve data directly from the S3-stored snapshot on an incremental, on-demand basis.

Once the optimization is complete, I can create volumes from the snapshot in the usual way, confident that they will be ready in seconds and pre-initialized for full performance! Each FSR-enabled snapshot supports creation of up to 10 initialized volumes per hour per Availability Zone; additional volume creations will be non-initialized. As my needs change, I can enable Fast Snapshot Restore in additional Availability Zones and I can disable it in Zones where I had previously enabled it.

When Fast Snapshot Restore is enabled for a snapshot in a particular Availability Zone, a bucket-based credit system governs the acceleration process. Creating a volume consumes a credit; the credits refill over time, and the maximum number of credits is a function of the FSR-enabled snapshot size. Here are some guidelines:

  • A 100 GiB FSR-enabled snapshot will have a maximum credit balance of 10, and a fill rate of 10 credits per hour.
  • A 4 TiB FSR-enabled snapshot will have a maximum credit balance of 1, and a fill rate of 1 credit every 4 hours.

In other words, you can do 1 TiB of restores per hour for a given FSR-enabled snapshot within an AZ.

Things to Know
Here are some things to know about Fast Snapshot Restore:

Regions & AZs – Fast Snapshot Restore is available in all Availability Zones of the US East (N. Virginia), US West (Oregon), US West (N. California), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.

Pricing – You pay $0.75 for each hour that Fast Snapshot Restore is enabled for a snapshot in a particular Availability Zone, pro-rated and with a minimum of one hour.

Monitoring – You can use the following per-minute CloudWatch metrics to track the state of the credit bucket for each FSR-enabled snapshot:

  • FastSnapshotRestoreCreditsBalance – The number of volume creation credits that are available.
  • FastSnapshotRestoreCreditsBucketSize – The maximum number of volume creation credits that can be accumulated.

CLI & Programmatic Access – You can use the enable-fast-snapshot-restores, describe-fast-snapshot-restores, and disable-fast-snapshot-restores commands to create and manage your accelerated snapshots from the command line. You can also use the EnableFastSnapshotRestores, DescribeFastSnapshotRestores, and DisableFastSnapshotRestores API functions from your application code.

CloudWatch Events – You can use the EBS Fast Snapshot Restore State-change Notification event type to invoke Lambda functions or other targets when the state of a snapshot/AZ pair changes. Events are emitted on successful and unsuccessful transitions to the enabling, optimizing, enabled, disabling, and disabled states.

Data Lifecycle Manager – You can enable FSR on snapshots created by your DLM lifecycle policies, specify AZs, and specify the number of snapshots to be FSR-enabled. You can use an existing CloudFormation template to integrate FSR into your DLM policies (read about the AWS::DLM::LifecyclePolicy to learn more).

In the Works
We are launching with support for snapshots that you own. Over time, we intend to expand coverage and allow you to enable Fast Snapshot Restore for snapshots that you have been granted access to.

Available Now
Fast Snapshot Restore is available now and you can start using it today!

Jeff;

 

S3 Replication Update: Replication SLA, Metrics, and Events

This post was originally published on this site

S3 Cross-Region Replication has been around since early 2015 (new Cross-Region Replication for Amazon S3), and Same-Region Replication has been around for a couple of months.

Replication is very easy to set up, and lets you use rules to specify that you want to copy objects from one S3 bucket to another one. The rules can specify replication of the entire bucket, or of a subset based on prefix or tag:

You can use replication to copy critical data within or between AWS regions in order to meet regulatory requirements for geographic redundancy as part of a disaster recover plan, or for other operational reasons. You can copy within a region to aggregate logs, set up test & development environments, and to address compliance requirements.

S3’s replication features have been put to great use: Since the launch in 2015, our customers have replicated trillions of objects and exabytes of data! Today I am happy to be able to tell you that we are making it even more powerful, with the addition of Replication Time Control. This feature builds on the existing rule-driven replication and gives you fine-grained control based on tag or prefix so that you can use Replication Time Control with the data set you specify. Here’s what you get:

Replication SLA – You can now take advantage of a replication SLA to increase the predictability of replication time.

Replication Metrics – You can now monitor the maximum replication time for each rule using new CloudWatch metrics.

Replication Events – You can now use events to track any object replications that deviate from the SLA.

Let’s take a closer look!

New Replication SLA
S3 replicates your objects to the destination bucket, with timing influenced by object size & count, available bandwidth, other traffic to the buckets, and so forth. In situations where you need additional control over replication time, you can use our new Replication Time Control feature, which is designed to perform as follows:

  • Most of the objects will be replicated within seconds.
  • 99% of the objects will be replicated within 5 minutes.
  • 99.99% of the objects will be replicated within 15 minutes.

When you enable this feature, you benefit from the associated Service Level Agreement. The SLA is expressed is terms of a percentage of objects that are expected to be replicated within 15 minutes, and provides for billing credits if the SLA is not met:

  • 99.9% to 98.0% – 10% credit
  • 98.0% to 95.0% – 25% credit
  • 95% to 0% – 100% credit

The billing credit applies to a percentage of the Replication Time Control fee, replication data transfer, S3 requests, and S3 storage charges in the destination for the billing period.

I can enable Replication Time Control when I create a new replication rule, and I can also add it to an existing rule:

Replication begins as soon as I create or update the rule. I can use the Replication Metrics and the Replication Events to monitor compliance.

In addition to the existing charges for S3 requests and data transfer between regions, you will pay an extra per-GB charge to use Replication Time Control; see the S3 Pricing page for more information.

Replication Metrics
Each time I enable Replication Time Control for a rule, S3 starts to publish three new metrics to CloudWatch. They are available in the S3 and CloudWatch Consoles:

I created some large tar files, and uploaded them to my source bucket. I took a quick break, and inspected the metrics. Note that I did my testing before the launch, so don’t get overly concerned with the actual numbers. Also, keep in mind that these metrics are aggregated across the replication for display, and are not a precise indication of per-object SLA compliance.

BytesPendingReplication jumps up right after the upload, and then drops down as the replication takes place:

ReplicationLatency peaks and then quickly drops down to zero after S3 Replication transfers over 37 GB from the United States to Australia with a maximum latency of 8.3 minutes:

And OperationsPendingCount tracks the number of objects to be replicated:

I can also set CloudWatch Alarms on the metrics. For example, I might want to know if I have a replication backlog larger than 75 GB (for this to work as expected, I must set the Missing data treatment to Treat missing data as ignore (maintain the alarm state):

These metrics are billed as CloudWatch Custom Metrics.

Replication Events
Finally, you can track replication issues by setting up events on an SQS queue, SNS topic, or Lambda function. Start at the console’s Events section:

You can use these events to monitor adherence to the SLA. For example, you could store Replication time missed threshold and Replication time completed after threshold events in a database to track occasions where replication took longer than expected. The first event will tell you that the replication is running late, and the second will tell you that it has completed, and how late it was.

To learn more, read about Replication.

Available Now
You can start using these features today in all commercial AWS Regions, excluding the AWS China (Beijing) and AWS China (Ningxia) Regions.

Jeff;

PS – If you want to learn more about how S3 works, be sure to attend the re:Invent session: Beyond Eleven Nines: Lessons from the Amazon S3 Culture of Durability.

 

Amazon FSx For Windows File Server Update – Multi-AZ, & New Enterprise-Ready Features

This post was originally published on this site

Last year I told you about Amazon FSx for Windows File Server — Fast, Fully Managed, and Secure. That launch was well-received, and our customers (Neiman Marcus, Ancestry, Logicworks, and Qube Research & Technologies to name a few) are making great use of the service. They love the fact that they can access their shares from a wide variety of sources, and that they can use their existing Active Directory environment to authenticate users. They benefit from a native implementation with fast, SSD-powered performance, and no longer spend time attaching and formatting storage devices, updating Windows Server, or recovering from hardware failures.

Since the launch, we have continued to enhance Amazon FSx for Windows File Server, largely in response to customer requests. Some of the more significant enhancements include:

Self-Managed Directories – This launch gave you the ability to join your Amazon FSx file systems to on-premises or in-cloud self-managed Microsoft Active Directories. To learn how to get started with this feature, read Using Amazon FSx with Your Self-Managed Microsoft Active Directory.

Fine-Grained File Restoration – This launch (powered by Windows shadow copies) gave your users the ability to easily view and restore previous versions of their files. To learn how to configure and use this feature, read Working with Shadow Copies.

On-Premises Access – This launch gave you the power to access your file systems from on-premises using AWS Direct Connect or an AWS VPN connection. You can host user shares in the cloud for on-premises access, and you can also use it to support your backup and disaster recovery model. To learn more, read Accessing Amazon FSx for Windows File Server File Systems from On-Premises.

Remote Management CLI – This launch focused on a set of CLI commands (PowerShell Cmdlets) to manage your Amazon FSx for Windows File Server file systems. The commands support remote management and give you the ability to fully automate many types of setup, configuration, and backup workflows from a central location.

Enterprise-Ready Features
Today we are launching an extensive list of new features that are designed to address the top-priority requests from our enterprise customers.

Native Multi-AZ File Systems -You can now create file systems that span AWS Availability Zones (AZs). You no longer need to set up or manage replication across AZs; instead, you select the multi-AZ deployment option when you create your file system:

Then you select two subnets where your file system will reside:

This will create an active file server and a hot standby, each with their own storage, and synchronous replication across AZs to the standby. If the active file server fails, Amazon FSx will automatically fail over to the standby, so that you can maintain operations without losing any data. Failover typically takes less than 30 seconds. The DNS name remains unchanged, making replication and failover transparent, even during planned maintenance windows. This feature is available in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (London), Asia Pacific (Tokyo), Asia Pacific (Singapore), and Europe (Stockholm) Regions.

Support for SQL Server – Amazon FSx now supports the creation of Continuously Available (CA) file shares, which are optimized for use by Microsoft SQL Server. This allows you to store your active SQL Server data on a fully managed Windows file system in AWS.

Smaller Minimum Size – Single-AZ file systems can now be as small as 32 GiB (the previous minimum was 300 GiB).

Data Deduplication – You can optimize your storage by seeking out and eliminating low-level duplication of data, with the potential to reduce your storage costs. The actual space savings will depend on your use case, but you can expect it to be around 50% for typical workloads (read Microsoft’s Data Duplication Overview and Understanding Data Duplication to learn more).

Once enabled for a file system with Enable-FSxDedup, deduplication jobs are run on a default schedule that you can customize if desired. You can use the Get-FSxDedupStatus command to see some interesting stats about your file system:

To learn more, read Using Data Deduplication.

Programmatic File Share Configuration – You can now programmatically configure your file shares using PowerShell commands (this is part of the Remote Management CLI that I mentioned earlier). You can use these commands to automate your setup, migration, and synchronization workflows. The commands include:

  • New-FSxSmbShare – Create a new shared folder.
  • Grant-FSxSmbShareAccess – Add an access control entry (ACE) to an ACL.
  • Get-FSxSmbSession – Get information about active SMB sessions.
  • Get-FSxSmbOpenFile – Get information about files opened on SMB sessions.

To learn more, read Managing File Shares.

Enforcement of In-Transit Encryption – You can insist that connections to your file shares make use of in-transit SMB encryption:

PS> Set-FSxSmbServerConfiguration -RejectUnencryptedAccess $True

To learn more, read about Encryption of Data in Transit.

Quotas – You can now use quotas to monitor and control the amount of storage space consumed by each user. You can set up per-user quotas, monitor usage, track violations, and choose to deny further consumption to users who exceed their quotas:

PS> Enable-FSxUserQuotas Mode=Enforce
PS> Set-FSXUserQuota jbarr ...

To learn more, read about Managing User Quotas.

Available Now
Putting it all together, this laundry list of new enterprise-ready features and the power to create Multi-AZ file systems makes Amazon FSx for Windows File Server a great choice when you are moving your existing NAS (Network Attached Storage) to the AWS Cloud.

All of these features are available now and you can start using them today in all commercial AWS Regions where Amazon FSx for Windows File Server is available, unless otherwise noted above.

Jeff;

 

New – Using Step Functions to Orchestrate Amazon EMR workloads

This post was originally published on this site

AWS Step Functions allows you to add serverless workflow automation to your applications. The steps of your workflow can run anywhere, including in AWS Lambda functions, on Amazon Elastic Compute Cloud (EC2), or on-premises. To simplify building workflows, Step Functions is directly integrated with multiple AWS Services: Amazon ECS, AWS Fargate, Amazon DynamoDB, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), AWS Batch, AWS Glue, Amazon SageMaker, and (to run nested workflows) with Step Functions itself.

Starting today, Step Functions connects to Amazon EMR, enabling you to create data processing and analysis workflows with minimal code, saving time, and optimizing cluster utilization. For example, building data processing pipelines for machine learning is time consuming and hard. With this new integration, you have a simple way to orchestrate workflow capabilities, including parallel executions and dependencies from the result of a previous step, and handle failures and exceptions when running data processing jobs.

Specifically, a Step Functions state machine can now:

  • Create or terminate an EMR cluster, including the possibility to change the cluster termination protection. In this way, you can reuse an existing EMR cluster for your workflow, or create one on-demand during execution of a workflow.
  • Add or cancel an EMR step for your cluster. Each EMR step is a unit of work that contains instructions to manipulate data for processing by software installed on the cluster, including tools such as Apache Spark, Hive, or Presto.
  • Modify the size of an EMR cluster instance fleet or group, allowing you to manage scaling programmatically depending on the requirements of each step of your workflow. For example, you may increase the size of an instance group before adding a compute-intensive step, and reduce the size just after it has completed.

When you create or terminate a cluster or add an EMR step to a cluster, you can use synchronous integrations to move to the next step of your workflow only when the corresponding activity has completed on the EMR cluster.

Reading the configuration or the state of your EMR clusters is not part of the Step Functions service integration. In case you need that, the EMR List* and Describe* APIs can be accessed using Lambda functions as tasks.

Building a Workflow with EMR and Step Functions
On the Step Functions console, I create a new state machine. The console renders it visually, so that is much easier to understand:

To create the state machine, I use the following definition using the Amazon States Language (ASL):

{
  "StartAt": "Should_Create_Cluster",
  "States": {
    "Should_Create_Cluster": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.CreateCluster",
          "BooleanEquals": true,
          "Next": "Create_A_Cluster"
        },
        {
          "Variable": "$.CreateCluster",
          "BooleanEquals": false,
          "Next": "Enable_Termination_Protection"
        }
      ],
      "Default": "Create_A_Cluster"
    },
    "Create_A_Cluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
      "Parameters": {
        "Name": "WorkflowCluster",
        "VisibleToAllUsers": true,
        "ReleaseLabel": "emr-5.28.0",
        "Applications": [{ "Name": "Hive" }],
        "ServiceRole": "EMR_DefaultRole",
        "JobFlowRole": "EMR_EC2_DefaultRole",
        "LogUri": "s3://aws-logs-123412341234-eu-west-1/elasticmapreduce/",
        "Instances": {
          "KeepJobFlowAliveWhenNoSteps": true,
          "InstanceFleets": [
            {
              "InstanceFleetType": "MASTER",
              "TargetOnDemandCapacity": 1,
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "m4.xlarge"
                }
              ]
            },
            {
              "InstanceFleetType": "CORE",
              "TargetOnDemandCapacity": 1,
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "m4.xlarge"
                }
              ]
            }
          ]
        }
      },
      "ResultPath": "$.CreateClusterResult",
      "Next": "Merge_Results"
    },
    "Merge_Results": {
      "Type": "Pass",
      "Parameters": {
        "CreateCluster.$": "$.CreateCluster",
        "TerminateCluster.$": "$.TerminateCluster",
        "ClusterId.$": "$.CreateClusterResult.ClusterId"
      },
      "Next": "Enable_Termination_Protection"
    },
    "Enable_Termination_Protection": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:setClusterTerminationProtection",
      "Parameters": {
        "ClusterId.$": "$.ClusterId",
        "TerminationProtected": true
      },
      "ResultPath": null,
      "Next": "Add_Steps_Parallel"
    },
    "Add_Steps_Parallel": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "Step_One",
          "States": {
            "Step_One": {
              "Type": "Task",
              "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
              "Parameters": {
                "ClusterId.$": "$.ClusterId",
                "Step": {
                  "Name": "The first step",
                  "ActionOnFailure": "CONTINUE",
                  "HadoopJarStep": {
                    "Jar": "command-runner.jar",
                    "Args": [
                      "hive-script",
                      "--run-hive-script",
                      "--args",
                      "-f",
                      "s3://eu-west-1.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q",
                      "-d",
                      "INPUT=s3://eu-west-1.elasticmapreduce.samples",
                      "-d",
                      "OUTPUT=s3://MY-BUCKET/MyHiveQueryResults/"
                    ]
                  }
                }
              },
              "End": true
            }
          }
        },
        {
          "StartAt": "Wait_10_Seconds",
          "States": {
            "Wait_10_Seconds": {
              "Type": "Wait",
              "Seconds": 10,
              "Next": "Step_Two (async)"
            },
            "Step_Two (async)": {
              "Type": "Task",
              "Resource": "arn:aws:states:::elasticmapreduce:addStep",
              "Parameters": {
                "ClusterId.$": "$.ClusterId",
                "Step": {
                  "Name": "The second step",
                  "ActionOnFailure": "CONTINUE",
                  "HadoopJarStep": {
                    "Jar": "command-runner.jar",
                    "Args": [
                      "hive-script",
                      "--run-hive-script",
                      "--args",
                      "-f",
                      "s3://eu-west-1.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q",
                      "-d",
                      "INPUT=s3://eu-west-1.elasticmapreduce.samples",
                      "-d",
                      "OUTPUT=s3://MY-BUCKET/MyHiveQueryResults/"
                    ]
                  }
                }
              },
              "ResultPath": "$.AddStepsResult",
              "Next": "Wait_Another_10_Seconds"
            },
            "Wait_Another_10_Seconds": {
              "Type": "Wait",
              "Seconds": 10,
              "Next": "Cancel_Step_Two"
            },
            "Cancel_Step_Two": {
              "Type": "Task",
              "Resource": "arn:aws:states:::elasticmapreduce:cancelStep",
              "Parameters": {
                "ClusterId.$": "$.ClusterId",
                "StepId.$": "$.AddStepsResult.StepId"
              },
              "End": true
            }
          }
        }
      ],
      "ResultPath": null,
      "Next": "Step_Three"
    },
    "Step_Three": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.ClusterId",
        "Step": {
          "Name": "The third step",
          "ActionOnFailure": "CONTINUE",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args": [
              "hive-script",
              "--run-hive-script",
              "--args",
              "-f",
              "s3://eu-west-1.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q",
              "-d",
              "INPUT=s3://eu-west-1.elasticmapreduce.samples",
              "-d",
              "OUTPUT=s3://MY-BUCKET/MyHiveQueryResults/"
            ]
          }
        }
      },
      "ResultPath": null,
      "Next": "Disable_Termination_Protection"
    },
    "Disable_Termination_Protection": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:setClusterTerminationProtection",
      "Parameters": {
        "ClusterId.$": "$.ClusterId",
        "TerminationProtected": false
      },
      "ResultPath": null,
      "Next": "Should_Terminate_Cluster"
    },
    "Should_Terminate_Cluster": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.TerminateCluster",
          "BooleanEquals": true,
          "Next": "Terminate_Cluster"
        },
        {
          "Variable": "$.TerminateCluster",
          "BooleanEquals": false,
          "Next": "Wrapping_Up"
        }
      ],
      "Default": "Wrapping_Up"
    },
    "Terminate_Cluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:terminateCluster.sync",
      "Parameters": {
        "ClusterId.$": "$.ClusterId"
      },
      "Next": "Wrapping_Up"
    },
    "Wrapping_Up": {
      "Type": "Pass",
      "End": true
    }
  }
}

I let the Step Functions console create a new AWS Identity and Access Management (IAM) role for the executions of this state machine. The role automatically includes all permissions required to access EMR.

This state machine can either use an existing EMR cluster, or create a new one. I can use the following input to create a new cluster that is terminated at the end of the workflow:

{
"CreateCluster": true,
"TerminateCluster": true
}

To use an existing cluster, I need to provide input in the cluster ID, using this syntax:

{
"CreateCluster": false,
"TerminateCluster": false,
"ClusterId": "j-..."
}

Let’s see how that works. As the workflow starts, the Should_Create_Cluster Choice state looks into the input to decide if it should enter the Create_A_Cluster state or not. There, I use a synchronous call (elasticmapreduce:createCluster.sync) to wait for the new EMR cluster to reach the WAITING state before progressing to the next workflow state. The AWS Step Functions console shows the resource that is being created with a link to the EMR console:

After that, the Merge_Results Pass state merges the input state with the cluster ID of the newly created cluster to pass it to the next step in the workflow.

Before starting to process any data, I use the Enable_Termination_Protection state (elasticmapreduce:setClusterTerminationProtection) to help ensure that the EC2 instances in my EMR cluster are not shut down by an accident or error.

Now I am ready to do something with the EMR cluster. I have three EMR steps in the workflow. For the sake of simplicity, these steps are all based on this Hive tutorial. For each step, I use Hive’s SQL-like interface to run a query on some sample CloudFront logs and write the results to Amazon Simple Storage Service (S3). In a production use case, you’d probably have a combination of EMR tools processing and analyzing your data in parallel (two or more steps running at the same time) or with some dependencies (the output of one step is required by another step). Let’s try to do something similar.

First I execute Step_One and Step_Two inside a Parallel state:

  • Step_One is running the EMR step synchronously as a job (elasticmapreduce:addStep.sync). That means that the execution waits for the EMR step to be completed (or cancelled) before moving on to the next step in the workflow. You can optionally add a timeout to monitor that the execution of the EMR step happens within an expected time frame.
  • Step_Two is adding an EMR step asynchronously (elasticmapreduce:addStep). In this case, the workflow moves to the next step as soon as EMR replies that the request has been received. After a few seconds, to try another integration, I cancel Step_Two (elasticmapreduce:cancelStep). This integration can be really useful in production use cases. For example, you can cancel an EMR step if you get an error from another step running in parallel that would make it useless to continue with the execution of this step.

After those two steps have both completed and produce their results, I execute Step_Three as a job, similarly to what I did for Step_One. When Step_Three has completed, I enter the Disable_Termination_Protection step, because I am done using the cluster for this workflow.

Depending on the input state, the Should_Terminate_Cluster Choice state is going to enter the Terminate_Cluster state (elasticmapreduce:terminateCluster.sync) and wait for the EMR cluster to terminate, or go straight to the Wrapping_Up state and leave the cluster running.

Finally I have a state for Wrapping_Up. I am not doing much in this final state actually, but you can’t end a workflow from a Choice state.

In the EMR console I see the status of my cluster and of the EMR steps:

Using the AWS Command Line Interface (CLI), I find the results of my query in the S3 bucket configured as output for the EMR steps:

aws s3 ls s3://MY-BUCKET/MyHiveQueryResults/
...

Based on my input, the EMR cluster is still running at the end of this workflow execution. I follow the resource link in the Create_A_Cluster step to go to the EMR console and terminate it. In case you are following along with this demo, be careful to not leave your EMR cluster running if you don’t need it.

Available Now
Step Functions integration with EMR is available in all regions. There is no additional cost for using this feature on top of the usual Step Functions and EMR pricing.

You can now use Step Functions to quickly build complex workflows for executing EMR jobs. A workflow can include parallel executions, dependencies, and exception handling. Step Functions makes it easy to retry failed jobs and terminate workflows after critical errors, because you can specify what happens when something goes wrong. Let me know what are you going to use this feature for!

Danilo

AWS Systems Manager Explorer – A Multi-Account, Multi-Region Operations Dashboard

This post was originally published on this site

Since 2006, Amazon Web Services has been striving to simplify IT infrastructure. Thanks to services like Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), Amazon Relational Database Service (RDS), AWS CloudFormation and many more, millions of customers can build reliable, scalable, and secure platforms in any AWS region in minutes. Having spent 10 years procuring, deploying and managing more hardware than I care to remember, I’m still amazed every day by the pace of innovation that builders achieve with our services.

With great power come great responsibility. The second you create AWS resources, you’re responsible for them: security of course, but also cost and scaling. This makes monitoring and alerting all the more important, which is why we built services like Amazon CloudWatch, AWS Config and AWS Systems Manager.

Still, customers told us that their operations work would be much simpler if they could just look at a single dashboard, listing potential issues on AWS resources no matter which ones of their accounts or which region they’ve been created in.

We got to work, and today we’re very happy to announce the availability of AWS Systems Manager Explorer, a unified operations dashboard built as part of Systems Manager.

Introducing AWS Systems Manager Explorer
Collecting monitoring information and alerts from EC2, Config, CloudWatch and Systems Manager, Explorer presents you with an intuitive graphical dashboard that lets you quickly view and browse problems affecting your AWS resources. By default, this data comes from the account and region you’re running in, and you can easily include other regions as well as other accounts managed with AWS Organizations.

Specifically, Explorer can provide operational information about:

  • EC2 issues, such as unhealthy instances,
  • EC2 instances that have a non-compliant patching status,
  • AWS resources that don’t comply with Config rules (predefined or your own),
  • AWS resources that have triggered a CloudWatch Events rule (predefined or your own).

Each issue is stored as an OpsItem in AWS Systems Manager OpsCenter, and is assigned a status (open, in progress, resolved), a severity and a category (security, performance, cost, etc.). Widgets let you quickly browse OpsItems, and a timeline of all OpsItems is also available.

In addition to OpsItems, the Explorer dashboard also includes widgets that show consolidated information on EC2 instances:

  • Instance count, with a tag filter,
  • Instances managed by Systems Manager, as well as unmanaged instances,
  • Instances sorted by AMI id.

As you would expect, all information can be exported to S3 for archival or further processing, and you can also set up Amazon Simple Notification Service (SNS) notifications. Last but not least, all data visible on the dashboard can be accessed from the AWS CLI or any AWS SDK with the GetOpsSummary API.

Let’s take a quick tour.

A Look at AWS Systems Manager Explorer
Before using Explorer, we recommend that you first set up Config and Systems Manager. This will help populate your Explorer dashboard immediately. No setup is required for CloudWatch events.

Setting up Config is a best practice, and the procedure is extremely simple: don’t forget to enable EC2 rules too. Setting up Systems Manager is equally simple, thanks to the quick setup procedure: add managed instances and check for patch compliance in just a few clicks! Don’t forget to do this in all regions and accounts you want Explorer to manage.

If you set these services up later, you’ll have to wait for a little while for data to be retrieved and displayed.

Now, let’s head out to the AWS console for Explorer.

Once I’ve completed the one-click setup page creating a service role and enabling data sources, a quick look at the CloudWatch Events console confirms that rules have been created automatically.

Explorer recommends that I add regions and accounts in order to get a unified view. Of course, you can skip this step if you just want a quick taste of the service.

If you’re keen on synchronizing data, you can easily create a resource data sync, which will fetch operations data coming from other regions and other accounts. I’m going all in here, but please make sure you tick the boxes that work for you.

Once data has been retrieved and processed, my dashboard lights up. Good thing it’s only a test account!

I can also see information on all EC2 instances.

From here on, I can group OpsItems and instances according to different dimensions (accounts, regions, tags). I can also drill down on OpsItems, view their details in Opscenter, and apply runbooks to fix them. If you want to know more about Opscenter, here’s the launch post.

Now Available!
We believe AWS Systems Manager Explorer will help Operations teams find and solve problems easier and faster, no matter what the scale of their AWS infrastructure is.

This feature is available today in all regions where AWS Systems Manager OpsCenter is available. Give it a try, and please share your feedback in the AWS forum for AWS Systems Manager, or with your usual AWS support contacts.

Julien