Tag Archives: AWS

NoSQL Workbench for Amazon DynamoDB – Available in Preview

This post was originally published on this site

I am always impressed by the flexibility of Amazon DynamoDB, providing our customers a fully-managed key-value and document database that can easily scale from a few requests per month to millions of requests per second.

The DynamoDB team released so many great features recently, from on-demand capacity, to support for native ACID transactions. Here’s a great recap of other recent DynamoDB announcements such as global tables, point-in-time recovery, and instant adaptive capacity. DynamoDB now encrypts all customer data at rest by default.

However, switching mindset from a relational database to NoSQL is not that easy. Last year we had two amazing talks at re:Invent that can help you understand how DynamoDB works, and how you can use it for your use cases:

To help you even further, we are introducing today in preview NoSQL Workbench for Amazon DynamoDB, a free, client-side application available for Windows and macOS to help you design and visualize your data model, run queries on your data, and generate the code for your application!

The three main capabilities provided by the NoSQL Workbench are:

  • Data modeler — to build new data models, adding tables and indexes, or to import, modify, and export existing data models.
  • Visualizer — to visualize data models based on their applications access patterns, with sample data that you can add manually or import via a SQL query.
  • Operation builder — to define and execute data-plane operations or generate ready-to-use sample code for them.

To see how this new tool can simplify working with DynamoDB, let’s build an application to retrieve information on customers and their orders.

Using the NoSQL Workbench
In the Data modeler, I start by creating a CustomerOrders data model, and I add a table, CustomerAndOrders, to hold my customer data and the information on their orders. You can use this tool to create a simple data model where customers and orders are in two distinct tables, each one with their own primary keys. There would be nothing wrong with that. Here I’d like to show how this tool can also help you use more advanced design patterns. By having the customer and order data in a single table, I can construct queries that return all the data I need with a single interaction with DynamoDB, speeding up the performance of my application.

As partition key, I use the customerId. This choice provides an even distribution of data across multiple partitions. The sort key in my data model will be an overloaded attribute, in the sense that it can hold different data depending on the item:

  • A fixed string, for example customer, for the items containing the customer data.
  • The order date, written using ISO 8601 strings such as 20190823, for the items containing orders.

By overloading the sort key with these two possible values, I am able to run a single query that returns the customer data and the most recent orders. For this reason, I use a generic name for the sort key. In this case, I use sk.

Apart from the partition key and the optional sort key, DynamoDB has a flexible schema, and the other attributes can be different for each item in a table. However, with this tool I have the option to describe in the data model all the possible attributes I am going to use for a table. In this way, I can check later that all the access patterns I need for my application work well with this data model.

For this table, I add the following attributes:

  • customerName and customerAddress, for the items in the table containing customer data.
  • orderId and deliveryAddress, for the items in the table containing order data.

I am not adding a orderDate attribute, because for this data model the value will be stored in the sk sort key. For a real production use case, you would probably have much more attributes to describe your customers and orders, but I am trying to keep things simple enough here to show what you can do, without getting lost in details.

Another access pattern for my application is to be able to get a specific order by ID. For that, I add a global secondary index to my table, with orderId as partition key and no sort key.

I add the table definition to the data model, and move on to the Visualizer. There, I update the table by adding some sample data. I add data manually, but I could import a few rows from a table in a MySQL database, for example to simplify a NoSQL migration from a relational database.

Now, I visualize my data model with the sample data to have a better understanding of what to expect from this table. For example, if I select a customerId, and I query for all the orders greater than a specific date, I also get the customer data at the end, because the string customer, stored in the sk sort key, is always greater that any date written in ISO 8601 syntax.

In the Visualizer, I can also see how the global secondary index on the orderId works. Interestingly, items without an orderId are not part of this index, so I get only 4 of the 6 items that are part of my sample data. This happens because DynamoDB writes a corresponding index entry only if the index sort key value is present in the item. If the sort key doesn’t appear in every table item, the index is said to be sparseSparse indexes are useful for queries over a subsection of a table.

I now commit my data model to DynamoDB. This step creates server-side resources such as tables and global secondary indexes for the selected data model, and loads the sample data. To do so, I need AWS credentials for an AWS account. I have the AWS Command Line Interface (CLI) installed and configured in the environment where I am using this tool, so I can just select one of my named profiles.

I move to the Operation builder, where I see all the tables in the selected AWS Region. I select the newly created CustomerAndOrders table to browse the data and build the code for the operations I need in my application.

In this case, I want to run a query that, for a specific customer, selects all orders more recent that a date I provide. As we saw previously, the overloaded sort key would also return the customer data as last item. The Operation builder can help you use the full syntax of DynamoDB operations, for example adding conditions and child expressions. In this case, I add the condition to only return orders where the deliveryAddress contains Seattle.

I have the option to execute the operation on the DynamoDB table, but this time I want to use the query in my application. To generate the code, I select between Python, JavaScript (Node.js), or Java.

You can use the Operation builder to generate the code for all the access patterns that you plan to use with your application, using all the advanced features that DynamoDB provides, including ACID transactions.

Available Now
You can find how to set up NoSQL Workbench for Amazon DynamoDB (Preview) for Windows and macOS here.

We welcome your suggestions in the DynamoDB discussion forum. Let us know what you build with this new tool and how we can help you more!

Learn From Your VPC Flow Logs With Additional Meta-Data

This post was originally published on this site

Flow Logs for Amazon Virtual Private Cloud enables you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow Logs data can be published to Amazon CloudWatch Logs or Amazon Simple Storage Service (S3).

Since we launched VPC Flow Logs in 2015, you have been using it for variety of use-cases like troubleshooting connectivity issues across your VPCs, intrusion detection, anomaly detection, or archival for compliance purposes. Until today, VPC Flow Logs provided information that included source IP, source port, destination IP, destination port, action (accept, reject) and status. Once enabled, a VPC Flow Log entry looks like the one below.

While this information was sufficient to understand most flows, it required additional computation and lookup to match IP addresses to instance IDs or to guess the directionality of the flow to come to meaningful conclusions.

Today we are announcing the availability of additional meta data to include in your Flow Logs records to better understand network flows. The enriched Flow Logs will allow you to simplify your scripts or remove the need for postprocessing altogether, by reducing the number of computations or lookups required to extract meaningful information from the log data.

When you create a new VPC Flow Log, in addition to existing fields, you can now choose to add the following meta-data:

  • vpc-id : the ID of the VPC containing the source Elastic Network Interface (ENI).
  • subnet-id : the ID of the subnet containing the source ENI.
  • instance-id : the Amazon Elastic Compute Cloud (EC2) instance ID of the instance associated with the source interface. When the ENI is placed by AWS services (for example, AWS PrivateLink, NAT Gateway, Network Load Balancer etc) this field will be “-
  • tcp-flags : the bitmask for TCP Flags observed within the aggregation period. For example, FIN is 0x01 (1), SYN is 0x02 (2), ACK is 0x10 (16), SYN + ACK is 0x12 (18), etc. (the bits are specified in “Control Bits” section of RFC793 “Transmission Control Protocol Specification”).
    This allows to understand who initiated or terminated the connection. TCP uses a three way handshake to establish a connection. The connecting machine sends a SYN packet to the destination, the destination replies with a SYN + ACK and, finally, the connecting machine sends an ACK. In the Flow Logs, the handshake is shown as two lines, with tcp-flags values of 2 (SYN), 18 (SYN + ACK).  ACK is reported only when it is accompanied with SYN (otherwise it would be too much noise for you to filter out).
  • type : the type of traffic : IPV4, IPV6 or Elastic Fabric Adapter.
  • pkt-srcaddr : the packet-level IP address of the source. You typically use this field in conjunction with srcaddr to distinguish between the IP address of an intermediate layer through which traffic flows, such as a NAT gateway.
  • pkt-dstaddr : the packet-level destination IP address, similar to the previous one, but for destination IP addresses.

To create a VPC Flow Log, you can use the AWS Management Console, the AWS Command Line Interface (CLI) or the CreateFlowLogs API and select which additional information and the order you want to consume the fields, for example:

Or using the AWS Command Line Interface (CLI) as below:

$ aws ec2 create-flow-logs --resource-type VPC 
                            --region eu-west-1 
                            --resource-ids vpc-12345678 
                            --traffic-type ALL  
                            --log-destination-type s3 
                            --log-destination arn:aws:s3:::sst-vpc-demo 
                            --log-format '${version} ${vpc-id} ${subnet-id} ${instance-id} ${interface-id} ${account-id} ${type} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${pkt-srcaddr} ${pkt-dstaddr} ${protocol} ${bytes} ${packets} ${start} ${end} ${action} ${tcp-flags} ${log-status}'

# be sure to replace the bucket name and VPC ID !

{
    "ClientToken": "1A....HoP=",
    "FlowLogIds": [
        "fl-12345678123456789"
    ],
    "Unsuccessful": [] 
}

Enriched VPC Flow Logs are delivered to S3. We will automatically add the required S3 Bucket Policy to authorize VPC Flow Logs to write to your S3 bucket. VPC Flow Logs does not capture real-time log streams for your network interface, it might take several minutes to begin collecting and publishing data to the chosen destinations. Your logs will eventually be available on S3 at s3://<bucket name>/AWSLogs/<account id>/vpcflowlogs/<region>/<year>/<month>/<day>/

An SSH connection from my laptop with IP address 90.90.0.200 to an EC2 instance would appear like this :

3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 172.31.22.145 90.90.0.200 22 62897 172.31.22.145 90.90.0.200 6 5225 24 1566328660 1566328672 ACCEPT 18 OK
3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 90.90.0.200 172.31.22.145 62897 22 90.90.0.200 172.31.22.145 6 4877 29 1566328660 1566328672 ACCEPT 2 OK

172.31.22.145 is the private IP address of the EC2 instance, the one you see when you type ifconfig on the instance.  All flags are “OR”ed during aggregation period. When connection is short, probably both SYN and FIN (3), as well as SYN+ACK and FIN (19) will be set for the same lines.

Once a Flow Log is created, you can not add additional fields or modify the structure of the log to ensure you will not accidently break scripts consuming this data. Any modification will require you to delete and recreate the VPC Flow Logs. There is no additional cost to capture the extra information in the VPC Flow Logs, normal VPC Flow Log pricing applies, remember that Enriched VPC Flow Log records might consume more storage when selecting all fields.  We do recommend to select only the fields relevant to your use-cases.

Enriched VPC Flow Logs is available in all regions where VPC Flow logs is available, you can start to use it today.

— seb

PS: I heard from the team they are working on adding additional meta-data to the logs, stay tuned for updates.

Now Available – Amazon Quantum Ledger Database (QLDB)

This post was originally published on this site

Given the wide range of data types, query models, indexing options, scaling expectations, and performance requirements, databases are definitely not one size fits all products. That’s why there are many different AWS database offerings, each one purpose-built to meet the needs of a different type of application.

Introducing QLDB
Today I would like to tell you about Amazon QLDB, the newest member of the AWS database family. First announced at AWS re:Invent 2018 and made available in preview form, it is now available in production form in five AWS regions.

As a ledger database, QLDB is designed to provide an authoritative data source (often known as a system of record) for stored data. It maintains a complete, immutable history of all committed changes to the data that cannot be updated, altered, or deleted. QLDB supports PartiQL SQL queries to the historical data, and also provides an API that allows you to cryptographically verify that the history is accurate and legitimate. These features make QLDB a great fit for banking & finance, ecommerce, transportation & logistics, HR & payroll, manufacturing, and government applications and many other use cases that need to maintain the integrity and history of stored data.

Important QLDB Concepts
Let’s review the most important QLDB concepts before diving in:

Ledger – A QLDB ledger consists of a set of QLDB tables and a journal that maintains the complete, immutable history of changes to the tables. Ledgers are named and can be tagged.

Journal – A journal consists of a sequence of blocks, each cryptographically chained to the previous block so that changes can be verified. Blocks, in turn, contain the actual changes that were made to the tables, indexed for efficient retrieval. This append-only model ensures that previous data cannot be edited or deleted, and makes the ledgers immutable. QLDB allows you to export all or part of a journal to S3.

Table – Tables exist within a ledger, and contain a collection of document revisions. Tables support optional indexes on document fields; the indexes can improve performance for queries that make use of the equality (=) predicate.

Documents – Documents exist within tables, and must be in Amazon Ion form. Ion is a superset of JSON that adds additional data types, type annotations, and comments. QLDB supports documents that contain nested JSON elements, and gives you the ability to write queries that reference and include these elements. Documents need not conform to any particular schema, giving you the flexibility to build applications that can easily adapt to changes.

PartiQLPartiQL is a new open standard query language that supports SQL-compatible access to relational, semi-structured, and nested data while remaining independent of any particular data source. To learn more, read Announcing PartiQL: One Query Languge for All Your Data.

Serverless – You don’t have to worry about provisioning capacity or configuring read & write throughput. You create a ledger, define your tables, and QLDB will automatically scale to meet the needs of your application.

Using QLDB
You can create QLDB ledgers and tables from the AWS Management Console, AWS Command Line Interface (CLI), a CloudFormation template, or by making calls to the QLDB API. I’ll use the QLDB Console and I will follow the steps in Getting Started with Amazon QLDB. I open the console and click Start tutorial to get started:

The Getting Started page outlines the first three steps; I click Create ledger to proceed (this opens in a fresh browser tab):

I enter a name for my ledger (vehicle-registration), tag it, and (again) click Create ledger to proceed:

My ledger starts out in Creating status, and transitions to Active within a minute or two:

I return to the Getting Started page, refresh the list of ledgers, choose my new ledger, and click Load sample data:

This takes a second or so, and creates four tables & six indexes:

I could also use PartiQL statements such as CREATE TABLE, CREATE INDEX, and INSERT INTO to accomplish the same task.

With my tables, indexes, and sample data loaded, I click on Editor and run my first query (a single-table SELECT):

This returns a single row, and also benefits from the index on the VIN field. I can also run a more complex query that joins two tables:

I can obtain the ID of a document (using a query from here), and then update the document:

I can query the modification history of a table or a specific document in a table, with the ability to find modifications within a certain range and on a particular document (read Querying Revision History to learn more). Here’s a simple query that returns the history of modifications to all of the documents in the VehicleRegistration table that were made on the day that I wrote this post:

As you can see, each row is a structured JSON object. I can select any desired rows and click View JSON for further inspection:

Earlier, I mentioned that PartiQL can deal with nested data. The VehicleRegistration table contains ownership information that looks like this:

{
   "Owners":{
      "PrimaryOwner":{
         "PersonId":"6bs0SQs1QFx7qN1gL2SE5G"
      },
      "SecondaryOwners":[

      ]
  }

PartiQL lets me reference the nested data using “.” notation:

I can also verify the integrity of a document that is stored within my ledger’s journal. This is fully described in Verify a Document in a Ledger, and is a great example of the power (and value) of cryptographic verification. Each QLDB ledger has an associated digest. The digest is a 256-bit hash value that uniquely represents the ledger’s entire history of document revisions as of a point in time. To access the digest, I select a ledger and click Get digest:

When I click Save, the console provides me with a short file that contains all of the information needed to verify the ledger. I save this file in a safe place, for use when I want to verify a document in the ledger. When that time comes, I get the file, click on Verification in the left-navigation, and enter the values needed to perform the verification. This includes the block address of a document revision, and the ID of the document. I also choose the digest that I saved earlier, and click Verify:

QLDB recomputes the hashes to ensure that the document has not been surreptitiously changed, and displays the verification:

In a production environment, you would use the QLDB APIs to periodically download digests and to verify the integrity of your documents.

Building Applications with QLDB
You can use the Amazon QLDB Driver for Java to write code that accesses and manipulates your ledger database. This is a Java driver that allows you to create sessions, execute PartiQL commands within the scope of a transaction, and retrieve results. Drivers for other languages are in the works; stay tuned for more information.

Available Now
Amazon QLDB is available now in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo) Regions. Pricing is based on the following factors, and is detailed on the Amazon QLDB Pricing page, including some real-world examples:

  • Write operations
  • Read operations
  • Journal storage
  • Indexed storage
  • Data transfer

Jeff;

Introducing the Newest AWS Heroes – September 2019

This post was originally published on this site

Leaders within the AWS technical community educate others about the latest AWS services in a variety of ways: some share knowledge in person by speaking at events or running workshops and Meetups; while others prefer to share their insights online via social media, blogs, or open source contributions.

The most prominent AWS community leaders worldwide are recognized as AWS Heroes, and today we are excited to introduce to you the latest members of the AWS Hero program:

Alex Schultz – Fort Wayne, USA

Machine Learning Hero Alex Schultz works in the Innovation Labs at Advanced Solutions where he develops machine learning enabled products and solutions for the biomedical and product distribution industries. After receiving a DeepLens at re:Invent 2017, he dove headfirst into machine learning where he used the device to win the AWS DeepLens challenge by building a project which can read books to children. As an active advocate for AWS, he runs the Fort Wayne AWS User Group and loves to share his knowledge and experience with other developers. He also regularly contributes to the online DeepRacer community where he has helped many people who are new to machine learning get started.

 

 

 

 

 

 

 

Chase Douglas – Portland, USA

Serverless Hero Chase Douglas is the CTO and co-founder at Stackery, where he steers engineering and technical architecture of development tools that enable individuals and teams of developers to successfully build and manage serverless applications. He is a deeply experienced software architect and a long-time engineering leader focused on building products that increase development efficiency while delighting users. Chase is also a frequent conference speaker on the topics of serverless, instrumentation, and software patterns. Most recently, he discussed the serverless development-to-production pipeline at the Chicago and New York AWS Summits, and provided insight into how the future of serverless may be functionless in his blog series.

 

 

 

 

 

 

Chris Williams – Portsmouth, USA

Community Hero Chris Williams is an Enterprise Cloud Consultant for GreenPages Technology Solutions—a digital transformation and cloud enablement company. There he helps customers design and deploy the next generation of public, private, and hybrid cloud solutions, specializing in AWS and VMware. Chris blogs about virtualization, technology, and design at Mistwire. He is an active community leader, co-organizing the AWS Portsmouth User Group, and both hosts and presents on vBrownBag.

 

 

 

 

 

 

 

 

Dave Stauffacher – Milwaukee, USA

Community Hero Dave Stauffacher is a Principal Platform Engineer focused on cloud engineering at Direct Supply where he has helped navigate a 30,000% data growth over the last 15 years. In his current role, Dave is focused on helping drive Direct Supply’s cloud migration, combining his storage background with cloud automation and standardization practices. Dave has published his automation work for deploying AWS Storage Gateway for use with SQL Server data protection. He is a participant in the Milwaukee AWS User Group and the Milwaukee Docker User Group and has showcased his cloud experience in presentations at the AWS Midwest Community Day, AWS re:Invent, HashiConf, the Milwaukee Big Data User Group and other industry events.

 

 

 

 

 

 

Gojko Adzic – London, United Kingdom

Serverless Hero Gojko Adzic is a partner at Neuri Consulting LLP and a co-founder of MindMup, a collaborative mind mapping application that has been running on AWS Lambda since 2016. He is the author of the book Running Serverless and co-author of Serverless Computing: Economic and Architectural Impact, one of the first academic papers about AWS Lambda. He is also a key contributor to Claudia.js, an open-source tool that simplifies Lambda application deployment, and is a minor contributor to the AWS SAM CLI. Gojko frequently blogs about serverless application development on Serverless.Pub and his personal blog, and he has authored numerous other books.

 

 

 

 

 

 

 

Liz Rice – Enfield, United Kingdom

Container Hero Liz Rice is VP Open Source Engineering with cloud native security specialists Aqua Security, where she and her team look after several container-related open source projects. She is chair of the CNCF’s Technical Oversight Committee, and was Co-Chair of the KubeCon + CloudNativeCon 2018 events in Copenhagen, Shanghai and Seattle. She is a regular speaker at conferences including re:Invent, Velocity, DockerCon and many more. Her talks usually feature live coding or demos, and she is known for making complex technical concepts accessible.

 

 

 

 

 

 

 

 

Lyndon Leggate – London, United Kingdom

Machine Learning Hero Lyndon Leggate is a senior technology leader with extensive experience of defining and delivering complex technical solutions on large, business critical projects for consumer facing brands. Lyndon is a keen participant in the AWS DeepRacer league. Racing as Etaggel, he has regularly positioned in the top 10, features in DeepRacer TV and in May 2019 established the AWS DeepRacer Community. This vibrant and rapidly growing community provides a space for new and experienced racers to seek advice and share tips. The Community has gone on to expand the DeepRacer toolsets, making the platform more accessible and pushing the bounds of the technology. He also organises the AWS DeepRacer London Meetup series.

 

 

 

 

 

 

Maciej Lelusz – Crakow, Poland

Community Hero Maciej Lelusz is Co-Founder of Chaos Gears, a company concentrated on serverless, automation, IaC and chaos engineering as a way for the improvement of system resiliency. He is focused on community development, blogging, company management, and Public/Hybrid/Private cloud design. He cares about enterprise technology, IT transformation, its culture and people involved in it. Maciej is Co-Leader of the AWS User Group Poland – Cracow Chapter, and the Founder and Leader of the InfraXstructure conference and Polish VMware User Group.

 

 

 

 

 

 

 

 

Nathan Glover – Perth, Australia

Community Hero Nathan Glover is a DevOps Consultant at Mechanical Rock in Perth, Western Australia. Prior to that he worked as a Hardware Systems Designer and Embedded Developer in the IoT space. He is passionate about Cloud Native architecture and loves sharing his successes and failures on his blog. A key focus for him is breaking down the learning barrier by building practical examples using cloud services. On top of these he has a number of online courses teaching people how to get started building with Amazon Alexa Skills and AWS IoT. In his spare time, he loves to dabble in all areas of technology; building cloud connected toasters, embedded systems vehicle tracking, and competing in online capture the flag security events.

 

 

 

 

 

 

Prashanth HN – Bengaluru, India

Serverless Hero Prashanth HN is the Chief Technology Officer at WheelsBox and one of the community leaders of the AWS Users Group, Bengaluru. He mentors and consults other startups to embrace a serverless approach, frequently blogs about serverless topics for all skill levels including topics for beginners and advanced users on his personal blog and Amplify-related topics on the AWS Amplify Community Blog, and delivers talks about building using microservices and serverless. In a recent talk, he demonstrated how microservices patterns can be implemented using serverless. Prashanth maintains the open-source project Lanyard, a serverless agenda app for event organizers and attendees, which was well received at AWS Community Day India.

 

 

 

 

 

 

 

Ran Ribenzaft – Tel Aviv, Israel

Serverless Hero Ran Ribenzaft is the Chief Technology Officer at Epsagon, an AWS Advanced Technology Partner that specializes in monitoring and tracing for serverless applications. Ran is a passionate developer that loves sharing open-source tools to make everyone’s lives easier and writing technical blog posts on the topics of serverless, microservices, cloud, and AWS on Medium and the Epsagon blog. Ran is also dedicated to educating and growing the community around serverless, organizing Serverless meetups in SF and TLV, delivering online webinars and workshops, and frequently giving talks at conferences.

 

 

 

 

 

 

 

 

Rolf Koski – Tampere, Finland

Community Hero Rolf Koski works at Cybercom, which is an AWS Premier Partner from the Nordics headquartered in Sweden. He works as the CTO at Cybercom AWS Business Group. In his role he is both technical as well as being the thought leader in the Cloud. Rolf has been one of the leading figures at the Nordic AWS Communities as one of the community leads in Helsinki and Stockholm user groups and he initially founded and organized the first ever AWS Community Days Nordics. Rolf is professionally certified and additionally works as Well-Architected Lead doing Well-Architected Reviews for customer workloads.

 

 

 

 

 

 

 

Learn more about AWS Heroes and connect with a Hero near you by checking out the Hero website.

Optimize Storage Cost with Reduced Pricing for Amazon EFS Infrequent Access

This post was originally published on this site

Today we are announcing a new price reduction – one of the largest in AWS Cloud history to date – when using Infrequent Access (IA) with Lifecycle Management with Amazon Elastic File System. This price reduction make it possible to optimize cost even further and automatically save up to 92% on file storage costs as your access patterns change. With this new reduced pricing you can now store and access your files natively in a file system for effectively $0.08/GB-month, as we’ll see in an example later in this post.

Amazon Elastic File System (EFS) is a low-cost, simple to use, fully managed, and cloud-native NFS file system for Linux-based workloads that can be used with AWS services and on-premises resources. EFS provides elastic storage, growing and shrinking automatically as files are created or deleted – even to petabyte scale – without disruption. Your applications always have the storage they need immediately available. EFS also includes, for free, multi-AZ availability and durability right out of the box, with strong file system consistency.

Easy Cost Optimization using Lifecycle Management
As storage grows the likelihood that a given application needs access to all of the files all of the time lessens, and access patterns can also change over time. Industry analysts such as IDC, and our own analysis of usage patterns confirms, that around 80% of data is not accessed very often. The remaining 20% is in active use. Two common drivers for moving applications to the cloud are to maximize operational efficiency and to reduce the total cost of ownership, and this applies equally to storage costs. Instead of keeping all of the data on hand on the fastest performing storage it may make sense to move infrequently accessed data into a different storage class/tier, with an associated cost reduction. Identifying this data manually can be a burden so it’s also ideal to have the system monitor access over time and perform the movement of data between storage tiers automatically, again without disruption to your running applications.

EFS Infrequent Access (IA) with Lifecycle Management provides an easy to use, cost-optimized price and performance tier suitable for files that are not accessed regularly. With the new price reduction announced today builders can now save up to 92% on their file storage costs compared to EFS Standard. EFS Lifecycle Management is easy to enable and runs automatically behind the scenes. When enabled on a file system, files not accessed according to the lifecycle policy you choose will be moved automatically to the cost-optimized EFS IA storage class. This movement is transparent to your application.

Although the infrequently accessed data is held in a different storage class/tier it’s still immediately accessible. This is one of the advantages to EFS IA – you don’t have to sacrifice any of EFS‘s benefits to get the cost savings. Your files are still immediately accessible, all within the same file system namespace. The only tradeoff is slightly higher per operation latency (double digit ms vs single digit ms — think magnetic vs SSD) for the files in the IA storage class/tier.

As an example of the cost optimization EFS IA provides let’s look at storage costs for 100 terabytes (100TB) of data. The EFS Standard storage class is currently charged at $0.30/GB-month. When it was launched in July the EFS IA storage class was priced at $0.045/GB-month. It’s now been reduced to $0.025/GB-month. As I noted earlier, this is one of the largest price drops in the history of AWS to date!

Using the 20/80 access statistic mentioned earlier for EFS IA:

  • 20% of 100TB = 20TB at $0.30/GB-month = $0.30 x 20 x 1,000 = $6,000
  • 80% of 100TB = 80TB at $0.025/GB-month = $0.025 x 80 x 1,000 = $2,000
  • Total for 100TB = $8,000/month or $0.08/GB-month. Remember, this price also includes (for free) multi-AZ, full elasticity, and strong file system consistency.

Compare this to using only EFS Standard where we are storing 100% of the data in the storage class, we get a cost of $0.30 x 100 x 1,000 = $30,000. $22,000/month is a significant saving and it’s so easy to enable. Remember too that you have control over the lifecycle policy, specifying how frequently data is moved to the IA storage tier.

Getting Started with Infrequent Access (IA) Lifecycle Management
From the EFS Console I can quickly get started in creating a file system by choosing a Amazon Virtual Private Cloud and the subnets in the Virtual Private Cloud where I want to expose mount targets for my instances to connect to.

In the next step I can configure options for the new volume. This is where I select the Lifecycle policy I want to apply to enable use of the EFS IA storage class. Here I am going to enable files that have not been accessed for 14 days to be moved to the IA tier automatically.

In the final step I simply review my settings and then click Create File System to create the volume. Easy!

A Lifecycle Management policy can also be enabled, or changed, for existing volumes. Navigating to the volume in the EFS Console I can view the applied policy, if any. Here I’ve selected an existing file system that has no policy attached and therefore is not benefiting from EFS IA.

Clicking the pencil icon to the right of the field takes me to a dialog box where I can select the appropriate Lifecycle policy, just as I did when creating a new volume.

Amazon Elastic File System IA with Lifecycle Management is available now in all regions where Elastic File System is present.

— Steve

 

Operational Insights for Containers and Containerized Applications

This post was originally published on this site

The increasing adoption of containerized applications and microservices also brings an increased burden for monitoring and management. Builders have an expectation of, and requirement for, the same level of monitoring as would be used with longer lived infrastructure such as Amazon Elastic Compute Cloud (EC2) instances. By contrast containers are relatively short-lived, and usually subject to continuous deployment. This can make it difficult to reliably collect monitoring data and to analyze performance or other issues, which in turn affects remediation time. In addition builders have to resort to a disparate collection of tools to perform this analysis and inspection, manually correlating context across a set of infrastructure and application metrics, logs, and other traces.

Announcing general availability of Amazon CloudWatch Container Insights
At the AWS Summit in New York this past July, Amazon CloudWatch Container Insights support for Amazon ECS and AWS Fargate was announced as an open preview for new clusters. Starting today Container Insights is generally available, with the added ability to now also monitor existing clusters. Immediate insights into compute utilization and failures for both new and existing cluster infrastructure and containerized applications can be easily obtained from container management services including Kubernetes, Amazon Elastic Container Service for Kubernetes, Amazon ECS, and AWS Fargate.

Once enabled Amazon CloudWatch discovers all of the running containers in a cluster and collects performance and operational data at every layer in the container stack. It also continuously monitors and updates as changes occur in the environment, simplifying the number of tools required to collect, monitor, act, and analyze container metrics and logs giving complete end to end visibility.

Being able to easily access this data means customers can shift focus to increased developer productivity and away from building mechanisms to curate and build dashboards.

Getting started with Amazon CloudWatch Container Insights
I can enable Container Insights by following the instructions in the documentation. Once enabled and new clusters launched, when I visit the CloudWatch console for my region I see a new option for Container Insights in the list of dashboards available to me.

Clicking this takes me to the relevant dashboard where I can select the container management service that hosts the clusters that I want to observe.

In the below image I have selected to view metrics for my ECS Clusters that are hosting a sample application I have deployed in AWS Fargate. I can examine the metrics for standard time periods such as 1 hour, 3 hours, etc but can also specify custom time periods. Here I am looking at the metrics for a custom time period of the past 15 minutes.

You can see that I can quickly gain operational oversight of the overall performance of the cluster. Clicking the cluster name takes me deeper to view the metrics for the tasks inside the cluster.

Selecting a container allows me to then dive into either AWS X-Ray traces or performance logs.

Selecting performance logs takes me to the Amazon CloudWatch Logs Insights page where I can run queries against the performance events collected for my container ecosystem (e.g., Container, Task/Pod, Cluster, etc.) that I can then use to troubleshoot and dive deeper.

Container Insights makes it easy for me to get started monitoring my containers and enables me to quickly drill down into performance metrics and log analytics without the need to build custom dashboards to curate data from multiple tools. Beyond monitoring and troubleshooting I can also use the data and dashboards Container Insights provides me to support other use cases such as capacity requirements and planning, by helping me understand compute utilization by Pod/Task, Container, and Service for example.

Availability
Amazon CloudWatch Container Insights is generally available today to customers in all public AWS regions where Amazon Elastic Container Service for Kubernetes, Kubernetes, Amazon ECS, and AWS Fargate are present.

— Steve

New – Port Forwarding Using AWS System Manager Sessions Manager

This post was originally published on this site

I increasingly see customers adopting the immutable infrastructure architecture pattern: they rebuild and redeploy an entire infrastructure for each update. They very rarely connect to servers over SSH or RDP to update configuration or to deploy software updates. However, when migrating existing applications to the cloud, it is common to connect to your Amazon Elastic Compute Cloud (EC2) instances to perform a variety of management or operational tasks. To reduce the surface of attack, AWS recommends using a bastion host, also known as a jump host. This special purpose EC2 instance is designed to be the primary access point from the Internet and acts as a proxy to your other EC2 instances. To connect to your EC2 instance, you first SSH / RDP into the bastion host and, from there, to the destination EC2 instance.

To further reduce the surface of attack, the operational burden to manage bastion hosts and the additional costs incurred, AWS Systems Manager Session Manager allows you to securely connect to your EC2 instances, without the need to run and to operate your own bastion hosts and without the need to run SSH on your EC2 instances. When Systems Manager‘s Agent is installed on your instances and when you have IAM permissions to call Systems Manager API, you can use the AWS Management Console or the AWS Command Line Interface (CLI) to securely connect to your Linux or Windows EC2 instances.

Interactive shell on EC2 instances is not the only use case for SSH. Many customers are also using SSH tunnel to remotely access services not exposed to the public internet. SSH tunneling is a powerful but lesser known feature of SSH that alows you to to create a secure tunnel between a local host and a remote service. Let’s imagine I am running a web server for easy private file transfer between an EC2 instance and my laptop. These files are private, I do not want anybody else to access that web server, therefore I configure my web server to bind only on 127.0.0.1 and I do not add port 80 to the instance’ security group. Only local processes can access the web server. To access the web server from my laptop, I create a SSH tunnel between the my laptop and the web server, as shown below

This command tells SSH to connect to instance as user ec2-user, open port 9999 on my local laptop, and forward everything from there to localhost:80 on instance. When the tunnel is established, I can point my browser at http://localhost:9999 to connect to my private web server on port 80.

Today, we are announcing Port Forwarding for AWS Systems Manager Session Manager. Port Forwarding allows you to securely create tunnels between your instances deployed in private subnets, without the need to start the SSH service on the server, to open the SSH port in the security group or the need to use a bastion host.

Similar to SSH Tunnels, Port Forwarding allows you to forward traffic between your laptop to open ports on your instance. Once port forwarding is configured, you can connect to the local port and access the server application running inside the instance. Systems Manager Session Manager’s Port Forwarding use is controlled through IAM policies on API access and the Port Forwarding SSM Document. These are two different places where you can control who in your organisation is authorised to create tunnels.

To experiment with Port Forwarding today, you can use this CDK script to deploy a VPC with private and public subnets, and a single instance running a web server in the private subnet. The drawing below illustrates the infrastructure that I am using for this blog post.

The instance is private, it does not have a public IP address, nor a DNS name. The VPC Default Security Group does not authorise connection over SSH. The Systems Manager‘s Agent, running on your EC2 instance, must be able to communicate with the Systems Manager‘ Service Endpoint. The private subnet must therefore have a routing table to a NAT Gateway or you must configure an AWS Private Link to do so.

Let’s use Systems Manager Session Manager Port Forwarding to access the web server running on this private instance.

Before doing so, you must ensure the following prerequisites are met on the EC2 instance:

  • System Manager Agent must be installed and running (version 2.3.672.0 or more recent, see instructions for Linux or Windows). The agent is installed and started by default on Amazon Linux 1 & 2, Windows and Ubuntu AMIs provided by Amazon (see this page for the exact versions), no action is required when you are using these.
  • the EC2 instance must have an IAM role with permission to invoke Systems Manager API. For this example, I am using AmazonSSMManagedInstanceCore.

On your laptop, you must:

Once the prerequisites are met, you use the AWS Command Line Interface (CLI) to create the tunnel (assuming you started the instance using the CDK script to create your instance) :

# find the instance ID based on Tag Name
INSTANCE_ID=$(aws ec2 describe-instances 
               --filter "Name=tag:Name,Values=CodeStack/NewsBlogInstance" 
               --query "Reservations[].Instances[?State.Name == 'running'].InstanceId[]" 
               --output text)
# create the port forwarding tunnel
aws ssm start-session --target $INSTANCE_ID 
                       --document-name AWS-StartPortForwardingSession 
                       --parameters '{"portNumber":["80"],"localPortNumber":["9999"]}'

Starting session with SessionId: sst-00xxx63
Port 9999 opened for sessionId sst-00xxx63
Connection accepted for session sst-00xxx63.

You can now point your browser to port 9999 and access your private web server. Type ctrl-c to terminate the port forwarding session.

The Session Manager Port Forwarding creates a tunnel similar to SSH tunneling, as illustrated below.

Port Forwarding works for Windows and Linux instances. It is available in every public AWS region today, at no additional cost when connecting to EC2 instances, you will be charged for the outgoing bandwidth from the NAT Gateway or your VPC Private Link.

— seb

Amazon Transcribe Now Supports Mandarin and Russian

This post was originally published on this site

As speech is central to human interaction, artificial intelligence research has long focused on speech recognition, the first step in designing and building systems allowing humans to interact intuitively with machines. The diversity in languages, accents and voices makes this an incredibly difficult problem, requiring expert skills, extremely large data sets, and vast amounts of computing power to train efficient models.

In order to help organizations and developers use speech recognition in their applications, we launched Amazon Transcribe at AWS re:Invent 2017, an automatic speech recognition service. Thanks to Amazon Transcribe, customers such as VideoPeel, Echo360, or GE Appliances have been able to quickly and easily add speech recognition capabilities to their applications and devices.

A single API call is all that it takes… and you don’t need to know the first thing about machine learning. You can analyze audio files stored in Amazon Simple Storage Service (S3) and have the service return a text file of the transcribed speech. You can also send a live audio stream to Amazon Transcribe and receive a stream of transcripts in real time.

Since launch, the team has constantly added new languages, and today we are happy to announce support for Mandarin and Russian, bringing the total number of supported languages to 16.

Introducing Mandarin
Working with Amazon Transcribe is extremely simple: let me show you how to get started in just a few minutes.

Let’s try Mandarin first. Starting from this Little Red Riding Hood video, I extracted the audio track, saved it in MP3 format, and uploaded it to one of my Amazon Simple Storage Service (S3) buckets. Here’s the actual file.

Then, I started a transcription job using the AWS CLI:

$ aws transcribe start-transcription-job--media MediaFileUri=https://s3-us-west-2.amazonaws.com/jsimon-transcribe-demo/little_red_riding_hood-mandarin.mp3 --media-format mp3 --language-code zh-CN --transcription-job-name little_red_riding_hood-mandarin

After a few minutes, the job is complete. Looking at the AWS console, I can either download it using the URL provided by Amazon Transcribe, or read it directly.

Unfortunately, I don’t speak Mandarin, but using Amazon Translate, this text is about a sick grandmother and a big bad wolf, so it looks like Amazon Transcribe did its job!

Introducing Russian
Let’s try Russian now, using the dialogue in this short video.

Здравствуйте! Greetings!
Добрый день! Good day!
Давайте познакомимся. Меня зовут Слава. Let’s introduce ourselves. My name is Slava.
Очень приятно, а меня – Наташа. Nice to meet you, and mine – Natasha.
Наташа, кто вы по профессии? Natasha, what is your profession?
Я врач. А вы? I (am a) doctor. And you?
Я инженер. I (am an) engineer.

This time, I will ask Amazon Transcribe to perform speaker identification too.

$ aws transcribe start-transcription-job --media MediaFileUri=https://s3-us-west-2.amazonaws.com/jsimon-transcribe-demo/russian-dialogue.mp3 --media-format mp3 --language-code ru-RU --transcription-job-name russian_dialogue --settings ShowSpeakerLabels=true,MaxSpeakerLabels=2

Here is the result.

As you can see, not only has Amazon Transcribe faithfully converted speech to text, it has also correctly assigned each sentence to the correct speaker.

Now Available!
You can start using these two new languages today in the following regions:

  • Americas: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), AWS GovCloud (US-West), Canada (Central), South America (Sao Paulo).
  • Europe: EU (Frankfurt), EU (Ireland), EU (London), EU (Paris).
  • Asia Pacific: Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney).

The free tier covers 60 minutes for the first 12 months, starting from your first transcription request.

As always, we’d love to hear your feedback: please post it to the AWS forum for Amazon Transcribe, or send it through your usual AWS contacts.

Julien;

Amazon Forecast – Now Generally Available

This post was originally published on this site

Getting accurate time series forecasts from historical data is not an easy task. Last year at re:Invent we introduced Amazon Forecast, a fully managed service that requires no experience in machine learning to deliver highly accurate forecasts. I’m excited to share that Amazon Forecast is generally available today!

With Amazon Forecast, there are no servers to provision. You only need to provide historical data, plus any additional metadata that you think may have an impact on your forecasts. For example, the demand for a particular product you need or produce may change with the weather, the time of the year, and the location where the product is used.

Amazon Forecast is based on the same technology used at Amazon and packages our years of experience in building and operating scalable, highly accurate forecasting technology in a way that is easy to use, and can be used for lots of different use cases, such as estimating product demand, cloud computing usage, financial planning, resource planning in a supply chain management system, as it uses deep learning to learn from multiple datasets and automatically try different algorithms.

Using Amazon Forecast
For this post, I need some sample data. To have an interesting use case, I go for the individual household electric power consumption dataset from the UCI Machine Learning Repository. For simplicity, I am using a version where data is aggregated hourly in a file in CSV format. Here are the first few lines where you can see the timestamp, the energy consumption, and the client ID:

2014-01-01 01:00:00,38.34991708126038,client_12
2014-01-01 02:00:00,33.5820895522388,client_12
2014-01-01 03:00:00,34.41127694859037,client_12
2014-01-01 04:00:00,39.800995024875625,client_12
2014-01-01 05:00:00,41.044776119402975,client_12

Let’s see how easy it is to build a predictor and get forecasts by using the Amazon Forecast console. Another option, for more advanced users, would be to use a Jupyter notebook and the AWS SDK for Python. You can find some sample notebooks in this GitHub repository.

In the Amazon Forecast console, the first step is to create a dataset group. Dataset groups act as containers for datasets that are related.

I can select a forecasting domain for my dataset group. Each domain covers a specific use case, such as retail, inventory planning, or web traffic, and brings its own dataset types based on the type of data used for training. For now, I use a custom domain that covers all use cases that don’t fall in other categories.

Next, I create a dataset. The data I am going to upload is aggregated by the hour, so 1 hour is the frequency of my data. The default data schema depends on the forecasting domain I selected earlier. I am using a custom domain here, and I change the data schema to have a timestamp, a target_value, and an item_id, in that order, as you can see in the sample few lines of data at the beginning of this post.

Now is the time to upload my time series data from Amazon Simple Storage Service (S3) into my dataset. The default timestamp format is exactly what I have in my data, so I don’t need to change it. I need an AWS Identity and Access Management (IAM) role to give Amazon Forecast access to the S3 bucket. I can select one here, or create a new one for this use case. As usual, avoid creating IAM roles that are too permissive and apply a least privilege approach to reduce the amount of permissions to the minimum required for this activity. After I tell Amazon Forecast in which S3 bucket and folder to look for my historical data, I start the import job.

The dataset group dashboard gives an overview of the process. My target time series data is being imported, and I can optionally add:

  • item metadata information on the items I want to predict on; for example, the color of the items in a retail scenario, or the kind of household (is this an apartment or a detached house?) for this electricity-focused use case.
  • related time series data that don’t include the target variable I want to predict, but can help improve my model; for example, price and promotions used by an ecommerce company are probably related to actual sales.

I am not adding more data for this use case. As soon as my dataset is imported, I start to train a predictor that I can then use to generate forecasts. I give the predictor a name, then select the forecast horizon, that in my case is 24 hours, and the frequency at which my forecast are generated.

To train the predictor, I can select a specific machine learning algorithms of my choice, such as ARIMA or DeepAR+, but I prefer simplicity and use AutoML to let Amazon Forecast evaluate all algorithms and choose the one that performs best for my dataset.

In the case of my dataset, each household is identified by a single variable, the item_id, but you can add more dimensions if required. I can then select the Country for holidays. This is optional, but can improve your results if the data you are using may be affected by people being on holidays or not. I think energy usage is different on holidays, so I select United States, the country my dataset is coming from.

The configuration of the backtest windows is a more advanced topic, and you can skip the next paragraph if you’re not interested into the details of how a machine learning model is evaluated in case of time series. In this case, I am leaving the default.

When training a machine learning model, you need two split your dataset in two: a training dataset you use to train with the machine learning algorithm, and an evaluation dataset that you use to evaluate the performance of your trained model. With time series, you can’t just create these two subsets of your data randomly, like you would normally do, because the order of your data points is important. The approach we use for Amazon Forecast is to split the time series in one or more parts, each one called a backtest window, preserving the order of the data. When evaluating your model against a backtest window, you should always use an evaluation dataset of the same length, otherwise it would be very difficult to compare different results.  The backtest window offset tells how many points ahead of a split point you want to use for evaluation, and this is the same value for all the splits. For example, by leaving 24 (hours) I always use one day of data for evaluating my model against multiple window offsets.

In the advanced configurations, I have the option to enable hyperparameter optimization (HPO), for the algorithms that support it, and featurizations, to develop additional features computed from the ones in your data. I am not touching those settings now.

After a few minutes, the predictor is active. To understand the quality of a predictor, I look at some of the metrics that are automatically computed.

Quantile loss (QL) calculates how far off the forecast at a certain quantile is from the actual demand. It weights underestimation and overestimation according to a specific quantile. For example, a P90 forecast, if calibrated, means that 90% of the time the true demand is less than the forecast value. Thus, when the demand turns out to be higher than the forecast, the loss would be greater than the other way around.

When the predictor is ready, and I am satisfied by its metrics, I can use it to create a forecast.

When the forecast is active, I can query it to get predictions. I can export the whole forecast as CSV file, or query for specific lookups. Let’s do a lookup. In the case of the dataset I am using, I can forecast the energy used by a household for a specific range of time. Dates here are in the past because I used an old dataset. I am pretty sure you’re going to use Amazon Forecast to look into the future.

For each timestamp in the forecast, I get a range of values. The P10, P50, and P90 forecasts have respectively 10%, 50%, and 90% probability of satisfying the actual demand. How you use these three values depends on your use case and how it is impacted by overestimating or underestimating demand. The P50 forecast is the most likely estimate for the demand. The P10 and P90 forecasts give you an 80% confidence interval for what to expect.

Available Now
You can use Amazon Forecast via the console, the AWS Command Line Interface (CLI) and the AWS SDKs. For example, you can use Amazon Forecast within a Jupyter notebook with the AWS SDK for Python to create a new predictor, or use the AWS SDK for JavaScript in the Browser to get predictions from within a web or mobile app, or the AWS SDK for Java or AWS SDK for .NET to add forecast capabilities to an existing enterprise application.

Here’s the overall flow of the Amazon Forecast API, from creating the dataset group to querying and extracting the forecast:

The dataset I used for this walkthrough and other examples are available in this GitHub repository:

Amazon Forecast is now available in US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), Asia Pacific (Singapore), and Asia Pacific (Tokyo).

More information on specific features and pricing is one click away at:

I look forward to see what you’re going to use it for, please share your results with me!

Danilo

Amazon Prime Day 2019 – Powered by AWS

This post was originally published on this site

What did you buy for Prime Day? I bought a 34″ Alienware Gaming Monitor and used it to replace a pair of 25″ monitors that had served me well for the past six years:

 

As I have done in years past, I would like to share a few of the many ways that AWS helped to make Prime Day a reality for our customers. You can read How AWS Powered Amazon’s Biggest Day Ever and Prime Day 2017 – Powered by AWS to learn more about how we evaluate the results of each Prime Day and use what we learn to drive improvements to our systems and processes.

This year I would like to focus on three ways that AWS helped to support record-breaking amounts of traffic and sales on Prime Day: Amazon Prime Video Infrastructure, AWS Database Infrastructure, and Amazon Compute Infrastructure. Let’s take a closer look at each one…

Amazon Prime Video Infrastructure
Amazon Prime members were able to enjoy the second Prime Day Concert (presented by Amazon Music) on July 10, 2019. Headlined by 10-time Grammy winner Taylor Swift, this live-streamed event also included performances from Dua Lipa, SZA, and Becky G.

Live-streaming an event of this magnitude and complexity to an audience in over 200 countries required a considerable amount of planning and infrastructure. Our colleagues at Amazon Prime Video used multiple AWS Media Services including AWS Elemental MediaPackage and AWS Elemental live encoders to encode and package the video stream.

The streaming setup made use of two AWS Regions, with a redundant pair of processing pipelines in each region. The pipelines delivered 1080p video at 30 fps to multiple content distribution networks (including Amazon CloudFront), and worked smoothly.

AWS Database Infrastructure
A combination of NoSQL and relational databases were used to deliver high availability and consistent performance at extreme scale during Prime Day:

Amazon DynamoDB supports multiple high-traffic sites and systems including Alexa, the Amazon.com sites, and all 442 Amazon fulfillment centers. Across the 48 hours of Prime Day, these sources made 7.11 trillion calls to the DynamoDB API, peaking at 45.4 million requests per second.

Amazon Aurora also supports the network of Amazon fulfillment centers. On Prime Day, 1,900 database instances processed 148 billion transactions, stored 609 terabytes of data, and transferred 306 terabytes of data.

Amazon Compute Infrastructure
Prime Day 2019 also relied on a massive, diverse collection of EC2 instances. The internal scaling metric for these instances is known as a server equivalent; Prime Day started off with 372K server equivalents and scaled up to 426K at peak.

Those EC2 instances made great use of a massive fleet of Elastic Block Store (EBS) volumes. The team added an additional 63 petabytes of storage ahead of Prime Day; the resulting fleet handled 2.1 trillion requests per day and transferred 185 petabytes of data per day.

And That’s a A Wrap
These are some impressive numbers, and show you the kind of scale that you can achieve with AWS. As you can see, scaling up for one-time (or periodic) events and then scaling back down afterward, is easy and straightforward, even at world scale!

If you want to run your own world-scale event, I’d advise you to check out the blog posts that I linked above, and also be sure to read about AWS Infrastructure Event Management. My colleagues are ready (and eager) to help you to plan for your large-scale product or application launch, infrastructure migration, or marketing event. Here’s an overview of their process:

 

Jeff;