Release of PowerShell Script Analyzer (PSScriptAnalyzer) 1.18.2

This post was originally published on this site

In keeping with the tradition of releasing improvements to PSScriptAnalyzer more often, we’re happy to announce that 1.18.12 is now available! As a dependency of PowerShell Editor Services (a module used by editor extensions like the PowerShell Visual Studio Code extension), this release is motivated by a desire to further stabilize our editor experience. At the moment, the Visual Studio Code PowerShell extension still ships with PSScriptAnalyzer 1.18.0. After fixing some undesirable edge cases between 1.18.1 and 1.18.2, we intend to ship an update to the Visual Studio Code extension that will include 1.18.2.

The blocking issue that it resolves is quite technical and should not concern end-users, but for those who are interested: starting with1.18.1, a performance optimization was added whereby we started to share and cache a PowerShell runspace pool instead of creating a new one for every command invocation. However, it turns out that there is an edge case where, when dealing with specific commands from thePackageManagementmodule, the runspace pool can get into a deadlock, which causes the execution of PSScriptAnalyzer to hang indefinitely. This is due to a bug inPackageManagementitself (a very unfortunate asynchronous API call that leads to the deadlock) but also PowerShell itself, which should be able to handle bad scenarios like this. Therefore, a workaround for this had to be implemented in PSScriptAnalyzer by blacklisting the PackageManagement commands.

Given that the other changes in this release are mainly fixes and small enhancements, we decided to not bump the minor version number. We ask that the community participate in testing and giving feedback on this update before it ships by default in the Visual Studio Code extension. You can make this new update with the Visual Studio Code extension start by executing the following command:

Install-Module -Name PSScriptAnalyzer -Repository PSGallery -Scope CurrentUser

Should you find that there are changes that you are not happy with, please report them here.

Optionally, you can roll back to the default included version of PSScriptAnalyzer by running Uninstall-Module -Name PSScriptAnalyzer.

In this release, we’ve made the following fixes

  • PipelineIndentation: More edge cases when using non-default values of this setting (NoIndentation in the Visual Studio Code extension) were fixed. This feature was only introduced in1.18.0and we hope the be closer to a state now where we could potentially change the default.
  • New compatibility rule profiles were added for non-Windows OSs on PowerShell 7 (preview). Additionally, fixes were made to profile generation to support macOS and Linux.
  • A fix was made to PSCloseBrace to correctly flag the closing brace of a one-line hashtable, correcting some broken formatting.

Enhancements were made in the following areas

  • When using settings files, error messages are now much more actionable.
PS> Invoke-ScriptAnalyzer -Settings /tmp/MySettings.psd1 -ScriptDefinition 'gci'

Invoke-ScriptAnalyzer : includerule is not a valid key in the settings hashtable.
Valid keys are CustomRulePath, ExcludeRules, IncludeRules, IncludeDefaultRules,
RecurseCustomRulePath, Rules and Severity. 
...

  • PSScriptAnalyzer has a logo now thanks to the community member @adilio
  • The formatter was enhanced to also take commented lines into account in multi-line commands
  • The formatter was enhanced to optionally allow correction of aliases as well. With this change, a setting in the Visual Studio Code extension will soon be made available to configure this. By default, this setting will not be on for the moment. We are open to feedback: while there are very likely a few people that would love for it to be enabled, it may upset others.
  • UseDeclaredVarsMoreThanAssignmentsnow also takes into account the usage of Get-Variable with an array of variables and usage of the named parameter -Name

We’ve also made some changes in our GitHub repository and changed the default branch from development to master to simplify the development workflow and be consistent with other repositories in the PowerShell organization. If you have a fork of the project, you will need to make this change in your fork as well or remember to use master as a base and open pull requests against master. This also means that the next version of the Visual Studio Code extension will point tomasterfor the documentation of PSScriptAnalyzer’s rules.

The Changelog has more details if you want to dig further.

Future Directions

We are thinking of following an approach similar to the Visual Studio Code extension where we make a version 2.0 at that drops support for PowerShell version 3 and 4. One of the next changes could be to improve how PowerShellEditorServices calls into PSScriptAnalyzer: currently, Editor Services uses the PSScriptAnalyzer PowerShell cmdlets which means that we have to create an entire instance of PowerShell for these invocations. Knowing that bothPowerShellEditorServicesandPSScriptAnalyzerare binary .NET modules, we could directly call into PSScriptAnalyzer’s .NET code by publishing a NuGet package of PSScriptAnalyzer with suitable public APIs. Given that PSScriptAnalzyer currently performs a conditional compilation for each PowerShell version (3, 4, 5, and 6+), dropping support for version 4 and 5 could help make the aforementioned move to an API model much easier to implement. Please give feedback if your use case ofPSScriptAnalyzerwould be impacted by this.

On behalf of the Script Analyzer team,

Christoph Bergmeister, Project Maintainer from the community, BJSS
Jim Truher, Senior Software Engineer, Microsoft

The post Release of PowerShell Script Analyzer (PSScriptAnalyzer) 1.18.2 appeared first on PowerShell.

Introducing the Newest AWS Heroes – September 2019

This post was originally published on this site

Leaders within the AWS technical community educate others about the latest AWS services in a variety of ways: some share knowledge in person by speaking at events or running workshops and Meetups; while others prefer to share their insights online via social media, blogs, or open source contributions.

The most prominent AWS community leaders worldwide are recognized as AWS Heroes, and today we are excited to introduce to you the latest members of the AWS Hero program:

Alex Schultz – Fort Wayne, USA

Machine Learning Hero Alex Schultz works in the Innovation Labs at Advanced Solutions where he develops machine learning enabled products and solutions for the biomedical and product distribution industries. After receiving a DeepLens at re:Invent 2017, he dove headfirst into machine learning where he used the device to win the AWS DeepLens challenge by building a project which can read books to children. As an active advocate for AWS, he runs the Fort Wayne AWS User Group and loves to share his knowledge and experience with other developers. He also regularly contributes to the online DeepRacer community where he has helped many people who are new to machine learning get started.

 

 

 

 

 

 

 

Chase Douglas – Portland, USA

Serverless Hero Chase Douglas is the CTO and co-founder at Stackery, where he steers engineering and technical architecture of development tools that enable individuals and teams of developers to successfully build and manage serverless applications. He is a deeply experienced software architect and a long-time engineering leader focused on building products that increase development efficiency while delighting users. Chase is also a frequent conference speaker on the topics of serverless, instrumentation, and software patterns. Most recently, he discussed the serverless development-to-production pipeline at the Chicago and New York AWS Summits, and provided insight into how the future of serverless may be functionless in his blog series.

 

 

 

 

 

 

Chris Williams – Portsmouth, USA

Community Hero Chris Williams is an Enterprise Cloud Consultant for GreenPages Technology Solutions—a digital transformation and cloud enablement company. There he helps customers design and deploy the next generation of public, private, and hybrid cloud solutions, specializing in AWS and VMware. Chris blogs about virtualization, technology, and design at Mistwire. He is an active community leader, co-organizing the AWS Portsmouth User Group, and both hosts and presents on vBrownBag.

 

 

 

 

 

 

 

 

Dave Stauffacher – Milwaukee, USA

Community Hero Dave Stauffacher is a Principal Platform Engineer focused on cloud engineering at Direct Supply where he has helped navigate a 30,000% data growth over the last 15 years. In his current role, Dave is focused on helping drive Direct Supply’s cloud migration, combining his storage background with cloud automation and standardization practices. Dave has published his automation work for deploying AWS Storage Gateway for use with SQL Server data protection. He is a participant in the Milwaukee AWS User Group and the Milwaukee Docker User Group and has showcased his cloud experience in presentations at the AWS Midwest Community Day, AWS re:Invent, HashiConf, the Milwaukee Big Data User Group and other industry events.

 

 

 

 

 

 

Gojko Adzic – London, United Kingdom

Serverless Hero Gojko Adzic is a partner at Neuri Consulting LLP and a co-founder of MindMup, a collaborative mind mapping application that has been running on AWS Lambda since 2016. He is the author of the book Running Serverless and co-author of Serverless Computing: Economic and Architectural Impact, one of the first academic papers about AWS Lambda. He is also a key contributor to Claudia.js, an open-source tool that simplifies Lambda application deployment, and is a minor contributor to the AWS SAM CLI. Gojko frequently blogs about serverless application development on Serverless.Pub and his personal blog, and he has authored numerous other books.

 

 

 

 

 

 

 

Liz Rice – Enfield, United Kingdom

Container Hero Liz Rice is VP Open Source Engineering with cloud native security specialists Aqua Security, where she and her team look after several container-related open source projects. She is chair of the CNCF’s Technical Oversight Committee, and was Co-Chair of the KubeCon + CloudNativeCon 2018 events in Copenhagen, Shanghai and Seattle. She is a regular speaker at conferences including re:Invent, Velocity, DockerCon and many more. Her talks usually feature live coding or demos, and she is known for making complex technical concepts accessible.

 

 

 

 

 

 

 

 

Lyndon Leggate – London, United Kingdom

Machine Learning Hero Lyndon Leggate is a senior technology leader with extensive experience of defining and delivering complex technical solutions on large, business critical projects for consumer facing brands. Lyndon is a keen participant in the AWS DeepRacer league. Racing as Etaggel, he has regularly positioned in the top 10, features in DeepRacer TV and in May 2019 established the AWS DeepRacer Community. This vibrant and rapidly growing community provides a space for new and experienced racers to seek advice and share tips. The Community has gone on to expand the DeepRacer toolsets, making the platform more accessible and pushing the bounds of the technology. He also organises the AWS DeepRacer London Meetup series.

 

 

 

 

 

 

Maciej Lelusz – Crakow, Poland

Community Hero Maciej Lelusz is Co-Founder of Chaos Gears, a company concentrated on serverless, automation, IaC and chaos engineering as a way for the improvement of system resiliency. He is focused on community development, blogging, company management, and Public/Hybrid/Private cloud design. He cares about enterprise technology, IT transformation, its culture and people involved in it. Maciej is Co-Leader of the AWS User Group Poland – Cracow Chapter, and the Founder and Leader of the InfraXstructure conference and Polish VMware User Group.

 

 

 

 

 

 

 

 

Nathan Glover – Perth, Australia

Community Hero Nathan Glover is a DevOps Consultant at Mechanical Rock in Perth, Western Australia. Prior to that he worked as a Hardware Systems Designer and Embedded Developer in the IoT space. He is passionate about Cloud Native architecture and loves sharing his successes and failures on his blog. A key focus for him is breaking down the learning barrier by building practical examples using cloud services. On top of these he has a number of online courses teaching people how to get started building with Amazon Alexa Skills and AWS IoT. In his spare time, he loves to dabble in all areas of technology; building cloud connected toasters, embedded systems vehicle tracking, and competing in online capture the flag security events.

 

 

 

 

 

 

Prashanth HN – Bengaluru, India

Serverless Hero Prashanth HN is the Chief Technology Officer at WheelsBox and one of the community leaders of the AWS Users Group, Bengaluru. He mentors and consults other startups to embrace a serverless approach, frequently blogs about serverless topics for all skill levels including topics for beginners and advanced users on his personal blog and Amplify-related topics on the AWS Amplify Community Blog, and delivers talks about building using microservices and serverless. In a recent talk, he demonstrated how microservices patterns can be implemented using serverless. Prashanth maintains the open-source project Lanyard, a serverless agenda app for event organizers and attendees, which was well received at AWS Community Day India.

 

 

 

 

 

 

 

Ran Ribenzaft – Tel Aviv, Israel

Serverless Hero Ran Ribenzaft is the Chief Technology Officer at Epsagon, an AWS Advanced Technology Partner that specializes in monitoring and tracing for serverless applications. Ran is a passionate developer that loves sharing open-source tools to make everyone’s lives easier and writing technical blog posts on the topics of serverless, microservices, cloud, and AWS on Medium and the Epsagon blog. Ran is also dedicated to educating and growing the community around serverless, organizing Serverless meetups in SF and TLV, delivering online webinars and workshops, and frequently giving talks at conferences.

 

 

 

 

 

 

 

 

Rolf Koski – Tampere, Finland

Community Hero Rolf Koski works at Cybercom, which is an AWS Premier Partner from the Nordics headquartered in Sweden. He works as the CTO at Cybercom AWS Business Group. In his role he is both technical as well as being the thought leader in the Cloud. Rolf has been one of the leading figures at the Nordic AWS Communities as one of the community leads in Helsinki and Stockholm user groups and he initially founded and organized the first ever AWS Community Days Nordics. Rolf is professionally certified and additionally works as Well-Architected Lead doing Well-Architected Reviews for customer workloads.

 

 

 

 

 

 

 

Learn more about AWS Heroes and connect with a Hero near you by checking out the Hero website.

PowerShell ForEach-Object Parallel Feature

This post was originally published on this site

PowerShell ForEach-Object Parallel Feature

PowerShell 7.0 Preview 3 is now available with a new ForEach-Object Parallel Experimental feature. This feature is a great new tool for parallelizing work, but like any tool, it has its uses and drawbacks.

This article describes this new feature, how it works, when to use it and when not to.

What is ForEach-Object -Parallel?

ForEach-Object -Parallel is a new parameter set added to the existing PowerShell ForEach cmdlet.

ForEach-Object -Parallel <scriptblock> [-InputObject <psobject>] [-ThrottleLimit <int>] [-TimeoutSeconds <int>] [-AsJob] [-WhatIf] [-Confirm] [<CommonParameters>]

 

Normally, when you use the ForEach-Object cmdlet, each object piped to the cmdlet is processed sequentially.

PS C:> 1..5 | ForEach-Object { "Hello $_"; sleep 1 } Hello 1 Hello 2 Hello 3 Hello 4 Hello 5 PS C:> (Measure-Command { 1..5 | ForEach-Object { "Hello $_"; sleep 1 } }).Seconds 5

But with the new ForEach-Object -Parallel parameter set, you can run all script in parallel for each piped input object.

PS C:> 1..5 | ForEach-Object -Parallel { "Hello $_"; sleep 1; } -ThrottleLimit 5
Hello 1
Hello 3
Hello 2
Hello 4
Hello 5

PS C:> (Measure-Command { 1..5 | ForEach-Object -Parallel { "Hello $_"; sleep 1; } -ThrottleLimit 5 }).Seconds
1

Because each script block in the ForEach-Object example above takes 1 second to run, running all five in parallel takes only one second instead of 5 seconds when run sequentially.

Since the script blocks are run in parallel for each of the 1-5 piped input integers, the order of execution is not guaranteed. The -ThrottleLimit parameter limits the number of script blocks running in parallel at a given time, and its default value is 5.

This new feature also supports jobs, where you can choose to have a job object returned instead of having results written to the console.

PS C:> $Job = 1..5 | ForEach-Object -Parallel { "Hello $_"; sleep 1; } -ThrottleLimit 5 -AsJob PS C:> $job | Wait-Job | Receive-Job Hello 1 Hello 2 Hello 3 Hello 5 Hello 4

ForEach-Object -Parallel is not the same as the foreach language keyword

Don’t confuse ForEach-Object cmdlet with PowerShell’s foreach keyword. The foreach keyword does not handle piped input but instead iterates over an enumerable object. There is currently no parallel support for the foreach keyword.

PS C:> foreach ($item in (1..5)) { "Hello $item" }
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5

How does it work?

The new ForEach-Object -Parallel parameter set uses existing PowerShell APIs for running script blocks in parallel. These APIs have been around since PowerShell v2, but are cumbersome and difficult to use correctly. This new feature makes it much easier to run script blocks in parallel. But there is a fair amount of overhead involved and many times there is no gain in running scripts in parallel, and in fact it can end up being significantly slower than running ForEach-Object normally.

PowerShell currently supports parallelism in three main categories.

  1. PowerShell remoting. Here PowerShell sends script to external machines to run, using PowerShell’s remoting system.
  2. PowerShell jobs. This is the same as remoting except that script is run in separate processes on the local machine, rather than on external machines.
  3. PowerShell runspaces. Here script is run on the local machine within the same process but on separate threads.

This new feature uses the third method for running scripts in parallel. It has the least overhead of the other two methods and does not use the PowerShell remoting system. So it is generally much faster than the other two methods.

However, there is still quite a bit of overhead to run script blocks in parallel. Script blocks run in a context called a PowerShell runspace. The runspace context contains all of the defined variables, functions and loaded modules. So initializing a runspace for script to run in takes time and resources. When scripts are run in parallel they must be run within their own runspace. And each runspace must load whatever module is needed and have any variable be explicitly passed in from the calling script. The only variable that automatically appears in the parallel script block is the piped in object. Other variables are passed in using the $using: keyword.

$computers = 'computerA','computerB','computerC','computerD'
$logsToGet = 'LogA','LogB','LogC'

# Read specified logs on each machine, using custom module
$logs = $computers | ForEach-Object -ThrottleLimit 10 -Parallel {
    Import-Module MyLogsModule
    Get-Logs -ComputerName $_ -LogName $using:logsToGet
}

Given the overhead required to run scripts in parallel, the -ThrottleLimit becomes very useful to prevent the system from being overwhelmed. There are some cases where running a lot of script blocks in parallel makes sense, but also many cases where it does not.

When should it be used?

There are two primary reasons to run script blocks in parallel with the ForEach-Object -Parallel feature (keeping in mind that this feature runs the script on separate system threads).

  1. Highly compute intensive script. If your script is crunching a lot of data over a significant period of time and the scripts can be run independently, then it is worthwhile to run them in parallel. But only if the machine you are running on has multiple cores that can host the script block threads. In this case the -ThrottleLimit parameter should be set approximately to the number of available cores. If you are running on a VM with a single core, then it makes little sense to run high compute script blocks in parallel since the system must serialize them anyway to run on the single core.
  2. Script that must wait on something. If you have script that can run independently and performs long running work that requires waiting for somethings to complete, then it makes sense to run these tasks in parallel. If you have 5 scripts that take 5 minutes each to run but spend most of the time waiting, you can have them all run/wait at the same time, and complete all 5 tasks in 5 minutes instead of 25 minutes. Scripts that do a lot of file operations, or perform operations on external machines can benefit by running in parallel. Since the running script cannot use all of the machine cores, it makes sense to set the -ThrottleLimit parameter to something greater than the number of cores. If one script execution waits many minutes to complete, you may want to allow tens or hundreds of scripts to run in parallel.
$logNames.count
10

PS C:> Measure-Command { $logs = $logNames | ForEach-Object -Parallel { Get-WinEvent -LogName $_ -MaxEvents 5000 2>$null } -ThrottleLimit 10 }
TotalMilliseconds : 115994.3 (1 minute 56 seconds)
$logs.Count
50000

PS C:> Measure-Command { $logs = $logNames | ForEach-Object { Get-WinEvent -LogName $_ -MaxEvents 5000 2>$null } }
TotalMilliseconds : 229768.2364 (3 minutes 50 seconds)
$logs.Count
50000

The script above collects 50,000 log entries on the local machine from 10 system log names. Running this in parallel is almost twice as fast as running sequentially, because it involves some relatively slow disk access and can also take advantage of the machine multiple cores as it processes the log entries.

When should it be avoided?

ForEach-Object -Parallel should not be thought as something that will always speed up script execution. And in fact it can significantly slow down script execution if used heedlessly. For example, if your script block is executing trivial script then running in parallel adds a huge amount of overhead and will run much slower.

PS C:> (measure-command { 1..1000 | ForEach-Object -Parallel { "Hello: $_" } }).TotalMilliseconds
10457.962

PS C:> (measure-command { 1..1000 | ForEach-Object { "Hello: $_" } }).TotalMilliseconds
18.4473

The above example, a trivial script block is run 1000 times. The ThrottleLimit is 5 by default so only 5 runspace/threads are created at a time, but still a runspace and thread is created 1000 times to do a simple string evaluation. Consequently, it takes over 10 seconds to complete. But removing the -Parallel parameter and running the ForEach-Object cmdlet normally, results in completion in about 18 milliseconds.

So, it is important to use this feature wisely.

Implementation details

As previously mentioned, the new ForEach-Object -Parallel feature uses existing PowerShell functionality to run script blocks concurrently. The primary addition is the ability to limit the number of concurrent scripts running at a given time with the -ThrottleLimit parameter. Throttling is accomplished by a PSTaskPool class that holds running tasks (running scripts), and has a settable size limit which is set to the throttle limit value. An Add method allows tasks to be added to the pool, but if it is full then the method blocks until a new slot becomes available. Adding tasks to the task pool was initially performed on the ForEach-Object cmdlet piped input processing thread. But that turned out to be a performance bottleneck, and now a dedicated thread is used to add tasks to the pool.

PowerShell itself imposes conditions on how scripts run concurrently, based on its design and history. Scripts have to run in runspace contexts and only one script thread can run at a time within a runspace. So in order to run multiple scripts simultaneously multiple runspaces must be created. The current implementation of ForEach-Object -Parallel creates a new runspace for each script block execution instance. It may be possible to optimize this by re-using runspaces from a pool, but one concern in doing this is leaking state from one script execution to another.

Runspace contexts are an isolation unit for running scripts, and generally do not allow sharing state between themselves. However, variables can be passed at the beginning of script execution through the $using: keyword, from the calling script to the parallel script block. This was borrowed from the remoting layer which uses the keyword for the same purpose but over a remote connection. But there is a big difference when using the $using: keyword in ForEach-Object -Parallel. And that is for remoting, the variable being passed is a copy sent over the remoting connection. But with ForEach-Object -Parallel, the actual object reference is being passed from one script to another, violating normal isolation restrictions. So it is possible to have a non thread-safe variable used in two scripts running on different threads, which can lead to unpredictable behavior.

# This does not throw an error, but is not guaranteed to work since the dictionary object is not thread safe
$threadUnSafeDictionary = [System.Collections.Generic.Dictionary[string,object]]::new()
Get-Process | ForEach-Object -Parallel {
    $dict = $using:threadUnSafeDictionary
    $dict.TryAdd($_.ProcessName, $_)
}
# This *is* guaranteed to work because the passed in concurrent dictionary object is thread safe
$threadSafeDictionary = [System.Collections.Concurrent.ConcurrentDictionary[string,object]]::new()
Get-Process | ForEach-Object -Parallel {
    $dict = $using:threadSafeDictionary
    $dict.TryAdd($_.ProcessName, $_)
}

$threadSafeDictionary["pwsh"]

 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
    112   108.25     124.43      69.75   16272   1 pwsh

Conclusion

This feature can greatly improve your life for many work load scenarios. As long as you understand how it works and what its limitations are, you can experiment with parallelism and make real performance improvements with your scripts.

Paul Higinbotham
Senior Software Engineer
PowerShell Team

The post PowerShell ForEach-Object Parallel Feature appeared first on PowerShell.

Optimize Storage Cost with Reduced Pricing for Amazon EFS Infrequent Access

This post was originally published on this site

Today we are announcing a new price reduction – one of the largest in AWS Cloud history to date – when using Infrequent Access (IA) with Lifecycle Management with Amazon Elastic File System. This price reduction make it possible to optimize cost even further and automatically save up to 92% on file storage costs as your access patterns change. With this new reduced pricing you can now store and access your files natively in a file system for effectively $0.08/GB-month, as we’ll see in an example later in this post.

Amazon Elastic File System (EFS) is a low-cost, simple to use, fully managed, and cloud-native NFS file system for Linux-based workloads that can be used with AWS services and on-premises resources. EFS provides elastic storage, growing and shrinking automatically as files are created or deleted – even to petabyte scale – without disruption. Your applications always have the storage they need immediately available. EFS also includes, for free, multi-AZ availability and durability right out of the box, with strong file system consistency.

Easy Cost Optimization using Lifecycle Management
As storage grows the likelihood that a given application needs access to all of the files all of the time lessens, and access patterns can also change over time. Industry analysts such as IDC, and our own analysis of usage patterns confirms, that around 80% of data is not accessed very often. The remaining 20% is in active use. Two common drivers for moving applications to the cloud are to maximize operational efficiency and to reduce the total cost of ownership, and this applies equally to storage costs. Instead of keeping all of the data on hand on the fastest performing storage it may make sense to move infrequently accessed data into a different storage class/tier, with an associated cost reduction. Identifying this data manually can be a burden so it’s also ideal to have the system monitor access over time and perform the movement of data between storage tiers automatically, again without disruption to your running applications.

EFS Infrequent Access (IA) with Lifecycle Management provides an easy to use, cost-optimized price and performance tier suitable for files that are not accessed regularly. With the new price reduction announced today builders can now save up to 92% on their file storage costs compared to EFS Standard. EFS Lifecycle Management is easy to enable and runs automatically behind the scenes. When enabled on a file system, files not accessed according to the lifecycle policy you choose will be moved automatically to the cost-optimized EFS IA storage class. This movement is transparent to your application.

Although the infrequently accessed data is held in a different storage class/tier it’s still immediately accessible. This is one of the advantages to EFS IA – you don’t have to sacrifice any of EFS‘s benefits to get the cost savings. Your files are still immediately accessible, all within the same file system namespace. The only tradeoff is slightly higher per operation latency (double digit ms vs single digit ms — think magnetic vs SSD) for the files in the IA storage class/tier.

As an example of the cost optimization EFS IA provides let’s look at storage costs for 100 terabytes (100TB) of data. The EFS Standard storage class is currently charged at $0.30/GB-month. When it was launched in July the EFS IA storage class was priced at $0.045/GB-month. It’s now been reduced to $0.025/GB-month. As I noted earlier, this is one of the largest price drops in the history of AWS to date!

Using the 20/80 access statistic mentioned earlier for EFS IA:

  • 20% of 100TB = 20TB at $0.30/GB-month = $0.30 x 20 x 1,000 = $6,000
  • 80% of 100TB = 80TB at $0.025/GB-month = $0.025 x 80 x 1,000 = $2,000
  • Total for 100TB = $8,000/month or $0.08/GB-month. Remember, this price also includes (for free) multi-AZ, full elasticity, and strong file system consistency.

Compare this to using only EFS Standard where we are storing 100% of the data in the storage class, we get a cost of $0.30 x 100 x 1,000 = $30,000. $22,000/month is a significant saving and it’s so easy to enable. Remember too that you have control over the lifecycle policy, specifying how frequently data is moved to the IA storage tier.

Getting Started with Infrequent Access (IA) Lifecycle Management
From the EFS Console I can quickly get started in creating a file system by choosing a Amazon Virtual Private Cloud and the subnets in the Virtual Private Cloud where I want to expose mount targets for my instances to connect to.

In the next step I can configure options for the new volume. This is where I select the Lifecycle policy I want to apply to enable use of the EFS IA storage class. Here I am going to enable files that have not been accessed for 14 days to be moved to the IA tier automatically.

In the final step I simply review my settings and then click Create File System to create the volume. Easy!

A Lifecycle Management policy can also be enabled, or changed, for existing volumes. Navigating to the volume in the EFS Console I can view the applied policy, if any. Here I’ve selected an existing file system that has no policy attached and therefore is not benefiting from EFS IA.

Clicking the pencil icon to the right of the field takes me to a dialog box where I can select the appropriate Lifecycle policy, just as I did when creating a new volume.

Amazon Elastic File System IA with Lifecycle Management is available now in all regions where Elastic File System is present.

— Steve

 

Operational Insights for Containers and Containerized Applications

This post was originally published on this site

The increasing adoption of containerized applications and microservices also brings an increased burden for monitoring and management. Builders have an expectation of, and requirement for, the same level of monitoring as would be used with longer lived infrastructure such as Amazon Elastic Compute Cloud (EC2) instances. By contrast containers are relatively short-lived, and usually subject to continuous deployment. This can make it difficult to reliably collect monitoring data and to analyze performance or other issues, which in turn affects remediation time. In addition builders have to resort to a disparate collection of tools to perform this analysis and inspection, manually correlating context across a set of infrastructure and application metrics, logs, and other traces.

Announcing general availability of Amazon CloudWatch Container Insights
At the AWS Summit in New York this past July, Amazon CloudWatch Container Insights support for Amazon ECS and AWS Fargate was announced as an open preview for new clusters. Starting today Container Insights is generally available, with the added ability to now also monitor existing clusters. Immediate insights into compute utilization and failures for both new and existing cluster infrastructure and containerized applications can be easily obtained from container management services including Kubernetes, Amazon Elastic Container Service for Kubernetes, Amazon ECS, and AWS Fargate.

Once enabled Amazon CloudWatch discovers all of the running containers in a cluster and collects performance and operational data at every layer in the container stack. It also continuously monitors and updates as changes occur in the environment, simplifying the number of tools required to collect, monitor, act, and analyze container metrics and logs giving complete end to end visibility.

Being able to easily access this data means customers can shift focus to increased developer productivity and away from building mechanisms to curate and build dashboards.

Getting started with Amazon CloudWatch Container Insights
I can enable Container Insights by following the instructions in the documentation. Once enabled and new clusters launched, when I visit the CloudWatch console for my region I see a new option for Container Insights in the list of dashboards available to me.

Clicking this takes me to the relevant dashboard where I can select the container management service that hosts the clusters that I want to observe.

In the below image I have selected to view metrics for my ECS Clusters that are hosting a sample application I have deployed in AWS Fargate. I can examine the metrics for standard time periods such as 1 hour, 3 hours, etc but can also specify custom time periods. Here I am looking at the metrics for a custom time period of the past 15 minutes.

You can see that I can quickly gain operational oversight of the overall performance of the cluster. Clicking the cluster name takes me deeper to view the metrics for the tasks inside the cluster.

Selecting a container allows me to then dive into either AWS X-Ray traces or performance logs.

Selecting performance logs takes me to the Amazon CloudWatch Logs Insights page where I can run queries against the performance events collected for my container ecosystem (e.g., Container, Task/Pod, Cluster, etc.) that I can then use to troubleshoot and dive deeper.

Container Insights makes it easy for me to get started monitoring my containers and enables me to quickly drill down into performance metrics and log analytics without the need to build custom dashboards to curate data from multiple tools. Beyond monitoring and troubleshooting I can also use the data and dashboards Container Insights provides me to support other use cases such as capacity requirements and planning, by helping me understand compute utilization by Pod/Task, Container, and Service for example.

Availability
Amazon CloudWatch Container Insights is generally available today to customers in all public AWS regions where Amazon Elastic Container Service for Kubernetes, Kubernetes, Amazon ECS, and AWS Fargate are present.

— Steve

Careers in Cybersecurity

This post was originally published on this site

Have you considered a career in Cybersecurity? It is a fast-paced, highly dynamic field with a huge number of specialties to choose from, including forensics, endpoint security, critical infrastructure, incident response, secure coding, and awareness and training. In addition, a career in cybersecurity allows you to work almost anywhere in the world, with amazing benefits and an opportunity to make a real difference. However, the most exciting thing is you do NOT need a technical background, anyone can get started.

New – Port Forwarding Using AWS System Manager Sessions Manager

This post was originally published on this site

I increasingly see customers adopting the immutable infrastructure architecture pattern: they rebuild and redeploy an entire infrastructure for each update. They very rarely connect to servers over SSH or RDP to update configuration or to deploy software updates. However, when migrating existing applications to the cloud, it is common to connect to your Amazon Elastic Compute Cloud (EC2) instances to perform a variety of management or operational tasks. To reduce the surface of attack, AWS recommends using a bastion host, also known as a jump host. This special purpose EC2 instance is designed to be the primary access point from the Internet and acts as a proxy to your other EC2 instances. To connect to your EC2 instance, you first SSH / RDP into the bastion host and, from there, to the destination EC2 instance.

To further reduce the surface of attack, the operational burden to manage bastion hosts and the additional costs incurred, AWS Systems Manager Session Manager allows you to securely connect to your EC2 instances, without the need to run and to operate your own bastion hosts and without the need to run SSH on your EC2 instances. When Systems Manager‘s Agent is installed on your instances and when you have IAM permissions to call Systems Manager API, you can use the AWS Management Console or the AWS Command Line Interface (CLI) to securely connect to your Linux or Windows EC2 instances.

Interactive shell on EC2 instances is not the only use case for SSH. Many customers are also using SSH tunnel to remotely access services not exposed to the public internet. SSH tunneling is a powerful but lesser known feature of SSH that alows you to to create a secure tunnel between a local host and a remote service. Let’s imagine I am running a web server for easy private file transfer between an EC2 instance and my laptop. These files are private, I do not want anybody else to access that web server, therefore I configure my web server to bind only on 127.0.0.1 and I do not add port 80 to the instance’ security group. Only local processes can access the web server. To access the web server from my laptop, I create a SSH tunnel between the my laptop and the web server, as shown below

This command tells SSH to connect to instance as user ec2-user, open port 9999 on my local laptop, and forward everything from there to localhost:80 on instance. When the tunnel is established, I can point my browser at http://localhost:9999 to connect to my private web server on port 80.

Today, we are announcing Port Forwarding for AWS Systems Manager Session Manager. Port Forwarding allows you to securely create tunnels between your instances deployed in private subnets, without the need to start the SSH service on the server, to open the SSH port in the security group or the need to use a bastion host.

Similar to SSH Tunnels, Port Forwarding allows you to forward traffic between your laptop to open ports on your instance. Once port forwarding is configured, you can connect to the local port and access the server application running inside the instance. Systems Manager Session Manager’s Port Forwarding use is controlled through IAM policies on API access and the Port Forwarding SSM Document. These are two different places where you can control who in your organisation is authorised to create tunnels.

To experiment with Port Forwarding today, you can use this CDK script to deploy a VPC with private and public subnets, and a single instance running a web server in the private subnet. The drawing below illustrates the infrastructure that I am using for this blog post.

The instance is private, it does not have a public IP address, nor a DNS name. The VPC Default Security Group does not authorise connection over SSH. The Systems Manager‘s Agent, running on your EC2 instance, must be able to communicate with the Systems Manager‘ Service Endpoint. The private subnet must therefore have a routing table to a NAT Gateway or you must configure an AWS Private Link to do so.

Let’s use Systems Manager Session Manager Port Forwarding to access the web server running on this private instance.

Before doing so, you must ensure the following prerequisites are met on the EC2 instance:

  • System Manager Agent must be installed and running (version 2.3.672.0 or more recent, see instructions for Linux or Windows). The agent is installed and started by default on Amazon Linux 1 & 2, Windows and Ubuntu AMIs provided by Amazon (see this page for the exact versions), no action is required when you are using these.
  • the EC2 instance must have an IAM role with permission to invoke Systems Manager API. For this example, I am using AmazonSSMManagedInstanceCore.

On your laptop, you must:

Once the prerequisites are met, you use the AWS Command Line Interface (CLI) to create the tunnel (assuming you started the instance using the CDK script to create your instance) :

# find the instance ID based on Tag Name
INSTANCE_ID=$(aws ec2 describe-instances 
               --filter "Name=tag:Name,Values=CodeStack/NewsBlogInstance" 
               --query "Reservations[].Instances[?State.Name == 'running'].InstanceId[]" 
               --output text)
# create the port forwarding tunnel
aws ssm start-session --target $INSTANCE_ID 
                       --document-name AWS-StartPortForwardingSession 
                       --parameters '{"portNumber":["80"],"localPortNumber":["9999"]}'

Starting session with SessionId: sst-00xxx63
Port 9999 opened for sessionId sst-00xxx63
Connection accepted for session sst-00xxx63.

You can now point your browser to port 9999 and access your private web server. Type ctrl-c to terminate the port forwarding session.

The Session Manager Port Forwarding creates a tunnel similar to SSH tunneling, as illustrated below.

Port Forwarding works for Windows and Linux instances. It is available in every public AWS region today, at no additional cost when connecting to EC2 instances, you will be charged for the outgoing bandwidth from the NAT Gateway or your VPC Private Link.

— seb

Managed Spot Training: Save Up to 90% On Your Amazon SageMaker Training Jobs

This post was originally published on this site

Amazon SageMaker is a fully-managed, modular machine learning (ML) service that enables developers and data scientists to easily build, train, and deploy models at any scale. With a choice of using built-in algorithms, bringing your own, or choosing from algorithms available in AWS Marketplace, it’s never been easier and faster to get ML models from experimentation to scale-out production.

One of the key benefits of Amazon SageMaker is that it frees you of any infrastructure management, no matter the scale you’re working at. For instance, instead of having to set up and manage complex training clusters, you simply tell Amazon SageMaker which Amazon Elastic Compute Cloud (EC2) instance type to use, and how many you need: the appropriate instances are then created on-demand, configured, and terminated automatically once the training job is complete. As customers have quickly understood, this means that they will never pay for idle training instances, a simple way to keep costs under control.

Introducing Managed Spot Training
Going one step further, we’re extremely happy to announce Managed Spot Training for Amazon SageMaker, a new feature based on Amazon EC2 Spot Instances that will help you lower ML training costs by up to 90% compared to using on-demand instances in Amazon SageMaker. Launched almost 10 years ago, Spot Instances have since been one of the cornerstones of building scalable and cost-optimized IT platforms on AWS. Starting today, not only will your Amazon SageMaker training jobs run on fully-managed infrastructure, they will also benefit from fully-managed cost optimization, letting you achieve much more with the same budget. Let’s dive in!

Managed Spot Training is available in all training configurations:

Setting it up is extremely simple, as it should be when working with a fully-managed service:

  • If you’re using the console, just switch the feature on.
  • If you’re working with the Amazon SageMaker SDK, just set the train_use_spot_instances to true in the Estimator constructor.

That’s all it takes: do this, and you’ll save up to 90%. Pretty cool, don’t you think?

Interruptions and Checkpointing
There’s an important difference when working with Managed Spot Training. Unlike on-demand training instances that are expected to be available until a training job completes, Managed Spot Training instances may be reclaimed at any time if we need more capacity.

With Amazon Elastic Compute Cloud (EC2) Spot Instances, you would receive a termination notification 2 minutes in advance, and would have to take appropriate action yourself. Don’t worry, though: as Amazon SageMaker is a fully-managed service, it will handle this process automatically, interrupting the training job, obtaining adequate spot capacity again, and either restarting or resuming the training job. This makes Managed Spot Training particularly interesting when you’re flexible on job starting time and job duration. You can also use the MaxWaitTimeInSeconds parameter to control the total duration of your training job (actual training time plus waiting time).

To avoid restarting a training job from scratch should it be interrupted, we strongly recommend that you implement checkpointing, a technique that saves the model in training at periodic intervals. Thanks to this, you can resume a training job from a well-defined point in time, continuing from the most recent partially trained model:

  • Built-in frameworks and custom models: you have full control over the training code. Just make sure that you use the appropriate APIs to save model checkpoints to Amazon Simple Storage Service (S3) regularly, using the location you defined in the CheckpointConfig parameter and passed to the SageMaker Estimator. Please note that TensorFlow uses checkpoints by default. For other frameworks, you’ll find examples in our sample notebooks and in the documentation.
  • Built-in algorithms: computer vision algorithms support checkpointing (Object Detection, Semantic Segmentation, and very soon Image Classification). As they tend to train on large data sets and run for longer than other algorithms, they have a higher likelihood of being interrupted. Other built-in algorithms do not support checkpointing for now.

Alright, enough talk, time for a quick demo!

Training a Built-in Object Detection Model with Managed Spot Training
Reading from this sample notebook, let’s use the AWS console to train the same job with Managed Spot Training instead of on-demand training. As explained before, I only need to take care of two things:

  • Enable Managed Spot Training (obviously).
  • Set MaxWaitTimeInSeconds.

First, let’s name our training job, and make sure it has appropriate AWS Identity and Access Management (IAM) permissions (no change).

Then, I select the built-in algorithm for object detection.

Then, I select the instance count and instance type for my training job, making sure I have enough storage for the checkpoints.

The next step is to set hyper parameters, and I’ll use the same ones as in the notebook. I then define the location and properties of the training data set.

I do the same for the validation data set.

I also define where model checkpoints should be saved. This is where Amazon SageMaker will pick them up to resume my training job should it be interrupted.

This is where the final model artifact should be saved.

Good things come to those who wait! This is where I enable Managed Spot Training, configuring a very relaxed 48 hours of maximum wait time.

I’m done, let’s train this model. Once training is complete, cost savings are clearly visible in the console.

As you can see, my training job ran for 2423 seconds, but I’m only billed for 837 seconds, saving 65% thanks to Managed Spot Training! While we’re on the topic, let me explain how pricing works.

Pricing
A Managed Spot training job is priced for the duration for which it ran before it completed, or before it was terminated.

For built-in algorithms and AWS Marketplace algorithms that don’t use checkpointing, we’re enforcing a maximum training time of 60 minutes (MaxWaitTimeInSeconds parameter).

Last but not least, no matter how many times the training job restarts or resumes, you only get charged for data download time once.

Now Available!
This new feature is available in all regions where Amazon SageMaker is available, so don’t wait and start saving now!

As always, we’d love to hear your feedback: please post it to the AWS forum for Amazon SageMaker, or send it through your usual AWS contacts.

Julien;

Amazon Transcribe Now Supports Mandarin and Russian

This post was originally published on this site

As speech is central to human interaction, artificial intelligence research has long focused on speech recognition, the first step in designing and building systems allowing humans to interact intuitively with machines. The diversity in languages, accents and voices makes this an incredibly difficult problem, requiring expert skills, extremely large data sets, and vast amounts of computing power to train efficient models.

In order to help organizations and developers use speech recognition in their applications, we launched Amazon Transcribe at AWS re:Invent 2017, an automatic speech recognition service. Thanks to Amazon Transcribe, customers such as VideoPeel, Echo360, or GE Appliances have been able to quickly and easily add speech recognition capabilities to their applications and devices.

A single API call is all that it takes… and you don’t need to know the first thing about machine learning. You can analyze audio files stored in Amazon Simple Storage Service (S3) and have the service return a text file of the transcribed speech. You can also send a live audio stream to Amazon Transcribe and receive a stream of transcripts in real time.

Since launch, the team has constantly added new languages, and today we are happy to announce support for Mandarin and Russian, bringing the total number of supported languages to 16.

Introducing Mandarin
Working with Amazon Transcribe is extremely simple: let me show you how to get started in just a few minutes.

Let’s try Mandarin first. Starting from this Little Red Riding Hood video, I extracted the audio track, saved it in MP3 format, and uploaded it to one of my Amazon Simple Storage Service (S3) buckets. Here’s the actual file.

Then, I started a transcription job using the AWS CLI:

$ aws transcribe start-transcription-job--media MediaFileUri=https://s3-us-west-2.amazonaws.com/jsimon-transcribe-demo/little_red_riding_hood-mandarin.mp3 --media-format mp3 --language-code zh-CN --transcription-job-name little_red_riding_hood-mandarin

After a few minutes, the job is complete. Looking at the AWS console, I can either download it using the URL provided by Amazon Transcribe, or read it directly.

Unfortunately, I don’t speak Mandarin, but using Amazon Translate, this text is about a sick grandmother and a big bad wolf, so it looks like Amazon Transcribe did its job!

Introducing Russian
Let’s try Russian now, using the dialogue in this short video.

Здравствуйте! Greetings!
Добрый день! Good day!
Давайте познакомимся. Меня зовут Слава. Let’s introduce ourselves. My name is Slava.
Очень приятно, а меня – Наташа. Nice to meet you, and mine – Natasha.
Наташа, кто вы по профессии? Natasha, what is your profession?
Я врач. А вы? I (am a) doctor. And you?
Я инженер. I (am an) engineer.

This time, I will ask Amazon Transcribe to perform speaker identification too.

$ aws transcribe start-transcription-job --media MediaFileUri=https://s3-us-west-2.amazonaws.com/jsimon-transcribe-demo/russian-dialogue.mp3 --media-format mp3 --language-code ru-RU --transcription-job-name russian_dialogue --settings ShowSpeakerLabels=true,MaxSpeakerLabels=2

Here is the result.

As you can see, not only has Amazon Transcribe faithfully converted speech to text, it has also correctly assigned each sentence to the correct speaker.

Now Available!
You can start using these two new languages today in the following regions:

  • Americas: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), AWS GovCloud (US-West), Canada (Central), South America (Sao Paulo).
  • Europe: EU (Frankfurt), EU (Ireland), EU (London), EU (Paris).
  • Asia Pacific: Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney).

The free tier covers 60 minutes for the first 12 months, starting from your first transcription request.

As always, we’d love to hear your feedback: please post it to the AWS forum for Amazon Transcribe, or send it through your usual AWS contacts.

Julien;

Amazon Forecast – Now Generally Available

This post was originally published on this site

Getting accurate time series forecasts from historical data is not an easy task. Last year at re:Invent we introduced Amazon Forecast, a fully managed service that requires no experience in machine learning to deliver highly accurate forecasts. I’m excited to share that Amazon Forecast is generally available today!

With Amazon Forecast, there are no servers to provision. You only need to provide historical data, plus any additional metadata that you think may have an impact on your forecasts. For example, the demand for a particular product you need or produce may change with the weather, the time of the year, and the location where the product is used.

Amazon Forecast is based on the same technology used at Amazon and packages our years of experience in building and operating scalable, highly accurate forecasting technology in a way that is easy to use, and can be used for lots of different use cases, such as estimating product demand, cloud computing usage, financial planning, resource planning in a supply chain management system, as it uses deep learning to learn from multiple datasets and automatically try different algorithms.

Using Amazon Forecast
For this post, I need some sample data. To have an interesting use case, I go for the individual household electric power consumption dataset from the UCI Machine Learning Repository. For simplicity, I am using a version where data is aggregated hourly in a file in CSV format. Here are the first few lines where you can see the timestamp, the energy consumption, and the client ID:

2014-01-01 01:00:00,38.34991708126038,client_12
2014-01-01 02:00:00,33.5820895522388,client_12
2014-01-01 03:00:00,34.41127694859037,client_12
2014-01-01 04:00:00,39.800995024875625,client_12
2014-01-01 05:00:00,41.044776119402975,client_12

Let’s see how easy it is to build a predictor and get forecasts by using the Amazon Forecast console. Another option, for more advanced users, would be to use a Jupyter notebook and the AWS SDK for Python. You can find some sample notebooks in this GitHub repository.

In the Amazon Forecast console, the first step is to create a dataset group. Dataset groups act as containers for datasets that are related.

I can select a forecasting domain for my dataset group. Each domain covers a specific use case, such as retail, inventory planning, or web traffic, and brings its own dataset types based on the type of data used for training. For now, I use a custom domain that covers all use cases that don’t fall in other categories.

Next, I create a dataset. The data I am going to upload is aggregated by the hour, so 1 hour is the frequency of my data. The default data schema depends on the forecasting domain I selected earlier. I am using a custom domain here, and I change the data schema to have a timestamp, a target_value, and an item_id, in that order, as you can see in the sample few lines of data at the beginning of this post.

Now is the time to upload my time series data from Amazon Simple Storage Service (S3) into my dataset. The default timestamp format is exactly what I have in my data, so I don’t need to change it. I need an AWS Identity and Access Management (IAM) role to give Amazon Forecast access to the S3 bucket. I can select one here, or create a new one for this use case. As usual, avoid creating IAM roles that are too permissive and apply a least privilege approach to reduce the amount of permissions to the minimum required for this activity. After I tell Amazon Forecast in which S3 bucket and folder to look for my historical data, I start the import job.

The dataset group dashboard gives an overview of the process. My target time series data is being imported, and I can optionally add:

  • item metadata information on the items I want to predict on; for example, the color of the items in a retail scenario, or the kind of household (is this an apartment or a detached house?) for this electricity-focused use case.
  • related time series data that don’t include the target variable I want to predict, but can help improve my model; for example, price and promotions used by an ecommerce company are probably related to actual sales.

I am not adding more data for this use case. As soon as my dataset is imported, I start to train a predictor that I can then use to generate forecasts. I give the predictor a name, then select the forecast horizon, that in my case is 24 hours, and the frequency at which my forecast are generated.

To train the predictor, I can select a specific machine learning algorithms of my choice, such as ARIMA or DeepAR+, but I prefer simplicity and use AutoML to let Amazon Forecast evaluate all algorithms and choose the one that performs best for my dataset.

In the case of my dataset, each household is identified by a single variable, the item_id, but you can add more dimensions if required. I can then select the Country for holidays. This is optional, but can improve your results if the data you are using may be affected by people being on holidays or not. I think energy usage is different on holidays, so I select United States, the country my dataset is coming from.

The configuration of the backtest windows is a more advanced topic, and you can skip the next paragraph if you’re not interested into the details of how a machine learning model is evaluated in case of time series. In this case, I am leaving the default.

When training a machine learning model, you need two split your dataset in two: a training dataset you use to train with the machine learning algorithm, and an evaluation dataset that you use to evaluate the performance of your trained model. With time series, you can’t just create these two subsets of your data randomly, like you would normally do, because the order of your data points is important. The approach we use for Amazon Forecast is to split the time series in one or more parts, each one called a backtest window, preserving the order of the data. When evaluating your model against a backtest window, you should always use an evaluation dataset of the same length, otherwise it would be very difficult to compare different results.  The backtest window offset tells how many points ahead of a split point you want to use for evaluation, and this is the same value for all the splits. For example, by leaving 24 (hours) I always use one day of data for evaluating my model against multiple window offsets.

In the advanced configurations, I have the option to enable hyperparameter optimization (HPO), for the algorithms that support it, and featurizations, to develop additional features computed from the ones in your data. I am not touching those settings now.

After a few minutes, the predictor is active. To understand the quality of a predictor, I look at some of the metrics that are automatically computed.

Quantile loss (QL) calculates how far off the forecast at a certain quantile is from the actual demand. It weights underestimation and overestimation according to a specific quantile. For example, a P90 forecast, if calibrated, means that 90% of the time the true demand is less than the forecast value. Thus, when the demand turns out to be higher than the forecast, the loss would be greater than the other way around.

When the predictor is ready, and I am satisfied by its metrics, I can use it to create a forecast.

When the forecast is active, I can query it to get predictions. I can export the whole forecast as CSV file, or query for specific lookups. Let’s do a lookup. In the case of the dataset I am using, I can forecast the energy used by a household for a specific range of time. Dates here are in the past because I used an old dataset. I am pretty sure you’re going to use Amazon Forecast to look into the future.

For each timestamp in the forecast, I get a range of values. The P10, P50, and P90 forecasts have respectively 10%, 50%, and 90% probability of satisfying the actual demand. How you use these three values depends on your use case and how it is impacted by overestimating or underestimating demand. The P50 forecast is the most likely estimate for the demand. The P10 and P90 forecasts give you an 80% confidence interval for what to expect.

Available Now
You can use Amazon Forecast via the console, the AWS Command Line Interface (CLI) and the AWS SDKs. For example, you can use Amazon Forecast within a Jupyter notebook with the AWS SDK for Python to create a new predictor, or use the AWS SDK for JavaScript in the Browser to get predictions from within a web or mobile app, or the AWS SDK for Java or AWS SDK for .NET to add forecast capabilities to an existing enterprise application.

Here’s the overall flow of the Amazon Forecast API, from creating the dataset group to querying and extracting the forecast:

The dataset I used for this walkthrough and other examples are available in this GitHub repository:

Amazon Forecast is now available in US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), Asia Pacific (Singapore), and Asia Pacific (Tokyo).

More information on specific features and pricing is one click away at:

I look forward to see what you’re going to use it for, please share your results with me!

Danilo