Asia-Pacific Series: Power up your Security Operations Center with the new SEC450 Part 1 – Blue Team FundamentalsCreating an on-ramp for new defenders! – September 16, 2019 11:00pm US/Eastern

This post was originally published on this site

Speakers: John Hubbard

Note: This webcast is free of charge however a SANS portal account is required (see webcast link for details)

SANS Asia-Pacific Webcast Series- Power up your Security Operations Center with the new SEC450 Part 1 – Blue Team FundamentalsCreating an on-ramp for new defenders!

Ready to bring your blue team to the next level? Whether you have a multi-national SOC or a team of one, SANS has you covered with the brand new SEC450: Blue Team Fundamentals – Security Operations and Analysis. This exciting new addition to the SANS lineup distills years of security operations experience and best practice into a 6-day course focused specifically on blue team ops. Created as an on-ramp for new defenders to quickly learn the art of security monitoring, triage, investigation, and event analysis, SEC450 is the fastest way to improve and retain your defensive security talent. With a focus on people, process, and tools SEC450 teaches not just what to monitor, but how to monitor your network and how your defense team can avoid burnout by having fun doing it! Come join SEC450 author John Hubbard for this webinar to learn additional details about this course and the new content, tools, and labs it brings to the SANS curriculum!

Analyst Webcast: How to Build a Threat Detection Strategy in AWS – September 12, 2019 1:00pm US/Eastern

This post was originally published on this site

Speakers: David Szili and David Aiken

One of the major concerns security teams have when their organization migrates business to a cloud environment is losing visibility into their systems and threat detection capabilities. Traditional network- and host-based monitoring can be adapted to support intrusion detection in the cloud. In this recorded webcast, SANS Analyst David Szili focuses on the keys to detecting threats in the AWS environment and presents use cases to demonstrate best practices.

Attendees at this webcast will learn:

  • How organizations can ensure intrusion detection and prevention and enhance visibility for threat detection in AWS using tools such as Amazon VPC Traffic Mirroring
  • What data sources are available for continuous monitoring
  • Which AWS-native tools are most useful for event management and analysis
  • How to automate monitoring processes

Register today to be among the first to receive the associated whitepaper written by SANS analyst and forensics expert David Szili.

Learn From Your VPC Flow Logs With Additional Meta-Data

This post was originally published on this site

Flow Logs for Amazon Virtual Private Cloud enables you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow Logs data can be published to Amazon CloudWatch Logs or Amazon Simple Storage Service (S3).

Since we launched VPC Flow Logs in 2015, you have been using it for variety of use-cases like troubleshooting connectivity issues across your VPCs, intrusion detection, anomaly detection, or archival for compliance purposes. Until today, VPC Flow Logs provided information that included source IP, source port, destination IP, destination port, action (accept, reject) and status. Once enabled, a VPC Flow Log entry looks like the one below.

While this information was sufficient to understand most flows, it required additional computation and lookup to match IP addresses to instance IDs or to guess the directionality of the flow to come to meaningful conclusions.

Today we are announcing the availability of additional meta data to include in your Flow Logs records to better understand network flows. The enriched Flow Logs will allow you to simplify your scripts or remove the need for postprocessing altogether, by reducing the number of computations or lookups required to extract meaningful information from the log data.

When you create a new VPC Flow Log, in addition to existing fields, you can now choose to add the following meta-data:

  • vpc-id : the ID of the VPC containing the source Elastic Network Interface (ENI).
  • subnet-id : the ID of the subnet containing the source ENI.
  • instance-id : the Amazon Elastic Compute Cloud (EC2) instance ID of the instance associated with the source interface. When the ENI is placed by AWS services (for example, AWS PrivateLink, NAT Gateway, Network Load Balancer etc) this field will be “-
  • tcp-flags : the bitmask for TCP Flags observed within the aggregation period. For example, FIN is 0x01 (1), SYN is 0x02 (2), ACK is 0x10 (16), SYN + ACK is 0x12 (18), etc. (the bits are specified in “Control Bits” section of RFC793 “Transmission Control Protocol Specification”).
    This allows to understand who initiated or terminated the connection. TCP uses a three way handshake to establish a connection. The connecting machine sends a SYN packet to the destination, the destination replies with a SYN + ACK and, finally, the connecting machine sends an ACK. In the Flow Logs, the handshake is shown as two lines, with tcp-flags values of 2 (SYN), 18 (SYN + ACK).  ACK is reported only when it is accompanied with SYN (otherwise it would be too much noise for you to filter out).
  • type : the type of traffic : IPV4, IPV6 or Elastic Fabric Adapter.
  • pkt-srcaddr : the packet-level IP address of the source. You typically use this field in conjunction with srcaddr to distinguish between the IP address of an intermediate layer through which traffic flows, such as a NAT gateway.
  • pkt-dstaddr : the packet-level destination IP address, similar to the previous one, but for destination IP addresses.

To create a VPC Flow Log, you can use the AWS Management Console, the AWS Command Line Interface (CLI) or the CreateFlowLogs API and select which additional information and the order you want to consume the fields, for example:

Or using the AWS Command Line Interface (CLI) as below:

$ aws ec2 create-flow-logs --resource-type VPC 
                            --region eu-west-1 
                            --resource-ids vpc-12345678 
                            --traffic-type ALL  
                            --log-destination-type s3 
                            --log-destination arn:aws:s3:::sst-vpc-demo 
                            --log-format '${version} ${vpc-id} ${subnet-id} ${instance-id} ${interface-id} ${account-id} ${type} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${pkt-srcaddr} ${pkt-dstaddr} ${protocol} ${bytes} ${packets} ${start} ${end} ${action} ${tcp-flags} ${log-status}'

# be sure to replace the bucket name and VPC ID !

{
    "ClientToken": "1A....HoP=",
    "FlowLogIds": [
        "fl-12345678123456789"
    ],
    "Unsuccessful": [] 
}

Enriched VPC Flow Logs are delivered to S3. We will automatically add the required S3 Bucket Policy to authorize VPC Flow Logs to write to your S3 bucket. VPC Flow Logs does not capture real-time log streams for your network interface, it might take several minutes to begin collecting and publishing data to the chosen destinations. Your logs will eventually be available on S3 at s3://<bucket name>/AWSLogs/<account id>/vpcflowlogs/<region>/<year>/<month>/<day>/

An SSH connection from my laptop with IP address 90.90.0.200 to an EC2 instance would appear like this :

3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 172.31.22.145 90.90.0.200 22 62897 172.31.22.145 90.90.0.200 6 5225 24 1566328660 1566328672 ACCEPT 18 OK
3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 90.90.0.200 172.31.22.145 62897 22 90.90.0.200 172.31.22.145 6 4877 29 1566328660 1566328672 ACCEPT 2 OK

172.31.22.145 is the private IP address of the EC2 instance, the one you see when you type ifconfig on the instance.  All flags are “OR”ed during aggregation period. When connection is short, probably both SYN and FIN (3), as well as SYN+ACK and FIN (19) will be set for the same lines.

Once a Flow Log is created, you can not add additional fields or modify the structure of the log to ensure you will not accidently break scripts consuming this data. Any modification will require you to delete and recreate the VPC Flow Logs. There is no additional cost to capture the extra information in the VPC Flow Logs, normal VPC Flow Log pricing applies, remember that Enriched VPC Flow Log records might consume more storage when selecting all fields.  We do recommend to select only the fields relevant to your use-cases.

Enriched VPC Flow Logs is available in all regions where VPC Flow logs is available, you can start to use it today.

— seb

PS: I heard from the team they are working on adding additional meta-data to the logs, stay tuned for updates.

Special Webcast: Advanced Zeek Usage: Scripting and Framework – September 10, 2019 10:30am US/Eastern

This post was originally published on this site

Speakers: David Szili

The open-source Network Security Monitor (NSM) and analytics platform Zeek (formerly known as Bro) became well-known in the information security industry among professionals. At its core, Zeek inspects traffic and creates an extensive set of detailed, well-structured log files that record a networks activity. As it is very scalable and can run on commodity hardware, Zeek provides an alternative to commercial solutions. Most deployments run with little or no configuration customization, thus only generating the default set of log files.

However, Zeek is so much more than just log files. It has a domain-specific, event-driven, Turing-complete scripting language that allows you to perform arbitrary analysis tasks such as extracting files from sessions, detecting brute-force attacks, or generating statistics. It also enables security analysts to modify, extend, and optimize logs, or to create new log files. Zeek comes with a broad set of libraries, called frameworks to facilitate script development.

This webcast gives an introduction to Zeek Scripting, starting with the basics and demonstrating the potential within this powerful platform through real-life examples. The second half of the webcast is going to show how to use Zeek frameworks such as the Intelligence Framework to consume and detect indicators from threat intelligence feeds.

Now Available – Amazon Quantum Ledger Database (QLDB)

This post was originally published on this site

Given the wide range of data types, query models, indexing options, scaling expectations, and performance requirements, databases are definitely not one size fits all products. That’s why there are many different AWS database offerings, each one purpose-built to meet the needs of a different type of application.

Introducing QLDB
Today I would like to tell you about Amazon QLDB, the newest member of the AWS database family. First announced at AWS re:Invent 2018 and made available in preview form, it is now available in production form in five AWS regions.

As a ledger database, QLDB is designed to provide an authoritative data source (often known as a system of record) for stored data. It maintains a complete, immutable history of all committed changes to the data that cannot be updated, altered, or deleted. QLDB supports PartiQL SQL queries to the historical data, and also provides an API that allows you to cryptographically verify that the history is accurate and legitimate. These features make QLDB a great fit for banking & finance, ecommerce, transportation & logistics, HR & payroll, manufacturing, and government applications and many other use cases that need to maintain the integrity and history of stored data.

Important QLDB Concepts
Let’s review the most important QLDB concepts before diving in:

Ledger – A QLDB ledger consists of a set of QLDB tables and a journal that maintains the complete, immutable history of changes to the tables. Ledgers are named and can be tagged.

Journal – A journal consists of a sequence of blocks, each cryptographically chained to the previous block so that changes can be verified. Blocks, in turn, contain the actual changes that were made to the tables, indexed for efficient retrieval. This append-only model ensures that previous data cannot be edited or deleted, and makes the ledgers immutable. QLDB allows you to export all or part of a journal to S3.

Table – Tables exist within a ledger, and contain a collection of document revisions. Tables support optional indexes on document fields; the indexes can improve performance for queries that make use of the equality (=) predicate.

Documents – Documents exist within tables, and must be in Amazon Ion form. Ion is a superset of JSON that adds additional data types, type annotations, and comments. QLDB supports documents that contain nested JSON elements, and gives you the ability to write queries that reference and include these elements. Documents need not conform to any particular schema, giving you the flexibility to build applications that can easily adapt to changes.

PartiQLPartiQL is a new open standard query language that supports SQL-compatible access to relational, semi-structured, and nested data while remaining independent of any particular data source. To learn more, read Announcing PartiQL: One Query Languge for All Your Data.

Serverless – You don’t have to worry about provisioning capacity or configuring read & write throughput. You create a ledger, define your tables, and QLDB will automatically scale to meet the needs of your application.

Using QLDB
You can create QLDB ledgers and tables from the AWS Management Console, AWS Command Line Interface (CLI), a CloudFormation template, or by making calls to the QLDB API. I’ll use the QLDB Console and I will follow the steps in Getting Started with Amazon QLDB. I open the console and click Start tutorial to get started:

The Getting Started page outlines the first three steps; I click Create ledger to proceed (this opens in a fresh browser tab):

I enter a name for my ledger (vehicle-registration), tag it, and (again) click Create ledger to proceed:

My ledger starts out in Creating status, and transitions to Active within a minute or two:

I return to the Getting Started page, refresh the list of ledgers, choose my new ledger, and click Load sample data:

This takes a second or so, and creates four tables & six indexes:

I could also use PartiQL statements such as CREATE TABLE, CREATE INDEX, and INSERT INTO to accomplish the same task.

With my tables, indexes, and sample data loaded, I click on Editor and run my first query (a single-table SELECT):

This returns a single row, and also benefits from the index on the VIN field. I can also run a more complex query that joins two tables:

I can obtain the ID of a document (using a query from here), and then update the document:

I can query the modification history of a table or a specific document in a table, with the ability to find modifications within a certain range and on a particular document (read Querying Revision History to learn more). Here’s a simple query that returns the history of modifications to all of the documents in the VehicleRegistration table that were made on the day that I wrote this post:

As you can see, each row is a structured JSON object. I can select any desired rows and click View JSON for further inspection:

Earlier, I mentioned that PartiQL can deal with nested data. The VehicleRegistration table contains ownership information that looks like this:

{
   "Owners":{
      "PrimaryOwner":{
         "PersonId":"6bs0SQs1QFx7qN1gL2SE5G"
      },
      "SecondaryOwners":[

      ]
  }

PartiQL lets me reference the nested data using “.” notation:

I can also verify the integrity of a document that is stored within my ledger’s journal. This is fully described in Verify a Document in a Ledger, and is a great example of the power (and value) of cryptographic verification. Each QLDB ledger has an associated digest. The digest is a 256-bit hash value that uniquely represents the ledger’s entire history of document revisions as of a point in time. To access the digest, I select a ledger and click Get digest:

When I click Save, the console provides me with a short file that contains all of the information needed to verify the ledger. I save this file in a safe place, for use when I want to verify a document in the ledger. When that time comes, I get the file, click on Verification in the left-navigation, and enter the values needed to perform the verification. This includes the block address of a document revision, and the ID of the document. I also choose the digest that I saved earlier, and click Verify:

QLDB recomputes the hashes to ensure that the document has not been surreptitiously changed, and displays the verification:

In a production environment, you would use the QLDB APIs to periodically download digests and to verify the integrity of your documents.

Building Applications with QLDB
You can use the Amazon QLDB Driver for Java to write code that accesses and manipulates your ledger database. This is a Java driver that allows you to create sessions, execute PartiQL commands within the scope of a transaction, and retrieve results. Drivers for other languages are in the works; stay tuned for more information.

Available Now
Amazon QLDB is available now in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo) Regions. Pricing is based on the following factors, and is detailed on the Amazon QLDB Pricing page, including some real-world examples:

  • Write operations
  • Read operations
  • Journal storage
  • Indexed storage
  • Data transfer

Jeff;

Release of PowerShell Script Analyzer (PSScriptAnalyzer) 1.18.2

This post was originally published on this site

In keeping with the tradition of releasing improvements to PSScriptAnalyzer more often, we’re happy to announce that 1.18.12 is now available! As a dependency of PowerShell Editor Services (a module used by editor extensions like the PowerShell Visual Studio Code extension), this release is motivated by a desire to further stabilize our editor experience. At the moment, the Visual Studio Code PowerShell extension still ships with PSScriptAnalyzer 1.18.0. After fixing some undesirable edge cases between 1.18.1 and 1.18.2, we intend to ship an update to the Visual Studio Code extension that will include 1.18.2.

The blocking issue that it resolves is quite technical and should not concern end-users, but for those who are interested: starting with1.18.1, a performance optimization was added whereby we started to share and cache a PowerShell runspace pool instead of creating a new one for every command invocation. However, it turns out that there is an edge case where, when dealing with specific commands from thePackageManagementmodule, the runspace pool can get into a deadlock, which causes the execution of PSScriptAnalyzer to hang indefinitely. This is due to a bug inPackageManagementitself (a very unfortunate asynchronous API call that leads to the deadlock) but also PowerShell itself, which should be able to handle bad scenarios like this. Therefore, a workaround for this had to be implemented in PSScriptAnalyzer by blacklisting the PackageManagement commands.

Given that the other changes in this release are mainly fixes and small enhancements, we decided to not bump the minor version number. We ask that the community participate in testing and giving feedback on this update before it ships by default in the Visual Studio Code extension. You can make this new update with the Visual Studio Code extension start by executing the following command:

Install-Module -Name PSScriptAnalyzer -Repository PSGallery -Scope CurrentUser

Should you find that there are changes that you are not happy with, please report them here.

Optionally, you can roll back to the default included version of PSScriptAnalyzer by running Uninstall-Module -Name PSScriptAnalyzer.

In this release, we’ve made the following fixes

  • PipelineIndentation: More edge cases when using non-default values of this setting (NoIndentation in the Visual Studio Code extension) were fixed. This feature was only introduced in1.18.0and we hope the be closer to a state now where we could potentially change the default.
  • New compatibility rule profiles were added for non-Windows OSs on PowerShell 7 (preview). Additionally, fixes were made to profile generation to support macOS and Linux.
  • A fix was made to PSCloseBrace to correctly flag the closing brace of a one-line hashtable, correcting some broken formatting.

Enhancements were made in the following areas

  • When using settings files, error messages are now much more actionable.
PS> Invoke-ScriptAnalyzer -Settings /tmp/MySettings.psd1 -ScriptDefinition 'gci'

Invoke-ScriptAnalyzer : includerule is not a valid key in the settings hashtable.
Valid keys are CustomRulePath, ExcludeRules, IncludeRules, IncludeDefaultRules,
RecurseCustomRulePath, Rules and Severity. 
...

  • PSScriptAnalyzer has a logo now thanks to the community member @adilio
  • The formatter was enhanced to also take commented lines into account in multi-line commands
  • The formatter was enhanced to optionally allow correction of aliases as well. With this change, a setting in the Visual Studio Code extension will soon be made available to configure this. By default, this setting will not be on for the moment. We are open to feedback: while there are very likely a few people that would love for it to be enabled, it may upset others.
  • UseDeclaredVarsMoreThanAssignmentsnow also takes into account the usage of Get-Variable with an array of variables and usage of the named parameter -Name

We’ve also made some changes in our GitHub repository and changed the default branch from development to master to simplify the development workflow and be consistent with other repositories in the PowerShell organization. If you have a fork of the project, you will need to make this change in your fork as well or remember to use master as a base and open pull requests against master. This also means that the next version of the Visual Studio Code extension will point tomasterfor the documentation of PSScriptAnalyzer’s rules.

The Changelog has more details if you want to dig further.

Future Directions

We are thinking of following an approach similar to the Visual Studio Code extension where we make a version 2.0 at that drops support for PowerShell version 3 and 4. One of the next changes could be to improve how PowerShellEditorServices calls into PSScriptAnalyzer: currently, Editor Services uses the PSScriptAnalyzer PowerShell cmdlets which means that we have to create an entire instance of PowerShell for these invocations. Knowing that bothPowerShellEditorServicesandPSScriptAnalyzerare binary .NET modules, we could directly call into PSScriptAnalyzer’s .NET code by publishing a NuGet package of PSScriptAnalyzer with suitable public APIs. Given that PSScriptAnalzyer currently performs a conditional compilation for each PowerShell version (3, 4, 5, and 6+), dropping support for version 4 and 5 could help make the aforementioned move to an API model much easier to implement. Please give feedback if your use case ofPSScriptAnalyzerwould be impacted by this.

On behalf of the Script Analyzer team,

Christoph Bergmeister, Project Maintainer from the community, BJSS
Jim Truher, Senior Software Engineer, Microsoft

The post Release of PowerShell Script Analyzer (PSScriptAnalyzer) 1.18.2 appeared first on PowerShell.

Analyst Webcast: Success Patterns for Supply Chain Security – September 10, 2019 1:00pm US/Eastern

This post was originally published on this site

Speakers: John Pescatore

Many CISOs report that addressing supply chain security is one of their top challenges. Damage from supply chain security failure is already happening today. The Not Petya malware, which caused hard costs of more than $300 million to both FedEx and Merck, originally spread through compromised business tax software. Outsourcer/system integrator Wipro was targeted to be the launch point for attacks against its clients.

These types of supply chain attacks are on the rise, according to multiple reports, and the high financial impact of these attacks has increased CEO, board of directors, and regulatory and auditor attention to supply chain security.

In this webcast, John Pescatore, SANS Director of Emerging Security Trends, provides recommendations that provide guidance in answering the following:

  • What are the key processes, skills and technologies required for an effective supply chain security program?
  • What are the patterns of success at companies that are able to implement and operate effective and affordable supply chain security programs?
  • What business-relevant metrics can demonstrate the value of a supply chain security program?
  • What are some quick wins in getting started in improving the security of your company’s supply chain?

Register today to be among the first to receive the associated whitepaper written by John Pescatore.

Special Webcast: How to accelerate your cyber security career – September 5, 2019 3:30pm US/Eastern

This post was originally published on this site

Speakers: Stephen Sims

Worldwide the cyber security skills gap is growing and theres an ever increasing demand for cyber security practitioners. There is no doubt its an exciting time to join the Infosec community, but where do you start if youre trying to get into the industry? Do you think youve got what it takes?

In this webcast Stephen Sims discusses his personal journey into security and the attributes he thinks make a great security professional.

Introducing the Newest AWS Heroes – September 2019

This post was originally published on this site

Leaders within the AWS technical community educate others about the latest AWS services in a variety of ways: some share knowledge in person by speaking at events or running workshops and Meetups; while others prefer to share their insights online via social media, blogs, or open source contributions.

The most prominent AWS community leaders worldwide are recognized as AWS Heroes, and today we are excited to introduce to you the latest members of the AWS Hero program:

Alex Schultz – Fort Wayne, USA

Machine Learning Hero Alex Schultz works in the Innovation Labs at Advanced Solutions where he develops machine learning enabled products and solutions for the biomedical and product distribution industries. After receiving a DeepLens at re:Invent 2017, he dove headfirst into machine learning where he used the device to win the AWS DeepLens challenge by building a project which can read books to children. As an active advocate for AWS, he runs the Fort Wayne AWS User Group and loves to share his knowledge and experience with other developers. He also regularly contributes to the online DeepRacer community where he has helped many people who are new to machine learning get started.

 

 

 

 

 

 

 

Chase Douglas – Portland, USA

Serverless Hero Chase Douglas is the CTO and co-founder at Stackery, where he steers engineering and technical architecture of development tools that enable individuals and teams of developers to successfully build and manage serverless applications. He is a deeply experienced software architect and a long-time engineering leader focused on building products that increase development efficiency while delighting users. Chase is also a frequent conference speaker on the topics of serverless, instrumentation, and software patterns. Most recently, he discussed the serverless development-to-production pipeline at the Chicago and New York AWS Summits, and provided insight into how the future of serverless may be functionless in his blog series.

 

 

 

 

 

 

Chris Williams – Portsmouth, USA

Community Hero Chris Williams is an Enterprise Cloud Consultant for GreenPages Technology Solutions—a digital transformation and cloud enablement company. There he helps customers design and deploy the next generation of public, private, and hybrid cloud solutions, specializing in AWS and VMware. Chris blogs about virtualization, technology, and design at Mistwire. He is an active community leader, co-organizing the AWS Portsmouth User Group, and both hosts and presents on vBrownBag.

 

 

 

 

 

 

 

 

Dave Stauffacher – Milwaukee, USA

Community Hero Dave Stauffacher is a Principal Platform Engineer focused on cloud engineering at Direct Supply where he has helped navigate a 30,000% data growth over the last 15 years. In his current role, Dave is focused on helping drive Direct Supply’s cloud migration, combining his storage background with cloud automation and standardization practices. Dave has published his automation work for deploying AWS Storage Gateway for use with SQL Server data protection. He is a participant in the Milwaukee AWS User Group and the Milwaukee Docker User Group and has showcased his cloud experience in presentations at the AWS Midwest Community Day, AWS re:Invent, HashiConf, the Milwaukee Big Data User Group and other industry events.

 

 

 

 

 

 

Gojko Adzic – London, United Kingdom

Serverless Hero Gojko Adzic is a partner at Neuri Consulting LLP and a co-founder of MindMup, a collaborative mind mapping application that has been running on AWS Lambda since 2016. He is the author of the book Running Serverless and co-author of Serverless Computing: Economic and Architectural Impact, one of the first academic papers about AWS Lambda. He is also a key contributor to Claudia.js, an open-source tool that simplifies Lambda application deployment, and is a minor contributor to the AWS SAM CLI. Gojko frequently blogs about serverless application development on Serverless.Pub and his personal blog, and he has authored numerous other books.

 

 

 

 

 

 

 

Liz Rice – Enfield, United Kingdom

Container Hero Liz Rice is VP Open Source Engineering with cloud native security specialists Aqua Security, where she and her team look after several container-related open source projects. She is chair of the CNCF’s Technical Oversight Committee, and was Co-Chair of the KubeCon + CloudNativeCon 2018 events in Copenhagen, Shanghai and Seattle. She is a regular speaker at conferences including re:Invent, Velocity, DockerCon and many more. Her talks usually feature live coding or demos, and she is known for making complex technical concepts accessible.

 

 

 

 

 

 

 

 

Lyndon Leggate – London, United Kingdom

Machine Learning Hero Lyndon Leggate is a senior technology leader with extensive experience of defining and delivering complex technical solutions on large, business critical projects for consumer facing brands. Lyndon is a keen participant in the AWS DeepRacer league. Racing as Etaggel, he has regularly positioned in the top 10, features in DeepRacer TV and in May 2019 established the AWS DeepRacer Community. This vibrant and rapidly growing community provides a space for new and experienced racers to seek advice and share tips. The Community has gone on to expand the DeepRacer toolsets, making the platform more accessible and pushing the bounds of the technology. He also organises the AWS DeepRacer London Meetup series.

 

 

 

 

 

 

Maciej Lelusz – Crakow, Poland

Community Hero Maciej Lelusz is Co-Founder of Chaos Gears, a company concentrated on serverless, automation, IaC and chaos engineering as a way for the improvement of system resiliency. He is focused on community development, blogging, company management, and Public/Hybrid/Private cloud design. He cares about enterprise technology, IT transformation, its culture and people involved in it. Maciej is Co-Leader of the AWS User Group Poland – Cracow Chapter, and the Founder and Leader of the InfraXstructure conference and Polish VMware User Group.

 

 

 

 

 

 

 

 

Nathan Glover – Perth, Australia

Community Hero Nathan Glover is a DevOps Consultant at Mechanical Rock in Perth, Western Australia. Prior to that he worked as a Hardware Systems Designer and Embedded Developer in the IoT space. He is passionate about Cloud Native architecture and loves sharing his successes and failures on his blog. A key focus for him is breaking down the learning barrier by building practical examples using cloud services. On top of these he has a number of online courses teaching people how to get started building with Amazon Alexa Skills and AWS IoT. In his spare time, he loves to dabble in all areas of technology; building cloud connected toasters, embedded systems vehicle tracking, and competing in online capture the flag security events.

 

 

 

 

 

 

Prashanth HN – Bengaluru, India

Serverless Hero Prashanth HN is the Chief Technology Officer at WheelsBox and one of the community leaders of the AWS Users Group, Bengaluru. He mentors and consults other startups to embrace a serverless approach, frequently blogs about serverless topics for all skill levels including topics for beginners and advanced users on his personal blog and Amplify-related topics on the AWS Amplify Community Blog, and delivers talks about building using microservices and serverless. In a recent talk, he demonstrated how microservices patterns can be implemented using serverless. Prashanth maintains the open-source project Lanyard, a serverless agenda app for event organizers and attendees, which was well received at AWS Community Day India.

 

 

 

 

 

 

 

Ran Ribenzaft – Tel Aviv, Israel

Serverless Hero Ran Ribenzaft is the Chief Technology Officer at Epsagon, an AWS Advanced Technology Partner that specializes in monitoring and tracing for serverless applications. Ran is a passionate developer that loves sharing open-source tools to make everyone’s lives easier and writing technical blog posts on the topics of serverless, microservices, cloud, and AWS on Medium and the Epsagon blog. Ran is also dedicated to educating and growing the community around serverless, organizing Serverless meetups in SF and TLV, delivering online webinars and workshops, and frequently giving talks at conferences.

 

 

 

 

 

 

 

 

Rolf Koski – Tampere, Finland

Community Hero Rolf Koski works at Cybercom, which is an AWS Premier Partner from the Nordics headquartered in Sweden. He works as the CTO at Cybercom AWS Business Group. In his role he is both technical as well as being the thought leader in the Cloud. Rolf has been one of the leading figures at the Nordic AWS Communities as one of the community leads in Helsinki and Stockholm user groups and he initially founded and organized the first ever AWS Community Days Nordics. Rolf is professionally certified and additionally works as Well-Architected Lead doing Well-Architected Reviews for customer workloads.

 

 

 

 

 

 

 

Learn more about AWS Heroes and connect with a Hero near you by checking out the Hero website.

PowerShell ForEach-Object Parallel Feature

This post was originally published on this site

PowerShell ForEach-Object Parallel Feature

PowerShell 7.0 Preview 3 is now available with a new ForEach-Object Parallel Experimental feature. This feature is a great new tool for parallelizing work, but like any tool, it has its uses and drawbacks.

This article describes this new feature, how it works, when to use it and when not to.

What is ForEach-Object -Parallel?

ForEach-Object -Parallel is a new parameter set added to the existing PowerShell ForEach cmdlet.

ForEach-Object -Parallel <scriptblock> [-InputObject <psobject>] [-ThrottleLimit <int>] [-TimeoutSeconds <int>] [-AsJob] [-WhatIf] [-Confirm] [<CommonParameters>]

 

Normally, when you use the ForEach-Object cmdlet, each object piped to the cmdlet is processed sequentially.

PS C:> 1..5 | ForEach-Object { "Hello $_"; sleep 1 } Hello 1 Hello 2 Hello 3 Hello 4 Hello 5 PS C:> (Measure-Command { 1..5 | ForEach-Object { "Hello $_"; sleep 1 } }).Seconds 5

But with the new ForEach-Object -Parallel parameter set, you can run all script in parallel for each piped input object.

PS C:> 1..5 | ForEach-Object -Parallel { "Hello $_"; sleep 1; } -ThrottleLimit 5
Hello 1
Hello 3
Hello 2
Hello 4
Hello 5

PS C:> (Measure-Command { 1..5 | ForEach-Object -Parallel { "Hello $_"; sleep 1; } -ThrottleLimit 5 }).Seconds
1

Because each script block in the ForEach-Object example above takes 1 second to run, running all five in parallel takes only one second instead of 5 seconds when run sequentially.

Since the script blocks are run in parallel for each of the 1-5 piped input integers, the order of execution is not guaranteed. The -ThrottleLimit parameter limits the number of script blocks running in parallel at a given time, and its default value is 5.

This new feature also supports jobs, where you can choose to have a job object returned instead of having results written to the console.

PS C:> $Job = 1..5 | ForEach-Object -Parallel { "Hello $_"; sleep 1; } -ThrottleLimit 5 -AsJob PS C:> $job | Wait-Job | Receive-Job Hello 1 Hello 2 Hello 3 Hello 5 Hello 4

ForEach-Object -Parallel is not the same as the foreach language keyword

Don’t confuse ForEach-Object cmdlet with PowerShell’s foreach keyword. The foreach keyword does not handle piped input but instead iterates over an enumerable object. There is currently no parallel support for the foreach keyword.

PS C:> foreach ($item in (1..5)) { "Hello $item" }
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5

How does it work?

The new ForEach-Object -Parallel parameter set uses existing PowerShell APIs for running script blocks in parallel. These APIs have been around since PowerShell v2, but are cumbersome and difficult to use correctly. This new feature makes it much easier to run script blocks in parallel. But there is a fair amount of overhead involved and many times there is no gain in running scripts in parallel, and in fact it can end up being significantly slower than running ForEach-Object normally.

PowerShell currently supports parallelism in three main categories.

  1. PowerShell remoting. Here PowerShell sends script to external machines to run, using PowerShell’s remoting system.
  2. PowerShell jobs. This is the same as remoting except that script is run in separate processes on the local machine, rather than on external machines.
  3. PowerShell runspaces. Here script is run on the local machine within the same process but on separate threads.

This new feature uses the third method for running scripts in parallel. It has the least overhead of the other two methods and does not use the PowerShell remoting system. So it is generally much faster than the other two methods.

However, there is still quite a bit of overhead to run script blocks in parallel. Script blocks run in a context called a PowerShell runspace. The runspace context contains all of the defined variables, functions and loaded modules. So initializing a runspace for script to run in takes time and resources. When scripts are run in parallel they must be run within their own runspace. And each runspace must load whatever module is needed and have any variable be explicitly passed in from the calling script. The only variable that automatically appears in the parallel script block is the piped in object. Other variables are passed in using the $using: keyword.

$computers = 'computerA','computerB','computerC','computerD'
$logsToGet = 'LogA','LogB','LogC'

# Read specified logs on each machine, using custom module
$logs = $computers | ForEach-Object -ThrottleLimit 10 -Parallel {
    Import-Module MyLogsModule
    Get-Logs -ComputerName $_ -LogName $using:logsToGet
}

Given the overhead required to run scripts in parallel, the -ThrottleLimit becomes very useful to prevent the system from being overwhelmed. There are some cases where running a lot of script blocks in parallel makes sense, but also many cases where it does not.

When should it be used?

There are two primary reasons to run script blocks in parallel with the ForEach-Object -Parallel feature (keeping in mind that this feature runs the script on separate system threads).

  1. Highly compute intensive script. If your script is crunching a lot of data over a significant period of time and the scripts can be run independently, then it is worthwhile to run them in parallel. But only if the machine you are running on has multiple cores that can host the script block threads. In this case the -ThrottleLimit parameter should be set approximately to the number of available cores. If you are running on a VM with a single core, then it makes little sense to run high compute script blocks in parallel since the system must serialize them anyway to run on the single core.
  2. Script that must wait on something. If you have script that can run independently and performs long running work that requires waiting for somethings to complete, then it makes sense to run these tasks in parallel. If you have 5 scripts that take 5 minutes each to run but spend most of the time waiting, you can have them all run/wait at the same time, and complete all 5 tasks in 5 minutes instead of 25 minutes. Scripts that do a lot of file operations, or perform operations on external machines can benefit by running in parallel. Since the running script cannot use all of the machine cores, it makes sense to set the -ThrottleLimit parameter to something greater than the number of cores. If one script execution waits many minutes to complete, you may want to allow tens or hundreds of scripts to run in parallel.
$logNames.count
10

PS C:> Measure-Command { $logs = $logNames | ForEach-Object -Parallel { Get-WinEvent -LogName $_ -MaxEvents 5000 2>$null } -ThrottleLimit 10 }
TotalMilliseconds : 115994.3 (1 minute 56 seconds)
$logs.Count
50000

PS C:> Measure-Command { $logs = $logNames | ForEach-Object { Get-WinEvent -LogName $_ -MaxEvents 5000 2>$null } }
TotalMilliseconds : 229768.2364 (3 minutes 50 seconds)
$logs.Count
50000

The script above collects 50,000 log entries on the local machine from 10 system log names. Running this in parallel is almost twice as fast as running sequentially, because it involves some relatively slow disk access and can also take advantage of the machine multiple cores as it processes the log entries.

When should it be avoided?

ForEach-Object -Parallel should not be thought as something that will always speed up script execution. And in fact it can significantly slow down script execution if used heedlessly. For example, if your script block is executing trivial script then running in parallel adds a huge amount of overhead and will run much slower.

PS C:> (measure-command { 1..1000 | ForEach-Object -Parallel { "Hello: $_" } }).TotalMilliseconds
10457.962

PS C:> (measure-command { 1..1000 | ForEach-Object { "Hello: $_" } }).TotalMilliseconds
18.4473

The above example, a trivial script block is run 1000 times. The ThrottleLimit is 5 by default so only 5 runspace/threads are created at a time, but still a runspace and thread is created 1000 times to do a simple string evaluation. Consequently, it takes over 10 seconds to complete. But removing the -Parallel parameter and running the ForEach-Object cmdlet normally, results in completion in about 18 milliseconds.

So, it is important to use this feature wisely.

Implementation details

As previously mentioned, the new ForEach-Object -Parallel feature uses existing PowerShell functionality to run script blocks concurrently. The primary addition is the ability to limit the number of concurrent scripts running at a given time with the -ThrottleLimit parameter. Throttling is accomplished by a PSTaskPool class that holds running tasks (running scripts), and has a settable size limit which is set to the throttle limit value. An Add method allows tasks to be added to the pool, but if it is full then the method blocks until a new slot becomes available. Adding tasks to the task pool was initially performed on the ForEach-Object cmdlet piped input processing thread. But that turned out to be a performance bottleneck, and now a dedicated thread is used to add tasks to the pool.

PowerShell itself imposes conditions on how scripts run concurrently, based on its design and history. Scripts have to run in runspace contexts and only one script thread can run at a time within a runspace. So in order to run multiple scripts simultaneously multiple runspaces must be created. The current implementation of ForEach-Object -Parallel creates a new runspace for each script block execution instance. It may be possible to optimize this by re-using runspaces from a pool, but one concern in doing this is leaking state from one script execution to another.

Runspace contexts are an isolation unit for running scripts, and generally do not allow sharing state between themselves. However, variables can be passed at the beginning of script execution through the $using: keyword, from the calling script to the parallel script block. This was borrowed from the remoting layer which uses the keyword for the same purpose but over a remote connection. But there is a big difference when using the $using: keyword in ForEach-Object -Parallel. And that is for remoting, the variable being passed is a copy sent over the remoting connection. But with ForEach-Object -Parallel, the actual object reference is being passed from one script to another, violating normal isolation restrictions. So it is possible to have a non thread-safe variable used in two scripts running on different threads, which can lead to unpredictable behavior.

# This does not throw an error, but is not guaranteed to work since the dictionary object is not thread safe
$threadUnSafeDictionary = [System.Collections.Generic.Dictionary[string,object]]::new()
Get-Process | ForEach-Object -Parallel {
    $dict = $using:threadUnSafeDictionary
    $dict.TryAdd($_.ProcessName, $_)
}
# This *is* guaranteed to work because the passed in concurrent dictionary object is thread safe
$threadSafeDictionary = [System.Collections.Concurrent.ConcurrentDictionary[string,object]]::new()
Get-Process | ForEach-Object -Parallel {
    $dict = $using:threadSafeDictionary
    $dict.TryAdd($_.ProcessName, $_)
}

$threadSafeDictionary["pwsh"]

 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
    112   108.25     124.43      69.75   16272   1 pwsh

Conclusion

This feature can greatly improve your life for many work load scenarios. As long as you understand how it works and what its limitations are, you can experiment with parallelism and make real performance improvements with your scripts.

Paul Higinbotham
Senior Software Engineer
PowerShell Team

The post PowerShell ForEach-Object Parallel Feature appeared first on PowerShell.