Amazon Transcribe Now Supports Mandarin and Russian

This post was originally published on this site

As speech is central to human interaction, artificial intelligence research has long focused on speech recognition, the first step in designing and building systems allowing humans to interact intuitively with machines. The diversity in languages, accents and voices makes this an incredibly difficult problem, requiring expert skills, extremely large data sets, and vast amounts of computing power to train efficient models.

In order to help organizations and developers use speech recognition in their applications, we launched Amazon Transcribe at AWS re:Invent 2017, an automatic speech recognition service. Thanks to Amazon Transcribe, customers such as VideoPeel, Echo360, or GE Appliances have been able to quickly and easily add speech recognition capabilities to their applications and devices.

A single API call is all that it takes… and you don’t need to know the first thing about machine learning. You can analyze audio files stored in Amazon Simple Storage Service (S3) and have the service return a text file of the transcribed speech. You can also send a live audio stream to Amazon Transcribe and receive a stream of transcripts in real time.

Since launch, the team has constantly added new languages, and today we are happy to announce support for Mandarin and Russian, bringing the total number of supported languages to 16.

Introducing Mandarin
Working with Amazon Transcribe is extremely simple: let me show you how to get started in just a few minutes.

Let’s try Mandarin first. Starting from this Little Red Riding Hood video, I extracted the audio track, saved it in MP3 format, and uploaded it to one of my Amazon Simple Storage Service (S3) buckets. Here’s the actual file.

Then, I started a transcription job using the AWS CLI:

$ aws transcribe start-transcription-job--media MediaFileUri= --media-format mp3 --language-code zh-CN --transcription-job-name little_red_riding_hood-mandarin

After a few minutes, the job is complete. Looking at the AWS console, I can either download it using the URL provided by Amazon Transcribe, or read it directly.

Unfortunately, I don’t speak Mandarin, but using Amazon Translate, this text is about a sick grandmother and a big bad wolf, so it looks like Amazon Transcribe did its job!

Introducing Russian
Let’s try Russian now, using the dialogue in this short video.

Здравствуйте! Greetings!
Добрый день! Good day!
Давайте познакомимся. Меня зовут Слава. Let’s introduce ourselves. My name is Slava.
Очень приятно, а меня – Наташа. Nice to meet you, and mine – Natasha.
Наташа, кто вы по профессии? Natasha, what is your profession?
Я врач. А вы? I (am a) doctor. And you?
Я инженер. I (am an) engineer.

This time, I will ask Amazon Transcribe to perform speaker identification too.

$ aws transcribe start-transcription-job --media MediaFileUri= --media-format mp3 --language-code ru-RU --transcription-job-name russian_dialogue --settings ShowSpeakerLabels=true,MaxSpeakerLabels=2

Here is the result.

As you can see, not only has Amazon Transcribe faithfully converted speech to text, it has also correctly assigned each sentence to the correct speaker.

Now Available!
You can start using these two new languages today in the following regions:

  • Americas: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), AWS GovCloud (US-West), Canada (Central), South America (Sao Paulo).
  • Europe: EU (Frankfurt), EU (Ireland), EU (London), EU (Paris).
  • Asia Pacific: Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney).

The free tier covers 60 minutes for the first 12 months, starting from your first transcription request.

As always, we’d love to hear your feedback: please post it to the AWS forum for Amazon Transcribe, or send it through your usual AWS contacts.


Special Webcast: Purple Teaming: The Pen-Test Grows Up – August 22, 2019 3:30pm US/Eastern

This post was originally published on this site

Speakers: Bryce Galbraith

With 20+ years of experience in the ethical hacking field, and a few million miles, I’ve see a lot. As a consultant, I’ve seen firsthand the inner workings of organizations across the industry sectors. As an instructor, I’ve had the unique privilege of conversing with thousands of professionals from around the world. I’ve seen their faces, I’ve heard their stories, I’ve felt their frustrations.

I’ve spent my entire career studying adversarial Tactics, Techniques, and Procedures (TTPs) to seek understanding, so I can help others understand how adversaries do what they do. I’ve learned many things along the way. Including, not to sugar-coat harsh realities sometimes. So, here goes…

1. If you’re relying on an annual pen-test and traditional static defenses to defend against advanced adversaries – you’re toast.

2. If your Red and Blue Teams see each other as the adversary and measure their success by the other’s failure – you’re toast.

If compliance is the goal – you’re burnt toast.

But, with a few adjustments and a change in alignment, organizations can begin to effectively prevent, detect, and respond to real-world TTPs through adversary emulation and Purple Teaming!

This webcast will cover:

  • Why your annual pen-test is a recipe for disaster, and what you can do about it.
  • Why many Red and Blue Teams are ineffective despite their efforts, and how to turn this around.
  • Several real-world TTPs that adversaries utilize (including demos) to completely dominate organizations, shockingly fast.
  • How to begin to perform adversary emulation and Purple Teaming
  • Several helpful tools and resources you can begin to explore immediately…

As Einstein wisely stated, Insanity is doing the same thing over and over again and expecting different results.

Special Webcast: ICS 612 Practitioner focused Hands on cybersecurity – August 21, 2019 10:30am US/Eastern

This post was originally published on this site

Speakers: Tim Conway

With the new ICS612 course being released at the Oil and Gas summit in Houston on September 17th, the course authors and instructors will walk through the layout of the course material, labs, learning objectives and future direction as the course progresses through Beta 1 and 2, before a planned release at the ICS Summit in March. Please join us with any questions you may have about the course, prerequisites, materials, whats next, or just to hear from some really excited ICS authors and instructors talk about how this course will help save the world.

Amazon Forecast – Now Generally Available

This post was originally published on this site

Getting accurate time series forecasts from historical data is not an easy task. Last year at re:Invent we introduced Amazon Forecast, a fully managed service that requires no experience in machine learning to deliver highly accurate forecasts. I’m excited to share that Amazon Forecast is generally available today!

With Amazon Forecast, there are no servers to provision. You only need to provide historical data, plus any additional metadata that you think may have an impact on your forecasts. For example, the demand for a particular product you need or produce may change with the weather, the time of the year, and the location where the product is used.

Amazon Forecast is based on the same technology used at Amazon and packages our years of experience in building and operating scalable, highly accurate forecasting technology in a way that is easy to use, and can be used for lots of different use cases, such as estimating product demand, cloud computing usage, financial planning, resource planning in a supply chain management system, as it uses deep learning to learn from multiple datasets and automatically try different algorithms.

Using Amazon Forecast
For this post, I need some sample data. To have an interesting use case, I go for the individual household electric power consumption dataset from the UCI Machine Learning Repository. For simplicity, I am using a version where data is aggregated hourly in a file in CSV format. Here are the first few lines where you can see the timestamp, the energy consumption, and the client ID:

2014-01-01 01:00:00,38.34991708126038,client_12
2014-01-01 02:00:00,33.5820895522388,client_12
2014-01-01 03:00:00,34.41127694859037,client_12
2014-01-01 04:00:00,39.800995024875625,client_12
2014-01-01 05:00:00,41.044776119402975,client_12

Let’s see how easy it is to build a predictor and get forecasts by using the Amazon Forecast console. Another option, for more advanced users, would be to use a Jupyter notebook and the AWS SDK for Python. You can find some sample notebooks in this GitHub repository.

In the Amazon Forecast console, the first step is to create a dataset group. Dataset groups act as containers for datasets that are related.

I can select a forecasting domain for my dataset group. Each domain covers a specific use case, such as retail, inventory planning, or web traffic, and brings its own dataset types based on the type of data used for training. For now, I use a custom domain that covers all use cases that don’t fall in other categories.

Next, I create a dataset. The data I am going to upload is aggregated by the hour, so 1 hour is the frequency of my data. The default data schema depends on the forecasting domain I selected earlier. I am using a custom domain here, and I change the data schema to have a timestamp, a target_value, and an item_id, in that order, as you can see in the sample few lines of data at the beginning of this post.

Now is the time to upload my time series data from Amazon Simple Storage Service (S3) into my dataset. The default timestamp format is exactly what I have in my data, so I don’t need to change it. I need an AWS Identity and Access Management (IAM) role to give Amazon Forecast access to the S3 bucket. I can select one here, or create a new one for this use case. As usual, avoid creating IAM roles that are too permissive and apply a least privilege approach to reduce the amount of permissions to the minimum required for this activity. After I tell Amazon Forecast in which S3 bucket and folder to look for my historical data, I start the import job.

The dataset group dashboard gives an overview of the process. My target time series data is being imported, and I can optionally add:

  • item metadata information on the items I want to predict on; for example, the color of the items in a retail scenario, or the kind of household (is this an apartment or a detached house?) for this electricity-focused use case.
  • related time series data that don’t include the target variable I want to predict, but can help improve my model; for example, price and promotions used by an ecommerce company are probably related to actual sales.

I am not adding more data for this use case. As soon as my dataset is imported, I start to train a predictor that I can then use to generate forecasts. I give the predictor a name, then select the forecast horizon, that in my case is 24 hours, and the frequency at which my forecast are generated.

To train the predictor, I can select a specific machine learning algorithms of my choice, such as ARIMA or DeepAR+, but I prefer simplicity and use AutoML to let Amazon Forecast evaluate all algorithms and choose the one that performs best for my dataset.

In the case of my dataset, each household is identified by a single variable, the item_id, but you can add more dimensions if required. I can then select the Country for holidays. This is optional, but can improve your results if the data you are using may be affected by people being on holidays or not. I think energy usage is different on holidays, so I select United States, the country my dataset is coming from.

The configuration of the backtest windows is a more advanced topic, and you can skip the next paragraph if you’re not interested into the details of how a machine learning model is evaluated in case of time series. In this case, I am leaving the default.

When training a machine learning model, you need two split your dataset in two: a training dataset you use to train with the machine learning algorithm, and an evaluation dataset that you use to evaluate the performance of your trained model. With time series, you can’t just create these two subsets of your data randomly, like you would normally do, because the order of your data points is important. The approach we use for Amazon Forecast is to split the time series in one or more parts, each one called a backtest window, preserving the order of the data. When evaluating your model against a backtest window, you should always use an evaluation dataset of the same length, otherwise it would be very difficult to compare different results.  The backtest window offset tells how many points ahead of a split point you want to use for evaluation, and this is the same value for all the splits. For example, by leaving 24 (hours) I always use one day of data for evaluating my model against multiple window offsets.

In the advanced configurations, I have the option to enable hyperparameter optimization (HPO), for the algorithms that support it, and featurizations, to develop additional features computed from the ones in your data. I am not touching those settings now.

After a few minutes, the predictor is active. To understand the quality of a predictor, I look at some of the metrics that are automatically computed.

Quantile loss (QL) calculates how far off the forecast at a certain quantile is from the actual demand. It weights underestimation and overestimation according to a specific quantile. For example, a P90 forecast, if calibrated, means that 90% of the time the true demand is less than the forecast value. Thus, when the demand turns out to be higher than the forecast, the loss would be greater than the other way around.

When the predictor is ready, and I am satisfied by its metrics, I can use it to create a forecast.

When the forecast is active, I can query it to get predictions. I can export the whole forecast as CSV file, or query for specific lookups. Let’s do a lookup. In the case of the dataset I am using, I can forecast the energy used by a household for a specific range of time. Dates here are in the past because I used an old dataset. I am pretty sure you’re going to use Amazon Forecast to look into the future.

For each timestamp in the forecast, I get a range of values. The P10, P50, and P90 forecasts have respectively 10%, 50%, and 90% probability of satisfying the actual demand. How you use these three values depends on your use case and how it is impacted by overestimating or underestimating demand. The P50 forecast is the most likely estimate for the demand. The P10 and P90 forecasts give you an 80% confidence interval for what to expect.

Available Now
You can use Amazon Forecast via the console, the AWS Command Line Interface (CLI) and the AWS SDKs. For example, you can use Amazon Forecast within a Jupyter notebook with the AWS SDK for Python to create a new predictor, or use the AWS SDK for JavaScript in the Browser to get predictions from within a web or mobile app, or the AWS SDK for Java or AWS SDK for .NET to add forecast capabilities to an existing enterprise application.

Here’s the overall flow of the Amazon Forecast API, from creating the dataset group to querying and extracting the forecast:

The dataset I used for this walkthrough and other examples are available in this GitHub repository:

Amazon Forecast is now available in US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), Asia Pacific (Singapore), and Asia Pacific (Tokyo).

More information on specific features and pricing is one click away at:

I look forward to see what you’re going to use it for, please share your results with me!


Special Webcast: Kerberos & Attacks 101 – August 21, 2019 3:30pm US/Eastern

This post was originally published on this site

Speakers: Tim Medin

Want to understand how Kerberos works? Would you like to understand modern Kerberos attacks? If so, then join Tim Medin as he walks you through how to attack Kerberos with ticket attacks and Kerberoasting. Well cover the basics of Kerberos authentication and then show you how the trust model can be exploited for persistence, pivoting, and privilege escalation.

Special Webcast: Leveraging OSINT for Better DFIR Investigations – August 20, 2019 11:00pm US/Eastern

This post was originally published on this site

Speakers: Jeff Lomas and Micah Hoffman

Note: This webcast is free of charge however a SANS portal account is required (see webcast link for details)

SANS Asia-Pacific Webcast Series- Leveraging OSINT for Better DFIR Investigations

Are you a digital forensic examiner or investigator? Do you use OSINT? Are you unsure if you are using OSINT? If you answered yes to any of these questions, this webinar is for you! Nearly all examiners have used OSINT at one point in their work product, but many are not sure if they are maximizing their use of OSINT. SEC487 author and certified SANS instructor Micah Hoffman and law enforcement digital forensic examiner/detective Jeff Lomas will discuss how OSINT techniques can add value to digital forensic investigations, perform a live demo using OSINT in concert with digital forensics, and discuss how digital forensic examiners can improve their OSINT.

New Telemetry in PowerShell 7 Preview 3

This post was originally published on this site

Beginning in PowerShell 7 Preview 3, PowerShell will be sending some additional data points to Microsoft.
This data will allow us to better understand usage of PowerShell and enable us to prioritize our future investments.
These additional points of data were reviewed with the PowerShell community and approved by the PowerShell Committee through the PowerShell RFC process.

What we added

We will continue to use Application Insights to collect the following new telemetry points:

- Count of PowerShell starts by type (API vs console)
    - Count of unique PowerShell usage
    - Count of the following execution types:
        - Application (native commands)
        - ExternalScript
        - Script
        - Function
        - Cmdlet
    - Enabled Microsoft experimental features or experimental features shipped with PowerShell
    - Count of hosted sessions
    - Microsoft owned modules loaded (based on white list)
This data will include the OS name, OS version, the PowerShell version, and the distribution channel when provided.

We will continue to share portions of our aggregated data with the PowerShell community through the
Public PowerBi report.

Why we added it

We want to make PowerShell better and believe this can be achieved by better understanding how PowerShell is being used.
Through these additional data points we will get answers backed by data to the following questions:

  • Is the PowerShell Core user-base growing?
  • How is PowerShell being used? What is the usage distribution across command types and session type?
  • How can we encourage PowerShell Core usage growth?
  • What are issues that customers are hitting in PowerShell Core?
  • What versions of PowerShell tools and services should Microsoft continue to support?
  • Which experimental features are being used and tested? Which experimental features should we invest in?
  • How can we optimize the engine size and efficiency of PowerShell for cloud scenarios?

To ensure we are getting an accurate picture of how everyone uses PowerShell, not just those most
vocal/involved in the community, we made improvements in our telemetry.
PowerShell usage telemetry will allow us to better prioritize testing, support, and investments.

Performance testing

When implementing this telemetry we took special care to ensure that there would not be a discernible performance impact.
The telemetry is collected through Application Insights and is batched and sent on a separate thread in order to reduce impact.
We also conducted tests to verify that there would not be a noticeable difference in PowerShell performance.

In order to test the performance impact of the telemetry we ran our test suite 5 times with and 5 times without the telemetry changes
and compared the average time for test completion.
The tests had a 1% difference in average completion time with the telemetry-enabled test runs actually having the faster average completion. The difference in average completion time, however, was not statistically significant.

We also tested the impact of collecting telemetry on startup time for both cold starts (first start-up of PowerShell) and warm starts (all future starts). We found that on average cold starups were .028 seconds slower with the additional telemetry while warm startups were, on average, .027 slower. The average performance impact was around 4% and all start-ups during the test runs performed faster than .6023 seconds.

How to disable

The telemetry reporting can be disabled by setting the environment variable POWERSHELL_TELEMETRY_OPTOUT to true, yes, or 1.
This should not be done in your profile, as PowerShell reads this value from your system before executing your profile.

Feedback and issues

If you encounter any issues with PowerShell telemetry, the best place to get support is through our GitHub page.

The post New Telemetry in PowerShell 7 Preview 3 appeared first on PowerShell.

PowerShell 7 Preview 3

This post was originally published on this site

PowerShell 7 Preview 3

In May, I published our PowerShell 7 Roadmap. We have been making progress on our roadmap and are currently on track to have a Generally Available (GA)
release by end of this calendar year.

Long Term Servicing

PowerShell 7 GA will also be our first Long Term Servicing (LTS) release which is a change from our current Modern Lifecycle support for PowerShell Core 6.
We will support PowerShell 7 GA for as long as .NET Core 3.1 is supported before you must upgrade to a newer version to continue to be supported by Microsoft.

Windows PowerShell compatibility

One of the main goals of PowerShell 7 is to have a viable replacement for Windows PowerShell 5.1 in production and we’ve made significant progress towards that goal.

PowerShell 7 Preview 3 is built on .NET Core 3.0 Preview 8 and leverages the work from the .NET Team to close the gap between .NET Core and .NET Framework. .NET Core 3.0 reintroduces a large number of .NET Framework APIs, opening up a large number of PowerShell modules shipped with Windows to be validated and marked as compatible by our team. Because the compatibility changes to the modules come as part of Windows, the latest version of Windows 10/Windows Server is required for full module compatibility.

However, on older versions of Windows, some modules may just work if you use:

Import-Module <moduleName> -SkipEditionCheck

If you have issues with a Microsoft PowerShell module, please open an issue in the PowerShellModuleCoverage repository!

Expect more content on this specific topic from Joey Aiello in the near future with more detail on which modules are compatible and where they’re marked as such.

New Features in Preview 3

This is just a small part of the entire changelog.
New features in this preview from the community and also the PowerShell team:

Experimental Features on by default in Preview builds

We decided to enable all Experimental Features by default in order to solicit more feedback for the PowerShell Committee to determine if a feature should continue as experimental, move from experimental to stable (non-experimental), or be withdrawn. On Stable builds (as well as Release Candidates), experimental features will continue to be disabled by default.

Note that if you had previously manually enabled experimental features, your powershell.config.jsonsettings file will take precedence and only experimental features listed within that file will be enabled. You can delete that file or run Get-ExperimentalFeature | Enable-ExperimentalFeature to ensure all experimental features are enabled. However, if you use the pipeline, you’ll have to do it again with a future Preview release that has new experimental features.


Single Apartment Thread as default

In general, you don’t need to worry about a concept called ApartmentState which only applies to Windows.

Prior to this release pwsh would run as a multi-threaded apartment by default. However, graphical user interface (GUI) APIs such as WinForms and WPF require a single-threaded apartment. What is important here is that pwsh is now the same as powershell.exe in regards to apartment state and as such support calling WinForms and WPF APIs from PowerShell script.


Display COM Method Signature Argument Names

On Windows, if you happen to call COM APIs from PowerShell, a new capability by nbkalex will now show the argument names of COM methods instead of just the type information which can be used as simple documentation indicating what arguments should be passed.


Consider DBNull and NullString as $null

If you work with database types, you may get back a [dbnull]::Value which is equivalent to $null within the database, but in PowerShell, this was not equal to $null so you can’t compare it directly. This change from Joel Sallow allows you to compare both [dbnull]::Value and [nullstring]::Value to $null and get $true.


Read-Host -Prompt works for all input

Due to how Read-Host calls into the console host and how the console host prompts for input (such as mandatory parameters that are given a value), you might encounter a situation where using Read-Host to prompt for input in your script exhibits unintended behavior when certain characters are used. This has been fixed so Read-Host will accept input as expected.


Support negative numbers with -Split operator

The -Split operator splits one or more strings into substrings. You can optionally specify a value to indicate the maximum number of substrings you want returned.

This new capability by Jacob Scott now allows you to specify the maximum number of substrings as a negative value signifying that the split should happen right to left instead of the usual left to right.


ForEach-Object -Parallel

We’ve received consistent feedback that PowerShell users use PSWorkflow primarily to easily run scriptblocks in parallel.

We’ve added a -Parallel parameter to ForEach-Object that accepts a scriptblock to execute in parallel. There is an optional -ThrottleLimit parameter to set the maximum threads to use in parallel where it defaults to 5.


Resolve AppX reparse points

On Windows 10, if you have apps installed from the Windows Store and list them in the command line, they show up as 0 byte files. These files are actually a different type of link to the actual executable. With this change, the target executable will now show up when using Get-ChildItem.


pwsh as a login shell

On Linux and macOS systems, there is a concept of a login shell which sets up the environment from which other apps and shells inherit. Prior to this release if you used pwsh as your default login shell, you may have noticed that some environment variables are missing or incomplete.

With this change, pwsh will work the same as sh Bourne Shell in how it sets up the login environment so that everything works correctly.

Additional Telemetry

In this Preview release, we’ve added more telemetry. Please see Sydney Smith‘s blog post on New Telemetry in PowerShell 7 Preview 3.


Although this blog post focuses on new features, this release also contains many bug fixes as well as targeted performance improvements.

You can always get the latest version of PowerShell from

Expect more new features from the community and the PowerShell team in future Preview releases!

Steve Lee
PowerShell Team

The post PowerShell 7 Preview 3 appeared first on PowerShell.

Ask The Expert Webcast: Focus On People, Process, and Technology to Take Your SOC to the Next Level – August 20, 2019 1:00pm US/Eastern

This post was originally published on this site

Speakers: John Pescatore and John Kitchen

Please join us for a webinar featuring John Pescatore, SANS Director of Emerging Technologies and John Kitchen, Anomali Solution Engineering Manager Americas, as they discuss key themes developed through analyzing the results of the SANS Common and Best Practices for Security Operations Centers: 2019 Survey. They will tie real-world SOC experience to the survey findings with an emphasis on people, process and technology. How are SOC managers successfully incorporating SOAR technologies and metrics that show measurable business benefit? And what are SOC organizations doing to tackle the problems associated with staffing and skills gap issues? Take your SOC to the next level with actionable insights.

Special Webcast: Legacy Authentication and Password Spray, Understanding and Stopping Attackers Favorite TTPs in Azure AD – August 19, 2019 1:00pm US/Eastern

This post was originally published on this site

Speakers: Mark Morowczynski and Ramiro Calderon

One of attackers’ favorite techniques today is password spraying. And it should be: in August 2018, 200,000 accounts were compromised using this. Nearly all password spray attacks are targeting legacy authentication protocols. The good news there are several steps you can take to prevent this type of attack. In this session we will focus on what legacy authentication is, how to look for it in your environment and what you need to do to prevent it from compromising your accounts.