Parsing Text with PowerShell (2/3)

This post was originally published on this site

This is the second post in a three-part series.

  • Part 1:
    • Useful methods on the String class
    • Introduction to Regular Expressions
    • The Select-String cmdlet
  • Part 2:
    • the -split operator
    • the -match operator
    • the switch statement
    • the Regex class
  • Part 3:
    • a real world, complete and slightly bigger, example of a switch-based parser

The -split operator

The -split operator splits one or more strings into substrings.

The first example is a name-value pattern, which is a common parsing task. Note the usage of the Max-substrings parameter to the -split operator.
We want to ensure that it doesn’t matter if the value contains the character to split on.

$text = "Description=The '=' character is used for assigning values to a variable"
$name, $value = $text -split "=", 2

@"
Name  =  $name
Value =  $value
"@

Name  =  Description
Value =  The '=' character is used for assigning values to a variable

When the line to parse contains fields separated by a well known separator, that is never a part of the field values, we can use the -split operator in combination with multiple assignment to get the fields into variables.

$name, $location, $occupation = "Spider Man,New York,Super Hero" -split ','

If only the location is of interest, the unwanted items can be assigned to $null.

$null, $location, $null = "Spider Man,New York,Super Hero" -split ','

$location

New York

If there are many fields, assigning to null doesn’t scale well. Indexing can be used instead, to get the fields of interest.

$inputText = "x,Staffan,x,x,x,x,x,x,x,x,x,x,Stockholm,x,x,x,x,x,x,x,x,11,x,x,x,x"
$name, $location, $age = ($inputText -split ',')[1,12,21]

$name
$location
$age

Staffan
Stockholm
11

It is almost always a good idea to create an object that gives context to the different parts.

$inputText = "x,Steve,x,x,x,x,x,x,x,x,x,x,Seattle,x,x,x,x,x,x,x,x,22,x,x,x,x"
$name, $location, $age = ($inputText -split ',')[1,12,21]
[PSCustomObject] @{
    Name = $name
    Location = $location
    Age = [int] $age
}

Name  Location Age
----  -------- ---
Steve Seattle   22

Instead of creating a PSCustomObject, we can create a class. It’s a bit more to type, but we can get more help from the engine, for example with tab completion.

The example below also shows an example of type conversion, where the default string to number conversion doesn’t work.
The age field is handled by PowerShell’s built-in type conversion. It is of type [int], and PowerShell will handle the conversion from string to int,
but in some cases we need to help out a bit. The ShoeSize field is also an [int], but the data is hexadecimal,
and without the hex specifier (‘0x’), this conversion fails for some values, and provides incorrect results for the others.

class PowerSheller {
    [string] $Name
    [string] $Location
    [int] $Age
    [int] $ShoeSize
}

$inputText = "x,Staffan,x,x,x,x,x,x,x,x,x,x,Stockholm,x,x,x,x,x,x,x,x,33,x,11d,x,x"
$name, $location, $age, $shoeSize = ($inputText -split ',')[1,12,21,23]
[PowerSheller] @{
    Name = $name
    Location = $location
    Age = $age
    # ShoeSize is expressed in hex, with no '0x' because reasons :)
    # And yes, it's in millimeters.
    ShoeSize = [Convert]::ToInt32($shoeSize, 16)
}

Name    Location  Age ShoeSize
----    --------  --- --------
Staffan Stockholm  33      285

The split operator’s first argument is actually a regex (by default, can be changed with options).
I use this on long command lines in log files (like those given to compilers) where there can be hundreds of options specified. This makes it hard to see if a certain option is specified or not, but when split into their own lines, it becomes trivial.
The pattern below uses a positive lookahead assertion.
It can be very useful to make patterns match only in a given context, like if they are, or are not, preceded or followed by another pattern.

$cmdline = "cl.exe /D Bar=1 /I SomePath /D Foo  /O2 /I SomeOtherPath /Debug a1.cpp a3.cpp a2.cpp"

$cmdline -split "s+(?=[-/])"

cl.exe
/D Bar=1
/I SomePath
/D Foo
/O2
/I SomeOtherPath
/Debug a1.cpp a2.cpp

Breaking down the regex, by rewriting it with the x option:

(?x)      # ignore whitespace in the pattern, and enable comments after '#'
s+       # one or more spaces
(?=[-/])  # only match the previous spaces if they are followed by any of '-' or '/'.

Splitting with a scriptblock

The -split operator also comes in another form, where you can pass it a scriptblock instead of a regular expression.
This allows for more complicated logic, that can be hard or impossible to express as a regular expression.

The scriptblock accepts two parameters, the text to split and the current index. $_ is bound to the character at the current index.

function SplitWhitespaceInMiddleOfText {
    param(
        [string]$Text,
        [int] $Index
    )
    if ($Index -lt 10 -or $Index -gt 40){
        return $false
    }
    $_ -match 's'
}

$inputText = "Some text that only needs splitting in the middle of the text"
$inputText -split $function:SplitWhitespaceInMiddleOfText

Some text that
only
needs
splitting
in
the middle of the text

The $function:SplitWhitespaceInMiddleOfText syntax is a way to get to content (the scriptblock that implements it) of the function, just as $env:UserName gets the content of an item in the env: drive.
It provides a way to document and/or reuse the scriptblock.

The -match operator

The -match operator works in conjunction with the $matches automatic variable. Each time a -match or a -notmatch succeeds, the $matches variable is populated so that each capture group gets its own entry. If the capture group is named, the key will be the name of the group, otherwise it will be the index.

As an example:

if ('a b c' -match '(w) (?<named>w) (w)'){
    $matches
}

Name                           Value
----                           -----
named                          b
2                              c
1                              a
0                              a b c

Notice that the indices only increase on groups without names. I.E. the indices of later groups change when a group is named.

Armed with the regex knowledge from the earlier post, we can write the following:

PS> "    10,Some text" -match '^s+(d+),(.+)'

True

PS> $matches
Name                           Value
----                           -----
2                              Some text
1                              10
0                              10,Some text

or with named groups

PS> "    10,Some text" -match '^s+(?<num>d+),(?<text>.+)'

True

PS> $matches
Name                           Value
----                           -----
num                            10
text                           Some text
0                              10,Some text

The important thing here is to put parentheses around the parts of the pattern that we want to extract. That is what creates the capture groups that allow us to reference those parts of the matching text, either by name or by index.

Combining this into a function makes it easy to use:

function ParseMyString($text){
    if ($text -match '^s+(d+),(.+)') {
        [PSCustomObject] @{
            Number = [int] $matches[1]
            Text    = $matches[2]
        }
    }
    else {
        Write-Warning "ParseMyString: Input `$text` doesn't match pattern"
    }
}

ParseMyString "    10,Some text"

 

Number  Text
------- ----
     10 Some text

Notice the type conversion when assigning the Number property. As long as the number is in range of an integer, this will always succeed, since we have made a successful match in the if statement above. ([long] or [bigint] could be used. In this case I provide the input, and I have promised myself to stick to a range that fits in a 32-bit integer.)
Now we will be able to sort or do numerical operations on the Number property, and it will behave like we want it to – as a number, not as a string.

The switch statement

Now we’re at the big guns 🙂

The switch statement in PowerShell has been given special functionality for parsing text.
It has two flags that are useful for parsing text and files with text in them. -regex and -file.

When specifying -regex, the match clauses that are strings are treated as regular expressions. The switch statement also sets the $matches automatic variable.

When specifying -file, PowerShell treats the input as a file name, to read input from, rather than as a value statement.

Note the use of a ScriptBlock instead of a string as the match clause to determine if we should skip preamble lines.

class ParsedOutput {
    [int] $Number
    [string] $Text

    [string] ToString() { return "{0} ({1})" -f $this.Text, $this.Number }
}

$inputData =
    "Preamble line",
    "LastLineOfPreamble",
    "    10,Some Text",
    "    Some other text,20"

$inPreamble = $true
switch -regex ($inputData) {

    {$inPreamble -and $_ -eq 'LastLineOfPreamble'} { $inPreamble = $false; continue }

    "^s+(?<num>d+),(?<text>.+)" {  # this matches the first line of non-preamble input
        [ParsedOutput] @{
            Number = $matches.num
            Text = $matches.text
        }
        continue
    }

    "^s+(?<text>[^,]+),(?<num>d+)" { # this matches the second line of non-preamble input
        [ParsedOutput] @{
            Number = $matches.num
            Text = $matches.text
        }
        continue
    }
}

 

Number  Text
------ ----
    10 Some Text
    20 Some other text

The pattern [^,]+ in the text group in the code above is useful. It means match anything that is not a comma ,. We are using the any-of construct [], and within those brackets, ^ changes meaning from the beginning of the line to anything but.

That is useful when we are matching delimited fields. A requirement is that the delimiter cannot be part of the set of allowed field values.

The regex class

regex is a type accelerator for System.Text.RegularExpressions.Regex. It can be useful when porting code from C#, and sometimes when we want to get more control in situations when we have many matches of a capture group. It also allows us to pre-create the regular expressions which can matter in performance sensitive scenarios, and to specify a timeout.

One instance where the regex class is needed is when you have multiple captures of a group.

Consider the following:

Text Pattern
a,b,c, (w,)+

If the match operator is used, $matches will contain

Name                           Value
----                           -----
1                              c,
0                              a,b,c,

The pattern matched three times, for a,, b, and c,. However, only the last match is preserved in the $matches dictionary.
However, the following will allow us to get to all the captures of the group:

[regex]::match('a,b,c,', '(w,)+').Groups[1].Captures
Index Length Value
----- ------ -----
    0      2 a,
    2      2 b,
    4      2 c,

Below is an example that uses the members of the Regex class to parse input data

class ParsedOutput {
    [int] $Number
    [string] $Text

    [string] ToString() { return "{0} ({1})" -f $this.Text, $this.Number }
}

$inputData =
    "    10,Some Text",
    "    Some other text,20"  # this text will not match

[regex] $pattern = "^s+(d+),(.+)"

foreach($d in $inputData){
    $match = $pattern.Match($d)
    if ($match.Success){
        $number, $text = $match.Groups[1,2].Value
        [ParsedOutput] @{
            Number = $number
            Text = $text
        }
    }
    else {
        Write-Warning "regex: '$d' did not match pattern '$pattern'"
    }
}

 

WARNING: regex: '    Some other text,20' did not match pattern '^s+(d+),(.+)'
Number Text
------ ----
    10 Some Text

It may surprise you that the warning appears before the output. PowerShell has a quite complex formatting system at the end of the pipeline, which treats pipeline output different than other streams. Among other things, it buffers output in the beginning of a pipeline to calculate sensible column widths. This works well in practice, but sometimes gives strange reordering of output on different streams.

Summary

In this post we have looked at how the -split operator can be used to split a string in parts, how the -match operator can be used to extract different patterns from some text, and how the powerful switch statement can be used to match against multiple patterns.

We ended by looking at how the regex class, which in some cases provides a bit more control, but at the expense of ease of use. This concludes the second part of this series. Next time, we will look at a complete, real world, example of a switch-based parser.

Thanks to Jason Shirk, Mathias Jessen and Steve Lee for reviews and feedback.

Staffan Gustafsson, @StaffanGson, powercode@github

Staffan works at DICE in Stockholm, Sweden, as a Software Engineer and has been using PowerShell since the first public beta.
He was most seriously pleased when PowerShell was open sourced, and has since contributed bug fixes, new features and performance improvements.
Staffan is a speaker at PSConfEU and is always happy to talk PowerShell.

The post Parsing Text with PowerShell (2/3) appeared first on Powershell.

Parsing Text with PowerShell (2/3)

This post was originally published on this site

This is the second post in a three-part series.

  • Part 1:
    • Useful methods on the String class
    • Introduction to Regular Expressions
    • The Select-String cmdlet
  • Part 2:
    • the -split operator
    • the -match operator
    • the switch statement
    • the Regex class
  • Part 3:
    • a real world, complete and slightly bigger, example of a switch-based parser

The -split operator

The -split operator splits one or more strings into substrings.

A common pattern is a name-value pattern:
Note the usage of the Max-substrings parameter to the -split operator.
We want to ensure that is doesn’t matter if the value contains the character to split on.

$text = "Description=The '=' character is used for assigning values to a variable"
$name, $value = $text -split "=", 2

@"
Name  =  $name
Value =  $value
"@
Name  =  Description
Value =  The '=' character is used for assigning values to a variable

When the line to parse contains fields separated by a well known separator, that is never a part of the field values, we can use the -split operator
in combination with multiple assignment to get the fields into variables.

$name, $location, $occupation = "Spider Man,New York,Super Hero" -split ','

If only the location is of interest, the unwanted items can be assigned to $null.

$null, $location, $null = "Spider Man,New York,Super Hero" -split ','

$location
New York

If there are many fields, assigning to null doesn’t scale well. Indexing can be used instead, to get the fields of interest.

$inputText = "x,Staffan,x,x,x,x,x,x,x,x,x,x,Stockholm,x,x,x,x,x,x,x,x,11,x,x,x,x"
$name, $location, $age = ($inputText -split ',')[1,12,21]

$name
$location
$age
Staffan
Stockholm
11

It is almost always a good idea to create an object that gives context to the different parts.

$inputText = "x,Steve,x,x,x,x,x,x,x,x,x,x,Seattle,x,x,x,x,x,x,x,x,22,x,x,x,x"
$name, $location, $age = ($inputText -split ',')[1,12,21]
[PSCustomObject] @{
    Name = $name
    Location = $location
    Age = [int] $age
}
Name  Location Age
----  -------- ---
Steve Seattle   22

Instead of creating a PSCustomObject, we can create a class. It’s a bit more to type, but we can get more help from the engine, for example with tab completion.

The example below also shows an example of type conversion, where the default string to number conversion doesn’t work.
The age field is handled by PowerShell’s built-in type conversion. It is of type [int], and PowerShell will handle the conversion from string to int,
but in some cases we need to help out a bit. The ShoeSize field is also an [int], but the data is hexadecimal,
and without the hex specifier (‘0x’), this conversion fails for some values, and provides incorrect results for the others.

class PowerSheller {
    [string] $Name
    [string] $Location
    [int] $Age
    [int] $ShoeSize
}

$inputText = "x,Staffan,x,x,x,x,x,x,x,x,x,x,Stockholm,x,x,x,x,x,x,x,x,33,x,11d,x,x"
$name, $location, $age, $shoeSize = ($inputText -split ',')[1,12,21,23]
[PowerSheller] @{
    Name = $name
    Location = $location
    Age = $age
    # ShoeSize is expressed in hex, with no '0x' because reasons :)
    # And yes, it's in millimeters.
    ShoeSize = [Convert]::ToInt32($shoeSize, 16)
}
Name    Location  Age ShoeSize
----    --------  --- --------
Staffan Stockholm  33      285

The split operator’s first argument is actually a regex (by default, can be changed with options).
I use this on long command lines in log files (like those given to compilers) where there can be hundreds of options specified. This makes it hard to see if a certain option is specified or not, but when split into their own lines, it becomes trivial.
The pattern below uses a positive lookahead assertion.
It can be very useful to make patterns match only in a given context, like if they are, or are not, preceded or followed by another pattern.

$cmdline = "cl.exe /D Bar=1 /I SomePath /D Foo  /O2 /I SomeOtherPath /Debug a1.cpp a3.cpp a2.cpp"

$cmdline -split "s+(?=[-/])"
cl.exe
/D Bar=1
/I SomePath
/D Foo
/O2
/I SomeOtherPath
/Debug a1.cpp a2.cpp

Breaking down the regex, by rewriting it with the x option:

(?x)      # ignore whitespace in the pattern, and enable comments after '#'
s+       # one or more spaces
(?=[-/])  # only match the previous spaces if they are followed by any of '-' or '/'.

Splitting with a scriptblock

The -split operator also comes in another form, where you can pass it a scriptblock instead of a regular expression.
This allows for more complicated logic, that can be hard or impossible to express as a regular expression.

The scriptblock accepts two parameters, the text to split and the current index. $_ is bound to the character at the current index.

function SplitWhitespaceInMiddleOfText {
    param(
        [string]$Text,
        [int] $Index
    )
    if ($Index -lt 10 -or $Index -gt 40){
        return $false
    }
    $_ -match 's'
}

$inputText = "Some text that only needs splitting in the middle of the text"
$inputText -split $function:SplitWhitespaceInMiddleOfText
Some text that
only
needs
splitting
in
the middle of the text

The $function:SplitWhitespaceInMiddleOfText syntax is a way to get to content (the scriptblock that implements it) of the function, just as $env:UserName gets the content of an item in the env: drive.
It provides a way to document and/or reuse the scriptblock.

The -match operator

The -match operator works in conjunction with the $matches automatic variable.
Each time a -match or a -notmatch succeeds, the $matches variable is populated so that each capture group gets its own entry.
If the capture group is named, the key will be the name of the group, otherwise it will be the index.

As an example:

if ('a b c' -match '(w) (?<named>w) (w)'){
    $matches
}
Name                           Value
----                           -----
named                          b
2                              c
1                              a
0                              a b c

Notice that the indices only increase on groups without names. I.E. the indices of later groups change when a group is named.

Armed with the regex knowledge from the earlier post, we can write the following:

PS> "    10,Some text" -match '^s+(d+),(.+)'
True
PS> $matches
Name                           Value
----                           -----
2                              Some text
1                              10
0                              10,Some text

or with named groups

PS> "    10,Some text" -match '^s+(?<num>d+),(?<text>.+)'
True
PS> $matches
Name                           Value
----                           -----
num                            10
text                           Some text
0                              10,Some text

The important thing here is that the parts of the pattern that we want to extract has parenthesis around them.
That is what creates the capture groups that allow us to reference those parts of the matching text, either by name or by index.

Combining this into a function makes it easy to use:

function ParseMyString($text){
    if ($text -match '^s+(d+),(.+)') {
        [PSCustomObject] @{
            Number = [int] $matches[1]
            Text    = $matches[2]
        }
    }
    else {
        Write-Warning "ParseMyString: Input `$text` doesn't match pattern"
    }
}

ParseMyString "    10,Some text"
Number  Text
------- ----
     10 Some text

Notice the type conversion when assigning the Number property. As long as the number is in range of an integer, this will always succeed, since we have made a successful match in the if statement above. ([long] or [bigint] could be used. In this case I provide the input, and I have promised myself to stick to a range that fits in a 32-bit integer.)
Now we will be able to sort or do numerical operations on the Number property, and it will behave like we want it to – as a number, not as a string.

The switch statement

Now we’re at the big guns 🙂

The switch statement in PowerShell has been given special functionality for parsing text.
It has two flags that are useful for parsing text and files with text in them. -regex and -file.

When specifying -regex, the match clauses that are strings are treated as regular expressions. The switch statement also sets the $matches automatic variable.

When specifying -file, PowerShell treats the input as a file name, to read input from, rather than as a value statement.

Note the use of a ScriptBlock instead of a string as the match clause to determine if we should skip preamble lines.

class ParsedOutput {
    [int] $Number
    [string] $Text

    [string] ToString() { return "{0} ({1})" -f $this.Text, $this.Number }
}

$inputData =
    "Preamble line",
    "LastLineOfPreamble",
    "    10,Some Text",
    "    Some other text,20"

$inPreamble = $true
switch -regex ($inputData) {

    {$inPreamble -and $_ -eq 'LastLineOfPreamble'} { $inPreamble = $false; continue }

    "^s+(?<num>d+),(?<text>.+)" {  # this matches the first line of non-preamble input
        [ParsedOutput] @{
            Number = $matches.num
            Text = $matches.text
        }
        continue
    }

    "^s+(?<text>[^,]+),(?<num>d+)" { # this matches the second line of non-preamble input
        [ParsedOutput] @{
            Number = $matches.num
            Text = $matches.text
        }
        continue
    }
}
Number  Text
------ ----
    10 Some Text
    20 Some other text

The pattern [^,]+ in the text group in the code above is useful. It means match anything that is not a comma ,. We are using the any-of construct [], and within those brackets, ^ changes meaning from the beginning of the line to anything but.

That is useful when we are matching delimited fields. A requirement is that the delimiter cannot be part of the set of allowed field values.

The regex class

regex is a type accelerator for System.Text.RegularExpressions.Regex. It can be useful when porting code from C#, and sometimes when we want to get more control in situations when we have many matches of a capture group. It also allows us to pre-create the regular expressions which can matter in performance sensitive scenarios, and to specify a timeout.

One instance where the regex class is needed is when you have multiple captures of a group.

Consider the following:

Text Pattern
a,b,c, (w,)+

If the match operator is used, $matches will contain

Name                           Value
----                           -----
1                              c,
0                              a,b,c,

The pattern matched three times, for a,, b, and c,. However, only the last match is preserved in the $matches dictionary.
However, the following will allow us to get to all the captures of the group:

[regex]::match('a,b,c,', '(w,)+').Groups[1].Captures
Index Length Value
----- ------ -----
    0      2 a,
    2      2 b,
    4      2 c,

Below is an example that uses the members of the Regex class to parse input data

class ParsedOutput {
    [int] $Number
    [string] $Text

    [string] ToString() { return "{0} ({1})" -f $this.Text, $this.Number }
}

$inputData =
    "    10,Some Text",
    "    Some other text,20"  # this text will not match

[regex] $pattern = "^s+(d+),(.+)"

foreach($d in $inputData){
    $match = $pattern.Match($d)
    if ($match.Success){
        $number, $text = $match.Groups[1,2].Value
        [ParsedOutput] @{
            Number = $number
            Text = $text
        }
    }
    else {
        Write-Warning "regex: '$d' did not match pattern '$pattern'"
    }
}
WARNING: regex: '    Some other text,20' did not match pattern '^s+(d+),(.+)'
Number Text
------ ----
    10 Some Text

It may surprise you that the warning appears before the output. PowerShell has a quite complex formatting system at the end of the pipeline, which treats pipeline output different than other streams. Among other things, it buffers output in the beginning of a pipeline to calculate sensible column widths. This works well in practice, but sometimes gives strange reordering of output on different streams.

Summary

In this post we have looked at how the -split operator can be used to split a string in parts, how the -match operator can be used to extract different patterns from some text, and how the powerful switch statement can be used to match against multiple patterns.

We ended by looking at how the regex class, which in some cases provides a bit more control, but at the expense of ease of use.

This concludes the second part of this post. Next time, we will look at a complete, real world, example of a switch-based parser.

Thanks to Jason Shirk, Mathias Jessen and Steve Lee for reviews and feedback.

Staffan Gustafsson, @StaffanGson, powercode@github

Staffan works at DICE in Stockholm, Sweden, as a Software Engineer and has been using PowerShell since the first public beta.
He was most seriously pleased when PowerShell was open sourced, and has since contributed bug fixes, new features and performance improvements.
Staffan is a speaker at PSConfEU and is always happy to talk PowerShell.

The PowerShell-Docs repositories have been moved

This post was originally published on this site

The PowerShell-Docs repositories have been moved from the PowerShell organization to the MicrosoftDocs organization in GitHub.

The tools we use to build the documentation are designed to work in the MicrosoftDocs org. Moving the repository lets us build the foundation for future improvements in our documentation experience.

Impact of the move

During the move there may be some downtime. The affected repositories will be inaccessible during
the move process. Also, the documentation processes will be paused. After the move, we need to test
access permissions and automation scripts.After these tasks are complete, access and operations will return to normal. GitHub automatically
redirects requests to the old repo URL to the new location.
For more information about transferring repositories in GitHub, see About repository transfers.
If the transferred repository has any forks, then those forks will remain associated with the
repository after the transfer is complete.
  • All Git information about commits, including contributions, are preserved.
  • All of the issues and pull requests remain intact when transferring a repository.
  • All links to the previous repository location are automatically redirected to the new location.

When you use git clone, git fetch, or git push on a transferred repository, these commands will redirect to the new repository location or URL.

However, to avoid confusion, we strongly recommend updating any existing local clones to point to
the new repository URL. For more information, see Changing a remote’s URL.

The following example shows how to change the “upstream” remote to point to the new location:

[Wed 06:08PM] [staging =]
PS C:GitPS-DocsPowerShell-Docs> git remote -v
origin  https://github.com/sdwheeler/PowerShell-Docs.git (fetch)
origin  https://github.com/sdwheeler/PowerShell-Docs.git (push)
upstream        https://github.com/PowerShell/PowerShell-Docs.git (fetch)
upstream        https://github.com/PowerShell/PowerShell-Docs.git (push)

[Wed 06:09PM] [staging =]
PS C:GitPS-DocsPowerShell-Docs> git remote set-url upstream https://github.com/MicrosoftDocs/PowerShell-Docs.git

[Wed 06:10PM] [staging =]
PS C:GitPS-DocsPowerShell-Docs> git remote -v
origin  https://github.com/sdwheeler/PowerShell-Docs.git (fetch)
origin  https://github.com/sdwheeler/PowerShell-Docs.git (push)
upstream        https://github.com/MicrosoftDocs/PowerShell-Docs.git (fetch)
upstream        https://github.com/MicrosoftDocs/PowerShell-Docs.git (push)

Which repositories were moved?

 

The following repositories were transferred:

  • PowerShell/PowerShell-Docs
  • PowerShell/powerShell-Docs.cs-cz
  • PowerShell/powerShell-Docs.de-de
  • PowerShell/powerShell-Docs.es-es
  • PowerShell/powerShell-Docs.fr-fr
  • PowerShell/powerShell-Docs.hu-hu
  • PowerShell/powerShell-Docs.it-it
  • PowerShell/powerShell-Docs.ja-jp
  • PowerShell/powerShell-Docs.ko-kr
  • PowerShell/powerShell-Docs.nl-nl
  • PowerShell/powerShell-Docs.pl-pl
  • PowerShell/powerShell-Docs.pt-br
  • PowerShell/powerShell-Docs.pt-pt
  • PowerShell/powerShell-Docs.ru-ru
  • PowerShell/powerShell-Docs.sv-se
  • PowerShell/powerShell-Docs.tr-tr
  • PowerShell/powerShell-Docs.zh-cn
  • PowerShell/powerShell-Docs.zh-tw

Call to action

If you have a fork that you cloned, change your remote configuration to point to the new upstream URL.
Help us make the documentation better.
  • Submit issues when you find a problem in the docs.
  • Suggest fixes to documentation by submitting changes through the PR process.
Sean Wheeler
Senior Content Developer for PowerShell
https://github.com/sdwheeler

The post The PowerShell-Docs repositories have been moved appeared first on Powershell.

The PowerShell-Docs repositories have been moved

This post was originally published on this site

The PowerShell-Docs repositories have been moved from the PowerShell organization to the MicrosoftDocs organization in GitHub.

The tools we use to build the documentation are designed to work in the MicrosoftDocs org. Moving the repository lets us build the foundation for future improvements in our documentation experience.

Impact of the move

During the move there may be some downtime. The affected repositories will be inaccessible during
the move process. Also, the documentation processes will be paused. After the move, we need to test
access permissions and automation scripts.
 

After these tasks are complete, access and operations will return to normal. GitHub automatically
redirects requests to the old repo URL to the new location.
 

For more information about transferring repositories in GitHub, see About repository transfers

If the transferred repository has any forks, then those forks will remain associated with the
repository after the transfer is complete.
  • All Git information about commits, including contributions, are preserved.
  • All of the issues and pull requests remain intact when transferring a repository.
  • All links to the previous repository location are automatically redirected to the new location.


When you use git clone, git fetch, or git push on a transferred repository, these commands will r
edirect to the new repository location or URL.

However, to avoid confusion, we strongly recommend updating any existing local clones to point to
the new repository URL.
For more information, see Changing a remote’s URL.

The following example shows how to change the “upstream” remote to point to the new location:

[Wed 06:08PM] [staging =]
PS C:GitPS-DocsPowerShell-Docs> git remote -v
origin  https://github.com/sdwheeler/PowerShell-Docs.git (fetch)
origin  https://github.com/sdwheeler/PowerShell-Docs.git (push)
upstream        https://github.com/PowerShell/PowerShell-Docs.git (fetch)
upstream        https://github.com/PowerShell/PowerShell-Docs.git (push)

[Wed 06:09PM] [staging =]
PS C:GitPS-DocsPowerShell-Docs> git remote set-url upstream https://github.com/MicrosoftDocs/PowerShell-Docs.git

[Wed 06:10PM] [staging =]
PS C:GitPS-DocsPowerShell-Docs> git remote -v
origin  https://github.com/sdwheeler/PowerShell-Docs.git (fetch)
origin  https://github.com/sdwheeler/PowerShell-Docs.git (push)
upstream        https://github.com/MicrosoftDocs/PowerShell-Docs.git (fetch)
upstream        https://github.com/MicrosoftDocs/PowerShell-Docs.git (push)


Which repositories were moved?

 

The following repositories were transferred:

  • PowerShell/PowerShell-Docs
  • PowerShell/powerShell-Docs.cs-cz
  • PowerShell/powerShell-Docs.de-de
  • PowerShell/powerShell-Docs.es-es
  • PowerShell/powerShell-Docs.fr-fr
  • PowerShell/powerShell-Docs.hu-hu
  • PowerShell/powerShell-Docs.it-it
  • PowerShell/powerShell-Docs.ja-jp
  • PowerShell/powerShell-Docs.ko-kr
  • PowerShell/powerShell-Docs.nl-nl
  • PowerShell/powerShell-Docs.pl-pl
  • PowerShell/powerShell-Docs.pt-br
  • PowerShell/powerShell-Docs.pt-pt
  • PowerShell/powerShell-Docs.ru-ru
  • PowerShell/powerShell-Docs.sv-se
  • PowerShell/powerShell-Docs.tr-tr
  • PowerShell/powerShell-Docs.zh-cn
  • PowerShell/powerShell-Docs.zh-tw

Call to action

If you have a fork that you cloned, change your remote configuration to point to the new upstream URL.
Help us make the documentation better.
  • Submit issues when you find a problem in the docs.
  • Suggest fixes to documentation by submitting changes through the PR process.
Sean Wheeler
Senior Content Developer for PowerShell
https://github.com/sdwheeler

Announcing the PowerShell Preview Extension in VSCode

This post was originally published on this site

Preview builds of the PowerShell extension are now available in VSCode

We are excited to announce the PowerShell Preview extension in the VSCode marketplace!
The PowerShell Preview extension allows users on Windows PowerShell 5.1, Powershell 6.0, and all newer versions to get and test the latest updates to the PowerShell extension and comes with some exciting features. The PowerShell Preview extension is a substitute for the PowerShell extension so both the PowerShell extension and the PowerShell Preview
extension should not be enabled at the same time.

Features of the PowerShell Preview extension

The PowerShell Preview extension is built on .NET Standard thereby enabling simplification of our code and dependency structure.

The PowerShell Preview extension also contains PSReadLine support in the integrated console for Windows behind a
feature flag. PSReadLine provides a consistent and rich interactive experience, including syntax coloring and
multi-line editing and history, in the PowerShell console, in Cloud Shell, and now in VSCode terminal.
For more information on the benefits of PSReadLine, check out their documentation.

To enable PSReadLine support in the Preview version on Windows, please add the following to your user settings:

"powershell.developer.featureFlags": [ "PSReadLine" ]

HUGE thanks to @SeeminglyScience for all his amazing work getting PSReadLine working in PowerShell Editor Services!

Why we created the PowerShell Preview extension

By having a preview channel, which supports Windows Powershell 5.1 and PowerShell Core 6, in addition to our existing stable channel, we can get new features out faster. PSReadLine support for the VSCode integrated console is a great
example of a feature that the preview build makes possible. Having a preview channel also allows us to get more feedback
on new features and to iterate on changes before they arrive in our stable channel.

How to Get/Use the PowerShell Preview extension

If you dont already have VSCode, start here.

Once you have VSCode open, click Clt+Shift+X to open the extensions marketplace.
Next, type PowerShell Preview in the search bar.
Click Install on the PowerShell Preview page.
Finally, click Reload in order to refresh VSCode.

If you already have the PowerShell extension please disable it to use the Powershell Preview extension.
To disable the PowerShell extension find it in the Extensions sidebar view, specifically under the list of Enabled extensions, Right-click on the PowerShell extension and select Disable. Please note that it is important to only have either the
PowerShell extension or the PowerShell Preview extension endabled at one time.

Breaking Changes

As stated above, this version of the PowerShell extension only works with Windows PowerShell versions 5.1 and
PowerShell Core 6.

Reporting Feedback

An important benefit of a preview extension is getting feedback from users.
To report issues with the extension use our GitHub repo.
When reporting issues be sure to specify the version of the extension you are using.

Sydney Smith
Program Manager
PowerShell Team

The post Announcing the PowerShell Preview Extension in VSCode appeared first on Powershell.

Announcing the PowerShell Preview Extension in VSCode

This post was originally published on this site

Preview builds of the PowerShell extension are now available in VSCode

We are excited to announce the PowerShell Preview extension in the VSCode marketplace!
The PowerShell Preview extension allows users on Windows PowerShell 5.1, Powershell 6.0, and all newer versions to get and test the latest updates to the PowerShell extension and comes with some exciting features. The PowerShell Preview extension is a substitute for the PowerShell extension so both the PowerShell extension and the PowerShell Preview
extension should not be enabled at the same time.

Features of the PowerShell Preview extension

The PowerShell Preview extension is built on .NET Standard thereby enabling simplification of our code and dependency structure.

The PowerShell Preview extension also contains PSReadLine support in the integrated console for Windows behind a
feature flag. PSReadLine provides a consistent and rich interactive experience, including syntax coloring and
multi-line editing and history, in the PowerShell console, in Cloud Shell, and now in VSCode terminal.
For more information on the benefits of PSReadLine, check out their documentation.

To enable PSReadLine support in the Preview version on Windows, please add the following to your user settings:

"powershell.developer.featureFlags": [ "PSReadLine" ]

HUGE thanks to @SeeminglyScience for all his amazing work getting PSReadLine working in PowerShell Editor Services!

Why we created the PowerShell Preview extension

By having a preview channel, which supports Windows Powershell 5.1 and PowerShell Core 6, in addition to our existing stable channel, we can get new features out faster. PSReadLine support for the VSCode integrated console is a great
example of a feature that the preview build makes possible. Having a preview channel also allows us to get more feedback
on new features and to iterate on changes before they arrive in our stable channel.

How to Get/Use the PowerShell Preview extension

If you dont already have VSCode, start here.

Once you have VSCode open, click Clt+Shift+X to open the extensions marketplace.
Next, type PowerShell Preview in the search bar.
Click Install on the PowerShell Preview page.
Finally, click Reload in order to refresh VSCode.

If you already have the PowerShell extension please disable it to use the Powershell Preview extension.
To disable the PowerShell extension find it in the Extensions sidebar view, specifically under the list of Enabled extensions, Right-click on the PowerShell extension and select Disable. Please note that it is important to only have either the
PowerShell extension or the PowerShell Preview extension endabled at one time.
How to Disable

Breaking Changes

As stated above, this version of the PowerShell extension only works with Windows PowerShell versions 5.1 and
PowerShell Core 6.

Reporting Feedback

An important benefit of a preview extension is getting feedback from users.
To report issues with the extension use our GitHub repo.
When reporting issues be sure to specify the version of the extension you are using.

Sydney Smith
Program Manager
PowerShell Team

Parsing Text with PowerShell (1/3)

This post was originally published on this site

This is the first post in a three part series.

  • Part 1:
    • Useful methods on the String class
    • Introduction to Regular Expressions
    • The Select-String cmdlet
  • Part 2:
    • The -split operator
    • The -match operator
    • The switch statement
    • The Regex class
  • Part 3:
    • A real world, complete and slightly bigger, example of a switch-based parser

A task that appears regularly in my workflow is text parsing. It may be about getting a token from a single line of text or about turning the text output of native tools into structured objects so I can leverage the power of PowerShell.

I always strive to create structure as early as I can in the pipeline, so that later on I can reason about the content as properties on objects instead of as text at some offset in a string. This also helps with sorting, since the properties can have their correct type, so that numbers, dates etc. are sorted as such and not as text.

There are a number of options available to a PowerShell user, and I’m giving an overview here of the most common ones.

This is not a text about how to create a high performance parser for a language with a structured EBNF grammar. There are better tools available for that, for example ANTLR.

.Net methods on the string class

Any treatment of string parsing in PowerShell would be incomplete if it didn’t mention the methods on the string class.
There are a few methods that I’m using more often than others when parsing strings:

Name Description
Substring(int startIndex) Retrieves a substring from this instance. The substring starts at a specified character position and continues to the end of the string.
Substring(int startIndex, int length) Retrieves a substring from this instance. The substring starts at a specified character position and has a specified length.
IndexOf(string value) Reports the zero-based index of the first occurrence of the specified string in this instance.
IndexOf(string value, int startIndex) Reports the zero-based index of the first occurrence of the specified string in this instance. The search starts at a specified character position.
LastIndexOf(string value) Reports the zero-based index of the last occurrence of the specified string in this instance. Often used together with Substring.
LastIndexOf(string value, int startIndex) Reports the zero-based index position of the last occurrence of a specified string within this instance. The search starts at a specified character position and proceeds backward toward the beginning of the string.

This is a minor subset of the available functions. It may be well worth your time to read up on the string class since it is so fundamental in PowerShell.
Docs are found here.

As an example, this can be useful when we have very large input data of comma-separated input with 15 columns and we are only interested in the third column from the end. If we were to use the -split ',' operator, we would create 15 new strings and an array for each line. On the other hand, using LastIndexOf on the input string a few times and then SubString to get the value of interest is faster and results in just one new string.

function parseThirdFromEnd([string]$line){
    $i = $line.LastIndexOf(",")             # get the last separator
    $i = $line.LastIndexOf(",", $i - 1)     # get the second to last separator, also the end of the column we are interested in
    $j = $line.LastIndexOf(",", $i - 1)     # get the separator before the column we want
    $j++                                    # more forward past the separator
    $line.SubString($j,$i-$j)               # get the text of the column we are looking for
}

In this sample, I ignore that the IndexOf and LastIndexOf returns -1 if they cannot find the text to search for. From experience, I also know that it is easy to mess up the index arithmetics.
So while using these methods can improve performance, it is also more error prone and a lot more to type. I would only resort to this when I know the input data is very large and performance is an issue. So this is not a recommendation, or a starting point, but something to resort to.

On rare occasions, I write the whole parser in C#. An example of this is in a module wrapping the Perforce version control system, where the command line tool can output python dictionaries. It is a binary format, and the use case was complicated enough that I was more comfortable with a compiler checked implementation language.

Regular Expressions

Almost all of the parsing options in PowerShell make use of regular expressions, so I will start with a short intro of some regular expression concepts that are used later in these posts.

Regular expressions are very useful to know when writing simple parsers since they allow us to express patterns of interest and to capture text that matches those patterns.

It is a very rich language, but you can get quite a long way by learning a few key parts. I’ve found regular-expressions.info to be a good online resource for more information. It is not written directly for the .net regex implementation, but most of the information is valid across the different implementations.

Regex Description
* Zero or more of the preceding character. a* matches the empty string, a, aa, etc, but not b.
+ One or more of the preceding character. a+ matches a, aa, etc, but not the empty string or b.
. Matches any character
[ax1] Any of a,x,1
a-d matches any of a,b,c,d
w The w meta character is used to find a word character. A word character is a character from a-z, A-Z, 0-9, including the _ (underscore) character. It also matches variants of the characters such as ??? and ???.
W The inversion of w. Matches any non-word character
s The s meta character is used to find white space
S The inversion of s. Matches any non-whitespace character
d Matches digits
D The inversion of d. Matches non-digits
b Matches a word boundary, that is, the position between a word and a space.
B The inversion of b. . erB matches the er in verb but not the er in never.
^ The beginning of a line
$ The end of a line
(<expr>) Capture groups

Combining these, we can create a pattern like below to match a text like:

Text Pattern
" 42,Answer" ^s+d+,.+

The above pattern can be written like this using the x (ignore pattern whitespace) modifier.

Starting the regex with (?x) ignores whitespace in the pattern (it has to be specified explicitly, with s) and also enables the comment character #.

(?x)  # this regex ignores whitespace in the pattern. Makes it possible do document a regex with comments.
^     # the start of the line
s+   # one or more whitespace character
d+   # one or more digits
,     # a comma
.+    # one or more characters of any kind

By using capture groups, we make it possible to refer back to specific parts of a matched expression.

Text Pattern
" 42,Answer" ^s+(d+),(.+)
(?x)  # this regex ignores whitespace in the pattern. Makes it possible to document a regex with comments.
^     # the start of the line
s+   # one or more whitespace character
(d+) # capture one or more digits in the first group (index 1)
,     # a comma
(.+)  # capture one or more characters of any kind in the second group (index 2)

Naming regular expression groups

There is a construct called named capturing groups, (?<group_name>pattern), that will create a capture group with a designated name.

The regex above can be rewritten like this, which allows us to refer to the capture groups by name instead of by index.

^s+(?<num>d+),(?<text>.+)

Different languages have implementation specific solutions to accessing the values of the captured groups. We will see later on in this series how it is done in PowerShell.

The Select-String cmdlet

The Select-String command is a work horse, and is very powerful when you understand the output it produces.
I use it mainly when searching for text in files, but occasionally also when looking for something in command output and similar.

The key to being efficient with Select-String is to know how to get to the matched patterns in the output. In its internals, it uses the same regex class as the -match and -split operator, but instead of populating a global variable with the resulting groups, as -match does, it writes an object to the pipeline, with a Matches property that contains the results of the match.

Set-Content twitterData.txt -value @"
Lee, Steve-@Steve_MSFT,2992
Lee Holmes-13000 @Lee_Holmes
Staffan Gustafsson-463 @StaffanGson
Tribbiani, Joey-@Matt_LeBlanc,463400
"@

# extracting captured groups
Get-ChildItem twitterData.txt |
    Select-String -Pattern "^(w+) ([^-]+)-(d+) (@w+)" |
    Foreach-Object {
        $first, $last, $followers, $handle = $_.Matches[0].Groups[1..4].Value   # this is a common way of getting the groups of a call to select-string
        [PSCustomObject] @{
            FirstName = $first
            LastName = $last
            Handle = $handle
            TwitterFollowers = [int] $followers
        }
    }

FirstName LastName   Handle       TwitterFollowers
--------- --------   ------       ----------------
Lee       Holmes     @Lee_Holmes             13000
Staffan   Gustafsson @StaffanGson              463

Support for Multiple Patterns

As we can see above, only half of the data matched the pattern to Select-String.

A technique that I find useful is to take advantage of the fact that Select-String supports the use of multiple patterns.

The lines of input data in twitterData.txt contain the same type of information, but they’re formatted in slightly different ways.
Using multiple patterns in combination with named capture groups makes it a breeze to extract the groups even when the positions of the groups differ.

$firstLastPattern = "^(?<first>w+) (?<last>[^-]+)-(?<followers>d+) (?<handle>@.+)"
$lastFirstPattern = "^(?<last>[^s,]+),s+(?<first>[^-]+)-(?<handle>@[^,]+),(?<followers>d+)"
Get-ChildItem twitterData.txt |
     Select-String -Pattern $firstLastPattern, $lastFirstPattern |
    Foreach-Object {
        # here we access the groups by name instead of by index
        $first, $last, $followers, $handle = $_.Matches[0].Groups['first', 'last', 'followers', 'handle'].Value
        [PSCustomObject] @{
            FirstName = $first
            LastName = $last
            Handle = $handle
            TwitterFollowers = [int] $followers
        }
    }

FirstName LastName   Handle        TwitterFollowers
--------- --------   ------        ----------------
Steve     Lee        @Steve_MSFT               2992
Lee       Holmes     @Lee_Holmes              13000
Staffan   Gustafsson @StaffanGson               463
Joey      Tribbiani  @Matt_LeBlanc           463400

Breaking down the $firstLastPattern gives us

(?x)                # this regex ignores whitespace in the pattern. Makes it possible do document a regex with comments.
^                   # the start of the line
(?<first>w+)       # capture one or more of any word characters into a group named 'first'
s                  # a space
(?<last>[^-]+)      # capture one of more of any characters but `-` into a group named 'last'
-                   # a '-'
(?<followers>d+)   # capture 1 or more digits into a group named 'followers'
s                  # a space
(?<handle>@.+)      # capture a '@' followed by one or more characters into a group named 'handle'

The second regex is similar, but with the groups in different order. But since we retrieve the groups by name, we don’t have to care about the positions of the capture groups, and multiple assignment works fine.

Context around Matches

Select-String also has a Context parameter which accepts an array of one or two numbers specifying the number of lines before and after a match that should be captured. All text parsing techniques in this post can be used to parse information from the context lines.
The result object has a Context property, that returns an object with PreContext and PostContext properties, both of the type string[].

This can be used to get the second line before a match:

# using the context property
Get-ChildItem twitterData.txt |
    Select-String -Pattern "Staffan" -Context 2,1 |
    Foreach-Object { $_.Context.PreContext[1], $_.Context.PostContext[0] }

Lee Holmes-13000 @Lee_Holmes
Tribbiani, Joey-@Matt_LeBlanc,463400

To understand the indexing of the Pre- and PostContext arrays, consider the following:

Lee, Steve-@Steve_MSFT,2992                  &lt;- PreContext[0]
Lee Holmes-13000 @Lee_Holmes                 &lt;- PreContext[1]
Staffan Gustafsson-463 @StaffanGson          &lt;- Pattern matched this line
Tribbiani, Joey-@Matt_LeBlanc,463400         &lt;- PostContext[0]

The pipeline support of Select-String makes it different from the other parsing tools available in PowerShell, and makes it the undisputed king of one-liners.

I would like stress how much more useful Select-String becomes once you understand how to get to the parts of the matches.

Summary

We have looked at useful methods of the string class, especially how to use Substring to get to text at a specific offset. We also looked at regular expression, a language used to describe patterns in text, and on the Select-String cmdlet, which makes heavy use of regular expression.

Next time, we will look at the operators -split and -match, the switch statement (which is surprisingly useful for text parsing), and the regex class.

Staffan Gustafsson, @StaffanGson, github

Thanks to Jason Shirk, Mathias Jessen and Steve Lee for reviews and feedback.

The post Parsing Text with PowerShell (1/3) appeared first on Powershell.

Parsing Text with PowerShell (1/3)

This post was originally published on this site

This is the first post in a three part series.

  • Part 1:
    • Useful methods on the String class
    • Introduction to Regular Expressions
    • The Select-String cmdlet
  • Part 2:
    • The -split operator
    • The -match operator
    • The switch statement
    • The Regex class
  • Part 3:
    • A real world, complete and slightly bigger, example of a switch-based parser

A task that appears regularly in my workflow is text parsing. It may be about getting a token from a single line of text or about turning the text output of native tools into structured objects so I can leverage the power of PowerShell.

I always strive to create structure as early as I can in the pipeline, so that later on I can reason about the content as properties on objects instead of as text at some offset in a string. This also helps with sorting, since the properties can have their correct type, so that numbers, dates etc. are sorted as such and not as text.

There are a number of options available to a PowerShell user, and I’m giving an overview here of the most common ones.

This is not a text about how to create a high performance parser for a language with a structured EBNF grammar. There are better tools available for that, for example ANTLR.

.Net methods on the string class

Any treatment of string parsing in PowerShell would be incomplete if it didn’t mention the methods on the string class.
There are a few methods that I’m using more often than others when parsing strings:

Name Description
Substring(int startIndex) Retrieves a substring from this instance. The substring starts at a specified character position and continues to the end of the string.
Substring(int startIndex, int length) Retrieves a substring from this instance. The substring starts at a specified character position and has a specified length.
IndexOf(string value) Reports the zero-based index of the first occurrence of the specified string in this instance.
IndexOf(string value, int startIndex) Reports the zero-based index of the first occurrence of the specified string in this instance. The search starts at a specified character position.
LastIndexOf(string value) Reports the zero-based index of the first occurrence of the specified string in this instance. Often used together with Substring.
LastIndexOf(string value, int startIndex) Reports the zero-based index position of the last occurrence of a specified string within this instance. The search starts at a specified character position and proceeds backward toward the beginning of the string.

This is a minor subset of the available functions. It may be well worth your time to read up on the string class since it is so fundamental in PowerShell.
Docs are found here.

As an example, this can be useful when we have very large input data of comma-separated input with 15 columns and we are only interested in the third column from the end. If we were to use the -split ',' operator, we would create 15 new strings and an array for each line. On the other hand, using LastIndexOf on the input string a few times and then SubString to get the value of interest is faster and results in just one new string.

function parseThirdFromEnd([string]$line){
    $i = $line.LastIndexOf(",")             # get the last separator
    $i = $line.LastIndexOf(",", $i - 1)     # get the second to last separator, also the end of the column we are interested in
    $j = $line.LastIndexOf(",", $i - 1)     # get the separator before the column we want
    $j++                                    # more forward past the separator
    $line.SubString($j,$i-$j)               # get the text of the column we are looking for
}

In this sample, I ignore that the IndexOf and LastIndexOf returns -1 if they cannot find the text to search for. From experience, I also know that it is easy to mess up the index arithmetics.
So while using these methods can improve performance, it is also more error prone and a lot more to type. I would only resort to this when I know the input data is very large and performance is an issue. So this is not a recommendation, or a starting point, but something to resort to.

On rare occasions, I write the whole parser in C#. An example of this is in a module wrapping the Perforce version control system, where the command line tool can output python dictionaries. It is a binary format, and the use case was complicated enough that I was more comfortable with a compiler checked implementation language.

Regular Expressions

Almost all of the parsing options in PowerShell make use of regular expressions, so I will start with a short intro of some regular expression concepts that are used later in these posts.

Regular expressions are very useful to know when writing simple parsers since they allow us to express patterns of interest and to capture text that matches those patterns.

It is a very rich language, but you can get quite a long way by learning a few key parts. I’ve found regular-expressions.info to be a good online resource for more information. It is not written directly for the .net regex implementation, but most of the information is valid across the different implementations.

Regex Description
* Zero or more of the preceding character. a* matches the empty string, a, aa, etc, but not b.
+ One or more of the preceding character. a+ matches a, aa, etc, but not the empty string or b.
. Matches any character
[ax1] Any of a,x,1
a-d matches any of a,b,c,d
w The w meta character is used to find a word character. A word character is a character from a-z, A-Z, 0-9, including the _ (underscore) character. It also matches variants of the characters such as ??? and ???.
W The inversion of w. Matches any non-word character
s The s meta character is used to find white space
S The inversion of s. Matches any non-whitespace character
d Matches digits
D The inversion of d. Matches non-digits
b Matches a word boundary, that is, the position between a word and a space.
B The inversion of b. . erB matches the er in verb but not the er in never.
^ The beginning of a line
$ The end of a line
(<expr>) Capture groups

Combining these, we can create a pattern like below to match a text like:

Text Pattern
" 42,Answer" ^s+d+,.+

The above pattern can be written like this using the x (ignore pattern whitespace) modifier.

Starting the regex with (?x) ignores whitespace in the pattern (it has to be specified explicitly, with s) and also enables the comment character #.

(?x)  # this regex ignores whitespace in the pattern. Makes it possible do document a regex with comments.
^     # the start of the line
s+   # one or more whitespace character
d+   # one or more digits
,     # a comma
.+    # one or more characters of any kind

By using capture groups, we make it possible to refer back to specific parts of a matched expression.

Text Pattern
" 42,Answer" ^s+(d+),(.+)
(?x)  # this regex ignores whitespace in the pattern. Makes it possible to document a regex with comments.
^     # the start of the line
s+   # one or more whitespace character
(d+) # capture one or more digits in the first group (index 1)
,     # a comma
(.+)  # capture one or more characters of any kind in the second group (index 2)

Naming regular expression groups

There is a construct called named capturing groups, (?<group_name>pattern), that will create a capture group with a designated name.

The regex above can be rewritten like this, which allows us to refer to the capture groups by name instead of by index.

^s+(?<num>d+),(?<text>.+)

Different languages have implementation specific solutions to accessing the values of the captured groups. We will see later on in this series how it is done in PowerShell.

The Select-String cmdlet

The Select-String command is a work horse, and is very powerful when you understand the output it produces.
I use it mainly when searching for text in files, but occasionally also when looking for something in command output and similar.

The key to being efficient with Select-String is to know how to get to the matched patterns in the output. In its internals, it uses the same regex class as the -match and -split operator, but instead of populating a global variable with the resulting groups, as -match does, it writes an object to the pipeline, with a Matches property that contains the results of the match.

Set-Content twitterData.txt -value @"
Lee, Steve-@Steve_MSFT,2992
Lee Holmes-13000 @Lee_Holmes
Staffan Gustafsson-463 @StaffanGson
Tribbiani, Joey-@Matt_LeBlanc,463400
"@

# extracting captured groups
Get-ChildItem twitterData.txt |
    Select-String -Pattern "^(w+) ([^-]+)-(d+) (@w+)" |
    Foreach-Object {
        $first, $last, $followers, $handle = $_.Matches[0].Groups[1..4].Value   # this is a common way of getting the groups of a call to select-string
        [PSCustomObject] @{
            FirstName = $first
            LastName = $last
            Handle = $handle
            TwitterFollowers = [int] $followers
        }
    }
FirstName LastName   Handle       TwitterFollowers
--------- --------   ------       ----------------
Lee       Holmes     @Lee_Holmes             13000
Staffan   Gustafsson @StaffanGson              463

Support for Multiple Patterns

As we can see above, only half of the data matched the pattern to Select-String.

A technique that I find useful is to take advantage of the fact that Select-String supports the use of multiple patterns.

The lines of input data in twitterData.txt contain the same type of information, but they’re formatted in slightly different ways.
Using multiple patterns in combination with named capture groups makes it a breeze to extract the groups even when the positions of the groups differ.

$firstLastPattern = "^(?<first>w+) (?<last>[^-]+)-(?<followers>d+) (?<handle>@.+)"
$lastFirstPattern = "^(?<last>[^s,]+),s+(?<first>[^-]+)-(?<handle>@[^,]+),(?<followers>d+)"
Get-ChildItem twitterData.txt |
     Select-String -Pattern $firstLastPattern, $lastFirstPattern |
    Foreach-Object {
        # here we access the groups by name instead of by index
        $first, $last, $followers, $handle = $_.Matches[0].Groups['first', 'last', 'followers', 'handle'].Value
        [PSCustomObject] @{
            FirstName = $first
            LastName = $last
            Handle = $handle
            TwitterFollowers = [int] $followers
        }
    }
FirstName LastName   Handle        TwitterFollowers
--------- --------   ------        ----------------
Steve     Lee        @Steve_MSFT               2992
Lee       Holmes     @Lee_Holmes              13000
Staffan   Gustafsson @StaffanGson               463
Joey      Tribbiani  @Matt_LeBlanc           463400

Breaking down the $firstLastPattern gives us

(?x)                # this regex ignores whitespace in the pattern. Makes it possible do document a regex with comments.
^                   # the start of the line
(?<first>w+)       # capture one or more of any word characters into a group named 'first'
s                  # a space
(?<last>[^-]+)      # capture one of more of any characters but `-` into a group named 'last'
-                   # a '-'
(?<followers>d+)   # capture 1 or more digits into a group named 'followers'
s                  # a space
(?<handle>@.+)      # capture a '@' followed by one or more characters into a group named 'handle'

The second regex is similar, but with the groups in different order. But since we retrieve the groups by name, we don’t have to care about the positions of the capture groups, and multiple assignment works fine.

Context around Matches

Select-String also has a Context parameter which accepts an array of one or two numbers specifying the number of lines before and after a match that should be captured. All text parsing techniques in this post can be used to parse information from the context lines.
The result object has a Context property, that returns an object with PreContext and PostContext properties, both of the type string[].

This can be used to get the second line before a match:

# using the context property
Get-ChildItem twitterData.txt |
    Select-String -Pattern "Staffan" -Context 2,1 |
    Foreach-Object { $_.Context.PreContext[1], $_.Context.PostContext[0] }
Lee Holmes-13000 @Lee_Holmes
Tribbiani, Joey-@Matt_LeBlanc,463400

To understand the indexing of the Pre- and PostContext arrays, consider the following:

Lee, Steve-@Steve_MSFT,2992                  <- PreContext[0]
Lee Holmes-13000 @Lee_Holmes                 <- PreContext[1]
Staffan Gustafsson-463 @StaffanGson          <- Pattern matched this line
Tribbiani, Joey-@Matt_LeBlanc,463400         <- PostContext[0]

The pipeline support of Select-String makes it different from the other parsing tools available in PowerShell, and makes it the undisputed king of one-liners.

I would like stress how much more useful Select-String becomes once you understand how to get to the parts of the matches.

Summary

We have looked at useful methods of the string class, especially how to use Substring to get to text at a specific offset. We also looked at regular expression, a language used to describe patterns in text, and on the Select-String cmdlet, which makes heavy use of regular expression.

Next time, we will look at the operators -split and -match, the switch statement (which is surprisingly useful for text parsing), and the regex class.

Staffan Gustafsson, @StaffanGson, github

Thanks to Jason Shirk, Mathias Jessen and Steve Lee for reviews and feedback.

Windows Security change affecting PowerShell

This post was originally published on this site

Windows Security change affecting PowerShell

January 9, 2019

The recent (1/8/2019) Windows security patch CVE-2019-0543, has introduced a breaking change for a PowerShell remoting scenario. It is a narrowly scoped scenario that should have low impact for most users.

The breaking change only affects local loopback remoting, which is a PowerShell remote connection made back to the same machine, while using non-Administrator credentials.

PowerShell remoting endpoints do not allow access to non-Administrator accounts by default. However, it is possible to modify endpoint configurations, or create new custom endpoint configurations, that do allow non-Administrator account access. So you would not be affected by this change, unless you explicitly set up loopback endpoints on your machine to allow non-Administrator account access.

Example of broken loopback scenario

# Create endpoint that allows Users group access
PS > Register-PSSessionConfiguration -Name MyNonAdmin -SecurityDescriptorSddl 'O:NSG:BAD:P(A;;GA;;;BA)(A;;GA;;;BU)S:P(AU;FA;GA;;;WD)(AU;SA;GXGW;;;WD)' -Force

# Create non-Admin credential
PS > $nonAdminCred = Get-Credential ~NonAdminUser

# Create a loopback remote session to custom endpoint using non-Admin credential
PS > $session = New-PSSession -ComputerName localhost -ConfigurationName MyNonAdmin -Credential $nonAdminCred

New-PSSession : [localhost] Connecting to remote server localhost failed with the following error message : The WSMan
service could not launch a host process to process the given request.  Make sure the WSMan provider host server and
proxy are properly registered. For more information, see the about_Remote_Troubleshooting Help topic.
At line:1 char:1
+ New-PSSession -ComputerName localhost -ConfigurationName MyNonAdmin - ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OpenError: (System.Manageme....RemoteRunspace:RemoteRunspace) [New-PSSession], PSRemotin
   gTransportException
    + FullyQualifiedErrorId : -2146959355,PSSessionOpenFailed

The above example fails only when using non-Administrator credentials, and the connection is made back to the same machine (localhost). Administrator credentials still work. And the above scenario will work when remoting off-box to another machine.

Example of working loopback scenario

# Create Admin credential
PS > $adminCred = Get-Credential ~AdminUser

# Create a loopback remote session to custom endpoint using Admin credential
PS > $session = New-PSSession -ComputerName localhost -ConfigurationName MyNonAdmin -Credential $adminCred
PS > $session

 Id Name            ComputerName    ComputerType    State         ConfigurationName     Availability
 -- ----            ------------    ------------    -----         -----------------     ------------
  1 WinRM1          localhost       RemoteMachine   Opened        MyNonAdmin               Available

The above example uses Administrator credentials to the same MyNonAdmin custom endpoint, and the connection is made back to the same machine (localhost). The session is created successfully using Administrator credentials.

The breaking change is not in PowerShell but in a system security fix that restricts process creation between Windows sessions. This fix is preventing WinRM (which PowerShell uses as a remoting transport and host) from successfully creating the remote session host, for this particular scenario. There are no plans to update WinRM.

This affects Windows PowerShell and PowerShell Core 6 (PSCore6) WinRM based remoting.

This does not affect SSH remoting with PSCore6.

This does not affect JEA (Just Enough Administration) sessions.

A workaround for a loopback connection is to always use Administrator credentials.

Another option is to use PSCore6 with SSH remoting.

Paul Higinbotham
Senior Software Engineer
PowerShell Team

The post Windows Security change affecting PowerShell appeared first on Powershell.

Windows Security change affecting PowerShell

This post was originally published on this site

Windows Security change affecting PowerShell

January 9, 2019

The recent (1/8/2019) Windows security patch CVE-2019-0543, has introduced a breaking change for a PowerShell remoting scenario. It is a narrowly scoped scenario that should have low impact for most users.

The breaking change only affects local loopback remoting, which is a PowerShell remote connection made back to the same machine, while using non-Administrator credentials.

PowerShell remoting endpoints do not allow access to non-Administrator accounts by default. However, it is possible to modify endpoint configurations, or create new custom endpoint configurations, that do allow non-Administrator account access. So you would not be affected by this change, unless you explicitly set up loopback endpoints on your machine to allow non-Administrator account access.

Example of broken loopback scenario

# Create endpoint that allows Users group access
PS > Register-PSSessionConfiguration -Name MyNonAdmin -SecurityDescriptorSddl 'O:NSG:BAD:P(A;;GA;;;BA)(A;;GA;;;BU)S:P(AU;FA;GA;;;WD)(AU;SA;GXGW;;;WD)' -Force

# Create non-Admin credential
PS > $nonAdminCred = Get-Credential ~NonAdminUser

# Create a loopback remote session to custom endpoint using non-Admin credential
PS > $session = New-PSSession -ComputerName localhost -ConfigurationName MyNonAdmin -Credential $nonAdminCred

New-PSSession : [localhost] Connecting to remote server localhost failed with the following error message : The WSMan
service could not launch a host process to process the given request.  Make sure the WSMan provider host server and
proxy are properly registered. For more information, see the about_Remote_Troubleshooting Help topic.
At line:1 char:1
+ New-PSSession -ComputerName localhost -ConfigurationName MyNonAdmin - ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OpenError: (System.Manageme....RemoteRunspace:RemoteRunspace) [New-PSSession], PSRemotin
   gTransportException
    + FullyQualifiedErrorId : -2146959355,PSSessionOpenFailed

The above example fails only when using non-Administrator credentials, and the connection is made back to the same machine (localhost). Administrator credentials still work. And the above scenario will work when remoting off-box to another machine.

Example of working loopback scenario

# Create Admin credential
PS > $adminCred = Get-Credential ~AdminUser

# Create a loopback remote session to custom endpoint using Admin credential
PS > $session = New-PSSession -ComputerName localhost -ConfigurationName MyNonAdmin -Credential $adminCred
PS > $session

 Id Name            ComputerName    ComputerType    State         ConfigurationName     Availability
 -- ----            ------------    ------------    -----         -----------------     ------------
  1 WinRM1          localhost       RemoteMachine   Opened        MyNonAdmin               Available

The above example uses Administrator credentials to the same MyNonAdmin custom endpoint, and the connection is made back to the same machine (localhost). The session is created successfully using Administrator credentials.

The breaking change is not in PowerShell but in a system security fix that restricts process creation between Windows sessions. This fix is preventing WinRM (which PowerShell uses as a remoting transport and host) from successfully creating the remote session host, for this particular scenario. There are no plans to update WinRM.

This affects Windows PowerShell and PowerShell Core 6 (PSCore6) WinRM based remoting.

This does not affect SSH remoting with PSCore6.

This does not affect JEA (Just Enough Administration) sessions.

A workaround for a loopback connection is to always use Administrator credentials.

Another option is to use PSCore6 with SSH remoting.

Paul Higinbotham
Senior Software Engineer
PowerShell Team