Working with Regular Expressions in Perl

Matching with Regular Expressions (Regex)

Modifiers

/i # case insensitive

/s # match any character, regardless of newlines

/x # ignore whitespace. Example, to match floating point numbers:

-? # optional minus sign

\d+ # one or more digits

\.? # optional decimal point

\d* # zero or more optional digits after decimal point

/x # end of string

if ( /word/I ) # ignore case

$_ = "word here \n and there \n end";
 if ( /here.*there/s) ... # this would pass

if ( /barney.*fred/is) { print "String Fred is after Barney \n"; }

# the above can be rewritten as:
 if ( m{
 barney # comments here
 .* # more comments
 fred # comments
 } six) # modifiers 
 { print "String Fred is after Barney \n"; } # Notice brackets replace slashes
  • Anchors

^ # at beginning of string

$ # at end of string

\b # match whole words only. Example:

/ \bfred\b / # will only match fred, and not Frederick, Alfred, or Manfred

  • Binding Operator =~

$_ is the default value

When trying to match variables, use the binding operator. Example:

  • Persistence of Memory

An unsuccessful match leaves the previous memories intact, but a successful one resets them all

$var1 =~ /(\w+)/; # $1 contains match

$var2 =~ /(\w+)/; # $1 contains new match

if ($var2 =~ /(\w+)/) { print "var2 is $1 \n"; }
 else { print "Var2 did not match \n"; }
  • Non-capturing Parentheses

Syntax to use parenthesis that are ignored for memory capturing is: “?:”

#$_ = "brontosaurus steak"; # 1

$_ = "bonosaurus steak"; # 2

#$_ = "brontosaurus burger"; # 3

if(/(bronto)?saurus (steak|burger)/) {

print "$2\n"; # matches 1,2,3

}

if(/(?:bronto)?saurus (steak|burger)/){

print "$1\n"; # matches 1,2,3 - same as above, but match is now in $1

}
  • Named Captured Parentheses

Another way to control what goes into memory for pattern match capturing is by using capture names

Syntax is: ?<capture_name>

Only available in Perl Version 5.01 and higher

Example:

use 5.010;
 if(m/(?<name1>\w+) (steak|burger)/){ # this requires Perl 5.10

print "$+{name1} - $+{name2}\n";

}
  • Automatic Match Variables

The part of the string that actually matched the pattern is automatically stored in: $&

Whatever was before the match is in: $`

Whatever was after the match is in: $’

$_ = "a aa a bbb ccc dd e ee e";

if (/(\w+) (ccc) (\w+)/i) {

print "$1 - $2 - $3\n"; # bbb - ccc - dd

print "$& - $` - $'\n"; # bbb ccc dd - a aa a - e ee e

}
  • General Quantifiers

Quantifier in a pattern means to repeat the preceding item a certain number of times

  • *, +, ? (option)

Other quantifiers are to set specific repeat counts

  • X{3,} # match 3 or more times
  • X{2, 5} # match 2 to 5 times
  • X{5} # match exactly 5 times
$var1 = "aaabbbbccccc";
 if ( $var1 =~ /a{3,}/ ) { print "$` , $& , $' \n"; } # , aaa , bbbbccccc
 if ( $var1 =~ /b{2,5}/) { print "$` , $& , $' \n"; } # aaa , bbbb , ccccc
 if ( $var1 =~ /c{5}/ ) { print "$` , $& , $' \n"; } # aaabbbb , ccccc ,
  • Precedence

Regex Precedence goes as follows:

  • Parentheses # (…), (?:…), (?<LABEL>…)
  • Quantifiers # a* a+ a? a{n, m}
  • Anchors and Sequence # abc ^a a$
  • Alternation # a|b|c
  • Atoms # [abc] \d \1
# 1 - match "match"
 $_ = "beforematchafter";
 if ( /match/) {
 print "Matched: |$`<$&>$'| \n";
 } else {
 print "No Match: |$_| \n";
 }

# 2 - match word end with "a"
 $_ = "wilma";
 if ( /a\b/ ) {
 print "Matched: |$`<$&>$'| \n";
 } else {
 print "No Match: |$_| \n";
 }

# 3 - match word end with "a" and capture it into $1
 $_ = "wilma";
 if ( /(\w*a\b)/ ) {
 print "Matched: |$`<$&>$'| - '$1' \n";
 } else {
 print "No Match: |$_| \n";
 }

Processing text with Regular Expressions (Regex)

Regex can be used to change text, not only match

Substitutions using “s///”

# Basic Substitution
 $_ = "foey foo foobar!"; # set the default input
 print "$_ \n"; # foey foo foobar!
 s/foo/bar/; # replace foo with bar
 print "$_ \n"; # foey bar foobar!

# Complex Substitutions
 $_ = "John is out bowling with Joe tonight.";
 print "$_ \n"; # John is out bowling with Joe tonight.
 s/with (\w+)/against $1's team/; # replace 'with Joe' to 'against Joe's team'
 print "$_ \n"; # John is out bowling against Joe's team tonight.
  • Global Replacements are done using “/g”
$_ = "John is out bowling with Joe tonight.";
 print "$_ \n"; # John is out bowling with Joe tonight.
 s/^/Tonight, /; # Append 'Tonight, ' to the front of line
 print "$_ \n"; # Tonight, John is out bowling with Joe tonight.
 s/tonight/Tomorrow/gi; # global sub
 print "$_ \n"; # Tomorrow, John is out bowling with Joe Tomorrow.
 s/tonight/Tomorrow/gi; # global sub
 print "$_ \n"; # Tomorrow, John is out bowling with Joe Tomorrow.
 s/(John)/\U$1/; # case shifting
 print "$_ \n"; # Tomorrow, JOHN is out bowling with Joe Tomorrow.
 s/(Joe)/\L$1/; # case shifting
 print "$_ \n"; # Tomorrow, JOHN is out bowling with joe Tomorrow.
  • Split Operator

Split operator breaks up a string according to a pattern

Split drags the pattern through a string and returns a list of fields that were separated by the delimiter

# Split function
 @result = split /:/, "abc:def:g:h"; # result now has ("abc","def","g","h"), 4 total elements
 @result = split /:/, ":::a:b:c:::"; # result now has ("","","","a","b","c"), 6 elements
  • Join Function

Performs opposite function as split, but doesn’t use patterns

# Join function
 $myjoin = join ":", 1, 2, 3, 4, 5; 
 print $myjoin . "\n"; # 1:2:3:4:5
 $myjoin = join "foo", "bar"; 
 print $myjoin . "\n"; # bar
 $myjoin = join "foo", "bar", "zoo"; 
 print $myjoin . "\n"; # barfoozoo