Matching with Regular Expressions (Regex)
Modifiers
/i # case insensitive
/s # match any character, regardless of newlines
/x # ignore whitespace. Example, to match floating point numbers:
-? # optional minus sign
\d+ # one or more digits
\.? # optional decimal point
\d* # zero or more optional digits after decimal point
/x # end of string
if ( /word/I ) # ignore case $_ = "word here \n and there \n end"; if ( /here.*there/s) ... # this would pass if ( /barney.*fred/is) { print "String Fred is after Barney \n"; } # the above can be rewritten as: if ( m{ barney # comments here .* # more comments fred # comments } six) # modifiers { print "String Fred is after Barney \n"; } # Notice brackets replace slashes
- Anchors
^ # at beginning of string
$ # at end of string
\b # match whole words only. Example:
/ \bfred\b / # will only match fred, and not Frederick, Alfred, or Manfred
- Binding Operator =~
$_ is the default value
When trying to match variables, use the binding operator. Example:
- Persistence of Memory
An unsuccessful match leaves the previous memories intact, but a successful one resets them all
$var1 =~ /(\w+)/; # $1 contains match $var2 =~ /(\w+)/; # $1 contains new match if ($var2 =~ /(\w+)/) { print "var2 is $1 \n"; } else { print "Var2 did not match \n"; }
- Non-capturing Parentheses
Syntax to use parenthesis that are ignored for memory capturing is: “?:”
#$_ = "brontosaurus steak"; # 1 $_ = "bonosaurus steak"; # 2 #$_ = "brontosaurus burger"; # 3 if(/(bronto)?saurus (steak|burger)/) { print "$2\n"; # matches 1,2,3 } if(/(?:bronto)?saurus (steak|burger)/){ print "$1\n"; # matches 1,2,3 - same as above, but match is now in $1 }
- Named Captured Parentheses
Another way to control what goes into memory for pattern match capturing is by using capture names
Syntax is: ?<capture_name>
Only available in Perl Version 5.01 and higher
Example:
use 5.010; if(m/(?<name1>\w+) (steak|burger)/){ # this requires Perl 5.10 print "$+{name1} - $+{name2}\n"; }
- Automatic Match Variables
The part of the string that actually matched the pattern is automatically stored in: $&
Whatever was before the match is in: $`
Whatever was after the match is in: $’
$_ = "a aa a bbb ccc dd e ee e"; if (/(\w+) (ccc) (\w+)/i) { print "$1 - $2 - $3\n"; # bbb - ccc - dd print "$& - $` - $'\n"; # bbb ccc dd - a aa a - e ee e }
- General Quantifiers
Quantifier in a pattern means to repeat the preceding item a certain number of times
- *, +, ? (option)
Other quantifiers are to set specific repeat counts
- X{3,} # match 3 or more times
- X{2, 5} # match 2 to 5 times
- X{5} # match exactly 5 times
$var1 = "aaabbbbccccc"; if ( $var1 =~ /a{3,}/ ) { print "$` , $& , $' \n"; } # , aaa , bbbbccccc if ( $var1 =~ /b{2,5}/) { print "$` , $& , $' \n"; } # aaa , bbbb , ccccc if ( $var1 =~ /c{5}/ ) { print "$` , $& , $' \n"; } # aaabbbb , ccccc ,
- Precedence
Regex Precedence goes as follows:
- Parentheses # (…), (?:…), (?<LABEL>…)
- Quantifiers # a* a+ a? a{n, m}
- Anchors and Sequence # abc ^a a$
- Alternation # a|b|c
- Atoms # [abc] \d \1
# 1 - match "match" $_ = "beforematchafter"; if ( /match/) { print "Matched: |$`<$&>$'| \n"; } else { print "No Match: |$_| \n"; } # 2 - match word end with "a" $_ = "wilma"; if ( /a\b/ ) { print "Matched: |$`<$&>$'| \n"; } else { print "No Match: |$_| \n"; } # 3 - match word end with "a" and capture it into $1 $_ = "wilma"; if ( /(\w*a\b)/ ) { print "Matched: |$`<$&>$'| - '$1' \n"; } else { print "No Match: |$_| \n"; }
Processing text with Regular Expressions (Regex)
Regex can be used to change text, not only match
Substitutions using “s///”
# Basic Substitution $_ = "foey foo foobar!"; # set the default input print "$_ \n"; # foey foo foobar! s/foo/bar/; # replace foo with bar print "$_ \n"; # foey bar foobar! # Complex Substitutions $_ = "John is out bowling with Joe tonight."; print "$_ \n"; # John is out bowling with Joe tonight. s/with (\w+)/against $1's team/; # replace 'with Joe' to 'against Joe's team' print "$_ \n"; # John is out bowling against Joe's team tonight.
- Global Replacements are done using “/g”
$_ = "John is out bowling with Joe tonight."; print "$_ \n"; # John is out bowling with Joe tonight. s/^/Tonight, /; # Append 'Tonight, ' to the front of line print "$_ \n"; # Tonight, John is out bowling with Joe tonight. s/tonight/Tomorrow/gi; # global sub print "$_ \n"; # Tomorrow, John is out bowling with Joe Tomorrow. s/tonight/Tomorrow/gi; # global sub print "$_ \n"; # Tomorrow, John is out bowling with Joe Tomorrow. s/(John)/\U$1/; # case shifting print "$_ \n"; # Tomorrow, JOHN is out bowling with Joe Tomorrow. s/(Joe)/\L$1/; # case shifting print "$_ \n"; # Tomorrow, JOHN is out bowling with joe Tomorrow.
- Split Operator
Split operator breaks up a string according to a pattern
Split drags the pattern through a string and returns a list of fields that were separated by the delimiter
# Split function @result = split /:/, "abc:def:g:h"; # result now has ("abc","def","g","h"), 4 total elements @result = split /:/, ":::a:b:c:::"; # result now has ("","","","a","b","c"), 6 elements
- Join Function
Performs opposite function as split, but doesn’t use patterns
# Join function $myjoin = join ":", 1, 2, 3, 4, 5; print $myjoin . "\n"; # 1:2:3:4:5 $myjoin = join "foo", "bar"; print $myjoin . "\n"; # bar $myjoin = join "foo", "bar", "zoo"; print $myjoin . "\n"; # barfoozoo