Matching with Regular Expressions (Regex)
Modifiers
/i # case insensitive
/s # match any character, regardless of newlines
/x # ignore whitespace. Example, to match floating point numbers:
-? # optional minus sign
\d+ # one or more digits
\.? # optional decimal point
\d* # zero or more optional digits after decimal point
/x # end of string
if ( /word/I ) # ignore case
$_ = "word here \n and there \n end";
if ( /here.*there/s) ... # this would pass
if ( /barney.*fred/is) { print "String Fred is after Barney \n"; }
# the above can be rewritten as:
if ( m{
barney # comments here
.* # more comments
fred # comments
} six) # modifiers
{ print "String Fred is after Barney \n"; } # Notice brackets replace slashes
- Anchors
^ # at beginning of string
$ # at end of string
\b # match whole words only. Example:
/ \bfred\b / # will only match fred, and not Frederick, Alfred, or Manfred
- Binding Operator =~
$_ is the default value
When trying to match variables, use the binding operator. Example:
- Persistence of Memory
An unsuccessful match leaves the previous memories intact, but a successful one resets them all
$var1 =~ /(\w+)/; # $1 contains match
$var2 =~ /(\w+)/; # $1 contains new match
if ($var2 =~ /(\w+)/) { print "var2 is $1 \n"; }
else { print "Var2 did not match \n"; }
- Non-capturing Parentheses
Syntax to use parenthesis that are ignored for memory capturing is: “?:”
#$_ = "brontosaurus steak"; # 1
$_ = "bonosaurus steak"; # 2
#$_ = "brontosaurus burger"; # 3
if(/(bronto)?saurus (steak|burger)/) {
print "$2\n"; # matches 1,2,3
}
if(/(?:bronto)?saurus (steak|burger)/){
print "$1\n"; # matches 1,2,3 - same as above, but match is now in $1
}
- Named Captured Parentheses
Another way to control what goes into memory for pattern match capturing is by using capture names
Syntax is: ?<capture_name>
Only available in Perl Version 5.01 and higher
Example:
use 5.010;
if(m/(?<name1>\w+) (steak|burger)/){ # this requires Perl 5.10
print "$+{name1} - $+{name2}\n";
}
- Automatic Match Variables
The part of the string that actually matched the pattern is automatically stored in: $&
Whatever was before the match is in: $`
Whatever was after the match is in: $’
$_ = "a aa a bbb ccc dd e ee e";
if (/(\w+) (ccc) (\w+)/i) {
print "$1 - $2 - $3\n"; # bbb - ccc - dd
print "$& - $` - $'\n"; # bbb ccc dd - a aa a - e ee e
}
- General Quantifiers
Quantifier in a pattern means to repeat the preceding item a certain number of times
- *, +, ? (option)
Other quantifiers are to set specific repeat counts
- X{3,} # match 3 or more times
- X{2, 5} # match 2 to 5 times
- X{5} # match exactly 5 times
$var1 = "aaabbbbccccc";
if ( $var1 =~ /a{3,}/ ) { print "$` , $& , $' \n"; } # , aaa , bbbbccccc
if ( $var1 =~ /b{2,5}/) { print "$` , $& , $' \n"; } # aaa , bbbb , ccccc
if ( $var1 =~ /c{5}/ ) { print "$` , $& , $' \n"; } # aaabbbb , ccccc ,
- Precedence
Regex Precedence goes as follows:
- Parentheses # (…), (?:…), (?<LABEL>…)
- Quantifiers # a* a+ a? a{n, m}
- Anchors and Sequence # abc ^a a$
- Alternation # a|b|c
- Atoms # [abc] \d \1
# 1 - match "match"
$_ = "beforematchafter";
if ( /match/) {
print "Matched: |$`<$&>$'| \n";
} else {
print "No Match: |$_| \n";
}
# 2 - match word end with "a"
$_ = "wilma";
if ( /a\b/ ) {
print "Matched: |$`<$&>$'| \n";
} else {
print "No Match: |$_| \n";
}
# 3 - match word end with "a" and capture it into $1
$_ = "wilma";
if ( /(\w*a\b)/ ) {
print "Matched: |$`<$&>$'| - '$1' \n";
} else {
print "No Match: |$_| \n";
}
Processing text with Regular Expressions (Regex)
Regex can be used to change text, not only match
Substitutions using “s///”
# Basic Substitution $_ = "foey foo foobar!"; # set the default input print "$_ \n"; # foey foo foobar! s/foo/bar/; # replace foo with bar print "$_ \n"; # foey bar foobar! # Complex Substitutions $_ = "John is out bowling with Joe tonight."; print "$_ \n"; # John is out bowling with Joe tonight. s/with (\w+)/against $1's team/; # replace 'with Joe' to 'against Joe's team' print "$_ \n"; # John is out bowling against Joe's team tonight.
- Global Replacements are done using “/g”
$_ = "John is out bowling with Joe tonight."; print "$_ \n"; # John is out bowling with Joe tonight. s/^/Tonight, /; # Append 'Tonight, ' to the front of line print "$_ \n"; # Tonight, John is out bowling with Joe tonight. s/tonight/Tomorrow/gi; # global sub print "$_ \n"; # Tomorrow, John is out bowling with Joe Tomorrow. s/tonight/Tomorrow/gi; # global sub print "$_ \n"; # Tomorrow, John is out bowling with Joe Tomorrow. s/(John)/\U$1/; # case shifting print "$_ \n"; # Tomorrow, JOHN is out bowling with Joe Tomorrow. s/(Joe)/\L$1/; # case shifting print "$_ \n"; # Tomorrow, JOHN is out bowling with joe Tomorrow.
- Split Operator
Split operator breaks up a string according to a pattern
Split drags the pattern through a string and returns a list of fields that were separated by the delimiter
# Split function
@result = split /:/, "abc:def:g:h"; # result now has ("abc","def","g","h"), 4 total elements
@result = split /:/, ":::a:b:c:::"; # result now has ("","","","a","b","c"), 6 elements
- Join Function
Performs opposite function as split, but doesn’t use patterns
# Join function $myjoin = join ":", 1, 2, 3, 4, 5; print $myjoin . "\n"; # 1:2:3:4:5 $myjoin = join "foo", "bar"; print $myjoin . "\n"; # bar $myjoin = join "foo", "bar", "zoo"; print $myjoin . "\n"; # barfoozoo