-
Using Perl Regular Expressions to Replace substr Calls
Suppose we want to format digits with commas after the 2nd and 5th digits.
Ie. Convert 12345678 to 12,345,678Using substr, perl's substring method, this is achieved with:
$old = ‘12345678’ $new = substr( $old, 0, 2 ) . ',' . substr( $old, 2, 3 ) . ',' . substr( $old, 5, 3 ); # result: 123,456,78
It works and it’s fine, but perl critic will complain about the use of the numbers 0,2,3 and 5. You could go ahead define them as constants, but that’s cumbersome for trivial substr parameters.
Using a regular expression provides another option.
$old = ‘12345678’ $old =~ m/([d]{2})([d]{3})([d]{3})/; $new .= $1 . ',' . $2 . ',' . $3; #result: 123,456,78
It’s easily readable to anyone looking at your code and best of all perl critic won’t complain about it.
-
Perl Regular Expressions - Extracting substrings between two characters
Suppose we need to extract a program name from a URL.
Ie. '//www.domain.com/program_name'Simple enough with,
my $url = '//www.domain.com/program_name?extra=junk' my ($domain, $program) = $url =~ m/(.*/)(.*)/;
(.*/) tells perl to grab everything up until the furthest right '/'.
(.*) grabs everything else after.The '/' on either side of the regular expression define the delimiter.
Perl puts everything inside the () into variables for you. $1 for the first set of (), $2 is for the second, and so on. In the example above we are pre-defining our own variables for perl to put the results in.
Taking It Further
Now what if we want to cater for parameters or other variables in our url? We want to extract everything from the furthest right '/' to the end of the string or first '?' encountered.
Ie. '//www.domain.com/program_name' or '//www.domain.com/program_name?extra=junk'This does the trick,
my $url = '//www.domain.com/program_name?extra=junk' my ($domain, $program) = $url =~ m{(.*/)([^?]*)};
([^?]*) tells perl to grab everything that is not a '?'.
The [] represents a character class, where everything inside represents one character.
The ^ is a negation (everything that isn't a '?').
As the '?' is inside a character class, it does not need to be escaped.Pleasing The Critics
Perl Critic complains about the above regular expression, for a few reasons.
Probibited Escaped Characters - It doesn't like how we've used '/', because our delimited is a '/'. The fix is to use a different delimiter like '{}'.
Missing /x (Extended Format) Flag - Adding this flag allows for comments and extra whitespace in the regular expression, to make it easier to read.
Missing /m (Line Boundary Matching) Flag - Adding this flag makes boundary matching work as most would expect.
Missing /s (Dot Anything Matching) Flag - Adding this flag makes '.' match anything, instead of anything but an 'n'.
Perl Regular Expression Resources