jwigley.com

Looping through key/value pairs from a Perl scalar hash reference

If you’ve got a hash ( %hash ), it’s easy enough to loop through all the key/value pairs with the following:

foreach my $key ( keys %hash )
{
print "key: $key, value: $hash{$key}\n";
}

We could reference the hash with:

 $href = \%hash;

As the hash element is scalar, the $ notation is used.

To access the key/value pairs in a hash reference, the syntax differs slightly from the first example.

foreach my $key ( keys %{ $href } )
{
print "key: $key, value: ${$href}{$key}\n";
}

Perlreftut contains a more in depth explanation of Perl references.

September 13, 2011 - 1 minute read - perl looping key value pairs hash reference scalar

Using Perl Regular Expressions to Replace substr Calls

Suppose we want to format digits with commas after the 2nd and 5th digits.
Ie. Convert 12345678 to 12,345,678

Using substr, perl's substring method, this is achieved with:

$old = ‘12345678’
$new = substr( $old, 0, 2 ) . ',' . substr( $old, 2, 3 ) . ',' . substr( $old, 5, 3 );
# result: 123,456,78

It works and it’s fine, but perl critic will complain about the use of the numbers 0,2,3 and 5. You could go ahead define them as constants, but that’s cumbersome for trivial substr parameters.

Using a regular expression provides another option.

$old = ‘12345678’
$old =~ m/([d]{2})([d]{3})([d]{3})/;
$new .= $1 . ',' . $2 . ',' . $3;
#result: 123,456,78

It’s easily readable to anyone looking at your code and best of all perl critic won’t complain about it.

August 10, 2011 - 1 minute read - perl regex regular expressions characters substrings strings

Perl Regular Expressions - Extracting substrings between two characters

Suppose we need to extract a program name from a URL.
Ie. '//www.domain.com/program_name'

Simple enough with,

my $url = '//www.domain.com/program_name?extra=junk'
my ($domain, $program) = $url =~ m/(.*/)(.*)/;

(.*/) tells perl to grab everything up until the furthest right '/'.
(.*) grabs everything else after.

The '/' on either side of the regular expression define the delimiter.

Perl puts everything inside the () into variables for you. $1 for the first set of (), $2 is for the second, and so on. In the example above we are pre-defining our own variables for perl to put the results in.

Taking It Further

Now what if we want to cater for parameters or other variables in our url? We want to extract everything from the furthest right '/' to the end of the string or first '?' encountered.
Ie. '//www.domain.com/program_name' or '//www.domain.com/program_name?extra=junk'

This does the trick,

my $url = '//www.domain.com/program_name?extra=junk'
my ($domain, $program) = $url =~ m{(.*/)([^?]*)};

([^?]*) tells perl to grab everything that is not a '?'.
The [] represents a character class, where everything inside represents one character.
The ^ is a negation (everything that isn't a '?').
As the '?' is inside a character class, it does not need to be escaped.

Pleasing The Critics

Perl Critic complains about the above regular expression, for a few reasons.

Probibited Escaped Characters - It doesn't like how we've used '/', because our delimited is a '/'. The fix is to use a different delimiter like '{}'.

 m{(.*/)([^?]*)};

Missing /x (Extended Format) Flag - Adding this flag allows for comments and extra whitespace in the regular expression, to make it easier to read.

 m{(.*/)([^?]*)}x;

Missing /m (Line Boundary Matching) Flag - Adding this flag makes boundary matching work as most would expect.

 m{(.*/)([^?]*)}m;

Missing /s (Dot Anything Matching) Flag - Adding this flag makes '.' match anything, instead of anything but an 'n'.

 m{(.*/)([^?]*)}s;

Perl Regular Expression Resources

perlre
perlrequick
perlretut