Previous Section Table of Contents Next Section

Brief Summary of Perl

Perl acts like an interpreter. You type the source into a file and then tell the Perl program (usually called perl or perl.exe) to execute it. Actually, Perl compiles the program before it runs it, but the user doesn't really notice this. The only difference from a true interpreter is that syntax errors anywhere in the program cause an error before execution starts.

Perl generally treats all white space, including new lines, equally. Blocks of code are enclosed between { and }, and statements usually end with a semicolon (;).

Comments are marked with a # symbol-anything following that symbol on a line is ignored.

Data Types and Variables

Variables do not need to be declared; they can simply be used. Variables that have not been assigned a value evaluate to the special reserved value, undef.

A variable that holds one value is known as a scalar. The scalars used in this book are all numbers or strings. Scalar variable names start with a $. Perl distinguishes between strings and numbers in literals, so you can assign one or the other to a scalar variable:


$x = "Hello";


or


$y = 5;


Numbers can be treated as if they were all floating-point numbers; Perl automatically converts between integers and floating-point numbers as needed. In addition, Perl converts between strings and numbers, depending on which operator is being used. For example, the + operator is defined to be a numeric addition of two scalars, so the following statement


$x = 5 + "7";


sets $x to 12. The string "7" is automatically converted to the number 7. Similarly, the operator . concatenates two strings together, so the following statement


$x = 5 . "7";


sets $x to "57".

In practice, this means that you can almost always treat a number and its string representation as the same. When converting a string to a number, only the characters up to the first non-numeric one are evaluated, so the string "123ABC" evaluates as the number 123. A variable that is undef does the "right thing" when used in an expression, evaluating to the number 0 or an empty string as appropriate.

Assignment is done with the = sign, as previously shown, and the assignment returns a value, so you can use an assignment anywhere you would use a variable:


$x = ($y = 4);    # $x will be 4 also


Perl uses +, -, *, and / for the basic mathematical operations. % is modulo (numbers are truncated to integers first) and ** is the exponentiation operator, for example 2**16. ++ and -- work as they do in C/C++ and Java. Perl supports binary assignment operators such as +=, -=, %=, and even .= and **=:


$x += 5;

$mystring .= "(s)";


Strings

Strings can be quoted with either single or double quotes. Within single quotes, the only special characters in a string are the backslash and the single quote. \\ represents a single backslash, and \' represents a single quote, as shown in these examples:


$directory = 'windows\\system32';

$answer = 'I don\'t agree';


Within double quotes, you can specify backslash escapes such as \n and \t. Perl also does variable interpolation within double-quoted strings, which means that scalar variables (and some others) are replaced with their values:


$prompt = "Enter your name\n";

$name = "Sally";

$reply = "Hello, $name!";


The length() function returns the length of a string. substr() returns a part of a string, and index() finds the index of a match within a string. String positions are zero-based, so the first character is at position 0. You can use negative numbers to count backward from the end, with index -1 as the last character in a string. With these built-in functions, as with most functions in Perl, the parentheses around the arguments are optional, as long as the fact that it is a function call is unambiguous (nonetheless, I will continue to append () to indicate functions):


$j = length $somestring;

$secondfivechars = substr ($x, 5, 5);

$lastchar = substr $inputline, -1;

$firstspace = index($text, " ");


Function parameters can be optional; if the third parameter to substr() is not provided, the rest of the string is returned. index() can also take a third parameter, which is the offset within the string to start searching.

You can modify a string by assigning to a substring, even if the new string is not the same length as the substring:


substr($currency, index($currency, "$"), 1) = "<dollar>";


Lists

Perl uses the term list to describe an ordered collection of scalars. An array is a variable that contains a list, so the terms array and list are often thought of as being the same.

A list is specified using comma-separated scalars within parentheses, which can then be assigned to an array variable:


@mylist = (1, 2, 3);


An entire array is referenced by preceding the name with @. If a list is included in another list, the elements in the list are included-not the list itself-so you don't get the "list member that is itself a list":


@listA = ('A', 'B', 'C');

@listB = (@listA, 'D', 'E');


@listB is now a list with five elements: ('A', 'B', 'C', 'D', 'E').

The range operator (..) can be used as a shortcut for a list that is a sequence of numbers (or letters):


@numbers1to5 = (1..5);


Access to elements in an array uses zero-based indexing, and supports negative numbers to indicate counting back from the end. When indexing into an array, the array name is preceded with $, not @, except for certain circumstances that this book doesn't cover:


$firstel = $mylist[0];

$lastel = $mylist[-1];


The index of the last element in an array can be retrieved with $#arrayname. This is one less than the size because arrays are zero-based. You can assign to this value to chop the end off an array:


$array[$#array+1] = $value     # extend array by one

$#myarray = 3;   # drop all elements after the fourth one


Consistent with being one less than the size of the array, $#arrayname will be -1 for an empty array.

You can assign to an element of an array past the current size. The array is extended and any intervening elements return the value undef if accessed (as will any elements past the new end of the array):


$array[0] = "A";

$array[1] = "B";

$array[25] = "Z";  # array[2] through array[24] are undef

$k = $array[26];   # this is undef also


You can also assign to a list containing variable names, which lines up the values as you would expect. Including undef in the list means no assignment is done:


@numarray = (1..5);

($a, undef, undef, $b, undef) = @numarray;


This code sets $a to 1 and $b to 4.

You can "slice" a list by specifying a list of indices into a list or array to produce a smaller list or array. For example:


($a, $b) = @arr[2, 3];   # $a = $arr[2], $b = $arr[3]

print @arr[0..4];        # print first five elements


The range is evaluated before the slice. Because the range [0..-1] is empty, the slice @arr[0..-1] is also empty (despite the temptation to think that the -1 would refer to the last element in the list and that slice would therefore contain every element in the list).

The shift() function takes an element off the beginning of a list (index 0) and pop() takes an element off the end (index -1). unshift() and push() place an element or list of elements back on the list at the beginning or end:


$next = shift @mylist;

push (@mylist, $newelement);

$last = pop (@biglist);

unshift @numberlist, (1, 2, 3);


Hashes

A Perl hash is similar to a list, except it is indexed using strings known as keys. The entries are also unordered. To access an element of a hash, the key is surrounded by curly braces:


$iphash{"router"} = "192.0.0.1";


There can be only one value for a given key. It is replaced if a new value is assigned.

The entire contents of a hash are referred to by preceding the name with %. The functions keys() and values() return lists of the keys and values of a hash:


@machinenames = keys %iphash;

@ipaddrs = values %iphash;


Although the hash is unordered, the elements in the lists returned by keys() and values() line up as long as the hash is not modified in between.

Conditionals

Conditionals can be tested with the if statement, which is followed by a block of code inside curly braces (which are required even if the block has only one line of code):


if ($i == 5) {

    $i = 0;

}


A scalar that is equal to undef evaluates to false, as will an empty string and the number 0. To preserve the rule that a number and its string representation can be treated as equivalent, the string "0" also evaluates to false. Everything else evaluates to true. No specific boolean type exists.

If a string has a number in it, Perl does not know whether to compare it as a number or a string. Therefore, there are two complete sets of comparison operators. Numeric comparisons are done using ==, !=, <, >, <=, and >=, and string comparisons are done using eq, ne, lt, gt, le, and ge. Thus, with the following assignments


$a = "5";

$b = "10";


($a < $b) is true, but ($a lt $b) is false.

Perl uses || and && for logical or and logical and. It guarantees that in the following expression


if ((expr1) || (expr2)) {


expr2 is only evaluated if expr1 is false. (Similarly, in the case of ((expr1) && (expr2)), expr2 is only evaluated if expr1 is true.)

Perl also supports the words or and and. The difference is that or and and have lower precedence than || and &&. In particular, the = assignment operator has higher precedence than or and and but lower precedence than || and &&. Therefore, a test such as the following


($j = myfunc() || $x)


won't do what you probably expect, but


($j = myfunc() or $x)


will.

Perl supports else and elsif (note the spelling) blocks after if statements:


if ($command eq "sort") {

    do_sort();

} elsif ($command eq "print") {

    do_print();

} else {

    invalid_command();

}


Perl also supports unless, which is like if except that the sense is reversed-the unless block executes if the condition is false:


unless (defined($name)) {

    $name = "default";

}


(defined() is a built-in function that returns false if the argument is undef.) An unless statement can have elsif and else clauses, but the meaning is not reversed for those:


unless ($age < 21) {

    print ("can drive and vote\n");

} elsif ($age >= 16) {

    print ("can drive but not vote\n");

} else {

    print ("cannot drive or vote\n");

}


Loops

Perl has several ways to loop. The while loop works as it does in many other languages:


while ($k < 100) {

    $k = $k + 1;

}


There are also until loops, which execute as long as their test is false (while and until are related the same way as if and unless), and also do/while and do/until loops.

Perl has for loops that look the same as C and Java:


for ($j = 0; $j < 10; $j++) {

    print $j;

}


Perl also has foreach loops that loop through a list:


foreach $counter (0..9) {

    print $counter;

}



foreach (@mylist) {

    print $_;

}


The second example shows the Perl default variable $_. If a foreach loop does not specify the name of its loop control variable, the control variable is stored in a variable named $_. Perl uses the $_ default in other places, too. For example, by default, the print() function takes $_ as its parameter. Thus, the body of the second loop could simply have been print;.

Perl allows if, unless, while, until, and foreach to be written as modifiers to expressions, which can sometimes be easier to read:


$x += 1 unless $x > 100;

print $_ foreach (1..10);


This is just a reordering of the traditional way of writing the code. In particular, the conditional is still evaluated before the code is executed, even though it is to the right of it. With foreach written as a modifier, the control variable can't be named; it is always $_.

Inside a loop, the last statement exits the loop (which is similar to break in some other languages), the next statement moves to the next iteration of the loop (which is similar to continue in some other languages), and the redo statement restarts the current iteration without changing the control variable.

Subroutines

Perl user-defined functions (called subroutines) are declared using sub. The parameters to the subroutine are passed in the @_ array:


sub addtwo {

    return $_[0] + $_[1];

}


A return statement at the end of a subroutine is actually optional. If it is missing, the subroutine returns the value of the last expression calculated, or undef if no expressions were calculated.

Variables local to a function can be declared with the my operator, so the previous function could be written as follows:


sub addtwo {

    my ($a, $b);    # declare them local

   ($a, $b) = @_;   # list assignment

    return $a + $b;

}


You don't have to put my on a separate line. Instead, it can be applied the first time the variables are used:


my ($a, $b) = @_;


There is also a local operator, which is an older Perl operator that works sort of like my, except instead of creating a truly local variable for the subroutine, it reuses a global variable (if one exists with the same name), but saves the current value of the global until the subroutine is complete. If that wasn't clear, it really matters only if the subroutine calls another subroutine that accesses the global variable by name. (For reasons that are best left to Perl wizards to explain, you can't use my on a file handle (see "File Handles" on the next page); you have to use local.)

Scalar Versus List Context

An important concept in Perl is scalar context versus list context. This refers to where an expression is used. For example, when assigning to a scalar, the right side of the assignment statement is in scalar context. When assigning to a list, the right side of the assignment statement is in list context. The conditional expression of a while statement is in scalar context, but the expression controlling a foreach loop is in list context.

This matters because certain expressions, such as the name of an array, produce different values in list context versus scalar context. In scalar context, the name of an array returns the number of values, but in list context, it returns the entire array. Thus, you can say both of the following:


$arraysize = @myarray;  # scalar context - length of array


and


@arraycopy = @myarray;  # list context - entire array


File Handles

Scalar versus list context also matters when you deal with file handles. The most commonly used file handle is STDIN, which is the standard input to the Perl program. File handles are accessed by enclosing the handle between < and > and assigning the result to a variable. In scalar context, a file handle returns the next line of a file, or undef when the end of a file is reached. In list context, it returns every line of the file. Thus, you can loop through standard input with either of the following code lines:


while (<STDIN>) {   # scalar context

    process($_);

}


or


foreach (<STDIN>) {   # list context

    process($_);

}


But, in the first case, only one line of the input is read into memory at a time. In the second, the entire input is read into a list, which is then stepped through.

When Perl reads a line from a file handle, it includes the newline character ('\n') at the end. Because it is common to want to remove this, Perl provides a built-in function chomp(), whose only function is to remove the last character from a string if it is '\n'. (chomp() actually can be used to remove an arbitrary string from the end of a string, but removing '\n' is the default behavior.)

Of special note is the diamond operator, which is called that because of its appearance: <>. The diamond operator is used for programs that specify a list of files as command-line parameters. It is a magic file handle that reads in turn from each file specified on the command line, or from standard input if no files were specified:


while (<>) {

    lookformatches($matchstring, $_);

}


The diamond operator follows the UNIX convention that a filename that is a single hyphen refers to the standard input stream. The diamond operator uses the @ARGV array (discussed under "Command-Line Parameters," later in this chapter) to determine which files to read, so you can tweak @ARGV as you like before invoking the diamond operator.

Regular Expressions

Perl includes built-in regular expression matching. The simplest form is an if that contains only the regular expression. This compares it to the value of $_:


while (<>) {

    if (/hello/) {

        ++$hellolinecount;

    }

}


In addition to matching literal strings, the regular expressions can include the following:

  • . Matches any character except newline.

  • \ Escapes the next character (so \. matches only a period).

  • () Groups parts of a regular expression.

  • * Match the previous item zero or more times.

  • + Match the previous item one or more times.

  • ? Makes an item optional. It can appear zero or one time.

  • {n} Match the previous item n times.

  • {n,m} Match the previous item between n and m times.

  • | Between two items, it means to match either one.

  • [abcd] Matches any of the characters listed.

  • [a-z] Matches any character in a range.

  • [^abcd] Matches any character except the ones listed.

  • \d Matches any digit, same as [0-9].

  • \w Matches a word character, same as [a-zA-Z0-9_].

  • \s Matches white space, same as [\f\t\n\r ].

  • \D, \W, and \S Match any character except their lowercase equivalent.

  • ^ Matches the beginning of the string.

  • $ Matches the end of the string.

Thus, you can get sophisticated with your matching (the binding operator, =~, matches a string against a regular expression):


if ($phone =~ /\d{3}-\d{4}/) {

    print ("$phone contains a US phone number\n");

}

if ($number =~ /^([0-9a-fA-F]+)$/) {

    print ("$number is a valid hexadecimal number\n");

}

if ($inputline =~ /^#/) {

    print("comment line, ignored");

}


Beyond grouping, parentheses (()) around a part of the match string tell Perl to remember what part of the string matched that part of the regular expression. The escape \1 can be used later in the match string to refer to the first grouped match. So, the match string /(.)\1/ matches any character repeated twice in a row. Furthermore, the part of the string that matched is put in a special variable, $1. The same goes for \2 and $2, \3 and $3, and so on:


if ($word =~ /([aeiou])\1/) {

    print("$word has a repeated vowel: $1\n");

}


When a match is complete, the part of the string that matched the regular expression is stored in $&, the part before is stored in $`, and the part after is stored in $':


if ($text =~ /[\w\.]+\.(com|org|net)/) {

    print ("$`<a href=\"$&\">$&</a>$'");

}


Finally, the /i modifier after the match string makes the matching case-insensitive. Perl regular expressions allow even more escapes and modifiers, but this book doesn't use them.

Output

Printing in Perl is done with the print() function, which was shown in previous examples. More sophisticated printing can be done with printf(), which does formatting similar to the C printf(), and its relative sprintf(), which returns the formatted string rather than printing it:


printf  "The date is %2d/%2d/%4d\n", $day, $month, $year;

$time = sprintf "%2d:%2d:%2d", $hour, $minute, $second;


Command-Line Parameters

When a program is invoked, the list @ARGV contains the command-line parameters that were passed to it:


$firstarg = shift @ARGV;


Unlike in the argv[] array in C, which stores the name of the program in argv[0] and the first argument in argv[1], the first element in @ARGV is the first argument to the program; the name of the program is stored in the variable $0 (that's the number 0, not the letter O).

    Previous Section Table of Contents Next Section