Chapter 29

Taking Advantage of Perl


CONTENTS


Perl is an interpreted language designed for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It is also a good language for many Web site system management tasks. This chapter shows you how to use it for a few handy CGI scripts, including Web site statistics. Web site statistics are perhaps more relevant on an Internet Web site than on an Intranet-but whether or not you decide to put the information in Chapter 28 to use and connect to the Internet, you will still be faced with a need to automate countless numbers of server chores. If you take the time to learn Perl, you will find that it is an ideal employee for those tasks. And if you don't have time to learn Perl, you still win, because many Perl scripts which you can use as-is are available on the Internet.

For non-GUI systems programming, Perl fills the gap between low-level programming languages, such as C, and high-level languages such as AWK, SED, and the UNIX Shell. Although C is a very powerful language, it requires a steep learning curve to master. Perl does not offer the speed of a compiled language (like C), but it does offer very good string-handling capabilities and it is easier to learn. Most people with experience in any of the languages just mentioned will find Perl to be an easy migration.

Note
Here's some good news for those who like Perl but hate to give up performance. Hip Communications Inc., in Vancouver, BC, has recently announced PerlIS.dll for Microsoft IIS. PerlIS is an ISAPI version of the Perl interpreter, so the Web server is able to process Perl scripts at a greatly increased speed compared to traditional CGI. You can download the software and documentation for PerlIS and Perl for Win32 at this URL:
http://www.perl.hip.com/
Hip Communications maintains current versions of Perl 5.001m for Windows NT and Windows 95 using Visual C++ 4.x. PerlIS should have no problems running on the Purveyor 1.2 Web server from Process Software, and probably other ISAPI-compliant HTTP servers for NT as well.

WWWusage, which is available for free downlead at http://rick.wzl.rwth-aachen.de/RickG/WWWusage/wwwusage.html, is an excellent example of a powerful CGI script written in Perl. WWWusage was written by Richard Graessler to perform HTTP statistics on Windows NT. Christopher Brown, with whom I co-authored Web Site Construction Kit for Windows NT and Web Site Construction Kit for Windows 95, gets credit for the original draft of this chapter. Chris ported WWWusage to Windows 95, although Rick's latest version of the program for Windows NT is discussed here.

Note
Although Perl is an acronym for Practical Extraction and Report Language, like the names of many other program languages formed from acronyms, it isn't always written in uppercase letters.

Obtaining Perl

Perl comes in two versions: Perl 4 and Perl 5. Version 5 is the new kid on the block, and it comes with object-oriented extensions. Perl was invented on UNIX by Larry Wall. Keep in mind that Perl has always had strong roots on UNIX platforms, and most of the Webmasters who use it and post public-domain Perl source code work on UNIX. Fortunately, some nice folks have ported Perl to Win32 and made it available by anonymous FTP.

Dick Hardt of Hip Communications led the porting of Perl 5 to Windows NT/95. You can obtain the latest version of it from ftp://ntperl.hip.com/ntperl/. This site contains the Visual C++ source code for Perl, binary files, and documentation.

Tip
If you plan to install Perl 5 on Windows 95, the batch files that come with Perl will not work. You will have to make three registry entries manually, as described in a file named win95.txt (accompanying Perl 5) that explains the process.

You can retrieve Perl 4 for Windows NT at the FTP site of Intergraph. Point your Web browser or CuteFTP (see Chapter 7, "Running the Intranet Web Server") to this URL: ftp://ftp.intergraph.com/pub/win32/perl/. You may want to download the file ntperlb.zip (if you only want the compiled version) or ntperls.zip (if you want the Visual C++ 2.0 source code). Using Perl 4 on Windows 95 requires a patch (which you can find at Yahoo) developed by Bob Denny.

Learning Perl

This book is already covering a great deal of information, and I don't intend to deluge you with a complete course on the Perl language. What this chapter does do is give you a quick introduction to the Perl syntax and show you how to put some Perl scripts to work so you can jump right in. For more information about Perl, please see Teach Yourself Perl 5 in 21 Days, 2nd Edition by David Till, published by Sams.

Note
Much of the material in this section was generously contributed originally by Gary E. Major (Systems Analyst) at Seattle Pacific University. You can visit his site on the Web for the latest information: http://www.spu.edu/tech/basic-perl/

There's so much to cover and only one chapter to do it. I caution you that this material is not intended for those who are new to programming. In fact, I am going to take somewhat of a hit-and-run approach and present the material largely in reference format. The second half of the chapter contains a very useful sample application.

Symbols

Table 29.1 presents the most common symbols unique to Perl and their meaning.

Table 29.1. Common Perl symbols.

Symbol
Purpose
$
For scalar values.
@
For indexed arrays.
%
For hashed arrays (associative arrays).
*
For all types of that symbol name. These are sometimes used like pointers in perl4, but perl5 uses references.
<>
Used for inputting a record from a filehandle.
\
Takes a reference to something.

Script Components

This section lists the basic components of a Perl script. The first line of every Perl program is a required special comment to identify the file location of the Perl interpreter itself. For example:

#!/usr/local/bin/perl

This list shows the predefined data types:

Special Variables and Characters

Table 29.2 lists several predefined variables and reserved characters in Perl.

Table 29.2. Predefined Perl variables.

VariablePurpose
$0 Contains the name of the script being executed.
$_ Default input and pattern search variable.
$/ Input record separator, newline by default.
@ARGV Contains command-line arguments. $ARGV[0] is the first argument.
@Inc Contains the list of places to look for scripts to be evaluated by the do or require commands.
%Inc Contains entries for each file included by the do or require commands.
%ENV Contains your environment settings. Changes made affect child processes.
STDIN Default input stream.
STDOUT Default output stream.
STDERR Default error stream.

Arithmetic Operators

Table 29.3 lists the common mathematical operators.

Table 29.3. Perl mathematical operators.

Operator
Example
Meaning
+
$a + $b
Sum of $a and $b
-
$a - $b
Difference of $a and $b
*
$a * $b
Product of $a times $b
/
$a / $b
Quotient of $a divided by $b
%
$a % $b
Remainder of $a divided by $b
**
$a ** $b
$a to the power of $b

Assignment Operators

Perl supports a rich array of assignment operators for many purposes. If the list in Table 29.4 seems overwhelming, try to stick to the easy ones and learn about the others after you have more experience with Perl programming.

Table 29.4. Perl assignment operators.

Operator
Example Meaning
=
$var = 5 Assign 5 to $var
++
$var++ or ++$var Increment $var by 1 and assign to $var
--
$var-- or --$var Decrement $var by 1 and assign to $var
+=
$var += 3 Increase $var by 3 and assign to $var
-=
$var -= 2 Decrease $var by 2 and assign to $var
.=
$str .= "ing" Concatenate "ing" to $str and assign to $str
*=
$var *= 4 Multiply $var by 4 and assign to $var
/=
$var /= 2 Divide $var by 2 and assign to $var
**=
$var **= 2 Raise $var to the second power and assign to $var
%=
$var %= 2 Divide $var by 2 and assign remainder to $var
x=
$str x= 20 Repeat $str 20 times and assign to $str

Logical Operators

The logical operators in Perl (shown in Table 29.5) are useful in If statements typical of nearly all programming languages.

Table 29.5. Perl logical operators.

Operator
Example Meaning
&&
$a && $b True if $a is true and $b is true
||
$a || $b True if $a is true or if $b is true
!
! $a True if $a is not true

Pattern-Matching Operators

Pattern matching is one of the areas in which Perl shows its strength. These operators, shown in Table 29.6, are very useful for string operations.

Table 29.6. Perl pattern-matching operators.

OperatorExample Meaning
=~ // $a =~ /pat/ True if $a contains pattern pat
=~ s// $a =~ s/p/r Replace occurrences of p with r in $a
=~ tr// $a =~ tr/a-z/A-Z Translate to corresponding characters
!~ // $a !~ /pat/ True if $a does not contain pattern pat

String Operators

String operators in Perl, as shown in Table 29.7, are the mainstay of the language.

Table 29.7. Perl string operators.

OperatorExample Meaning
. $a . $b Concatenate $b to the end of $a
x $a x $b Value of $a strung together $b times
substr() substr($a, $o, $l) Substring of $a at offset $o of length $l
index() index($a, $b) Offset of string $b in string $a

Relational Operators

The relational operators shown in Table 29.8 are essential to If and While statements.

Table 29.8. Perl relational operators.

Numeric Operator
String Operator
Example Meaning
==
eq
$str eq "Word" Equal to
!=
ne
$str ne "Word" Not equal to
>
gt
$var > 10 Greater than
>=
ge
$var >= 10 Greater than or equal to
<
lt
$var < 10 Less than
<=
le
$var <= 10 Less than or equal to

Basic Perl Commands

Here are several predefined Perl commands that you will come across repeatedly.

Conversion Character
Definition
%s
String
%c
Character
%d
Decimal number
%ld
Long decimal number
%u
Unsigned decimal number
%ul
Unsigned long decimal number
%x
Hexadecimal number
%lx
Long hexadecimal number
%o
Octal number
%lo
Long octal number
%e
Floating-point number in scientific notation
%f
Floating-point number

Flow of Control Statements

Note
Expression1 is used to set the initial value of the loop variables. Expression2 is used to test whether the loop should continue or stop. Expression3 is used to update the loop variables.

Note
VARIABLE is local to the foreach loop and regains its former value when the loop terminates. If VARIABLE is missing, the special scalar $ is used.

Perl Modules

A Perl module is a set of functions grouped into a package that deal with a similar problem. You can use module functions in a Perl script by telling your script the name of the module with the use command. For example, use CGI;.

One example of a Perl module is the CGI.pm module. This file includes functions that provide an easy interface to CGI programming, enabling you to write HTML forms and easily deal with the results. For more information about CGI.pm, visit its home page at http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html. This page has information about the functions available and examples of how they are used.

Executing the Script

To run a Perl program, you can type the script name at the command prompt. Here are several example commands that you can use for debugging scripts:

Web Site Statistical Analysis

Two of the most common uses of Perl by Webmasters are statistical analysis and forms processing. This section and the next present two Perl CGI scripts that prove very useful for these purposes.

As a Webmaster, you want to know who's coming to your site, how often, and what they are doing there. To accomplish this, the examples use the Perl programming language interpreter and the WWWusage CGI application.

Actually, before getting into Perl, let's mention a very interesting tool that can help you chart your Web site statistics without requiring any custom programming. It will analyze your Web page usage based on your server log files. A company called Logical Design Solutions has invented a cool program called WebTrac. You can download the free program and give it a try. Visit their home page at http://www.lds.com.

WWWusage

WWWusage is a Perl script written by Richard Graessler ([email protected]) to analyze and calculate monthly usage statistics from log files generated by World Wide Web servers. This application is designed for use with Windows NT. Once the script is customized for your Web server (which is easy to do), WWWusage should work on any Windows NT system with NT Perl 5.001 installed.

WWWusage will generate a new statistics page each month and the output of WWWusage is easy to read. For more information about WWWusage and to download a free copy, please visit Rick's Web page. It also contains many other interesting resources and Perl scripts for Windows NT:

http://rick.wzl.rwth-aachen.de/rick/

Tip
Remember that some Web server statistics tools require the Web server to close the log files before the files can be analyzed. This is true of IIS 2.0.

WWWusage will process HTTP access log files in the Common Logfile Format and output monthly statistics in HTML format ready for publishing on the Web. It creates reports on any or all of the following:

WWWusage does not make any changes to the access log files or write any files in the server directories (with the exception of two output HTML files per month).

Log File Formats

Gone are the days when every Web server used its own proprietary log file format (but for a few notable exceptions). Numerous formats made it very difficult to write general statistics collectors. Therefore, the Web community designed the Common Logfile Format, which will soon become the default, if it hasn't already.

The Common Logfile Format

Here is the format of each line in the logfile, followed by an explanation of each field:

remotehost rfc931 authuser [date] "request" status bytes

Note
Microsoft IIS does not write HTTP logfiles using the Common Logfile Format. However, IIS does include a simple command-line utility which can convert from the IIS format to the Common Logfile Format. You can then use WWWusage to process the results.

Some Web servers use a single log file. Some servers write logfiles which can be closed automatically (sometimes called cycled), others must be closed manually. Others have a single log file for each day, so that there is no need to cycle the log file.

Other Perl Statistic Scripts

Before we get to WWWusage, let's take a quick look at some other great Perl analysis scripts for HTTP log files. Just check Appendix C, "Resources for the Windows NT Webmaster," or search Yahoo for CGI or Perl.

Configuring WWWusage

Listing 29.1 shows the configuration section from the top of the file wwwusage.pl. All you need to do is read the comments in the source code to determine the modifications you need to make to customize the program for your site.


Listing 29.1. This excerpt of WWWusage shows the lines that can be modified for your Web site.

#!/cgi32/perl
#
# WWWusage - Perl script to calculate monthly usage statistics
# from log files
# generated by the Windows NT World Wide Web servers (https).
#
# Copyright (c) 1995 Richard Graessler ([email protected])
#
# For the latest version, DOCUMENTATION and LICENSE AGREEMENT see
#   <URL: http://pobox.com/~rickg/rickg/wwwusage/wwwusage.html>
#
#      This program is provided "AS IS", WITHOUT ANY WARRANTY 
#      (see License Agreement)
#
# Bug reports, comments, questions and suggestions are welcome.
# Please mail to
# [email protected] with the "subject: WWWusage" but please
# check first that you have the latest version.
#
# CREDITS:
#
# There are some other Perl logfile analyse scripts on the net:
#  Roy Fielding's wwwstat
#   <URL: http://www.ics.uci.edu/WebSoft/wwwstat/>
#  Nick Phillips's musage
#   <URL: http://www.blpes.lse.ac.uk/misc/musage.htm>
#  Steven Nemetz's iisstat
#   <URL:
# ftp://ftp.ccmail.com/pub/utils/InternetServices/iisstat/iisstat.html>
# Looking into these scripts helped me to write this script and
# there might be still some parts based on them.
#
# Requires timelocal.pl and getopts.pl which are included in the Perl
# disribution package.
#
# Thanks to the authors!
#

######################################################################
#  Program internal variables (please do not change!)
######################################################################
$VERNAME = 'WWWusage'; # Program name
$VERSION = '0.99';     # Program version
$VERDATE = '26 December 1995';  # Program version date

######################################################################
#  Present setting
######################################################################
# In Perl for Windows NT you can use forward slash (/) or double
# backslash (\\) in pathnames (e.g. C:/LOGS/ or c:\\LOGS\\). File
# and path names could
# be absolute(e.g. C:/LOGS/) or relative to current directory
# (e.g. ./LOGS/). 

# hostname of www server (HTTPS)
$ServerName = 'rick.wzl.rwth-aachen.de';

# flag - specifies the logfile format
# 1 : common log file format, 
# 0 : EMWAC HTTPS
$LogFormat = 1;

# file containing the country-codes to allow expansion from domain
# to country name
$CountryCodeFile = 'C:/www/alibaba/admin/country-codes.txt';

# Pattern used to recognise log files translated into a Perl regular
# expression, e.g. ('.+\.log' for *.log), ('ac.+\.log' for ac*.log).
# If your HTTPS have only one logfile simply set "access.log"
# Note: If you have more than one logfile the script assumes that the
#       alphabetical order of the filenames is the same as the
#       chronological order
$LogFilePattern = '.+\.log';

# directory containing external configuration files
# (without ending slash!)
$ConfigFileDir = 'c:/www/alibaba/admin';

# directory containing log files (without ending slash!)
$LogFileDir = 'c:/www/alibaba/logs/HTTP';

# filename (incl. path and arguments if necessary) of shell for
# unpacking archives. Note: If you use this feature please note
# that the archive contains only the logfiles for a single month
# and that you didn't analyse archives and normal logfile at 
#       the same time.
$Gzip = 'gunzip -c';          # Gzip Format: *.gz, *.Z
$Zip  = 'unzip -p';           # Zip Format:  *.zip
$Tar  = 'tar -x -O -f';       # Tar Format:  *.tar

# WWWusage directory to write statistics reports
# (without ending slash!)
$OutPutDir = 'c:/www/alibaba/htmldocs/usage/wwwusage';

# WWWusage Error file name including path
$ErrorFile = 'c:/www/alibaba/admin/WWWusage.log';

# Filename without extension for HTML main output file
# (e.g. "WWWusage", "index" or "default")
$MenuFile = "index";

# Extension for HTML output files
$HTMLextension = "html";

# show top nn statistics in main output, the detail output
# contains all (e.g. 30)
$Top = 30;

# format of the output HTML page (0 = <PRE></PRE>, 1 <TABLE></TABLE>
$HTMLOutput = 0;

# flags - disable if you don't want that output
$DoDomain = 1;       # Transfers by Client Domain (top level)
$DoDomain2 = 1;      # Transfers by Client Domain (second level)
$DoSubdomain = 1;    # Transfers by Client Subdomain
$DoHost = 1;         # Transfers by Client Host
$DoFileType = 1;     # Transfers by File Type
$DoFileName = 1;     # Transfers by File Name (URL)
$DoHTTPSMethod = 1;  # Transmission Statistics HTTPS Method
$DoStatusCode = 1;   # Transmission Statistics Status Code
$DoDaily = 1;        # Transmission Statistics Day
$DoWeekdaily = 1;    # Transmission Statistics Weekday
$DoHourly = 1;       # Transmission Statistics Hour
$DoIdent = 2;        # Transfers by Remote Identifer
# NOTE for $DoIdent: For security reasons, you should not
# publish to the web any report that lists the Remote Identifiers
# (rfc931 or authuser):
# 0 : no display, 1 : real user name, 2 : cookie name

# flag - disable if you don't want to create the detail statistics
# to save time
$DoDetail = 1;

# flag - disable if you don't want to create links to your
# accessed pages
$FileNameHREF=0;

# user specific parameters for the TABLE tag
$HTMLTable = 'Border=2 CELLPADDING=8 CELLSPACING=5';

# user specific backgrounds for all returns. Here you can set
# all elements of the body tag which can appear between "<BODY ... >
# in HTML format.
$HTMLBackground = 
  'BACKGROUND="/gif/bg0.gif" BGCOLOR="#63637b" TEXT="#ffffff" '
 .'LINK="#00ffff" ALINK="#ff0000" VLINK="#ffff00" HRCOLOR="#ff0000" ';

# user specific header for all returned HTML pages in HTML format
@HTMLHeader = ( 
 '<P><CENTER><A HREF="/image/ntrick.map"><IMG BORDER=0 HSPACE=10 ',
 'ALIGN=MIDDLE SRC="/gif/ntrick.gif" '
 'ALT="Rick\'s Windows NT Info Center"',
 ' ISMAP WIDTH=550 HEIGHT=44></A></CENTER></P>', 
 '<H1><CENTER>World Wide Web Server Usage Statistic</CENTER></H1><HR>'
);

# user specific footer for all returned HTML pages in HTML formats
@HTMLAddress = (
    '<HR><HR><A NAME="Bottom"></A><A HREF="/image/address.map" >',
    '<IMG BORDER=0 HSPACE=10 ALIGN=MIDDLE SRC="/gif/address.gif" ',
    'ALT="Addressbar" ISMAP WIDTH=293 HEIGHT=31></A>'
);

# flag - disable if you don't want a detailed output on the console
$VerboseMode = 1;

# flag - disable if you don't want to see the skiped lines of
#  the logfiles on the console
$ShowSkippedLines = 1;

# flag - disable if you don't want to show unresolved addresses
$ShowUnresolved = 1;

# file containing DNS names
# (will be created and updated by the script)
$DnsNamesFile = 'c:/www/alibaba/admin/dns-names.txt';

# flag - to set the DNS lookup. Note: DNS lookup needs much time
#        and slow up the execution of WWWusage 
# 0 : disable if you don't want to look up dnsname if ip address
# is given
# 1 : if you don't want to look up new dnsname but used the 
# saved dnsnames
# 2 : if you want to look up new and old unresloved dnsname
# 3 : if you want only to look up new dnsname
$LookupDnsNames = 3;

# flag - disable if you don't want to sort the host list to save time
$SortHostList = 0;

# flag - disable if you don't want to encode filenames
$UrlEncode = 0;

# flag - disable if you don't want to detect on disk if filename is a 
#       directoryor file. If flag is set, you should run the script on
#       your HTTPS machine
$FileCheck = 0;

# flag - enable it if you https automatically add a "/" to
#        slashless dirs (1 for EMWAC HTTPS, Netscape - 0 for Alibaba)
$DirWorksWithSlash = 0;

# real directory name of document root of the www server
# (without ending Slash!)
$DocumentRoot = 'c:/www/alibaba/htmldocs';

# list of configured "default/index" filename(s) for your HTTPS
@DefaultHTML = ('index.html','index.sht','default.htm');

# flag - enable to convert all filenames (URLs) to lower case
$FileNamesToLowerCase = 1;

# time zone information. Only necessary for EMWAC log file format.
# if not set it will be computed. Format: "+0100" or "-1100"
# $TimeZone = "+0100";

# exclude filter: optional list of IP addresses to ignore, please
# include ipnummer as well as dns name(s) in the list! IPnumber will
# be checked forward, DNSnames will be checked backward. Perl
# expressions are possible.
# (e.g. "137.226" for "137.226.*.*", "rwth-aachen.de" for 
# "*.wzl.rwth-aachen.de")
# @IgnoreHost = ('137.226.92.10','rick.wzl.rwth-aachen.de');

# include filter: optional list of IP addresses to focus on, please
# include ipnummer as well as dns name(s) in the list! IPnumber will
# be checked forward, DNSnames will be checked backward. Perl
# expressions are possible.
# (e.g. "137.226" for "137.226.*.*", "wzl.rwth-aachen.de" for 
# "*.wzl.rwth-aachen.de")
# @FocusOnHost = ('137.226.','wzl.rwth-aachen.de');

# exclude filter: optional list of paths/files to ignore. Paths
# will be checked forward from the beginning of the url filename.
# Perl expressions are possible.
# @IgnorePath = ('/gif/','/images/');

# include filter: optional list of paths/files to focus on. Paths
# will be checked forward from the beginning of the url filename.
# Perl expressions are possible.
# @FocusOnPath = ('/rick/');

# exclude filter: optional list of file extensions to ignore.
# Extension will be checked backward from the beginning of the
# url filename. Perl expressions are possible.
@IgnoreExt = ('gif','jpeg','jpg');

# include filter: optional list of file extensions to focus on.
# Extension will be checked backward from the beginning of the
# url filename. Perl expressions are possible.
# @FocusOnExt = ('.htm','html');

# Alias list for virtual paths.
# Format: '/aliasname/' => 'drive:/path/'
# Key: alias or pathnames relative to HTTPS document root.
# Value: pathnames relative to disk root.
# Do not include the $DocumentRoot (with its value '/'). This array
# does not make sense with EMWAC HTTPS because it doesn't
# support alias.
%WWWAlias = (
'/ALIBABA/', 'C:/WWW/ALIBABA/DOCS/',
'/ALIPROXY/', 'C:/WWW/ALIBABA/HTML/',
'/ICONS/', 'C:/WWW/ALIBABA/ICONS/',
'/IMAGE/', 'C:/WWW/ALIBABA/CONF/',
'/COUNTER/', 'C:/WWW/ALIBABA/COUNTER/',
'/CFDOCS/', 'C:/WWW/ALIBABA/CFUSION/CFDOCS/',
'/PERFORM/', 'C:/WWW/ALIBABA/PERFORM/DOCS/',
'/RICK/PERFORM/', 'C:/WWW/ALIBABA/PERFORM/OUTPUT/',
'/CGI-BIN/', 'C:/WWW/ALIBABA/CGI-BIN/',
'/CGIDOS/', 'C:/WWW/ALIBABA/CGI-BIN/',
'/CGI-32/', 'C:/WWW/ALIBABA/CGI-BIN/',
'/CGI32/', 'C:/WWW/ALIBABA/CGI-BIN/',
'/CGI-SHL/', 'C:/WWW/ALIBABA/CGI-BIN/',
'/WIncGI/', 'C:/WWW/ALIBABA/WIncGI/',
'/WINBIN/', 'C:/WWW/ALIBABA/WIncGI/',
'/DLLALIAS/', 'C:/WWW/ALIBABA/CGIDLL/',
'/ALIPROXY/', 'C:/WWW/ALIBABA/HTML/',
);

# List of used file types and its extensions. The extensions must
# be written in regular Perl expression. If @FileTypesSort is
# given it determine the search order.
%FileTypes = (
    'CGI Scripts', '(\/cgi32\/|\/cgi-32\/|\/cgi-shl\/)',
    'DOS CGI Scripts', '(\/cgi-bin\/|\/cgidos\/)',
    'WinCGI Scripts', '(\/wincgi\/|\/winbin\/)',
    'DllCGI Scripts', '\/dllalias\/',
    'Images', '\.(bmp|gif|xbm|jpg|jpeg)$',
    'Movies', '\.(mpg|mov|scm)$',
    'Archive Files', '\.(gz|z|zip|tar)$',
    'HTML Files', '\.(htm|html)$',
    'Imagemaps', '($\/image\/|\.map$)',
    'Server Side Includes', '\.(sht|shtm|shtml)$',
    'Text Files', '\.txt$',
    'Binary Executables', '\.(com|exe)$',
    'Script Executables', '\.(pl|sh|cmd|bat)$',
    'Readme Files', '\/README.*$',
    'Directory Listings', '\/$',
    'Java Applets', '\.CLASS$',
);

@FileTypesSort = (
    'HTML files',
    'Images',
    'CGI Scripts',
    'Server side includes',
    'Java Applets',
    'Text files',
    'Directory listings',
    'DOS CGI Scripts',
    'WinCGI Scripts',
    'DllCGI Scripts',
    'Movies',
    'Archive files',
    'Imagemaps',
    'Binary Executables',
    'Script Executables',
    'Readme files',
);

# Response Codes taken from <draft-ieft-http-v10-spec-01.ps>,
# August 3,1995 Normally you don't need to change!
%StatusCode = (
    '200', '200 OK',
    '201', '201 Created',
    '202', '202 Accepted',
    '203', '203 Non-Authoritative Information',
    '204', '204 No Content',
    '300', '300 Multiple Choices',
    '301', '301 Moved Permanently',
    '302', '302 Moved Temporarily',
    '303', '303 See Other',
    '304', '304 Not Modified',
    '400', '400 Bad Request',
    '401', '401 Unauthorized',
    '402', '402 Payment Required',
    '403', '403 Forbidden',
    '404', '404 Not found',
    '405', '405 Method Not Allowed',
    '406', '406 None Acceptable',
    '407', '407 Proxy Authorization Required',
    '408', '408 Request Timeout',
    '409', '409 Conflict',
    '410', '410 Gone',
    '411', '411 Authorization Refused',
    '500', '500 Internal Server Errors',
    '501', '501 Not implemented',
    '502', '502 Bad Gateway',
    '503', '503 Service Unavailable',
    '504', '504 Gateway Timeout',
);

##############################################################################
# END CONFIG
##############################################################################

CGI Forms Handling with Perl

The WWW MailTo & CommentTo gateway is a Windows NT HTTP CGI Perl script. (Whew!) It enables you to send a message by SMTP and/or to log the message to a local file. You can check Rick's Web site for the latest and greatest (along with other resources and documentation) at this URL:

http://rick.wzl.rwth-aachen.de/rick/

Using the HTTP GET method, the script creates a predefined or user-supplied fill-out form with a self-reference by the action tag. After the form is submitted, the script will be executed a second time by the POST method to create the mail and send it by SMTP if mail is enabled, or save it in the comment file if comment is enabled.

The features depend on the configuration. The script can do any of the following:

Installation

You need to put mailto.pl into your scripts or cgi-bin directory. Some HTTP servers use a different CGI directory for DOS CGI, Win32/NT CGI, or WinCGI binaries. If so, put the scripts in your Win32/NT CGI binaries directory, for example, the CGI32 directory. If your HTTP server does not support ALIAS, it must be in your WWW data directory or one of its subdirectories.

Now would be a good time to install Blat from the CD-ROM, if you have not done so already. (See Chapter 8, "Serving E-mail via TCP/IP," for more information about installing Blat.)

To install the WWW MailTo&CommentTo Gateway, you only need to modify the configuration as described in the following section titled "Configuring the Script." Beyond the simple configuration, the main issue is how to call it properly. This depends on how your HTTP server executes scripts.

If your HTTP server can execute scripts directly (for example, the Alibaba Web server), you can use HTML such as this:

<A HREF="http://rick.wzl.rwth-aachen.de/cgi32/mailto.pl">

If your HTTP server must execute a program binary (for example, the EMWAC HTTPS), you can use HTML such as this:

<A HREF="http://rick.wzl.rwth-aachen.de:8001/cgi32/perl.exe?cgi32/mailto.pl">

If you are unfamiliar with any part of the syntax of the above URL, please refer to Chapter 5, "What You Need to Know About HTML," for a refresher course. The question mark character is a special CGI marker indicating the start of the command-line arguments to be passed in the QUERY_STRING variable. (See Chapter 19.)

Alternatively, you can use Rick's CGI2Shell Gateway. In this case, you could do the following:

<A HREF="http://rick.wzl.rwth-aachen.de:8001/cgi32/cgi2perl.exe/
cgi32/mailto.pl?">

The last way is much easier if you want to specify parameters. See the following "Usage" section for more information about parameters.

Usage

First of all, you must create an HTML tag for WWW MailTo&CommentTo Gateway in your HTML document, which calls the script by the GET method. When called by the GET method, the script displays a standard e-mail form. Here is one example of the HTML code:

<A HREF="http://rick.wzl.rwth-aachen.de/cgi32/mailto.pl">Mailto</A>
<A HREF="/cgi32/mailto.pl">Mailto</A>

You can also include command-line parameters in the HTML tag where parameter is source, or one or more pairs of variables and values each separated by one ampersand. The variable and its value are separated by =. Note that all parameters must be HTML-encoded. That means that all spaces are replaced with plus signs (+). Also note that plus signs must then be specified in hexadecimal with %2B. Other HTML-reserved characters must also be encoded similarly.

The source parameter returns the script source code if source viewing is enabled and source is the only parameter. The pairs of variables and values could be all reserved variables except from and HTTPpage. These variables can be supplied in the GET request when linking to the mailto script. If you simply want your mail address to be given in the mail form as the default value, make your HTML look something like this:

<A HREF="/cgi32/[email protected]">

If you want your default subject to be "This is a subject!", give the subject variable separated by an ampersand. For example:

<A HREF="/cgi32/[email protected]&subject=This+is+a+subject!">

Notice that the subject must be URL-encoded.

Reserved Variables

Thereare several reserved variables that the script will check for explicitly.

All of these variables (except from and HTTPpage) could be set to default values, which can protect against overwriting. All of these variables can also be set at the command line following the "?" (which will then be inserted into the CGI environment variable QUERY_STRING).

These reserved variables have a special meaning for the script and must be set by either the Webmaster or the user. With the exception of the to and from variables, all variables are set to default values if they are undefined.

For easy questionnaires, all other CGI variables will be logged after the body portion-regardless of whether the values are hidden or part of the fill-out form. Remember that the GET method is limited on the number of characters passed. The variable and its value are separated by =, different variables/values by &. Spaces are replaced with +; plus signs and other HTML-reserved characters must then be specified in hexadecimal with %2B. Every non-reserved CGI variable will be logged after the mail body in variable/value pairs. To use the user-defined variables, you need to first create a user-defined form.

Configuring the Script

Before starting to use the script, you must configure it. All configurable variables are in the first section of the script, as follows:

Setting Default Values

You can set default values to all reserved variables (except from and HTTPpage) by configuring the default values with the $def{} variables in the script. All of these variables could also be found in the first section of the script. If the variable $default is set, these variables are fixed. They cannot be overwritten by given parameters to the script tag in an HTML page or the user input when filling out the form. If $default is not set, these default variables are used only if the reserved variables are not set by command-line parameters or user form input. For example:

Restricted Mail Addresses

You can restrict mail addresses to one address if you set the def{'to'} variable to an e-mail address and prevent overwriting of this value by setting the $default.

You can also restrict the to mail addresses to certain addresses by setting the %defto variable array. This variable can be found in the first section of the script. For this feature, you must run a separate copy of the script because the standard form always includes a selection list for the addresses.

User-Defined Forms

You can createyour own forms without modifying the script. You must define form files, which are also small Perl scripts. You can create two kinds of form files. The first will be executed when the main script is executed with the GET method. It must create the form. If the second form exists, it will be executed when the main script is executed with the POST method (after the user submitted the mail). It is intended for preparing the mail. To use the form file feature, the first (GET) form must exist. The second (POST) is optional.

You can specify the name of the form with the predefined variable $defto{form}=form name inside the script or with the parameter form=form name. Form name is the filename of the form without the path and the file extension. The path, the GET, and the form extension must be configured in the script. If they are not configured, the forms will not be executed. This is for security reasons. The form files will be executed with the eval function of Perl. Therefore, use a separate path for the form files. If you don't do this, other files could also be executed!

Inside your form files, you can use all the variables and subroutines of the main Perl script. You can overwrite variables from the main script, for example $commentfile. You can even write your own mailto application.

CGI Form Handling in Perl

As mentioned before, another excellent use for Perl is writing code to manage the Common Gateway Interface (CGI) forms, which have become the mainstay of the World Wide Web for interactive communication.

cgi-lib.pl is a simple Perl library designed to make writing CGI scripts in Perl easy. Many Perl CGI scripts that you find on the Web use cgi-lib.pl. See Listing 29.2 for an example.


Listing 29.2. A minimal Perl application using cgi-lib.pl.

#!/usr/local/bin/perl
# minimal.cgi
# Copyright (C) 1995 Steven E. Brenner
# $Header: /cys/people/brenner/http/docs/web/RCS/minimal.cgi,v 1.2 
#1995/04/07 21:36:29 brenner Exp $
# This is the minimalist script to demonstrate the use of
# the cgi-lib.pl library -- it needs only 7 lines
# --
# This is NOT intended to be a "typical" script
# Most importantly, the <form> key should normally have parameters
#like
#  <form method=POST action="minimal.cgi">
require "cgi-lib.pl";
if (&ReadParse(*input)) {
   print &PrintHeader, &PrintVariables(%input);
} else {
  print &PrintHeader,'<form><input type="submit">Data: <input name=
"myfield">';
}

Perl 5

Perl 5 adds many features to the language that space precludes full coverage of in this short introduction. Some of the more noteworthy enhancements are references, object-oriented extensions, general cleanup, support for modules, and importing.

Like any programming language, Perl will take some time to master. Alas, this is not a subject I can completely cover in this book. However, I can give you some information about where to look. This information will also tell you how you can quickly use existing Perl applications. The first thing you might want to do is check out these three text files that come with Perl:

To learn more about Perl, try the University of Florida's Perl Archive at http://www.cis.ufl.edu/perl/. Users in the UK might like to try something closer to home, such as the NEXOR Ltd Perl Page at http://pubweb.nexor.co.uk/public/perl/perl.html.

Here are a few other Perl resources on the Net; the last one consists of a few newsgroups dedicated to Perl topics.

http://www.metronet.com/perlinfo/perl5.html
http://www.perl.com/perl/faq/comp.lang.perl

Summary

It has been a very educational experience writing this book. I hope it has been, and will continue to be, as useful for you as it has been fun to write.

You have chosen a very exciting time to be involved with Windows NT and Web technologies. I wish you continued success on your Windows NT Intranet.