Chapter 18

Discussion Forums


CONTENTS


The World Wide Web discussion forum is a new way to deliver information the way Usenet newsgroups and computer bulletin board services have for years. The hyperlinking nature of the Web has made this migration quite natural for users and relatively simple to program through the Common Gateway Interface.

Discussion Forums-Everything Old Is New Again

Prior to the explosion of the World Wide Web, the User Network (Usenet) newsgroups were the most popular service on the Internet. They allowed users from all over the world to carry on a sort of "public discussion." The protocol used to transfer this information took heavily from the e-mail protocol, and local systems were obliged to, in essence, keep a local copy of Usenet for its users or face severe performance penalties. Usenet is still very popular today. In Usenet, you can find discussion groups on thousands of subjects.

Something very similar to Usenet can be simulated on the World Wide Web through CGI programming: a discussion forum. In fact, the earliest versions of Netscape allowed users to read Usenet newsgroups by parsing news articles and displaying them as HTML within the main Netscape client window. In this chapter, I'm going to accomplish the following:

Discussion Forum Display and Bistate CGI Programming

Though my opinions on a lot of things can run counter to the norm, when it comes to the display of a discussion forum, there tends to be a general degree of agreement between other programmers and me. Discussion forums tend to have two sorts of displays: an entry-level display and a display for people who are actually reading the postings within. This concept is demonstrated in Figure 18.1.

Figure 18.1: An outline of how discussion forums will look for users both entering into the forum and read.

Though the ends are generally agreed upon, the means can differ. Entries within a discussion forum can either be full HTML documents that are updated as needed, or they can be stored as files of information possessing header fields and a body area that are processed as needed by the CGI program and displayed from within the CGI. There are good arguments for both methods: creating "ready-made" HTML documents requires no CGI involvement at view-time, so pages can be displayed faster when being viewed. Dynamic presentation of postings can give a discussion forum much greater information-handling capabilities should they be required. For instance, if you wanted to create an auxiliary CGI program or routine that would display only messages by a given author, it would be much more straightforward to generate this sort of report if your data was handled as data files rather than as full HTML documents.

I often talk about the capability of CGI programs to be "multistate"; that is, they can present different faces to the user depending on circumstances. Circumstances include GET/POST information, environment variables (the IP the user is accessing the CGI program from, what browser he or she is using, what link the user clicked on to get to the CGI, and so on), the time of day, the HTTPd system load, phase of the moon, or just about anything else that can vary. I find that discussion forums naturally lend themselves to a very special multistate configuration-a bistate.

The division between just entering into the discussion forum and reading through it is an obvious one. In the first case, only two general areas are needed: a list of postings and a form for entering in your own posting.

If we are currently reading a posting, the discussion forum should display three areas: the contents of the posting being read, a list of related postings, and a form to submit a new posting. Any submission from the form in this area will relate the submission to the current posting. Also, this form can include "prequoted" text from the current posting to help the poster refer to specific elements in the message being replied to.

Useful Data Fields for Discussion Forums and Parent/Sibling/Child Relationships

Any posting in a discussion forum obviously includes a message body. Beyond that we get to choose what data fields we think are important to have included in a message. Here are the data fields that have gained general acceptance:

The first five fields in this list strongly resemble what you'd find in an e-mail. Part of this stems from the historical organization of discussion lists, but mostly these five fields are here because this sort of organization scheme just makes sense. The remaining two fields allow the CGI system to "thread" discussions.

Threading turns what might otherwise be white noise babble into meaningful conversations. After reading a message, the reader is given the opportunity to reply. The two messages are linked to each other with the reply as a "child message" and the first message as the "parent." A parent can have many children, which are "siblings." The discussion forum is responsible for organizing its postings in a way that reflects these familial relationships. On the Web, discussion forums often take advantage of the ordered and unordered list features of HTML. Figure 18.2 is a screen shot of the London Chat BBS, a discussion forum I have attached to one of my chat rooms. It illustrates the use of HTML list features to "thread" postings.

Figure 18.2: The London Chat BBS shows how discussion forums are well matched to Chat Rooms.

Note
The parent/sibling/child scheme is a way of conceptualizing the "tree" data structure. If ever you want to design a particularly complicated discussion forum, it might be useful to consult a reference on trees. The Art of Computer Programming by Donald Knuth is the classic book in the field of data structures, but it likely would be overkill for your problem. Any good book store or library should be able to cough up a book that will work out for you. I use Data Structures and Program Design in C by Kruse, Leung, and Tondo.


A Discussion Forum Example

Listing 18.1 is the source code for my example discussion forum. It works along the model of CGI processing of data files rather than generation of complete HTML documents. You can see this discussion forum on-line at

http://www.anadas.com/cgiunleashed/discussion/forum.cgi

Listing 18.1. forum.cgi-A "threaded discussion forum" CGI program, written in Perl.
#!/usr/bin/perl

#
# This program was written by Richard Dice of Anadas Software Development
# as part of the Sams Net "CGI Programming Unleashed" book.  The author
# intends this code to be used for instructional purposes and not for
# resale or commercial gain.
#
# Any questions or comments regarding this program are welcome.  You
# may contact the author by Internet email: [email protected]
#

# --- Configuration Variables ---
$prog_url = 'http://www.anadas.com/cgiunleashed/discussion/forum.cgi';

#
# the Query String is treated as a prefix to qualify which forum postings
# are ones of interest.  
#
$prefix = $ENV{QUERY_STRING};

# --- start of GET/POST method handler ---

# puts all POST query information the variable $input_line
read(STDIN, $input_line, $ENV{CONTENT_LENGTH});

# replace all '+' coded spaces with real spaces
$input_line =~ tr/+/ /;

# creates array of all data files in $input_line from & separated info
@fields = split(/\&/,$input_line);
undef($input_line); # free up memory

#
# decodes hex info for each name/value pair and places pairs in
# %input associative array
#
foreach $i (0 .. $#fields) {
   ($name,$value) = split(/=/,$fields[$i]);
   $name =~ s/%(..)/pack("c",hex($1))/ge;
   $value =~ s/%(..)/pack("c",hex($1))/ge;
   $input{$name} = $value;
}

# --- end of GET/POST method handler ---

#
# should this page be accessed with form data and having a refering URL
# different from $prog_url, someone is attempting to post data from
# an invalid form -- exit the program with error message
#
if ( defined(%input) || ($ENV{HTTP_REFERER} ne $prog_url) ) {
   &referer_error;
}

if ( $input{'submit'} eq 'Submit this Article' ) {
   &new_article;
   print "Location: $prog_url?$input{'basepost'}\n\n";
   exit 0;
}

#
# gets the names of all relevent posting files and puts in the @posts array
#
&get_posts($prefix);

if ( $prefix eq '' ) {
   &print_header;
   &print_list;
   &print_form;
   &print_footer;
} else {

   &read_article($prefix);

   &print_header;
   &print_posting;
   &print_list;
   &print_form;
   &print_footer;
}

exit 0;

sub print_header {

   print "Content-type: text/html\n\n";
   if ( $prefix eq '' ) {
      print <<END;
<HTML><HEAD><TITLE>Discussion Forum</TITLE></HEAD>
<BODY>
<P>
<H3>All Discussion Forum Postings</H3>
END
   } else {
      print "<HTML><HEAD><TITLE>Posting: $title</TITLE></HEAD>\n";
      print "<BODY>\n";
      print "<P>\n";
      print "<A HREF=$prog_url>Return to Discussion Forum front page</A>\n";
   }
}

sub referer_error {

   print <<END;
<HTML><HEAD><TITLE>Refering URL Error!</TITLE></HEAD>
<BODY>
<P>
The form which was submitted to this CGI program did not originate with
this program.  This is forbidden.
<P>
<A HREF=$prog_url>Return to the Discussion Forum</A>
</BODY></HTML>
END

}

sub no_author_error {

   print "Content-type: text/html\n\n";
   print <<END;
<HTML><HEAD><TITLE>No Author Error!</TITLE></HEAD>
<BODY>
<FONT SIZE=+1><B>
<P>
No author name was entered into the posting form.  One is required.<BR>
Press the BACK button on your browser to re-enter the form.</B></FONT>
</BODY></HTML>
END
   exit 1;
}

sub no_subject_error {

   print "Content-type: text/html\n\n";
   print <<END;
<HTML><HEAD><TITLE>No Subject Error!</TITLE></HEAD>
<BODY>
<FONT SIZE=+1><B>
<P>
No subject was entered into the posting form.  One is required.<BR>
Press the BACK button on your browser to re-enter the form.</B></FONT>
</BODY></HTML>
END
   exit 1;
}

sub print_list {

   if ( $prefix ne '' ) {
      print "<H3>Follow-up Postings:</H3>\n";
      shift(@posts);
   }

   if ( $#posts == -1 ) {
      print "<H4>None</H4>\n<HR>\n";
      return 0;
   }

#
# reconstructs header fields from posting and place into the list
#
   $ul_count = 0;
   foreach $i ( 0 .. $#posts ) {

#
# this if structure implements my "typographical trick" which threads
# postings based solely on their file names.  "bits" of posting names,
# separated by '_', are compared, first to see how many left-most bits
# the current posting has in common with its immediate predecessor,
# and then the signed inequality of right-most not-in-common bits is
# used to determine the extent of <UL> pushing/popping needed to thread
#
      if ( $i != 0 ) {

         $this = $posts[$i];
         $previous = $posts[$i-1];

         @this_bits = split(/-/,$this);
         @previous_bits = split(/-/,$previous);

         if ( $#this_bits > $#previous_bits ) {
            $lesser_bits = $#previous_bits;
         } else {
            $lesser_bits = $#this_bits;
         }

         $common = 0;
         for $j ( 0 .. $lesser_bits ) {
            last if $this_bits[$j] ne $previous_bits[$j];
            $common++;
         }
    
         splice(@this_bits,$[,$common);
         splice(@previous_bits,$[,$common);
    
         if ( $common == 0 ) {
            while ( $ul_count ) {
               print "</UL>";
               $ul_count-;
            }
         } else {
            if ( $#this_bits > $#previous_bits ) {
               for $k ( 1 .. ($#this_bits - $#previous_bits) ) {
                  print "<UL>\n";
                  $ul_count++;
               }
            } elsif ( $#this_bits < $#previous_bits ) {
               for $k ( 1 .. ($#previous_bits - $#this_bits) ) {
                  print "</UL>\n";
                  $ul_count-;
               }
            }
         }
      } else {
         $this = $posts[0];
         print "<UL>\n";
      }

      open(POST,"$this.post");
      &read_header;
      close(POST);

      print "<LI> <A HREF=$prog_url?$this>$subject</A> ";
      if ( $email ne '' ) {
         print "<FONT SIZE=-1><A HREF=mailto:$email><B>$author</B></A> ";
      } else {
         print "<FONT SIZE=-1><B>$author</B> ";
      }
      if ( $to ne '' ) { print "<I>To:</I> <B>$to</B> "; }
      print "<I>$time</I></FONT>\n";
   }

   while ( $ul_count ) {
      print "</UL>";
      $ul_count--;
   }
   print "</UL>\n<HR>\n";
}

sub get_posts {

   local($pre);
   ($pre) = @_;

   @posts = 'ls -r1 $pre*.post';
   $post_len = length(".post\n");
   foreach $post ( @posts ) {
      substr($post,-$post_len) = ''; # remove unwanted tailing characters
   }
}

sub new_article {

   $temp = $input{'author'};
   $temp =~ s/\s//;
   if ( $temp eq '' ) {
      &no_author_error;
   }
   $temp = $input{'subject'};
   $temp =~ s/\s//;
   if ( $temp eq '' ) {
      &no_subject_error;
   }

   &get_posts($input{'basepost'});

   if ( $#posts == -1 ) {
#
# case where there are absolutely no postings in the tree
#
      $fname = '00000';
   } else {
      if ( $input{'basepost'} eq '' ) {
#
# case where we are adding a new posting to the base level of the tree
#
         @parts = split(/-/,$posts[0]);
         $fname = ++($parts[0]);
      } else {
         $temp = shift(@posts);
         if ( $#posts == -1 ) {
#
# case of a first reply
#
            $fname = $temp . '-' . '00000';
         } else {
#
# case of each subsequent reply
#
            @parts = split(/-/,$posts[0]);
            ++($parts[$#parts]);
            $fname = join('-',@parts);
         }
      }
   }

#
# gets the current system time and removes some info for the sake of
# horizontal space (some info = seconds & time zone)
#
   chop($time = 'date');
   @timefields = split(/[\s]+/,$time);
   substr($timefields[3],-3,3) = '';
   $time = join(' ',@timefields[0..3],$timefields[5]);

#
# remove DOS-style newline info from posting body
#
   $input{'body'} =~ s/\cM//g;

   open(POST,"> $fname.post");
   print POST "Author\t$input{'author'}\n";
   print POST "Email\t$input{'email'}\n";
   print POST "Subject\t$input{'subject'}\n";
   print POST "To\t$input{'to'}\n";
   print POST "Time\t$time\n";
   print POST $input{'body'};
   close(POST);

   chmod 0660,"$fname.post";
}

sub footer {
   print "</BODY></HTML>\n\n";
}

sub print_form {

   if ( $prefix eq '' ) {
      print <<END;
<H3>Submit a New Posting</H3>
<FORM METHOD=POST ACTION=$prog_url>
<INPUT TYPE=HIDDEN NAME="basepost" VALUE="$prefix">
<PRE>
Author  : <INPUT TYPE=text NAME="author">
Email   : <INPUT TYPE=text NAME="email"> (optional)
Subject : <INPUT TYPE=text NAME="subject">
To      : <INPUT TYPE=text NAME="to"> (optional) <BR>
Body of Article:
<TEXTAREA NAME="body" ROWS=5 COLS=50></TEXTAREA></PRE>
<INPUT TYPE=SUBMIT NAME="submit" VALUE="Submit this Article">
<INPUT TYPE=RESET NAME=clear Value="Clear this form">
</FORM>
END
   } else {

      &read_article($prefix);

      print <<END;
<H3>Reply to this Posting</H3>
<FORM METHOD=POST ACTION=$prog_url>
<INPUT TYPE=HIDDEN NAME="basepost" VALUE="$prefix">
<PRE>
Author  : <INPUT TYPE=text NAME="author">
Email   : <INPUT TYPE=text NAME="email"> (optional)
Subject : <INPUT TYPE=text NAME="subject" VALUE="Re: $subject">
To      : <INPUT TYPE=text NAME="to"  VALUE=\"$author\"> (optional)
<BR> Body of Article:
END
      print "<TEXTAREA NAME=\"body\" ROWS=5 COLS=50>";
      foreach ( @body ) { print ":: $_"; }
         print "</TEXTAREA></PRE>\n" ,
"<INPUT TYPE=SUBMIT NAME=\"submit\" VALUE=\"Submit this Article\">\n" ,
"<INPUT TYPE=RESET NAME=clear Value=\"Clear this form\">\n" , "</FORM>\n";
   }
}

sub read_article {

   local ($post_id);

   ($post_id) = @_;

   open(POST,"$post_id.post");
   &read_header;
   @body = <POST>;
   close(POST);

}

sub read_header {

   chop($author = <POST>);
   ($discard,$author) = split(/\t/,$author);
   chop($email = <POST>);
   ($discard,$email) = split(/\t/,$email);
   chop($subject = <POST>);
   ($discard,$subject) = split(/\t/,$subject);
   chop($to = <POST>);
   ($discard,$to) = split(/\t/,$to);
   chop($time = <POST>);
   ($discard,$time) = split(/\t/,$time);

}

sub print_posting {

   print "<HR>\n";

   &read_article($prefix);

   if ( $email ne '' ) {
      print "<A HREF=mailto:$email>$author</A> ";
   } else {
      print "$author ";
   }
   print "on <I>$time</I> said:\n";
   print "<H2>$subject</H2>\n";

   foreach $line ( @body ) {
      $line =~ s/\n/<BR>\n/g;
      print $line;
   }
   print "<BR><HR>\n";

}

sub print_footer {

   print "</BODY></HTML>\n";

}

My general philosophy when programming is to find the sneakiest way of doing something. I try to use the tools of a language to do something that likely no one ever thought of doing before. I do this to avoid real work.

There are a great number of studied and well understood data structures in computer science that people use to get jobs done. When you need a solution, you go to your books or your source code libraries and invoke them as prescribed. Unless absolutely necessary, I'm too much of a loner to plug in someone else's code, and I'm too lazy to recode a classical solution from the ground up. Instead, I come up with a hack.

My hack for this program is to name data files so that the postings are automatically threaded for me, more or less. A top-level posting (one with no parent) will be named XXXXX-post, where the Xs are digits. The program assigns 00000 to the first post, and each subsequent first-level post is given the value of the most recent post plus 1. A reply to this will be XXXXX.XXXXX-post. A reply to this would be XXXXX.XXXXX.XXXXX-post, and so on. I get these in the correct order by the Perl command

@posts = 'ls -1r *-post';

The automatic ASCII collating of the ls command will arrange the postings in order of oldest to newest, given the numbering scheme I described in the preceding paragraph. The -1r flag reverses the ordering so that the newest postings will be on the top of the list. The program keeps a count of how many dots (.) there are within a file's name, and this acts as the basis for determining which level of the "family tree" a posting rests on.

The bistate nature of this discussion forum is controlled by the QUERY_STRING environment variable. Though all posting-oriented forum information is passed to the program through the form using the POST method, I'm "manually" post-pending a query string to the URL that invokes the chat room. If this string is empty, the CGI program knows that it is being called directly and that it should show its "entry-level" face. If there is a string there, it displays the posting that corresponds to that string and the family of postings that surround that posting. This string is put to use quite simply. Called $prefix, the place where it is most important is

@posts = 'ls -1r $prefix*-post';

This is the form of the line that is actually in use within the program. With this simple trick, the CGI program will focus in on only those posts that are relevant to the user's needs.

Discussion Forum Administration

In computer programming, even common sense questions have to be answered explicitly: once postings are entered into the discussion forum, how do they get removed? Do they ever get removed?

The answer to the second question is yes-of course they get removed. A discussion forum with several hundred thousand messages in it is just about as worthless as one with no messages at all. The problem then becomes finding a good way to accomplish these posting removals. I'll outline a few popular options.

Remove Posting by Date

This is a very popular option because it allows a very natural organization: old postings go, and new ones stay. There are three ways to program this feature: manually controlled by the discussion forum administrator, automatically by the discussion forum, or through an auxiliary program as a cron job.

The cron command is a UNIX daemon that runs other commands according to time-based rules found within the various crontab files. These rules tell cron which command to run, how, and when. If the cron method is used to clean out old postings within the discussion forum, the solution isn't strictly CGI. The discussion forum programmer would be responsible for writing a crontab file that details how cron should run another program, written to clean out the discussion forum.

If the job of cleaning out old postings is given to the discussion forum CGI itself, the strategy would be that whenever the discussion forum was activated, it would scan through its postings and remove ones that fit some programmed criteria of age.

Remove Thread by Date

This option for removing articles in a discussion forum will look for the first (and therefore oldest) file in a thread and will delete all postings in that thread. The mechanisms that invoke this option are identical to those for removing individual postings by date.

Remove Posting by Author

This is a handy feature to have in the event that a discussion forum is graced by a less than graceful individual. Programming this sort of removal is highly dependent upon the programming language being used and the overall organization of the discussion forum.

Remove Individual Postings

This option can be the easiest to program but requires the most effort on the part of the discussion forum administrator. The only consideration with this option is how the program will deal with threads left tattered by the procedure. In discussion forum setups based on data files and dynamic generation of lists, this problem wouldn't normally be too great. With discussion forums that deal with complete HTML files, the easy part is removing the data file. The hard part is hunting through all the other HTML files for references to that file and killing those references. Re-threading the discussion forum might be appropriate or needed.

Remove Individual Threads

This option is the same as the preceding but focuses on whole threads rather than individual postings.

The following code, Listing 18.2, will remove postings from forum.cgi by threads, authors, dates, and individual postings. I have not included any automatic date removal features.


Listing 18.2. admin.cgi-The Discussion Forum administration program that accompanies forum.cgi.
#!/usr/bin/perl

#
# This program was written by Richard Dice of Anadas Software Development
# as part of the Sams Net "CGI Programming Unleashed" book.  The author
# intends this code to be used for instructional purposes and not for
# resale or commercial gain.
#
# Any questions or comments regarding this program are welcome.  You
# may contact the author by Internet email: [email protected]
#

#
# the timelocal.pl library is needed for access to the &timegm() subroutine,
# which I use in the sorting of dates
#
require "timelocal.pl";

# -- Configuration Variables --
$prog_url = 'http://www.anadas.com/cgiunleashed/discussion/admin.cgi';
$forum_url = 'http://www.anadas.com/cgiunleashed/discussion/forum.cgi';

# -- start of GET/POST method handler --

# puts all POST query information the variable $input_line
read(STDIN, $input_line, $ENV{CONTENT_LENGTH});

# replace all '+' coded spaces with real spaces
$input_line =~ tr/+/ /;

# creates array of all data files in $input_line from & separated info
@fields = split(/\&/,$input_line);
undef($input_line); # free up memory

#
# decodes hex info for each name/value pair and places pairs in
# %input associative array
#
foreach $i (0 .. $#fields) {
   ($name,$value) = split(/=/,$fields[$i]);
   $name =~ s/%(..)/pack("c",hex($1))/ge;
   $value =~ s/%(..)/pack("c",hex($1))/ge;
   $input{$name} = $value;
}

# --- end of GET/POST method handler ---

#
# should this page be accessed with form data and having a refering URL
# different from $prog_url, someone is attempting to post data from
# an invalid form -- exit the program with error message
#
if ( defined(%input) || ($ENV{HTTP_REFERER} ne $prog_url) ) {
   &referer_error;
}

#
# Program will switch on the query string.  Also, the program is being
# invoked via a form submission, then actual deleting of postings needs
# to be done and not just displaying the option which queries for which
# postings to delete
#
$mode = $ENV{QUERY_STRING};
if ( !defined($input{'method'}) ) {
   &query_remove if ( ($mode eq 'posts') || ($mode eq 'thread') );
   &query_date_remove if $mode eq 'date';
   &query_author_remove if $mode eq 'author';
} else {
   &remove_posts if $input{'method'} eq 'posts';
   &date_remove if $input{'method'} eq 'date';
   &thread_remove if $input{'method'} eq 'thread';
   &author_remove if $input{'method'} eq 'author';
}

#
# if no query string nor $input{'method'} is found in the invoking of this
# page, present an intro page which supplies a menu of options
#
&intro_page;

exit 0;

#
# display this page if an invalid form submission is being made
#
sub referer_error {

   print <<END;
<HTML><HEAD><TITLE>Refering URL Error!</TITLE></HEAD>
<BODY>
<P>
The form which was submitted to this CGI program did not originate with
this program.  This is forbidden.
<P>
<A HREF=$prog_url>Return to the Discussion Forum</A>
</BODY></HTML>
END

}

#
# the following subroutine presents a threaded list of postings which
# may be removed by either thread or individual posting, depending on
# the value of the $mode variable
#
sub query_remove {

   print "Content-type: text/html\n\n";
   print <<END;
<HTML><HEAD><TITLE>Discussion Forum Administration</TITLE></HEAD>
<BODY>
END
   &get_posts;

   if ( $#posts == -1 ) {
      print "<H4>No postings found</H4>\n</BODY></HTML>";
      return 0;
   } elsif ( $mode eq 'posts' ) {
      print "<H3>Discussion Forum Administration : Remove Individual",
      " Postings</H3>\n";
      print "<FORM METHOD=POST ACTION=$prog_url>\n";
      print "<INPUT TYPE=HIDDEN NAME=method VALUE=\"posts\">\n";

      $code = 'POST';
   } elsif ( $mode eq 'thread' ) {
      print "<H3>Discussion Forum Administration : Remove Threads</H3>\n";
      print "<P>Clicking in a checkbox will remove that posting and all",
      "its children upon form submission.</P>\n";
      print "<FORM METHOD=POST ACTION=$prog_url>\n";
      print "<INPUT TYPE=HIDDEN NAME=method VALUE=\"thread\">\n";

      $code = 'THREAD';
   }

#
# reconstructs header fields from posting and place into the list
#
   $ul_count = 0;
   foreach $i ( 0 .. $#posts ) {

#
# this if structure implements my "typographical trick" which threads
# postings based solely on their file names.  "bits" of posting names,
# separated by '_', are compared, first to see how many left-most bits
# the current posting has in common with its immediate predecessor,
# and then the signed inequality of right-most not-in-common bits is
# used to determine the extent of <UL> pushing/popping needed to thread
#
      if ( $i != 0 ) {

         $this = $posts[$i];
         $previous = $posts[$i-1];

         @this_bits = split(/-/,$this);
         @previous_bits = split(/-/,$previous);

         if ( $#this_bits > $#previous_bits ) {
            $lesser_bits = $#previous_bits;
         } else {
            $lesser_bits = $#this_bits;
         }

         $common = 0;
         for $j ( 0 .. $lesser_bits ) {
            last if $this_bits[$j] ne $previous_bits[$j];
            $common++;
         }
    
         splice(@this_bits,$[,$common);
         splice(@previous_bits,$[,$common);
    
         if ( $common == 0 ) {
            while ( $ul_count ) {
               print "</UL>";
               $ul_count--;
            }
         } else {
            if ( $#this_bits > $#previous_bits ) {
               for $k ( 1 .. ($#this_bits - $#previous_bits) ) {
                  print "<UL>\n";
                  $ul_count++;
               }
            } elsif ( $#this_bits < $#previous_bits ) {
               for $k ( 1 .. ($#previous_bits - $#this_bits) ) {
                  print "</UL>\n";
                  $ul_count--;
               }
            }
         }
      } else {
         $this = $posts[0];
         print "<UL>\n";
      }

      open(POST,"$this.post");
      &read_header;
      close(POST);

      print "<LI><I>remove</I> <INPUT TYPE=checkbox NAME=\"$code$this\"> ";
      print "<FONT SIZE=-1><A HREF=$forum_url?$this>$subject</A> ";
      print "$author ";
      if ( defined($to) ) { print "<B><I>To:</I></B> $to "; }
      print "<I><B>On:</B> $time</I></FONT>\n";
   }

   while ( $ul_count ) {
      print "</UL>";
      $ul_count--;
   }
   print "</UL>\n<HR>\n";

   print "<P><INPUT TYPE=SUBMIT NAME=submit VALUE=\"Submit this form\">\n";
   print "<INPUT TYPE=reset NAME=reset VALUE=\"Reset this form\">\n";
   print "\n</FORM>\n";

   print "</BODY></HTML>\n";
   exit 0;
}

#
# - it's easy to remove posts -- just delete the corresponding .post file
# - proper threading is maintained by the spiffy "typographic data structure"
#
sub remove_posts {

   foreach $key ( keys %input ) {
      if ( $key =~ /POST/ ) {
         substr($key,$[,4) = '';
         system("rm $key.post");
      }
   }
   &report_removal;
}

#
# - it's easy to remove threads -- just delete the corresponding .post file
# - proper threading is maintained by the spiffy "typographic data structure"
#
sub thread_remove {

   foreach $key ( keys %input ) {
      if ( $key =~ /THREAD/ ) {
      substr($key,$[,6) = '';
      system("rm $key*.post");
      }
   }
   &report_removal;
}

#
# generate the form which asks for names of to-be-removed authors
#
sub query_author_remove {

   &get_posts;

#
# build list of author names and their corresponding postings by
# parsing the entire collection of postings
#
   foreach $post ( @posts ) {
      open(POST,"$post.post");
      &read_header;
      $allauthors{$author} .= "<LI><A HREF=$forum_url?$post>$subject</A> ";
      $allauthors{$author} .= "$author ";
      if ( defined($to) ) { $allauthors{$author} .= "<B><I>To:</I></B> $to "; }
      $allauthors{$author} .= "<I><B>On:</B> $time</I>\n";
      close(POST);
   }
   foreach $key ( keys(%allauthors)) {
      substr($allauthors{$key},$[,0) = "<UL>\n";
      substr($allauthors{$key},-1,0) = "\n</UL>\n";
   }

   print "Content-type: text/html\n\n";
   print <<END;
<HTML><HEAD><TITLE>Discussion Forum Administration</TITLE></HEAD>
<BODY>
<H3>Discussion Forum Administration : Remove Postings by Author</H3>
<FORM ACTION=$prog_url METHOD=POST>
<INPUT TYPE=HIDDEN NAME=method VALUE="author">
<TABLE BORDER CELLPADDING=8>
<TR><TH VALIGN=TOP ALIGN=CENTER>Author Name</TH>
<TH VALIGN=TOP ALIGN=CENTER>Check to Remove</TH>
<TH VALIGN=TOP ALIGN=CENTER>This Author's Postings</TH></TR>
END
   foreach $authname (sort case_insensitive (keys(%allauthors))) {
      print "<TR><TD VALIGN=TOP ALIGN=LEFT>$authname</TD>\n";
      print "<TD VALIGN=TOP ALIGN=CENTER><INPUT TYPE=chECKBOX NAME=\"AUTHOR" .
      &hex_encode($authname) . "\"></TD>\n";
      print "<TD VALIGN=TOP ALIGN=LEFT>$allauthors{$authname}</TD></TR>\n";
   }

   print <<END;
</TABLE>
<P><INPUT TYPE=SUBMIT NAME=submit VALUE="Submit this form">
<INPUT TYPE=reset NAME=reset VALUE="Reset this form"></FORM>
</BODY></HTML>
END
   exit 0;
}

#
# it's easy to remove postings by authors -- once you have the author's
# name, just scan through all header fields and check to see if the name
# in the header field matches the name to be deleted
#
sub author_remove {

   local($i,$mark);

   foreach $auth ( keys(%input) ) {
      next if !($auth =~ /^AUTHOR/);
      substr($auth,$[,length('AUTHOR')) = '';
      $ex_authors[$i++] = &hex_decode($auth);
   }

   &get_posts;

POSTINGS:   foreach $post ( @posts ) {
      open(POST,"$post.post");
      &read_header;
      close(POST);
      foreach $entry ( @ex_authors ) {
         if ( $author eq $entry ) {
            system("rm $post.post");
            next POSTINGS;
         }
      }
   }
   &report_removal;
}

#
# generate the form which the user fills in to decide which dates are
# supposed to be removed from the body of discussion forum postings
#
sub query_date_remove {

   &get_posts;

   foreach $post ( @posts ) {
      open(POST,"$post.post");
      &read_header;
      @tfields = split(/[\s]+/,$time);
      $timetemp = join(' ',@tfields[0..2],$tfields[4]);
      $dates{$timetemp} .= "<LI><A HREF=$forum_url?$post>$subject</A> ";
      $dates{$timetemp} .= "$author ";
      if ( defined($to) ) { $dates{$timetemp} .= "<B><I>To:</I></B> $to "; }
      $dates{$timetemp} .= "<I><B>On:</B> $time</I>\n";
      close(POST);
   }
   foreach $key ( keys(%dates)) {
      substr($dates{$key},$[,0) = "<UL>\n";
      substr($dates{$key},-1,0) = "\n</UL>\n";
   }

   print "Content-type: text/html\n\n";
   print <<END;
<HTML><HEAD><TITLE>Discussion Forum Administration</TITLE></HEAD>
<BODY>
<H3>Discussion Forum Administration : Remove Postings by Date</H3>
<FORM ACTION=$prog_url METHOD=POST>
<INPUT TYPE=HIDDEN NAME=method VALUE="date">
<TABLE BORDER CELLPADDING=8>
<TR><TH VALIGN=TOP ALIGN=CENTER>Date</TH>
<TH VALIGN=TOP ALIGN=CENTER>Check to Remove</TH>
<TH VALIGN=TOP ALIGN=CENTER>Postings on this Date</TH></TR>
END
   foreach $date (sort by_date (keys(%dates)) ) {
      print "<TR><TD VALIGN=TOP ALIGN=LEFT>$date</TD>\n";
      print "<TD VALIGN=TOP ALIGN=CENTER><INPUT TYPE=chECKBOX NAME=\"DATE" .
      &hex_encode($date) . "\"></TD>\n";
      print "<TD VALIGN=TOP ALIGN=LEFT>$dates{$date}</TD></TR>\n";
   }

   print <<END;
</TABLE>
<P><INPUT TYPE=SUBMIT NAME=submit VALUE="Submit this form">
<INPUT TYPE=reset NAME=reset VALUE="Reset this form"></FORM>
</BODY></HTML>
END
   exit 0;
}

#
# actually does the grunt-work of removing the
# previously-selected "bad dates"
#
sub date_remove {

   local($i,$mark);

   foreach $date ( keys(%input) ) {
      next if !($date =~ /^DATE/);
      substr($date,$[,length('DATE')) = '';
      $bad_dates[$i++] = &hex_decode($date);
   }

   &get_posts;

POSTINGS:   foreach $post ( @posts ) {
      open(POST,"$post.post");
      &read_header;
      close(POST);
      @tfields = split(/[\s]+/,$time);
      $time = join(' ',@tfields[0..2],$tfields[4]);
      foreach $date ( @bad_dates ) {
         if ( $time eq $date ) {
            system("rm $post.post");
            next POSTINGS;
         }
      }
   }
   &report_removal;
}

sub report_removal {

   print "Content-type: text/html\n\n";
   print <<END;
<HTML><HEAD><TITLE>Discussion Forum Administration</TITLE></HEAD>
<BODY>
<H3>Postings Successfully Removed</H3>
<P>
<A HREF=$prog_url>Return to the Discussion Forum Administration Page</A>
</BODY></HTML>
END

   exit 0;
}

sub get_posts {

   @posts = 'ls -r1 *.post';
   $post_len = length(".post\n");
   foreach $post ( @posts ) {
      substr($post,-$post_len) = ''; # remove unwanted tailing characters
   }
}

sub read_article {

   local ($post_id);

   ($post_id) = @_;

   open(POST,"$post_id.post");
   &read_header;
   @body = <POST>;
   close(POST);
}

sub read_header {

   chop($author = <POST>);
   ($discard,$author) = split(/\t/,$author);
   chop($email = <POST>);
   ($discard,$email) = split(/\t/,$email);
   chop($subject = <POST>);
   ($discard,$subject) = split(/\t/,$subject);
   chop($to = <POST>);
   ($discard,$to) = split(/\t/,$to);
   chop($time = <POST>);
   ($discard,$time) = split(/\t/,$time);
}

sub intro_page {

   print "Content-type: text/html\n\n";
   print <<END;
<HTML><HEAD><TITLE>Discussion Forum Administration</TITLE></HEAD>
<BODY>
<H3>Discussion Forum Administration</H3>
<P>
Please choose one of the following methods for removing postings from the
discussion forum:
<UL>
<LI><A HREF=$prog_url?posts>Remove Individual Postings</A>
<LI><A HREF=$prog_url?date>Remove Postings by Date</A>
<LI><A HREF=$prog_url?thread>Remove Postings by Thread</A>
<LI><A HREF=$prog_url?author>Remove Postings by Author</A>
</UL>
</BODY></HTML>
END

}

#
# this function is used by the sort command when a case-insensitive string
# comparison is performed... for instance, in a situation where I want 'a'
# to come before 'Z' rather than after, as would usually be the case
# given that Z comes before a in the ASCII sequence
#
sub case_insensitive {

   local($atemp,$btemp);
   $atemp = $a; $btemp = $b;

   $atemp =~ tr/A-Z/a-z/;
   $btemp =~ tr/A-Z/a-z/;

   $atemp cmp $btemp;
}

#
# this function is used by the sort command when trying to compare the
# dates of two postings
#
sub by_date {

   local($akey,$bkey);

   $akey = $a;
   $bkey = $b;

   @afields = split(/[\s]+/,$akey);
   @bfields = split(/[\s]+/,$bkey);

   substr($afields[3],$[,2) = '';
   substr($bfields[3],$[,2) = '';

   $months{'Jan'} = 0;
   $months{'Feb'} = 1;
   $months{'Mar'} = 2;
   $months{'Apr'} = 3;
   $months{'May'} = 4;
   $months{'Jun'} = 5;
   $months{'Jul'} = 6;
   $months{'Aug'} = 7;
   $months{'Sep'} = 8;
   $months{'Oct'} = 9;
   $months{'Nov'} = 10;
   $months{'Dec'} = 11;
   $weekday{'Sun'} = 0;
   $weekday{'Mon'} = 1;
   $weekday{'Tue'} = 2;
   $weekday{'Wed'} = 3;
   $weekday{'Thu'} = 4;
   $weekday{'Fri'} = 5;
   $weekday{'Sat'} = 6;

   &timegm('0','0','0',$afields[2],$months{$afields[1]},$afields[3],
   $weekday{$afields[0]},'','') <=>
   &timegm('0','0','0',$bfields[2],$months{$bfields[1]},$bfields[3],
   $weekday{$bfields[0]},'','');

}

#
# I hex-encode certain fields to avoid the possibility that info within
# the fields will botch up certain HTML situations.
#
sub hex_encode {

   local($an,$temp);
   ($an) = @_;

   undef($temp);
   for $i ( 0 .. (length($an)-1) ) {
      $temp .= sprintf("%lx",ord(substr($an,$[+$i,1)));
   }
   $temp;
}

#
# hex-decoding is necessary to retrieve info that was hex-encoded before
#
sub hex_decode {

   local($acode,$temp,$t);
   ($acode) = @_;

   undef($temp);
   while ( $acode ) {
      $t = substr($acode,$[,2);
      substr($acode,$[,2) = '';
      $temp .= pack("c",hex($t));
   }
   $temp;
}

Discussion Forum Additions

Now that the basic concept of a discussion forum and its administration has been established, let's consider what sorts of useful "bells and whistles" can be brought to the field.

Selective Sorting Criteria

Much as articles could be deleted according to author or subject in addition to date, coding in an option so that the list of discussion forum postings could be displayed in any of these ways could be a welcome addition.

Search Engines

Sometimes threads and subject lines won't be enough when it comes to finding something you want within a discussion forum. A search engine that would scan through all postings looking for certain words or phrases and returning with a report of relevant articles might be a great boon for a large discussion forum. This feature is found within many Usenet newsgroup readers, and many of the modern Web search engines also include an option to search a database of Usenet postings, as well.

Registered Users and .htaccess Schemes

If the discussion forum was meant to be a sort of "company intranet" solution, it would be mandatory to restrict access to only those people who were authorized. One means of accomplishing this is the .htaccess system. Figure 18.3 is a screen shot showing a .htaccess-inspired pop-up dialog box.

Figure 18.3: A view of the hataccess scheme, as interpreted by Netscape.

The .htaccess scheme is an intrinsic feature of most Web servers. To create a .htaccess security system, the Web programmer will need to place a .htaccess file in the directory that is to be protected. The .htaccess file references files that are usually called .htgroup and .htpasswd. The creation and use of these files would justify an entire chapter or two within this book, so I can't go into all the details here. One important point I will leave you with is this: The User Name typed into the appropriate field in the .htaccess pop-up dialog box will be returned as the REMOTE_USER environment variable. Given this tidbit of information, your discussion forum can have precise knowledge of who is making (and even reading!) postings.

Summary

In your CGI programming experiences, you'll often find an excuse to include a discussion forum. I find them to be one of the most marketable aspects of the field. A full appreciation of them will pay you dividends.

Conceptually, discussion forums are based on the Usenet newgroups. Understandably, they share many of the same data fields. As a discussion forum programmer, you must ask yourself if you want these fields to be embedded within pre-processed HTML documents or as part of a header section in postings that are meant to be parsed each and every time they are reviewed.

Related to this choice is how you organize your parent/child hierarchy. How will you build family trees at run time? How will you permanently store that information?

No discussion forum is complete without an accompanying administration utility. This will no doubt be intimately related to how the discussion forum operates and should include different ways of tending to the postings.

Before I end this chapter, I want to point out a popular CGI discussion forum system. This is the WWWBoard of Matt's Script Archive. This discussion forum uses the "ready-made HTML file" philosophy, in contrast to how I've done things in this chapter. You can find this CGI system at

http://worldwidemart.com/scripts/wwwboard.shtml

Matt's Script Archive is an impressive CGI resource, and I take my hat off to Matt for his work in promoting, creating, and distributing quality CGI programs.