A Case Study in Web Site Management

Brad Marshall
brad.marshall@member.sage-au.org.au
Plugged In Software
http://www.pisoftware.com/

Index
  1. Introduction
  2. Web Site Change Management
    1. Revision Control
    2. Templates
  3. Publishing
  4. Client Area
  5. Conclusion
  6. Useful Links
  7. Appendix
    1. Publish Script
    2. CVS Reference

1.0 Introduction

There are many issues that need to be resolved to allow hassle free management of websites. Some of these issues include the ability for multiple people to work on it, revision control for controlling changes to the website and the ability to use global headers and footers to control the look and feel of a website. It has also been useful to have an easy way to publish the website to a location that the client can view - be it in a production server, a Quality Assurance (QA) site, or on a development site.

Over the years, Plugged In Software has developed a set of procedures using freely available tools along with a little bit of scripting that makes this easy to achieve. These tools include make, m4, cvs, and a perl script that uses rsync and ssh.

2.0 Web Site Change Management

To ensure that you have proper change management of a website the most fundamental requirement is that you have revision control over your content. This can ensure that any changes that are made are not lost, by allowing conflict resolution if you have multiple people editing the same website. It also allows you to keep different revisions of a website - this means, for example, you could have an existing website that you are making minor changes to, while you have a new site in development.

Along side this, it is almost important to ease the maintainance of a consistant look and feel by using templates. This allows the web content authors to just deal with what they do well - maintaining the content - and avoid having to deal with updating the look and feel of the site. This will reduce time spent on work that can be automated by allowing you to maintain the look and feel in one place.

2.1 Revision Control

Plugged In uses CVS as a solution for revision control of websites. CVS (Concurrent Versions System) keeps a track of who has made what changes to files, and when. It does this by storing just the changes that are made, and logging a message about what the changes actually were.

CVS works by a client server model. A server is set up in a central location, and clients connect to it in a variety of ways. At Plugged In, the most common way of connecting to our CVS server is over NFS - the file system that has the CVS repositories is NFS exported and clients mount the directory, and then talk to CVS using the standard CVS commands. However, it is certainly possible for a user to connect to the server in a variety of ways. Perhaps the best option would be using ssh - it would allow the users to connect to the server from anywhere on the internet securely.

One good way of ensuring people can only edit the projects they are authorised for is to keep them in different repositories. The repositories can have permissions such that only certain groups can write to them, and the users who are allowed to edit the project will be put in that group.

2.2 Templates

Most websites these days have a similar look and feel over the website - for example, there is usually a common header which specifies different things such as stylesheets or links. M4 is a macro processor designed to preprocess files, and expand out macros, as well as spawning external shell programs. Make is a utility designed for use in detecting which part of a program needs recompiling - but in this case it is used to determine what files in the website need to be reprocessed.

By using m4 it is possible to abstract away things such as headers and footers into a seperate file, where they can be maintained seperately and referenced when you need to use them. It is also possible to spawn external programs which allows such functionality as automatically updating of last updated dates, and many other possibilities.

One of the more useful easy ways of changing the look and feel of a website is to define headers and footers for each page. This can be done by simply defining m4 macros that expand for each of the headers and footers that are needed. Listing 2.2.1 shows an example of a very basic page that includes a header and a footer, and listing 2.2.2 shows an include file that defines these headers and footers.

m4_include(include/stdlib.m4)

_HEADER(`Home Page', `./')

<P>This is a test page</P>

_FOOTER(`./', `./')
Listing 2.2.1
m4_define(`_HEADER',`
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<!--- Web site design and development by Plugged In Software.    --->
<!--- COPYRIGHT(c) 2000 Queensland Sugar.                        --->

<HEAD>
  <TITLE> Test Page: $1</TITLE>
')

m4_define(`_FOOTER',`
<A HREF="$1index.html" TARGET="_top">Home page</A> | <A HREF="$1contact.html" TARGET="_top">Contact Us</A>
<IMG SRC="$2images/logo.gif" HEIGHT="100" WIDTH="200" ALT="Test logo">
')
Listing 2.2.2

3.0 Publishing

While having a convient way of manipulating the html is useful, it was becoming clear that lots of time was being lost dealing with uploading files to the webserver for view. This meant valuable developer time was lost with them dealing with trying to remember machine information, file paths, and frustrations over permission problems. This also meant systems administration time was lost helping fix these problems when it could clearly something that could (and should) be automated as much as possible.

One obvious solution was to use rsync over ssh to syncronise between the local master copy of the web pages and the remote server, but this still required people to remember information such as the directory structure, and was still chewing valuble time with dealing with this problem.

A perl wrapper was quickly developed which simply took care of ``remembering'' the local and remote directory structure. This first version of the script was fairly primative, just allowing specification of a local flat directory structure to pull the information as well as the remote directory on a fixed machine. Surprisingly, this work remarkably well and saved quite a few headaches.

Over time this script has gradually evolved to the point where it is fairly flexible and covers most situations that it is required to. The source can be either a specified directory, or it can come from a cvs module by specifying the cvs repository and module. Another trivial extension was to make it possible to publish to different hosts - this has not been as important as there is only one production server that we use this on at the moment, the other host is the staff webpage server.

It is also possible to specify files to exclude, such as backup files, cvs meta files and a wide variety of others, as well as being able to override this for special cases where these files are required. This exclusion and inclusion can be done on either a global or module level.

Another nice feature is by careful use of the exclusion rules you can create modules that can be published inside other modules. This is useful in the situation where you have clearly defined areas of the module that are updated by different individuals, as in Plugged In's client area, where some areas are maintained by the developers - release notes, design documents, and other areas are maintained by administrative staff - invoices etc.

The server side setup of this is fairly minimal - you simply have to give the users the ability to connect to the webserver via ssh as a user with the appropriate permissions to write to the desired areas. In this case, it is a fairly simple matter of adding desired peoples ssh public keys to a users authorised keys file, who has the appropriate permissions. However, in a more complicated setup, you could easily specify different modules to have different users associated with them, and restrict uploads in this manner.

There is much that could be done to improve the script - at the moment it is a fairly simple perl script that has had features added to it, and as such could certainly do with a rewrite and a redesign. A few ideas have been tossed around, such as having pre and post publish scripts, possibly both locally and remotely, as well as ``meta-modules'', which include others. However, all in all, the mechanisim has saved a lot of problems with uploading files to a webserver.

4.0 Client Areas

To facilitate distribution of information to clients there has been an area set up, on a internet connected website, commonly known as a client area. This area is simply a flat directory structure that is protected via passwords using the standard htaccess mechanisim as well as only being accessible through https. These precautions ensure that all information is transmitted encrypted.

Each client area has one or more usernames associated with it, each with access to a different part of the tree. Usually, there is seperation of the access for the technical content and the financial content. This is done by having a directory named financials with protection on it for a different user than the one that has access to the rest of the site.

Client areas contain the following sorts of information:

To facilitate clients accessing the areas a simple script that redirects to the correct URL has been written. It consists of a simple form that users can enter their username and get redirected to the appropriate URL.

This has proved to be a valuable tool in communications with clients, many of whom are in a different timezone. In all, it has saved much messing around with organising expensive long distance phone calls and shipping documents overseas which has saved the company money.

5.0 Conclusion

Overall, this process allows rapid updates to be done and push into production with minimal impact on the systems administration team. There is no fussing about with permissions, or making sure people upload to the correct location, as this is all handled automatically. It also has the advantage of being fairly secure, with the possibility of access controls being put in place if necessary.


Useful Links


Appendix

Publish script


#!/usr/bin/perl -w

use strict 'norefs';

# Modules definitions
	# cvsroot, cvsmodule - cvs details
	# dest - destination directory on remote server
	# src - directory to get files from
	# host - username@host for ssh'ing in
	# exclude - files to exclude - adds to global exclude list
	# include - files to include - override for exclude
my %modules = (
	"static_dirs" => {
		"src" => "/path/to/src/of/website/",
		"dest" => "/path/to/dest/on/remote/site/"},
	"cvs_dirs" => {
		"host" => "username\@host.domain.com",
		"cvsroot" => "/path/to/cvs/root/",
		"cvsmodule" => "path/to/module",
		"dest" => "/path/to/dest/on/remote/site/"},
);

# Configuration details
	# exclude - files not to upload - can be overridden on a per module basis
	# host - default username@host to use for ssh
my %config = (
	"modules" => \%modules,
	"exclude" => [ "*.m4*", ".*.swp", "*.swp", "*.inc", "*~", "Makefile", "*.psd", "*.tgz" ],
	"host" => "www-data\@kernigan.pisoftware.com",
);

my $mod;
my $exclude_string;
my $include_string;

if ($ARGV[0]) {
	$mod = $ARGV[0];
	goto ulp if(!exists $modules{$mod});
} else {
      ulp:
	print "$0: $0 \n";
	print "$0: available modules:\n\t", join "\n\t", sort keys %modules;
	print "\n";
	exit 1;
}

# Build up the exclude string
map {
	$exclude_string .= "--exclude \"$_\" ";
} @{$config{exclude}},@{$config{"modules"}->{"$mod"}->{"exclude"}};

# Build up the include string
map {
	$include_string .= "--include \"$_\" ";
} @{$config{include}},@{$config{"modules"}->{"$mod"}->{"include"}};

my($srcdir);

# If we're simply pulling from a directory
if ($config{"modules"}->{"$mod"}->{"src"}) {
	$srcdir = $config{"modules"}->{"$mod"}->{"src"};
# else if we're getting it from cvs
} elsif ($config{"modules"}->{"$mod"}->{"cvsroot"}) {
	$cvsroot = $config{"modules"}->{"$mod"}->{"cvsroot"};
	$cvsmodule = $config{"modules"}->{"$mod"}->{"cvsmodule"}; 
	# Deal with a cvs checkout, then set $srcdir
	$dir = "/tmp/" . $$ . $< . "/";
	#print "\$dir = $dir\n";
	system("mkdir -p $dir");
	system("cd $dir; cvs -d $cvsroot export -Dtoday $cvsmodule");
	$srcdir = $dir . $cvsmodule . "/";
}

# Get the destination directory and host
my $destdir = $config{"modules"}->{"$mod"}->{"dest"};
my $host = $config{"modules"}->{"$mod"}->{"host"} || $config{"host"};

# Build up the rsync command
# added -H (preserve hard links)
$rsync = "rsync -e ssh $include_string $exclude_string -avH $srcdir $host:$destdir";

# Run it
system("$rsync");

# Clean up, if we got it from cvs
if ($cvsroot) {
	system("rm -r $dir");
}

CVS Reference

Creating a cvs repository is a fairly simple matter - all that is necessary is to initialise the repository, then import or add the files that are desired.

  $ export CVSROOT=/path/to/cvs
  $ cvs init
  $ cd /path/to/data
  $ cvs import VendorTag ReleaseTag

CVS has two methods of getting the data out of the repository - check outs, and exports. Checkouts are the standard way of getting a copy of the data - it also includes the CVS metadata. This is done like:

   $ export CVSROOT=/path/to/cvs
   $ cvs checkout module

To check out data from a remote cvs repository over ssh do:

   $ export CVS_RSH=ssh
   $ export CVSROOT=:ext:username@host:/path/to/cvs
   $ cvs checkout module

To get a copy of the data without the cvs metadata, for instance, for a website it is best to use export. This is done by:

   $ export CVSROOT=/path/to/cvs
   $ cvs export -Dnow module