CGIWrapper User's Guide

CGIWrapper version 1.0.2

Contents

Synopsis

The CGI Wrapper is a CGI script used to execute other Python CGI scripts. The wrapper provides convenient access to form fields and headers, exception catching, and usage and performance logging. Hooks are provided for cookies and class-based CGI scripts.

Description

Overview

The CGI Wrapper is a single CGI script used to execute other Python CGI scripts. A typical URL for a site that uses the wrapper looks like this:

http://www.somesite.com/server.cgi/SignUp

The server.cgi part is the CGI Wrapper and the SignUp part is the target Python script, whose real filename is SignUp.py. Also, that file is located in a directory named Scripts/, but there's no need for the user of the site to know about Scripts/ or .py; those are implementation details.

The wrapper provides the following benefits:

You don't have to immediately write code to play with CGI Wrapper. There are several samples included. See Running and Testing below.

Globals

The globals set up by the wrapper for the target CGI script are:

Global Type/Class Description
headers dictionary Contains all the HTTP headers that will be sent back to the client. The default contents are 'Content-type': 'text/html'. Often, the headers don't need to be modified at all. One popular use of the headers is 'Redirect': 'someURL' to point the client to a different place.
fields cgi.FieldStorage This instance of FieldStorage comes from the standard Python cgi module. Typical uses include fields.has_key('someField') and fields['someField'].value. See the Python standard module documentation for cgi for more information.
environ dictionary This dictionary represents the environment variables passed to the CGI scripts. Scripts should use this rather than os.environ since future versions of CGI Wrapper could be tightly integrated into web servers, thereby changing the nature of how environment variables get passed around (e.g., no longer through the OS). Also, note that the environment may seem a little non-standard to the target CGI script since the web server is setting it up to run the CGI Wrapper instead. In most CGI scripts (that execute under the wrapper), the environment is not even needed.
wrapper CGIWrapper This is a pointer back to the CGI Wrapper instance. This allows CGI scripts to communicate with the wrapper if they want. However, this is hardly ever needed.
cookies Cookie This global is not set up by the wrapper, but is looked for upon exit of the CGI script. See the Cookies section below for more information.

Errors / Uncaught Exceptions

One of the main benefits of the wrapper is the handling of uncaught exceptions raised by target CGI scripts. The typical behavior of the wrapper upon detecting an uncaught exception is:

  1. Log the time, error, script name and traceback to stderr. This information will typically appear in the web server's error log.
  2. Display a web page containing an apologetic message to the user and useful debugging information for developers.
  3. Save the above web page to a file so that developers can look at it after-the-fact. These HTML-based error messages are stored one-per-file, if the SaveErrorMessages setting is true (the default). They are stored in the directory named by the ErrorMessagesDir (defaults to 'ErrorMsgs').
  4. Add an entry to the CGI Wrapper's error log, called Errors.csv.
  5. E-mail the error message if the EmailErrors setting is true, using the settings ErrorEmailServer and ErrorEmailHeaders.

Here is a sample error page.

Archived error messages can be browsed through the administration page.

Error handling behavior can be configured as described in Configuration.

Configuration

There are several configuration parameters through which you can alter how CGI Wrapper behaves. They are described below, including their default values:

ScriptsHomeDir   = 'Examples'
This is where the wrapper always looks for the CGI scripts. This location would not appear in URLs. The path can be relative to the CGI Wrapper's location, or an absolute path. You should change this to your own Scripts directory instead of putting your scripts in the Examples directory.
ChangeDir   = 1
If true, the current working directory is changed to the same directory as the target script. Otherwise, the current working directory is left alone and likely to be the same as the CGI Wrapper.
ExtraPaths   = []
A list of a strings which are inserted into sys.path. This setting is useful if you have one or more modules that are shared by your CGI scripts that expect to be able to import them.
ExtraPathsIndex   = 1
This is the index into sys.path where the ExtraPath value is inserted. Often the first path in sys.path is '.' which is why the default value of ExtraPathsIndex is 1.
LogScripts   = 1
If true, then the execution of each script is logged with useful information such as time, duration and whether or not an error occurred.
ScriptLogFilename   = 'Scripts.csv'
This is the name of the file that script executions are logged to if LogScripts is true. If the filename is not an absolute path, then it is relative to the directory of the CGI Wrapper.
ScriptLogColumns

= ['environ.REMOTE_ADDR', 'environ.REQUEST_METHOD', 'environ.REQUEST_URI', 'responseSize', 'scriptName', 'serverStartTimeStamp', 'serverDuration', 'scriptDuration', 'errorOccurred']

Specifies the columns that will be stored in the script log. Each column is the name of an attribute of CGI Wrapper. The Introspect CGI example gives a list of all CGI Wrapper attributes. Note that attributes which are dictionaries, UserDicts or subclasses of MiddleKit's NamedValueAccess class can have their attributes used through dot notation (e.g., obj1.obj2.attr).
ClassNames   = ['', 'Page']
This is the list of class names that CGI Wrapper looks for after executing a script. An empty string signifies a class whose name is the same as its script (e.g., _admin in admin.py). See Class-based CGIs below.
ShowDebugInfoOnErrors   = 1
If true, then the uncaught exceptions will not only display a message for the user, but debugging information for the developer as well. This includes the traceback, HTTP headers, CGI form fields, environment and process ids.
UserErrorMessage

= 'The site is having technical difficulties with this page. An error has been logged, and the problem will be fixed as soon as possible. Sorry!'

This is the error message that is displayed to the user when an uncaught exception escapes the target CGI script.
LogErrors   = 1
If true, then CGI Wrapper logs exceptions. Each entry contains the date & time, filename, pathname, exception name & data, and the HTML error message filename (assuming there is one).
ErrorLogFilename   = 'Errors.csv'
This is the name of the file where CGI Wrapper logs exceptions if LogErrors is true.
SaveErrorMessages   = 1
If true, then errors (e.g., uncaught exceptions) will produce an HTML file with both the user message and debugging information. Developers/administrators can view these files after the fact, to see the details of what went wrong.
ErrorMessagesDir   = 'ErrorMsgs'
This is the name of the directory where HTML error messages get stored if SaveErrorMessages is true.
EmailErrors   = 0
If true, error messages are e-mail out according to the ErrorEmailServer and ErrorEmailHeaders settings. This setting defaults to false because the other settings need to be configured first.
ErrorEmailServer   = 'localhost'
The SMTP server to use for sending e-mail error messages.
ErrorEmailHeaders   =
{
    'From': 'webware@mydomain',
    'To': [webware@mydomain'],
    'Reply-to': 'webware@mydomain',
    'Content-type': 'text/html',
    'Subject': 'Error'
}
The e-mail MIME headers used for e-mailing error messages. Be sure to configure 'From', 'To' and 'Reply-to' before using this feature.
AdminRemoteAddr   = ['127.0.0.1']
A list of IP addresses or networks from which admin scripts can be accessed.

You can override any of these values by creating a CGIWrapper.config file in the same directory as the wrapper and selectively specifying values in a dictionary like so:

{
    'ExtraPaths':       ['Backend', 'ThirdParty'],
    'ScriptLog':        'Logs/Scripts.csv'
}

Running and Testing

Let's assume you have a web server running on a Unix box and a public HTML directory in your home directory. First, make a link from your public HTML directory to the source directory of the CGI Wrapper:

cd ~/public_html
ln -s ~/Projects/Webware/CGIWrapper pycgi

Note that in the Source directory there is an Examples directory and that the CGI Wrapper will automatically look there (you can configure this; see Configuration). Therefore you can type a URL like:

http://localhost/~echuck/pycgi/server.cgi/Hello

Note that you didn't need to include Examples in the page or a .py at the end.

There is a special CGI example called Directory.py that lists the other examples as links you can click on to run or view source. In this way, you can see the full set of scripts.

http://localhost/~echuck/pycgi/server.cgi/Directory

The resulting page will look something like the following. (Note: Those links aren't real!)

Size Script View
96Helloview
167Timeview
210Errorview
565Viewview
802Introspectview
925Colorsview
1251Directoryview

The above instructions rely on your web server executing any files that end in .cgi. However, some servers require that executable scripts are also located in a special directory (such as cgi-bin) so you may need to take that into consideration when getting CGI Wrapper to work. Please consult your web server admin or your web server docs. You may also have to specify the exact location of the Python interpreter in the first line of the server.cgi script, particularly under Windows.

Administration

CGI Wrapper comes with a script for administration purposes. You can access it by specifying the _admin script in the URL. You typically only have to remember _admin because it contains links to the other scripts.

Note that access to the admin scripts is restricted to the local host by default, but you can add more hosts or networks for administrators in the configuration.

From the administration page, you can view the script log, the error log and the configuration of the server. The error log display also contains links to the archived error messages so that you can browse through them.

The administration scripts are good examples of class-based CGIs so you may wish to read through their code.

Here's an example of the admin page.

Script Log

The script log uses the comma-separated-value (CSV) format, which can be easily read by scripts, databases and spreadsheets. The file is located in the same directory as the CGI Wrapper. The columns are fairly self-explanatory especially once you look at actual file. The Configuration section has more details under the ScriptLogColumns setting.

Class-based CGIs

As you write CGI scripts, and especially if they are for the same site, you may find that they have several things in common. For example, the generated pages may all have a common toolbar, heading and/or footing. You might also find that you display programmatically collected data in a similar fashion throughout your pages.

When you see these kinds of similarities, it's time to start designing a class hierarchy that takes advantage of inheritance, encapsulation and polymorphism in order to save you from duplicative work.

For example your base class could have methods header(), body() and footer() with the header and footer being fully implemented. Subclasses would then only need to override body() and would therefore inherit their look and feel from one source. You could take this much farther by providing several utility methods in the base class that are available to subclasses for use or customization.

CGI Wrapper provides a hook to support such class-based CGI scripts by checking for certain classes in the target script. The ClassNames setting, whose default value is ['', 'Page'], controls this behavior. After a script executes, CGI Wrapper checks these classes. The empty string is a special case which specifies a class whose name is the same name as its containing script (e.g., the class _admin in the script _admin.py).

If a matching class is found, it is automatically instantiated so that you don't have to do so in every script. The instantiation is basically:

print TheClass(info).html()

Where info has keys ['wrapper', 'fields', 'environ', 'headers'].

A good example of class-based CGIs are the admin pages for CGI Wrapper. Start by reading AdminPage.py and then continuing with the various admin scripts such as _admin.py and _showConfig.py. All of these are located in the same directory as CGI Wrapper.

On a final note, if you find that you're developing a sophisticated web-based application with accounts, sessions, persistence, etc. then you should consider using the WebKit, which is analogous to Apple's WebObjects and Sun's Java Servlets.

Other File Types

CGI Wrapper assumes that a URL with no extension (such .html) is a Python script. However, if the URL does contain an extension, the wrapper simply passes it through via HTTP redirection (e.g., the Location: header).

This becomes important when one of your CGI scripts writes a relative URL to a non-CGI resource. Such a relative URL ends up forcing server.cgi to come into play.

Cookies

Cookies are often an important part of web programming. CGI Wrapper does not provide explicit support for cookies, however, it provides an easily utilized hook for them.

If upon completion, the target script has a cookies global variable, the CGI Wrapper will print it to stdout. This fits in nicely with the Cookie module written by Timothy O'Malley that is part of the standard library since Python 2.0. There is also a copy of this module in the WebUtils package of Webware for Python.

Subclassing CGI Wrapper

This is just a note that CGI Wrapper is a class with well defined, broken-out methods. If it doesn't behave precisely as you need, you may very well be able to subclass it and override the appropriate methods. See the source which contains numerous doc strings and comments.

Private Scripts

Any script starting with an underscore ('_') is considered private to CGI Wrapper and is expected to be found in the CGI Wrapper's directory (as opposed to the directory named by the ScriptsHomeDir setting).

The most famous private script is the admin script which then contains links to others:

http://localhost/~echuck/pycgi/server.cgi/_admin

A second script is _dumpCSV which dumps the contents of a CSV file (such as the script log or the error log).

Files

server.cgi - Just a Python script with a generic name (that appears in URLs) that imports CGIWrapper.py. By keeping the CGIWrapper.py file separate we get byte code caching (CGIWrapper.pyc) and syntax highlighting when viewing or editing the script.

CGIWrapper.py - The main script that does the work.

CGIWrapper.config - An optional file containing a dictionary that overrides default configuration settings.

Scripts.csv - The log of script executions as described above.

Errors.csv - The log of uncaught exceptions including date & time, script filename and archived error message filename.

ErrorMsgs/Error-scriptname-YYYY-MM-DD-*.py - Archived error messages.

_*.py - Administration scripts for CGI Wrapper.

Release Notes

Limitations/Future

Note: CGI scripts are fine for small features, but if you're developing a full blown web-based application then you typically want more support, persistence and classes. That's where other Webware components like WebKit and MiddleKit come into play.

Here are some future ideas, with no commitments or timelines as to when/if they'll be realized. This is open source, so feel free to jump in!

The following are in approximate order of the author's perceived priority, but the numbering is mostly for reference.

To Do

  1. Examples: Make a Cookie example. (In the meantime, just see the main doc string of Cookie.py in WebUtils.)
  2. Wrapper: When a script produces no output, the CGI Wrapper should report that problem. (This most often happens for class based CGIs with incorrect class names.)
  3. Wrapper: There should probably be an option to clear the output of a script that raised an uncaught exception. Sometimes that could help in debugging.
  4. Admin: Create a summary page for the script and error logs.
  5. Wrapper: It's intended that the CGIWrapper class could be embedded in a server and a single instance reused several times. The class is not quite there yet.
  6. Wrapper: CGI scripts never get cached as byte code (.pyc) which would provide a performance boost.
  7. Wrapper: The error log columns should be configurable just like the script log columns.
  8. Code review: Misc functions towards bottom of CGIWrapper
  9. Code review: Use of _realStdout and sys.stdout on multiple serve() calls.
  10. Wrapper: Create a subclass of Python's CGI server that uses CGIWrapper. This would include caching the byte code in memory.
  11. Wrapper: htmlErrorPageFilename() uses a "mostly works" technique that could be better. See source.
  12. Wrapper: Keep a list of file extensions (such as .py .html .pl) mapped to their handlers. When processing a URL, iterate through the list until a file with that extension is found, then serve it up through its handler.
  13. Admin: Add password protection on the administration scripts.
  14. Wrapper: Provide integration (and therefore increased performance) with web servers such as Apache.
  15. Wrapper: Error e-mails are always in HTML format. It may be useful to have a plain text version for those with more primitive e-mail clients.

Credit

Author: Chuck Esterbrook

Some improvements were made by Christoph Zwerschke.

The idea of a CGI wrapper is based on a WebTechniques article by Andrew Kuchling.