innerjoin

innerjoin | apache-compression
Innerjoin Home Up Help

mod_gunzip howto

Want to hold gzip'd files on the webserver and hand out compressed data to the browsers that can cope and uncompress it for browsers which cannot?

Mod_gunzip is Helge Oldach's Apache module which provides this functionality for static data.

If you want details of the command line gzip and gunzip programs, see the GNU documentation or O'Reilly's nutshell.

Don't confuse mod_gunzip with mod_gzip, mod_gunzip uncompresses data 'on-the-fly', mod_gzip compresses it.

Steps

Notes and queries

Compilation

Design

Decide how you want to handle your compressed files. There is a choice of:

Configuration

The next step is to edit the Apache configuration files. Add the lines

and

in amongst the similar looking lines of the httpd.conf file

It does not seem to make any difference where the loadmodule and addmodule are placed in the order (at least for static pages). (for mod_gzip which is doing more, it seems to be more complicated)

The loadmodule and addmodule directives need to be included in server configuration file, loading a module is thus the responsibility of the server admin. Once loaded however, the behaviour of the mod_gunzip module can be controlled on a per-directory basis in the .htaccess files

No, you have not copied a mod_gunzip.c file anywhere. You may have notes in the httpd.conf file saying you need to add both these lines. Yes, these are correct, the module does not work without the AddModule. The LoadModule brings the module code 'into' apache through the dynamic loading process and the AddModule tells Apache that it can use it.

The second configuration step depends on whether you are following Option A or Option B.

Option A:

If you are compressing the html files and renaming them back to .html you need to add a line:

to your config or .htaccess file

There's useful information about .htaccess files in the Using .htaccess tutorial and the Apache configuration files documentation.

Handlers are briefly described in the Apache What is a handler? documentation.

This AddHandler tells Apache that there is module wanting to have a look at files with a .html extension. The mod_gunzip code picks up the reference because it is looking for 'send_gunzipped'

If the file is not compressed then mod_gunzip will not do anything with the file, it just 'passes' and lets Apache or another module hand back the html to the browser.

There is a slight overhead in serving pages if using mod_gunzip and the data is not compressed. There is also the distinctly human difficulty of remembering whether your source file is compressed or not.

Option B:

It is also possible to hold compressed data in .htmz files (using this extension as an example).

The configuration files would then need to include two extra lines instead of the one:

where the first is saying that the MIME type of .htmz files is html text. Without this line the browser is likely to show the raw HTML.

It is then useful is to edit the DirectoryIndex line to include index.htmz files:

without this Apache will not serve the compressed index file if it is given a URL without the 'file' component -

rather than

This directive has to be set in the Apache Configuration file (not the per-directory .htaccess files).

Testing

A telnet session is fine for testing. Connect to the webserver with:

and type the HTTP commands:

should give you uncompressed text whether or not the original is compressed (this is assuming a setup as per option A). Including an accept-encoding statement:

should give you the compressed data (assuming index.html is a compressed file) with a header line of

The entry in the Apache access-log will show the size of the data sent down the wire, so the size of the gzip'd data if the browser accepts it or the size of the uncompressed data if mod_gunzip has to uncompress it.

Notes and Queries

Compression or Uncompression?

It is better to hold files compressed and uncompress then when needed or the other way round? That is, is it better to try to set up a mod_gunzip or a mod_gzip?

Most likely the first option, mod_gunzip, with the reasons being that most browsers can handle gzip compressed data now and the overhead for the browsers, and robots, which cannot is fairly low - it is easier on the server to uncompress data when required than to compress it, gunzip runs faster then gzip and you are able to use higher levels of compression (gzip -9) as you are compressing offline.

Check the mod_gzip pages and FAQ though, the two modules have different strengths. Mod_gzip compresses dynamically generated content on the fly (which includes Server Side Includes and CGI output) and, since version 1.3.19a, can also return compressed copies of pages on disc

If looking to mod_gunzip to reduce the storage requirements on the webhost then compare the disc space allocated (with du) rather than the size of the file in bytes. File systems allocate space in terms of blocks or segments of several Kbytes so compressing small files may not reduce the quota.

What files to compress?

This howto concentrates specifically on html files, the AddHandler commands only tell Apache to check whether .html or .htmz files as compressed. This lightly sidesteps some of the traps presented by some older browsers which have trouble with compressed .css or .js JavaScript files.

gzip or x-gzip?

Mod_gunzip recognises the header lines 'Accept-Encoding: gzip' and 'Accept-Encoding: x-gzip' On incoming requests and responds with a 'Content-Encoding: gzip' if it passes back gzip compressed data.

The practice amongst older browsers was to send 'x-gzip' rather than 'gzip'. See the HTTP 1.1 spec, RFC 2616:

In the meantime the term "gzip" seems to have been adopted as a description of the compression algorithm as much as the name of the program and Apache itself treats gzip and x-gzip as equivalent.

.htmz or .html.gz?

Why use .htmz rather than .html.gz?

This is a matter of personal preference but concatenated extensions add additional complication to the Apache configuration. With a .htmz extension it is practical to set the MIME type to be returned with AddType:

Whereas a file with an appended .gz extension is considered deliberately compressed and sent with the MIME type application/x-gzip. The browser then prompts you to save to disc. With Apache 1.3.12, at least, adding

line does not solve the problem (presumably as the AddEncoding config is still saying .gz files are gzipped but this should not really affect the MIME type?)

Accept-Encoding or Transfer-Encoding?

mod_gunzip recognises the Accept-Encoding request header, not TE

Do any browsers make use of the TE: gzip request and Transfer-Encoding? This is mentioned in the Mozilla Perfomance HTTP Compression page but does not seem to be used by Mozilla. Opera sends both the Accept-Encoding and TE headers.

The difference seems to be that using Transfer-Encoding is completely unambigous, the compression done is simply for the transport and the data would be presented uncompressed in the browser.

In comparision Accept-Encoding does not carry clear implication that the browser will uncompress the data or offer to save it compressed. Mod_gunzip makes an assumption that the browser will uncompress and display any content flagged with a Content-Encoding: gzip reponse.

According to a description of the differences between HTTP 1.0 and 1.1 there was not a formal specification of HTTP 1.0, RFC1945 being a description of 'common usage' at the time rather than a formal standard. See the comparisions of:

Navigation: [ Home ] [ Index ] [ Site Map ] [ Translate ] [ Up ] [ What's New ]
Mail: mgk@iv.mmv.innerjoin.org, 2002/09/10
URI: http://www.innerjoin.org/apache-compression/howto.html
Keywords:innerjoinorg200112, innerjoinorg, innerjoin, mod_gunzip, gunzip, apache, compression