innerjoin |
|
| innerjoin | apache-compression | |
Want to hold gzip'd files on the webserver and hand out compressed data to the browsers that can cope and uncompress it for browsers which cannot?
Mod_gunzip is Helge Oldach's Apache module which provides this functionality for static data.
If you want details of the command line
gzip and gunzip programs,
see the
GNU documentation or O'Reilly's
nutshell.
Don't confuse
mod_gunzip with mod_gzip,
mod_gunzip uncompresses data 'on-the-fly',
mod_gzip compresses it.
mod_gunzip.tar.gz file from Helge Oldach's
http://www.oldach.net/ site,
unpack it with a
tar -zxvf mod_gunzip.tar.gz
mod_gunzip.c source.
Tested with an Apache 1.3.12 probably from a RedHat 7.0 install.
mod_gunzip-1-2 version. This works but it will
always decompress the data, it will not send the
compressed data to browsers which say they can handle it.
Version 2 of mod_gunzip gives Apache the ability to return the
uncompressed data only when needed.
apxs
utility which is part of the Apache Development Environment
(in the way that Red Hat packages things anyway)
apxs -c -lz mod_gunzip.c
A full description of compiling Apache with a set of commonly needed modules is given by Luc de Louw's Apache Compilation Howto (mirrored at the Linux Documentation Project and in French and German from here)
mod_gunzip.so module in the
/usr/httpd/modules directory
(or the usr/lib/apache/ directory, depending on your version or
distribution). This .so file is read into Apache through
dynamic loading/linking; the file is
read from disc and made part of the running Apache system.
httpd.conf file
to reference it:apxs -i -a -c -lz mod_gunzip.c
Decide how you want to handle your compressed files. There is a choice of:
.gz file back to
html, option A, for example by:
gzip -9 index.html && mv index.html.gz index.html
gzip -9 index.html && mv index.html.gz index.htmz
.htmz files is that you need
to change your links.
The next step is to edit the Apache configuration files. Add the lines
LoadModule gunzip_module /usr/httpd/modules/mod_gunzip.so
and
AddModule mod_gunzip.c
in amongst the similar looking lines of the httpd.conf file
It does not seem to make any difference where the
loadmodule
and
addmodule
are placed in the order (at least for static pages).
(for mod_gzip which is doing more, it seems to be
more complicated)
The loadmodule and addmodule directives need to be included in
server configuration file, loading a module is thus the responsibility of the server admin.
Once loaded however, the behaviour of the mod_gunzip module can be controlled on a
per-directory basis in the .htaccess files
No, you have not copied a mod_gunzip.c file anywhere. You may have
notes in the httpd.conf file saying you need to add both these
lines. Yes, these are correct, the module does not work without the AddModule.
The LoadModule brings the module code 'into' apache through the dynamic
loading process and the AddModule tells Apache that it can use it.
The second configuration step depends on whether you are following Option A or Option B.
If you are compressing the html files and renaming them back to
.html you need to add a line:
AddHandler send-gunzipped .html
to your config or .htaccess file
There's useful information about .htaccess
files in the
Using .htaccess
tutorial and the Apache
configuration files
documentation.
Handlers are briefly described in the Apache What is a handler? documentation.
This AddHandler tells Apache that there is module wanting
to have a look at files with a .html extension. The mod_gunzip code
picks up the reference because it is looking for 'send_gunzipped'
If the file is not compressed then mod_gunzip will not do anything with the file, it just 'passes' and lets Apache or another module hand back the html to the browser.
There is a slight overhead in serving pages if using mod_gunzip and the data is not compressed. There is also the distinctly human difficulty of remembering whether your source file is compressed or not.
It is also possible to hold compressed data in .htmz files
(using this extension as an example).
The configuration files would then need to include two extra lines instead of the one:
AddType text/html .htmz
AddHandler send-gunzipped .htmz
where the first is saying that the MIME type of .htmz files
is html text. Without this line the browser is likely
to show the raw HTML.
It is then useful is to edit the DirectoryIndex line to include
index.htmz files:
DirectoryIndex index.htmz index.html index.htm ....
without this Apache will not serve the compressed index file if it is given a URL without the 'file' component -
http://www.somewhere.net/
rather than
http://www.somewhere.net/index.htmz
This directive has to be set in the Apache Configuration file (not the
per-directory .htaccess files).
A telnet session is fine for testing. Connect to the webserver with:
telnet localhost 80
and type the HTTP commands:
GET /index.html HTTP/1.1
Host: localhost
<cr>
should give you uncompressed text whether or not the original is compressed (this is assuming a setup as per option A).
Including an accept-encoding statement:
GET /index.html HTTP/1.1
Host: localhost
Accept-Encoding: gzip
<cr>
should give you the compressed data (assuming index.html is a compressed file) with a header line of
Content-Encoding: gzip
The entry in the Apache access-log will show the size of the data
sent down the wire, so the size of the gzip'd data if the browser accepts it or
the size of the uncompressed data if mod_gunzip has to uncompress it.
It is better to hold files compressed and uncompress then when needed or the
other way round? That is, is it better to try to set up a mod_gunzip or
a mod_gzip?
Most likely the first option, mod_gunzip, with the reasons being that most browsers can
handle gzip compressed data now and the overhead for the browsers, and robots,
which cannot is fairly low - it is easier on the server to uncompress data when required
than to compress it, gunzip runs faster then gzip and
you are able to use higher levels of compression (gzip -9) as
you are compressing offline.
Check the
mod_gzip
pages and
FAQ though,
the two modules have different strengths. Mod_gzip compresses dynamically generated
content on the fly (which includes Server Side Includes and CGI output)
and, since version 1.3.19a, can also return compressed
copies of pages on disc
If looking to mod_gunzip to reduce the storage requirements on the webhost then
compare the disc space allocated (with du) rather than the size of the
file in bytes. File systems allocate space in terms of blocks or segments of several
Kbytes so compressing small files may not reduce the quota.
This howto concentrates specifically on html files, the
AddHandler commands only tell Apache to
check whether .html or .htmz files as compressed.
This lightly sidesteps some of the traps presented by some
older browsers
which have trouble with compressed .css or .js JavaScript files.
Mod_gunzip recognises the header lines 'Accept-Encoding: gzip' and 'Accept-Encoding: x-gzip' On incoming requests and responds with a 'Content-Encoding: gzip' if it passes back gzip compressed data.
The practice amongst older browsers was to send 'x-gzip' rather than 'gzip'. See the HTTP 1.1 spec, RFC 2616:
In the meantime the term "gzip" seems to have been adopted as a
description of the compression algorithm as much as the name of
the program and Apache itself treats gzip and
x-gzip as
equivalent.
Why use .htmz rather than .html.gz?
This is a matter of personal preference but concatenated extensions
add additional complication to the Apache configuration.
With a .htmz extension it is practical to set the MIME
type to be returned with
AddType:
AddType text/html .htmz
Whereas a file with an appended .gz extension is
considered deliberately compressed and sent with the MIME type
application/x-gzip. The browser then prompts you
to save to disc. With Apache 1.3.12, at least, adding
AddType text/html .html.gz
line does not solve the problem (presumably as the
AddEncoding config is still saying .gz files are gzipped
but this should not really affect the MIME type?)
mod_gunzip recognises the Accept-Encoding request header, not TE
Do any browsers make use of the TE: gzip request and
Transfer-Encoding? This is mentioned in the Mozilla Perfomance
HTTP Compression page
but does not seem to be used by Mozilla. Opera sends both the Accept-Encoding
and TE headers.
The difference seems to be that using Transfer-Encoding is completely
unambigous, the compression done is simply for the transport and the data
would be presented uncompressed in the browser.
In comparision Accept-Encoding does not carry clear implication
that the browser will uncompress the data or offer to save it compressed.
Mod_gunzip makes an assumption that the browser will uncompress and display any content
flagged with a Content-Encoding: gzip reponse.
According to a description of the differences between HTTP 1.0 and 1.1 there was not a formal specification of HTTP 1.0, RFC1945 being a description of 'common usage' at the time rather than a formal standard. See the comparisions of: