Apache::ParseLog - Object-oriented Perl extension for parsing Apache log files


NAME

Apache::ParseLog - Object-oriented Perl extension for parsing Apache log files


SYNOPSIS

    use Apache::ParseLog;
    $base = new Apache::ParseLog();
    $transferlog = $base->getTransferLog();
    %dailytransferredbytes = $transferlog->bytebydate();
    ...


DESCRIPTION

Apache::ParseLog provides an easy way to parse the Apache log files, using object-oriented constructs. The data obtained using this module are generic enough that it is flexible to use the data for your own applications, such as CGI, simple text-only report generater, feeding RDBMS, data for Perl/Tk-based GUI application, etc.


FEATURES

  1. Easy and Portable Log-Parsing Methods

    Because all of the work (parsing logs, constructing regex, matching and assigning to variables, etc.) is done inside this module, you can easily create log reports (unless your logs need intense scrutiny). Read on this manpage as well as the EXAMPLES section to see how easy it is to create log reports with this module.

    Also, this module does not require C compiler, and it can (should) run on any platforms supported by Perl.

  2. Support for LogFormat/CustomLog

    The Apache Web Server 1.3.x's new LogForamt/CustomLog feature (with mod_log_config) is supported.

    The log format specified with Apache's LogFormat directive in the httpd.conf file will be parsed and the regular expressions will be created dynamically inside this module, so re-writing your existing code will be minimal when the log format is changed.

  3. Reports on Unique Visitor Counts

    Tranditionally, the hit count is calculated based on the number of files requested by visitors (the simplest is the the total number of lines of the log file calculated as the ``total hit'').

    As such, the hit count obviously can be misleading in the sense of ``how many visitors have actually visited my site?'', especially if the pages of your site contain many images (because each image is counted as one hit).

    Apache::ParseLog provides the methods to obtain such traditional data, because those data also are very important for monitoring your web site's activities. However, this module also provides the methods to obtain the unique visitor counts, i.e., the actual number of ``people'' (well, IP or hostname) who visited your site, by date, time, and date and time.

    See the LOG OBJECT METHODS for details about those methods.

  4. Pre-Compiled Regex

    The new pre-compiled regex feature introduced by Perl 5.005 is used (if you have the version installed on your machine).

    For the pre-compiled regex and the new quote-like assignment operator (qr), see perlop(1) and perlre(1) manpages.


CONSTRUCTOR

To construct an Apache::ParseLog object,new() method is available just like other modules.

The new() constructor returns an Apache::ParseLog base object with which to obtain basic server information as well as to construct log objects.

New Method

new([$path_to_httpd_conf[, $virtual_host]]);

With the new() method, an Apache::ParseLog object can be created in three different ways.

  1. $base = new Apache::ParseLog();

    This first method creates an empty object, which means that the fields of the object are undefined (undef); i.e., the object does not know what the server name is, where the log files are, etc. It is useful when you need to parse log files that are not created on the local Apache server (e.g., the log files FTP'd from elsewhere).

    You have to use the config() method (see below) to call any other methods.

  2. $base = new Apache::ParseLog($httpd_conf);

    This is the second way to create an object with necessary information extracted from the $httpd_conf. $httpd_conf is a scalar string containing the absolute path to the httpd.conf file; e.g.,

        $httpd_conf = "/usr/local/httpd/conf/httpd.conf";

    This method tries to extract the information from $httpd_conf, specified by the following Apache directives: ServerName, Port, ServerAdmin, TransferLog, ErrorLog, AgentLog, RefererLog, and any user-defined CustomLog along with LogFormat.

    If any of the directives cannot be found or commented out in the $httpd_conf, then the field(s) for that directive(s) will be empty (undef), and corresponding methods that use the particular fields return an empty string when called, or error out (for log object methods, refer to the section below).

  3. $base = new Apache::ParseLog($httpd_conf, $virtual_host);

    This method creates an object just like the second method, but for the VirtualHost specified by $virtual_host only. The Apache directives and rules not specified within the <VitualHost xxx> and </VirtualHost> tags are parsed from the ``regular'' server section in the httpd.conf file.

    Note that the $httpd_conf must be specified in order to create an object for the $virtual_host.


BASE OBJECT METHODS

This section describes the methods available for the base object created by the new() construct described above.

Unless the object is created with an empty argument, the Apache::ParseLog module parses the basic information configured in the httpd.conf file (as passed as the first argument). The object uses the information to construct the log object.

The available methods are (return values are in parentheses):

    $base->config([%fields]); # (object)
    $base->version(); # (scalar)
    $base->serverroot(); # (scalar)
    $base->servername(); # (scalar)
    $base->httpport(); # (scalar)
    $base->serveradmin(); # (scalar)
    $base->transferlog(); # (scalar)
    $base->errorlog(); # (scalar)
    $base->agentlog(); # (scalar)
    $base->refererlog(); # (scalar)
    $base->customlog(); # (array)
    $base->customlogLocation($name); # (scalar)
    $base->customlogExists($name); # (scalar boolean, 1 or 0)
    $base->customlogFormat($name); # (scalar)
    $base->getTransferLog(); # (object)
    $base->getErrorLog(); # (object)
    $base->getRefererLog(); # (object)
    $base->getAgentLog(); # (object)
    $base->getCustomLog(); # (object)


LOG OBJECT METHODS

This section describes the methods available for the log object created by any of the following base object methods: getTransferLog(), getErrorLog(), getRereferLog(), getAgentLog(), and getCustomLog($log_nickname).

This section is devided into six subsections, each of which describes the available methods for a certain log object.

Note that all the methods for TransferLog, RefererLog, and AgentLog can be used for the object created with getCustomLog($name).

TransferLog/CustomLog Methods

The following methods are available for the TransferLog object (created by getTransferLog() method), as well as the CustomLog object that logs appropriate arguments to the corresponding LogFormat.

ErrorLog Methods

Until the Apache version 1.2.x, each error log entry was just an error, meaning that there was no distinction between ``real'' errors (e.g., File Not Found, malfunctioning CGI, etc.) and non-significant errors (e.g., kill -1 the httpd processes, etc.).

Starting from the version 1.3.x, the Apache httpd logs the ``type'' of each error log entry, namely ``error'', ``notice'' and ``warn''.

If you use Apache 1.2.x, the errorbyxxx(), noticebyxxx(), and warnbyxxx() should not be used, because those methods for that are for 1.3.x only will merely return an empty hash. The allbyxxx() methods will return desired results.

The following methods are available for the ErrorLog object (created by getErrorLog() method).

RefererLog/CustomLog Methods

The following methods are available for the RefererLog object (created by getRefererLog() method), as well as the CustomLog object that logs %{Referer}i to the corresponding LogFormat.

AgentLog/CustomLog Methods

This subsection describes the methods available for the AgentLog object (created by getAgentLog() method), as well as the CustomLog object that logs %{User-agent}i to the corresponding LogFormat.

CustomLog Methods

This subsection describes the methods available only for the CustomLog object. See each method for what Apache directive is used for each returned result.

Special Method

The special method described below, getMethods(), can be used with any of the log objects to extract the methods available for the calling object.


MISCELLANEOUS

This section describes some miscellaneous methods that might be useful.

Exported Methods

This subsection describes exported methods provided by the Apache::ParseLog module. (For the information about exported methods, see Exporter(3).)

Note that those exported modules can be used (called) just like local (main package) subroutines.


EXAMPLES

The most basic, easiest way to create reports is presented as an example in the getMethods() section above, but the format of the output is pretty crude and less user-friendly.

Shown below are some other examples to use Apache::ParseLog.

Example 1: Basic Report

The example code below checks the TransferLog and ErrorLog generated by the Apache 1.2.x, and prints the reports to STDOUT. (To run this code, all you have to do is to change the $conf value.)

    #!/usr/local/bin/perl
    $|++;
    use Apache::ParseLog;
    $conf = "/usr/local/httpd/conf/httpd.conf"; 
    $base = new Apache::ParseLog($conf);
    print "TransferLog Report\n\n";
    $transferlog = $base->getTransferLog();
    %hit = $transferlog->hit();
    %hitbydate = $transferlog->hitbydate();
    print "Total Hit Counts: ", $hit{'Total'}, "\n";
    foreach (sort keys %hitbydate) {
        print "$_:\t$hitbydate{$_}\n"; # <date>: <hit counts>
    }
    $hitaverage = int($hit{'Total'} / scalar(keys %hitbydate));
    print "Average Daily Hits: $hitaverage\n\n";
    %byte = $transferlog->byte();
    %bytebydate = $transferlog->bytebydate();
    print "Total Bytes Transferred: ", $byte{'Total'}, "\n";
    foreach (sort keys %bytebydate) {
        print "$_:\t$bytebydate{$_}\n"; # <date>: <bytes transferred>
    }
    $byteaverage = int($byte{'Total'} / scalar(keys %bytebydate));
    print "Average Daily Bytes Transferred: $byteaverage\n\n";
    %visitorbydate = $transferlog->visitorbydate();
    %host = $transferlog->host();
    print "Total Unique Visitors: ", scalar(keys %host), "\n";
    foreach (sort keys %visitorbydate) {
        print "$_:\t$visitorbydate{$_}\n"; # <date: <visitor counts>
    }
    $visitoraverage = int(scalar(keys %host) / scalar(keys %visitorbydate));
    print "Average Daily Unique Visitors: $visitoraverage\n\n";
    
    print "ErrorLog Report\n\n";
    $errorlog = $base->getErrorLog();
    %count = $errorlog->count();
    %allbydate = $errorlog->allbydate();
    print "Total Errors: ", $count{'Total'}, "\n";
    foreach (sort keys %allbydate) {
        print "$_:\t$allbydate{$_}\n"; # <date>: <error counts>
    }
    $erroraverage = int($count{'Total'} / scalar(keys %allbydate));
    print "Average Daily Errors: $erroraverage\n\n";
    exit;

Example 2: Referer Report

The RefererLog (or CustomLog with referer logged) contains the referer for every single file requested. It means that everytime a page that contains 10 images is requested, 11 lines are added to the RefererLog, one line for the actual referer (where the visitor comes from), and the other 10 lines for the images with the just refererd page containing the 10 images as the referer, which is probably a little too much more than what you want to know.

The example code below checks the CustomLog that contains referer, (among other things), and reports the names of the referer sites that are not the local server itself.

    #!/usr/local/bin/perl
    $|++;
    use Apache::ParseLog;
    $conf = "/usr/local/httpd/conf/httpd.conf"; 
    $base = new Apache::ParseLog($conf);
    $localserver = $base->servername();
    $log = $base->getCustomLog("combined");
    %referer = $log->referer();
    @sortedkeys = sortHashByValue(%referer);
    print "External Referers Report\n";
    foreach (@sortedkeys) {
        print "$_:\t$referer{$_}\n" unless m/$localserver/i or m/^\-/;
    }
    exit;

Example 3: Access-Controlled User Report

Let's suppose that you have a directory tree on your site that is access-controlled by .htaccess or the like, and you want to check how frequently the section is used by the users.

    #!/usr/local/bin/perl
    $|++;
    use Apache::ParseLog;
    $conf = "/usr/local/httpd/conf/httpd.conf";
    $base = new Apache::ParseLog($conf);
    $log = $base->getCustomLog("common");
    %user = $log->user();
    print "Users Report\n";
    foreach (sort keys %user) {
        print "$_:\t$user{$_}\n" unless m/^-$/;
    }
    exit;


SEE ALSO

perl(1), perlop(1), perlre(1), Exporter(3)


BUGS

The reports on lesser-known browsers returned from the AgentLog methods are not always informative.

The data returned from the referer() method for RefererLog may be irrelvant if the referred files are not accessed via HTTP (i.e., the referer does not start with ``http://'' string).

If the base object is created with the $virtualhost specified, unless the ServerAdmin and ServerName are specified within the <VirtualHost xxx> ... </VirtualHost>, those values specified in the global section of the httpd.conf are not shared with the $virtualhost.


TO DO

Increase the performance (speed).


VERSION

Apache::ParseLog 1.01 (10/01/1998).


AUTHOR

Apache::ParseLog was written and is maintained by Akira Hangai (akira@discover-net.net)

For the bug reports, comments, suggestions, etc., please email me.


COPYRIGHT

Copyright 1998, Akira Hangai. All rights reserved.

This program is free software; You can redistribute it and/or modify it under the same terms as Perl itself.


DISCLAIMER

This package is distributed in the hope that it will be useful for many web administrators/webmasters who are too busy to write their own programs to analyze the Apache log files. However, this package is so distributed WITHOUT ANY WARRANTY in that any use of the data generated by this package must be used at the user's own discretion, and the author shall not be held accountable for any results from the use of this package.

 Apache::ParseLog - Object-oriented Perl extension for parsing Apache log files