-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
1949 lines (1615 loc) · 100 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
The Webalizer - A web server log file analysis tool
Copyright 1997-2013 by Bradford L. Barrett
Distributed under the GNU GPL. See the files "COPYING" and
"Copyright" supplied with the distribution for additional info.
What is The Webalizer?
----------------------
The Webalizer is a web server log file analysis program which produces
usage statistics in HTML format for viewing with a browser. The results
are presented in both columnar and graphical format, which facilitates
interpretation. Yearly, monthly, daily and hourly usage statistics are
presented, along with the ability to display usage by site, URL, referrer,
user agent (browser), search string, entry/exit page, username and country
(some information is only available if supported and present in the log
files being processed). Processed data may also be exported into most
database and spreadsheet programs that support tab delimited data formats.
The Webalizer supports CLF (common log format) log files, as well as
Combined log formats as defined by NCSA and others, and variations
of these which it attempts to handle intelligently. In addition, The
Webalizer supports wu-ftpd xferlog (FTP) formatted logs, squid proxy logs
and W3C extended format logs.
Gzip compressed logs may be used as input directly. Any log filename
that ends with a '.gz' extension will be assumed to be in gzip format and
uncompressed on the fly as it is being read. The Webalizer now also has
the ability to handle BZip2 compressed logs, if enabled at compile time.
Similar to gzipped logs, any log filename that ends with a '.bz2' will be
assumed to be in bzip2 format and uncompressed on the fly as it is being
read.
For sites that do not enable hostname lookups (DNS resolution) on their
web servers (and have only IP addresses in their logs), The Webalizer
provides its own internal DNS lookup capability as well as geolocation
services (GeoDB). The optional GeoIP library from MaxMind Inc. is also
supported and may be used instead of the native GeoDB database.
A utility program, "The Webalizer (DNS) Cache file Manager", or 'wcmgr'
is also provided which allows the creation and manipulation of the DNS
cache files used and produced by the webalizer. See the file DNS.README
for additional information regarding DNS support.
This documentation applies to The Webalizer Version 2.23
Running the Webalizer
---------------------
The Webalizer was designed to be run from a Unix command line prompt or
as a cron job. There are several command line options which will modify
the results it produces, and configuration files can be used as well.
The format of the command line is:
webalizer [options ...] [log-file]
Where 'options' can be one or more of the supported command line
switches described below. 'log-file' is the name of the log file
to process (see below for more detailed information). If a dash
("-") is specified for the log-file name, STDIN will be used.
Once executed, the general flow of the program follows:
o A default configuration file is scanned for. A file named
'webalizer.conf' is searched for in the current directory, and if
found, its configuration data is parsed. If the file is not
present in the current directory, the file '/etc/webalizer.conf'
is searched for and, if found, is used instead.
o Any command line arguments given to the program are parsed. This
may include the specification of a configuration file, which is
processed at the time it is encountered.
o If a log file was specified, it is opened and made ready for
processing. If no log file was given, or the filename '-' is
specified on the command line, STDIN is used for input.
o If an output directory was specified, the program does a 'chdir' to
that directory in preparation for generating output. If no output
directory was given, the current directory is used.
o If a non-zero number of DNS Children processes were specified, they
will be started, and the specified log file will be processed,
either creating or updating the specified DNS cache file.
o If no hostname was given, the program attempts to get the hostname
using a uname system call. If that fails, 'localhost' is used.
o A history file is searched for. This file keeps previous month
totals used on the main index.html page. The default file is
named 'webalizer.hist', kept in the specified output directory,
however may be changed using the "HistoryName" configuration file
keyword.
o If incremental processing was specified, a data file is searched for
and loaded if found, containing the 'internal state' data of the
program at the end of a previous run. The default file is named
'webalizer.current', kept in the specified output directory, however
may be changed using the "IncrementalName" configuration file keyword.
o Main processing begins on the log file. If the log spans multiple
months, a separate HTML document is created for each month.
o After main processing, the main 'index.html' page is created, which
has totals by month and links to each months HTML document.
o A new history file is saved to disk, which includes totals generated
by The Webalizer during the current run.
o If incremental processing was specified, a data file is written that
contains the 'internal state' data at the end of this run.
Incremental Processing
----------------------
Version 1.2x of The Webalizer adds incremental run capability. Simply
put, this allows processing large log files by breaking them up into
smaller pieces, and processing these pieces instead. What this means
in real terms is that you can now rotate your log files as often as you
want, and still be able to produce monthly usage statistics without the
loss of any detail. This is accomplished by saving and restoring all
relevant internal data to a disk file between runs. Doing so allows the
program to 'start where it left off' so to speak, and allows the
preservation of detail from one run to the next.
Some special precautions need to be taken when using the incremental
run capability of The Webalizer. Configuration options should not be
changed between runs, as that could cause corruption of the internal
stored data. For example, changing the MangleAgents level will cause
different representations of user agents to be stored, producing invalid
results in the user agents section of the report. If you need to change
configuration options, do it at the end of the month after normal
processing of the previous month and before processing the current month.
You may also want to delete the 'webalizer.current' file as well (or
whatever name was specified using the "IncrementalName" configuration
option).
The Webalizer also attempts to prevent data duplication by keeping
track of the timestamp of the last record processed. This timestamp
is then compared to current records being processed, and any records
that were logged previous to that timestamp are ignored. This, in
theory, should allow you to re-process logs that have already been
processed, or process logs that contain a mix of processed/not yet
processed records, and not produce duplication of statistics. The
only time this may break is if you have duplicate timestamps in two
separate log files... any records in the second log file that do have
the same timestamp as the last record in the previous log file processed,
will be discarded as if they had already been processed. There are
lots of ways to prevent this however, for example, stopping the web
server before rotating logs will prevent this situation. This setup
also necessitates that you always process logs in chronological order,
otherwise data loss will occur as a result of the timestamp compare.
Output Produced
---------------
The Webalizer produces several reports (html) and graphics for each
month processed. In addition, a summary page is generated for the
current and previous months (up to 12), a history file is created
and if incremental mode is used, the current month's processed data.
The exact location and names of these files can be changed using
configuration files and command line options. The files produced,
(default names) are:
index.html - Main summary page (extension may be changed)
usage.png - Yearly graph displayed on the main index page
usage_YYYYMM.html - Monthly summary page (extension may be changed)
usage_YYYYMM.png - Monthly usage graph for specified month/year
daily_usage_YYYYMM.png - Daily usage graph for specified month/year
hourly_usage_YYYYMM.png - Hourly usage graph for specified month/year
site_YYYYMM.html - All sites listing (if enabled)
url_YYYYMM.html - All urls listing (if enabled)
ref_YYYYMM.html - All referrers listing (if enabled)
agent_YYYYMM.html - All user agents listing (if enabled)
search_YYYYMM.html - All search strings listing (if enabled)
webalizer.hist - Previous month history (may be changed)
webalizer.current - Incremental Data (may be changed)
site_YYYYMM.tab - tab delimited sites file
url_YYYYMM.tab - tab delimited urls file
ref_YYYYMM.tab - tab delimited referrers file
agent_YYYYMM.tab - tab delimited user agents file
user_YYYYMM.tab - tab delimited usernames file
search_YYYYMM.tab - tab delimited search string file
The yearly (index) report shows statistics for a 12 month period, and
links to each month. The monthly report has detailed statistics for
that month with additional links to any URLs and referrers found.
The various totals shown are explained below.
Hits
Any request made to the server which is logged, is considered a 'hit'.
The requests can be for anything... html pages, graphic images, audio
files, CGI scripts, etc... Each valid line in the server log is
counted as a hit. This number represents the total number of requests
that were made to the server during the specified report period.
Files
Some requests made to the server, require that the server then send
something back to the requesting client, such as a html page or graphic
image. When this happens, it is considered a 'file' and the files
total is incremented. The relationship between 'hits' and 'files' can
be thought of as 'incoming requests' and 'outgoing responses'.
Pages
Pages are, well, pages! Generally, any HTML document, or anything
that generates an HTML document, would be considered a page. This
does not include the other stuff that goes into a document, such as
graphic images, audio clips, etc... This number represents the number
of 'pages' requested only, and does not include the other 'stuff' that
is in the page. What actually constitutes a 'page' can vary from
server to server. The default action is to treat anything with the
extension '.htm', '.html' or '.cgi' as a page. A lot of sites will
probably define other extensions, such as '.phtml', '.php3' and '.pl'
as pages as well. Some people consider this number as the number of
'pure' hits... I'm not sure if I totally agree with that viewpoint.
Some other programs (and people :) refer to this as 'Pageviews'.
Sites
Each request made to the server comes from a unique 'site', which can
be referenced by a name or ultimately, an IP address. The 'sites'
number shows how many unique IP addresses made requests to the server
during the reporting time period. This DOES NOT mean the number of
unique individual users (real people) that visited, which is impossible
to determine using just logs and the HTTP protocol (however, this
number might be about as close as you will get).
Visits
Whenever a request is made to the server from a given IP address
(site), the amount of time since a previous request by the address
is calculated (if any). If the time difference is greater than a
pre-configured 'visit timeout' value (or has never made a request before),
it is considered a 'new visit', and this total is incremented (both
for the site, and the IP address). The default timeout value is 30
minutes (can be changed), so if a user visits your site at 1:00 in
the afternoon, and then returns at 3:00, two visits would be registered.
Note: in the 'Top Sites' table, the visits total should be discounted
on 'Grouped' records, and thought of as the "Minimum number of visits"
that came from that grouping instead. Note: Visits only occur on
PageType requests, that is, for any request whose URL is one of the
'page' types defined with the PageType and PagePrefix option, and not
excluded by the OmitPage option. Due to the limitation of the HTTP
protocol, log rotations and other factors, this number should not be
taken as absolutely accurate, rather, it should be considered a pretty
close "guess".
KBytes
The KBytes (kilobytes) value shows the amount of data, in KB, that
was sent out by the server during the specified reporting period. This
value is generated directly from the log file, so it is up to the
web server to produce accurate numbers in the logs (some web servers
do stupid things when it comes to reporting the number of bytes). In
general, this should be a fairly accurate representation of the amount
of outgoing traffic the server had, regardless of the web servers
reporting quirks.
Note: A kilobyte is 1024 bytes, not 1000 :)
Top Entry and Exit Pages
The Top Entry and Exit tables give a rough estimate of what URLs
are used to enter your site, and what the last pages viewed are.
Because of limitations in the HTTP protocol, log rotations, etc...
this number should be considered a good "rough guess" of the actual
numbers, however will give a good indication of the overall trend in
where users come into, and exit, your site.
Command Line Options
--------------------
The Webalizer supports many different configuration options that will
alter the way the program behaves and generates output. Most of these
can be specified on the command line, while some can only be specified
in a configuration file. The command line options are listed below,
with references to the corresponding configuration file keywords.
--------------------------------------------------------------------------
General Options
---------------
-h Display all available command line options and exit program.
-v Be Verbose. This will cause the program to print additional
information at run time. It is the same as specifying
"Quiet no", "ReallyQuiet no" and "Debug yes" config options.
-V Display the program version and exit. Additional program
specific information will be displayed if 'verbose' mode is
also used (e.g. '-vV'), which can be useful when submitting
bug reports.
-d Display additional 'debugging' information for errors and
warnings produced during processing. This normally would
not be used except to determine why you are getting all those
errors and wanted to see the actual data. Normally The
Webalizer will just tell you it found an error, not the
actual data. This option will display the data as well.
Config file keyword: Debug
-F Specify the log file type to process. Normally, the
Webalizer expects to find a valid CLF or Combined format
we server log file. This option allows you to process
wu-ftpd xferlogs, squid and W3C formatted web logs as well.
Values can be either 'clf', 'ftp', 'squid' or 'w3c' with
'clf' being the default. Only the first character needs
to be specified (eg: -Fs will process a squid log).
Config file keyword: LogType
-f Fold out of sequence log records back into analysis, by
treating them as if they were the same date/time as the
last good record. Normally, out of sequence log records
are ignored. If you run apache, don't worry about this.
Config file keyword: FoldSeqErr
-i Ignore history file. USE WITH CAUTION. This causes The
Webalizer to ignore any existing history file produced from
previous runs and generate its output from scratch. The
effect will be as if The Webalizer is being run for the
first time and any previous statistics will be lost (although
the HTML documents, if any, will not be deleted) on the main
index.html (yearly) web page.
Config file keyword: IgnoreHist
-b Ignore incremental data file. USE WITH CAUTION. This causes
The Webalizer to ignore any existing incremental (state) data
file produced by previous runs. By ignoring the incremental
data file, all previous processing for the current month will
be lost, and those logs must be re-processed.
Config file keyword: IgnoreState
-p Preserve state (incremental processing). This allows the
processing of partial logs in increments. At the end of
the program, all relevant internal data is saved, so that
it may be restored the next time the program is run. This
allows sites that must rotate their logs more than once a
month to still be able to use The Webalizer, and not worry
about having to gather and feed an entire months logs to
the program at the end of the month. See the section on
"Incremental Processing" below for additional information.
The default is to not perform incremental processing. Use
this command line option to enable the feature.
Config file keyword: Incremental
-q Quiet mode. Normally, The Webalizer will produce various
messages while it runs letting you know what its doing.
This option will suppress those messages. It should be
noted that this WILL NOT suppress errors and warnings, which
are output to STDERR.
Config file keyword: Quiet
-Q ReallyQuiet mode. This allows suppression of _all_ messages
generated by The Webalizer, including warnings and errors.
Useful when The Webalizer is run as a cron job.
Config file keyword: ReallyQuiet
-T Display timing information. The Webalizer keeps track of the
time it begins and ends processing, and normally displays the
total processing time at the end of each run. If quiet mode
(-q or 'Quiet yes' in configuration file) is specified, this
information is not displayed. This option forces the display
of timing totals if quiet mode has been specified, otherwise
it is redundant and will have no effect.
Config file keyword: TimeMe
-c file This option specifies a configuration file to use. Configuration
files allow greater control over how The Webalizer behaves, and
there are several ways to use them. As of version 0.98, The
Webalizer searches for a default configuration file in the
current directory named "webalizer.conf", and if not found,
will search in the /etc/ directory for a file of the same name.
In addition, you may specify a configuration file to use with
this command line option.
-n name This option specifies the hostname for the reports generated.
The hostname is used in the title of all reports, and is also
prepended to URLs in the reports. This allows The Webalizer
to be run on log files for 'virtual' web servers or web servers
that are different than the machine the reports are located on,
and still allows clicking on the URLs to go to the proper
location. If a hostname is not specified, either on the
command line or in a configuration file, The Webalizer attempts
to determine the hostname using a 'uname' system call. If this
fails, "localhost" will be used as the hostname.
Config file keyword: HostName
-o dir This options specifies the output directory for the reports.
If not specified here or in a configuration file, the current
default directory will be used for output.
Config file keyword: OutputDir
-x name This option allows the generated pages to have an extension
other than '.html', which is the default. Do not include the
leading period ('.') when you specify the extension.
Config file keyword: HTMLExtension
-P name Specify the file extensions for 'pages'. Pages (sometimes
called 'PageViews') are normally html documents and CGI
scripts that display the whole page, not just parts of it.
Some system will need to define a few more, such as 'phtml',
'php3' or 'pl' in order to have them counted as well. The
default is 'htm*' and 'cgi' for web logs and 'txt' for ftp.
Config file keyword: PageType
-O name Specify URLs which are not counted as 'pages'. Requests
matching one of these URLs will not be counted as a page, even
if they have an extension matching one of the PageTypes defined
above or have no extension at all.
Config file keyword: OmitPage
-t name This option specifies the title string for all reports. This
string is used, in conjunction with the hostname (if not blank)
to produce the actual title. If not specified, the default of
"Usage Statistics for" will be used.
Config file keyword: ReportTitle
-Y Suppress Country graph. Normally, The Webalizer produces
country statistics in both Graph and Columnar forms. This
option will suppress the Country Graph from being generated.
Config file keyword: CountryGraph
-G Suppress hourly graph. Normally, The Webalizer produces
hourly statistics in both Graph and Columnar forms. This
option will suppress the Hourly Graph only from being generated.
Config file keyword: HourlyGraph
-H Suppress Hourly statistics. Normally, The Webalizer produces
hourly statistics in both Graph and Columnar forms. This
option will suppress the Hourly Statistics table only from
being generated.
Config file keyword: HourlyStats
-K num Specify how many months should be displayed in the main index
(yearly summary) table. Default is 12 months. Can be set to
anything between 12 and 120 months (1 to 10 years).
Config file keyword: IndexMonths
-k num Specify how many months should be displayed in the main index
(yearly summary) graph. Default is 12 months. Can be set to
anything between 12 and 72 months (1 to 6 years).
Config file keyword: GraphMonths
-L Disable Graph Legends. The color coded legends displayed on
the in-line graphs can be disabled with this option. The
default is to display the legends.
Config file keyword: GraphLegend
-l num Graph Lines. Specify the number of background reference
lines displayed on the in-line graphics produced. The default
is 2 lines, however can range anywhere from zero ('0') for
no lines, up to 20 lines (looks funny!).
Config file keyword: GraphLines
-P name Page type. This is the extension of files you consider to
be pages for Pages calculations (sometimes called 'pageviews').
The default is 'htm*' and 'cgi' (plus whatever HTMLExtension
you specified if it is different). Don't use a period!
-m num Specify a 'visit timeout'. Visits are calculated by looking at
the time difference between the current and last request made
by a specific host. If the difference is greater that the
visit timeout value, the request is considered a new visit.
This value is specified in number of seconds. The default
is 30 minutes (1800).
Config file keyword: VisitTimeout
-M num Mangle user agent names. Normally, The Webalizer will keep
track of the user agent field verbatim. Unfortunately, there are
a ton of different names that user agents go by, and the field
also reports other items such as machine type and OS used. For
Example, Netscape 4.03 running on Windows 95 will report a
different string than Netscape 4.03 running on Windows NT, so even
though they are the same browser type, they will be considered
as two totally different browsers by The Webalizer. For that
matter, Netscape 4.0 running on Windows NT will report different
names if one is run on an Alpha and the other on an Intel
processor! Internet Exploder is even worse, as it reports itself
as if it were Netscape and you have to search the given string a
little deeper to discover that it is really MSIE! In order to
consolidate generic browser types, this option will cause The
Webalizer to 'mangle' the user agent field, attempting to
consolidate generic browser types. There are 6 levels that can be
specified, each producing different levels of detail. Level 5
displays only the browser name (MSIE or Mozilla) and the major
version number. Level 4 will also display the minor version
number (single decimal place). Level 3 will display the minor
version number to two decimal places. Level 2 will add any
sub-level designation (such as Mozilla/3.01Gold or MSIE 3.0b).
Level 1 will also attempt to add the system type. The default
Level 0 will disable name mangling and leave the user agent
field unmodified, producing the greatest amount of detail.
Configuration file keyword: MangleAgents
-g num This option allows you to specify the level of domains name
grouping to be performed. The numeric value represents the
level of grouping, and can be thought of as the 'number of
dots' to be displayed. The default value of 0 disables any
domain name grouping.
Configuration file keyword: GroupDomains
-D name This allows the specification of a DNS Cache file name. This
filename MUST be specified if you have dns lookups enabled
(using the -N command line switch or DNSChildren configuration
keyword). The filename is relative to the default output
directory if an absolute path is not specified (ie: starts
with a leading '/'). This option is only available if DNS
support was enabled at compile time, otherwise an 'Invalid
Keyword' error will be generated. See the DNS.README file
for additional information regarding DNS lookups.
Configuration file keyword: DNSCache
-N num Number of DNS child processes to use for reverse DNS lookups.
If specified, a DNSCache name MUST be specified also. If you
do not wish a DNS cache file to be generated, specify a value
of zero ('0') to disable it. This does not prevent using an
existing cache file, only the generation of one at run time.
See the DNS.README file for additional information.
Configuration file keyword: DNSChildren
-j Enable native GeoDB geolocation services.
Configuration file keyword: GeoDB
-J name Specify an alternate GeoDB database filename to use. This
shouldn't normally be needed. If used, the filename 'name'
is relative to the output directory being used unless an
absolute path is specified (ie: starts with a leading '/').
Configuration file keyword: GeoDBDatabase
-w Enable GeoIP support if it is available.
Configuration file keyword: GeoIP
-W name Specify an alternate GeoIP database filename to use. This
shouldn't normally be needed. If used, the filename 'name'
is relative to the specified output directory unless an
absolute name is given (ie: starts with a leading '/').
Configuration file keyword: GeoIPDatabase
-z name Specify location of the country flag graphics and enable
their display in the top country table. The directory name
is relative to the output directory unless an absolute path
is specified (ie: starts with a leading '/').
Configuration file keyword: FlagDir
Hide Options
------------
The following options take a string argument to use as a comparison
for matching. Except for the IndexAlias option, the string argument
can be plain text, or plain text that either starts or ends with the
wildcard character '*'.
For Example:
Given the string "yourmama/was/here", the arguments "was", "*here" and
"your*" will all produce a match.
-a name This option allows hiding of user agents (browsers) from the
"Top User Agents" table in the report. This option really
isn't too useful as there are a zillion different names that
current browsers go by, depending where they were obtained,
however you might have some particular user agents that hit
your site a lot that you would like to exclude from the list.
You must have a web server that includes user agents in its
log files for this option to be of any use. In addition, it
is also useless if you disable the user agent table in the
report (see the -A command line option or "TopAgents"
configuration file keyword). You can specify as many of these
as you want on the command line. The wildcard character '*'
can be used either in front of or at the end of the string.
(ie: Mozilla/4.0* would match anything that starts with the
string "Mozilla/4.0").
Config file keyword: HideAgent
-r name This option allows hiding of referrers from the "Top Referrer"
table in the report. Referrers are URLs, either on your own
local site or a remote site, that referred the user to a URL
on your web server. This option is normally used to hide
your own server from the table, as your own pages are usually
the top referrers to your own pages (well, you get the idea).
You must have a web server that includes referrer information
in the log files for this option to be of any use. In addition,
it is also useless if you disable the referrers table in the
report (see the -R command line option or "TopReferrers"
configuration file keyword). You can specify as many of these
as you like on the command line.
Config file keyword: HideReferrer
-s name This option allows hiding of sites from the "Top Sites" table
in the report. Normally, you will only want to hide your own
domain name from the report, as it usually is one of the top
sites to visit your web server. This option is of no use if
you disable the top sites table in the report (see the -S
command line option or "TopSites" configuration file option).
Config file keyword: HideSite
-X This causes all individual sites to be hidden, which results
in only grouped sites to be displayed on the report.
Config file keyword: HideAllSites
-u name This option allows hiding of URLs from the "Top URLs" table
in the report. Normally, this option is used to hide images,
audio files and other objects your web server dishes out that
would otherwise clutter up the table. This option is of no
use if you disable the top URLs table in the report (see the
-U command line option or "TopURLs" configuration file keyword).
Config file keyword: HideURL
-I name This option allows you to specify additional index.html aliases.
The Webalizer usually strips the string 'index.*' from URLs
before processing (unless disabled using the 'DefaultIndex'
config option), which has the effect of turning a URL such
as /somedir/index.html into just /somedir/ which is really the
same URL and should be treated as such. This option allows you
to specify _additional_ strings that are to be treated the same
way. Use with care, improper use could cause unexpected results.
For example, if you specify the alias string of 'home', a URL
such as /somedir/homepages/brad/home.html would be converted
into just /somedir/ which probably isn't what was intended.
This option is useful if your web server uses a different default
index page other than the standard 'index.html' or 'index.htm',
such as 'home.html' or 'homepage.html'. The string specified
is searched for _anywhere_ in the URL, so "home.htm" would
turn both "/somedir/home.htm" and "/somedir/home.html" into
just "/somedir/". Wildcards are _not_ allowed on this one.
Config file keyword: IndexAlias
Table Size Options
------------------
-e num This option specifies the number of entries to display in the
"Top Entry Pages" table. To disable the table, use a value of
zero (0).
Config file keyword: TopEntry
-E num This option specifies the number of entries to display in the
"Top Exit Pages" table. To disable the table, use a value of
zero (0).
Config file keyword: TopExit
-A num This option specifies the number of entries to display in the
"Top User Agents" table. To disable the table, use a value of
zero (0).
Config file keyword: TopAgents
-C num This option specifies the number of entries to display in the
"Top Countries" table. To disable the table, use a value of
zero (0).
Config file keyword: TopCountries
-R num This option specifies the number of entries to display in the
"Top Referrers" table. To disable the table, use a value of
zero (0).
Config file keyword: TopReferrers
-S num This option specifies the number of entries to display in the
"Top Sites" table. To disable the table, use a value of
zero (0).
Config file keyword: TopSites
-U num This option specifies the number of entries to display in the
"Top URLs" table. To disable the table, use a value of
zero (0).
Config file keyword: TopURLs
--------------------------------------------------------------------------
CONFIGURATION FILES
-------------------
The Webalizer allows configuration files to be used in order to simplify
life for all. There are several ways that configuration files are accessed
by the Webalizer. When The Webalizer first executes, it looks for a
default configuration file named "webalizer.conf" in the current directory,
and if not found there, will look for "/etc/webalizer.conf". In addition,
configuration files may be specified on the command line with the '-c'
option. There are lots of different ways you can combine the use of
configuration files and command line options to produce various results.
The Webalizer always looks for and reads configuration options from a
default configuration file before doing anything else. Because of this,
you can override options found in the default file by use of additional
configuration files specified on the command line or command line options
themselves. If you specify a configuration file on the command line, you
can override options in it by additional command line options which follow.
For example, most users will most likely want to create the default file
/etc/webalizer.conf and place options in it to specify the hostname, log
file, table options, etc... At the end of the month when a different log
file is to be used (the end of month log), you can run The Webalizer as
usual, but put the different filename on the end of the command line, which
will override the log file specified in the configuration file. It should
be noted that you cannot override some configuration file options by the
use of command line arguments. For example, if you specify "Quiet yes" in
a configuration file, you cannot override this with a command line argument,
as the command line option only _enables_ the feature (-q option).
The configuration files are standard ASCII text files that may be created
or edited using any standard editor. Blank lines and lines that begin
with a pound sign ('#') are ignored. Any other lines are considered to
be configuration lines, and have the form "Keyword Value", where the
'Keyword' is one of the currently available configuration keywords defined
below, and 'Value' is the value to assign to that particular option. Any
text found after the keyword up to the end of the line is considered the
keyword's value, so you should not include anything after the actual value
on the line that is not actually part of the value being assigned. The
file "sample.conf" provided with the distribution contains lots of useful
documentation and examples as well. It should be noted that you do not
have to use any configuration files at all, in which case, default values
will be used (which should be sufficient for most sites).
--------------------------------------------------------------------------
General Configuration Keywords
------------------------------
LogFile This defines the log file to use. It should be a fully qualified
name (ie: contain the path), but relative names will work as
well. If not specified, the logfile defaults to STDIN.
LogType This specified the log file type being used. Normally, The
Webalizer processes web logs in either CLF or Combined format.
You may also process wu-ftpd xferlog formatted logs, squid
proxy logs or W3C formatted web logs by setting the appropriate
type using this keyword. Values may be either 'clf', 'ftp',
'squid' or 'w3c'. Ensure that you specify the proper file type,
otherwise you will be presented with a long stream of 'invalid
record' messages when the Webalizer is run ;)
Command line argument: -F
OutputDir This defines the output directory to use for the reports. If
it is not specified, the current directory is used.
Command line argument: -o
HistoryName Allows specification of a history path/filename if desired.
The default is to use the file named 'webalizer.hist', kept
in the normal output directory (OutputDir above). Any name
specified is relative to the normal output directory unless
an absolute path name is given (ie: starts with a '/').
ReportTitle This specifies the title to use for the generated reports.
It is used in conjunction with the hostname (unless blank)
to produce the final report titles. If not defined, the
default of "Usage Statistics for" is used.
Command line argument: -t
HostName This defines the hostname. The hostname is used in the
report title as well as being prepended to URLs in the
"Top URLs" table. This allows The Webalizer to be run
on "virtual" web servers, or servers that do not reside
on the local machine, and allows clicking on the URL to
go to the right place. If not specified, The Webalizer
attempts to get the hostname via a 'uname' system call,
and if that fails, will default to "localhost".
Command line argument: -n
UseHTTPS Causes the links in the 'Top URLs' table to use 'https://'
instead of the default 'http://' prefix. Not much use if
you run a mix of secure/insecure servers on your machine.
Only useful if you run the analysis on a secure servers
logs, and want the links in the table to work properly.
HTAccess Enables the creation of a default .htaccess file in the
output directory. If enabled, the file will be created
(with a single "DirectoryIndex" directive), unless one
already exists. The default is 'no', which disables the
creation of any .htaccess files.
Quiet This allows you to enable or disable informational messages
while it is running. The values for this keyword can be
either 'yes' or 'no'. Using "Quiet yes" will suppress these
messages, while "Quiet no" will enable them. The default
is 'no' if not specified, which will allow The Webalizer
to display informational messages. It should be noted that
this option has no effect on Warning or Error messages that
may be generated, as they go to STDERR.
Command line argument: -q
ReallyQuiet This allows all generated output to be suppressed, including
warning and error messages. The values for this keyword
can be either 'yes' or 'no', with 'no' being the default.
Command line argument: -Q
TimeMe This allows you to display timing information regardless of
any "quiet mode" specified. Useful only if you did in fact
tell the webalizer to be quiet either by using the -q command
line option or the "Quiet" keyword, otherwise timing stats
are normally displayed anyway. Values may be either 'yes'
or 'no', with the default being 'no'.
Command line argument: -T
GMTTime This keyword allows timestamps to be displayed in GMT (UTC)
time instead of local time. Normally The Webalizer will
display timestamps in the time-zone of the local machine
(ie: PST or EDT). This keyword allows you to specify the
display of timestamps in GMT (UTC) time instead. Values
may be either 'yes' or 'no'. Default is 'no'.
Debug This tells The Webalizer to display additional information
when it encounters Warnings or Errors. Normally, The
Webalizer will just tell you it found a bad record or
field. This option will enable the display of the actual
data that produced the Warning or Error as well. Useful
only if you start getting lots of Warnings or Errors and
want to determine the cause. Values may be either 'yes'
or 'no', with the default being 'no'.
Command line argument: -d
IgnoreHist This suppresses the reading of a history file. USE WITH
EXTREME CAUTION as the history file is how The Webalizer
keeps track of previous months. The effect of this option
is as if The Webalizer was being run for the very first
time, and any previous data is discarded. Values may be
either 'yes' or 'no', with the default being 'no'.
Command line argument: -i
IgnoreState This suppresses the reading of an existing incremental
data file. USE WITH EXTREME CAUTION! By ignoring an
existing incremental data file, all previous processing
for the current month will be lost, and those logs must
be re-processed. Values may be 'yes' or 'no', with the
default being 'no'.
Command line argument: -b
FoldSeqErr Allows log records that are out of sequence to be folded
back into the analysis, by treating them as if they had
the same date/time as the last good record. Normally,
out of sequence log records are simply ignored. If you
run apache, don't worry about this.
VisitTimeout Set the 'visit timeout' value. Visits are determined by
looking at the time difference between the current and last
request made by a specific site. If the difference in time
is greater than the visit timeout value, the request is
considered a new visit. The value is in number of seconds,
and defaults to 30 minutes (1800).
Command line argument: -m
PageType Allows you to define the 'page' type extension. Normally,
people consider HTML and CGI scripts as 'pages'. This
option allows you to specify what extensions you consider
a page. Default is 'htm*' and 'cgi' for web logs, and
'txt' for ftp logs.
Command line argument: -P
PagePrefix Allows all requests with a specified prefix to be considered
as 'pages'. If you want everything under /documents to be
treated as pages no matter what their extension is. Also
useful if you have cgi-scripts with PATH_INFO.
OmitPage Allows specified URLs to not be counted as pages under any
circumstance, even if they have an extension matching a
PageType or PagePrefix as defined above.
GraphLegend Enable/disable the display of color coded legends on the
produced graphs. Default is 'yes', to display them.
Command line argument: -L
GraphLines Specify the number of background reference lines to display
on produced graphs. The default is 2. To disable the use
of background lines, use zero ('0').
Command line argument: -l
IndexMonths Specify the number of months to display in the main index
(yearly summary) table. Default is 12 months. Can be set
to anything between 12 and 120 months (1 to 10 years).
Command line argument: -K
YearHeaders Enable/disable the display of year headers in the main index
(yearly summary) table. If enabled, year headers will be
shown when the table is displaying more than 16 months worth
of data. Values can be 'yes' or 'no'. Default is 'yes'.
GraphMonths Specify the number of months to display in the main index
(yearly summary) graph. Default is 12 months. Can be set
to anything between 12 and 72 months (1 to 6 years).
Command line argument: -k
CountryGraph This keyword is used to either enable or disable the creation
and display of the Country Usage graph. Values may be either
'yes' or 'no', with the default being 'yes'.
Command line argument: -Y
CountryFlags Enables or disables the display of flags in the top country
table. If enabled, the default directory 'flags' directly
under the output directory will be used unless a different
path is specified with the 'FlagDir' option below.
Command line argument: -zflags
FlagDir Specifies the location of flag graphics. If not specified,
the default is in the 'flags' directory directly under the
output directory being used for the reports. If specified,
the display of flags will be enabled by default.
Command line argument: -z
DailyGraph This keyword is used to either enable or disable the creation
and display of the Daily Usage graph. Values may be either
'yes' or 'no', with the default being 'yes'.
DailyStats This keyword is used to either enable or disable the creation
and display of the Daily Usage statistics table. Values may
be either 'yes' or 'no', with the default being 'yes'.
HourlyGraph This keyword is used to either enable or disable the creation
and display of the Hourly Usage graph. Values may be either
'yes' or 'no', with the default being 'yes'.
Command line argument: -G
HourlyStats This keyword is used to either enable or disable the creation
and display of the Hourly Usage statistics table. Values may
be either 'yes' or 'no', with the default being 'yes'.
Command line argument: -H
IndexAlias This allows additional 'index.html' aliases to be defined.
Normally, The Webalizer scans for and strips the string
"index." from URLs before processing them (unless disabled
using the DefaultIndex config option below). This turns a
URL such as /somedir/index.html into just /somedir/ which
is really the same URL. This keyword allows _additional_
names to be treated in the same fashion for sites that use
different default names, such as "home.html". The string
is scanned for anywhere in the URL, so care should be used
if and when you define additional aliases. For example,
if you were to use an alias such as 'home', the URL
/somedir/homepages/brad/home.html would be turned into just
/somedir/ which probably isn't the intended result. Instead,
you should have specified 'home.htm' which would correctly
turn the URL into /somedir/homepages/brad/ like intended.
It should also be noted that specified aliases are scanned
for in EVERY log record... A bunch of aliases will noticeably
degrade performance as each record has to be scanned for
every alias defined. You don't have to specify 'index.' as
it is always the default (unless disabled with the config
option "DefaultIndex" described below).
Command line argument: -I
DefaultIndex This option is used to enable/disable the use of "index." as
a default index name to be stripped from the end of a URL.
Most sites should not need to use this option, however some
may find it useful, particularly those whose default index
file name is something different, or those sites that use
'index.php' or similar URLs to generate dynamic content.
This option does not effect any of the names that may be
defined using the IndexAlias option, and those names will
still function as described. Values may be 'yes' or 'no',
with 'yes' being the default.
MangleAgents The MangleAgents keyword specifies the level of user agent
name mangling, if any. There are 6 levels that may be specified,
each producing a different level of detail displayed. Level 5
displays only the browser name (MSIE or Mozilla) and the major
version number. Level 4 adds the minor version (single
decimal place). Level 3 adds the minor version to two decimal
places. Level 2 will also add any sub-level designation
(such as Mozilla/3.01Gold or MSIE 3.0b). Level 1 will also
attempt to add the system type. The default level 0 will
leave the user agent field unmodified and produces the
greatest amount of detail.
Command line argument: -M
SearchEngine This keyword allows specification of search engines and
their query strings. Search strings are obtained from
the referrer field in the record, and in order to work
properly, the Webalizer needs to know what query strings
different search engines use. The SearchEngine allows
you to specify the search engine and its query string
to parse the search string from. The line is formatted
as: "SearchEngine engine-string query-string" where
'engine-string' is a substring for matching the search
engine with, such as "yahoo.com" or "altavista". The
'query-string' is the unique query string that is added
to the URL for the search engine, such as "search=" or
"MT=" with the actual search strings appended to the
end. There is no command line option for this keyword.
SearchCaseI The SearchCaseI option specifies if search strings should
be lowercased (case insensitive) or not. Since most
search engines use case insensitive searches (ie: a
search for "Hello" is the same as "HELLO" or "hello"),
converting to lowercase will improve keyword accuracy,
which is the default. If desired, case sensitivity can
be forced with this option. The value can be 'yes' or
'no', with 'yes' (case insensitive) being the default.
Incremental This allows incremental processing to be enabled or disabled.
Incremental processing allows processing partial logs without
the loss of detail data from previous runs in the same month.
This feature saves the 'internal state' of the program so that
it may be restored in following runs. See the section above
titled "Incremental Processing" for additional information.
The value may be 'yes' or 'no', with the default being 'no'.