Skip to content

Logging

timrdf edited this page Jan 9, 2013 · 32 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

conversion cockpit's doc/logs/*.txt

A log file is created every time the conversion trigger is pulled. It is placed in the conversion cockpit's doc/log/ directory.

$CSV2RDF4LOD_HOME/bin/convert.sh and $CSV2RDF4LOD_HOME/bin/convert-aggregate.sh always log messages to:

CSV2RDF4LOD_LOG="doc/logs/csv2rdf4lod_log_e${eID}_`date +%Y-%m-%dT%H_%M_%S`.txt"

The number of logs in this directory is asserted as conversion:num_invocation_logs in the aggregated data dump Turtle file publish/<dataset-id>-<version-id>.ttl

Accessing all logs of latest conversion

cr-latest-logs.sh can be run from the conversion cockpit to list all files produced after the last time the conversion trigger was pulled. This can be used to quickly look for certain types of errors. For example, the following commands looks for java Exceptions. The first two commands are shown to indicate where in the conversion root cr-latest-logs.sh is run.

bash-3.2$ cr-pwd.sh 
source/hub-healthdata-gov/hospital-compare/version/2012-Jul-17

bash-3.2$ cr-pwd-type.sh 
cr:conversion-cockpit

bash-3.2$ grep -B1 Exception `cr-latest-logs.sh`
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_44.txt-manual/HQI_STATE_HCAHPS_MSR.csv.global.e1.params.ttl
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_44.txt:org.openrdf.rio.RDFParseException: Namespace prefix 'agg' used but not defined [line 150]
--
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_44.txt-manual/HQI_STATE_HCAHPS_MSR.csv.global.e1.params.ttl
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_44.txt:org.openrdf.rio.RDFParseException: Namespace prefix 'agg' used but not defined [line 150]
--
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_49.txt-manual/HQI_US_NATIONAL_HCAHPS_MSR.csv.global.e1.params.ttl
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_49.txt:org.openrdf.rio.RDFParseException: Namespace prefix 'agg' used but not defined [line 135]
--
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_49.txt-manual/HQI_US_NATIONAL_HCAHPS_MSR.csv.global.e1.params.ttl
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_49.txt:org.openrdf.rio.RDFParseException: Namespace prefix 'agg' used but not defined [line 135]

Trimming doc/logs/*.txt

Although having the number of logs around is useful, they can get big. We can trim them down so they take up less space, but are still around to indicate the amount of effort put into enhancing it.

When we are at the [data root](csv2rdf4lod automation data root):

$ cr-pwd.sh 
source/

We can run $CSV2RDF4LOD_HOME/bin/util/cr-trim-logs.sh and skim through the sizes of the logs, and the size it will become if we trim it:

$ cr-trim-logs.sh
...
...
========== source/data-gov/1554/version/2011-Jan-12 ========================================

319M doc/logs total
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_37_26.txt   24 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_39_16.txt   24 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_42_29.txt   28 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_42_44.txt   328 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_44_41.txt   24 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_44_48.txt   328 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_59_12.txt   19344 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_00_30.txt   19344 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_12_31.txt   19344 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_14_59.txt   28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_17_32.txt   28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_19_19.txt   28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_21_26.txt   28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_24_47.txt   28 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_25_10.txt   28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_34_23.txt   28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_35_41.txt   19904 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_41_59.txt   19908 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_44_37.txt   4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_51_12.txt   19908 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_52_55.txt   19916 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_08_18.txt   4304 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_08_38.txt   19916 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_11_07.txt   40 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_12_19.txt   40 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_14_22.txt   40 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_56_35.txt   44 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-04-12T08_47_21.txt   44 -> 8
doc/logs/csv2rdf4lod_log_raw_2011-03-21T14_20_41.txt   40 -> 12

Note: did not trim logs. Use cr-trim-logs.sh -w to modify doc/logs/*.txt
...
...

We can see the sizes of the logs, in case we want to verify that they ARE taking up a lot of space:

$ cr-trim-logs.sh | grep total
604K doc/logs total
116K doc/logs total
319M doc/logs total
216K doc/logs total
136K doc/logs total
16K doc/logs total
12K doc/logs total
24K doc/logs total
24K doc/logs total

When you're ready to trim the files (and save space), use -w to write:

$ cr-trim-logs.sh -w | grep total
604K doc/logs total
116K doc/logs total
319M doc/logs total
216K doc/logs total
136K doc/logs total
16K doc/logs total
12K doc/logs total
24K doc/logs total
24K doc/logs total

Then you can see the new smaller sizes:

$ cr-trim-logs.sh | grep total
552K doc/logs total
96K doc/logs total
160M doc/logs total # This is still huge b/c it was committed to svn before trimming. Bad!
136K doc/logs total
136K doc/logs total
16K doc/logs total
12K doc/logs total
12K doc/logs total
12K doc/logs total

Loading and Deleting a graph.

These are in tmp b/c Virtuoso needs permission to write.

$CSV2RDF4LOD_HOME/bin/util/virtuoso/vload stores logs to $CSV2RDF4LOD_HOME/tmp/vload/input-files/*.log with the latest at $CSV2RDF4LOD_HOME/tmp/vload/input-files/latest.log. More properly configured installs will log to $conversion_root/$CSV2RDF4LOD_PUBLISH_OUR_SOURCE_ID/$me/version/$versionID/doc/logs, e.g. /srv/logd/data/source/twc-rpi-edu/cr-vload/version/17f34aca66e186e543d3f1a649fdb0fe/doc/logs/.

$CSV2RDF4LOD_HOME/bin/util/virtuoso/vdelete stores logs to $CSV2RDF4LOD_HOME/tmp/vdelete/*.log with the latest at $CSV2RDF4LOD_HOME/tmp/vdelete/latest.log.

populate-endpoint.sh

populate-endpoint.sh needs to be generalized beyond LOGD. It loads metadata from all conversions into a named graph and caches query results to static files to reduce endpoint load when supporting a web site.

${CSV2RDF4LOD_HOME}/log/populate-endpoint.sh/*.log

Turning on the converter's logging

In debugging situations, I might have you turn this on. It should rarely be needed.

The Java implementation uses java.util.logging to log.

Turning logging on is parameterized by the CSV2RDF4LOD_CONVERT_DEBUG_LEVEL environment variable and takes affect within $CSV2RDF4LOD_HOME/bin/convert.sh:

javaprops="-Djava.util.logging.config.file=$CSV2RDF4LOD_HOME/bin/logging/finest.properties"
#javaprops=""

So,

$ export CSV2RDF4LOD_CONVERT_DEBUG_LEVEL=finer

other valid values include fine, finer, and finest.

CSV2RDF4LOD_HOME/bin/logging/ contains fine, finer, and finest.properties.

(If you REALLY wanna get your hands dirty, add your.properties in CSV2RDF4LOD_HOME/bin/logging/ and set your DEBUG_LEVEL to your.)

Clone this wiki locally