Skip to content

Installing csv2rdf4lod automation complete

Tim L edited this page Sep 4, 2013 · 70 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

What's first?

This page extends the minimal installation introduced in Installing csv2rdf4lod automation. See that first.

The Prizms installer accounts for everything on this page, so use that.

For announcements that affect existing installations, see announcements regarding distribution.

Included Dependencies

The following dependencies are included in the distribution:

lod-materialize has perl and C implementations; you want the C version:

cd csv2rdf4lod-automation/bin/lod-materialize/c/
make

You don't need to do anything to set up Saxon and Sesame; they were added to your classpath by install.sh.

Dependencies Not Included

issue 189 proposes to have a script to run after installation to show what dependencies are met, or still need to be met.

The following dependencies are NOT included:

Not having the following dependencies available will cause varying degrees of failure for csv2rdf4lod-automation:

  • SVN
  • git
    • Mac: .dmg
    • Ubuntu: sudo apt-get install git-core
  • Java 6 JVM - needed to invoke the converter (so, relatively essential).
  • curl - used to fetch URLs. (Used in at least pcurl.sh and dg-create-dataset-dir.sh)
    • sudo apt-get install curl
    • If curl can't handle https, try installing libssl-dev and re-building curl; building curl with ssl support needs header files such as ldap_ssl.h which requires the installation of the developer version of libssl. (Thanks to Linyun Fu for this note)
  • serdi has replaced some of csv2rdf4lod-automation's use of rapper, since rapper cannot handle large turtle files and serdi is a jet engine of reserialization.
    • Grab the tarball
    • Don't believe their claim that "This software requires only Python to build." (it requires gcc)
    • Follow the standard dance per INSTALL: ./waf configure, then sudo ./waf, then sudo ./waf install
    • It'll show up at /usr/local/bin/serdi
  • rapper - a must-have for any semantic web practitioner; changes among different RDF serializations.
  • tidy - used when scraping HTML. (Used to parse data.gov info pages in dg-create-dataset-dir.sh)
  • RDF::Trine - used by lod-materialize when publishing lod materializations; see How csv2rdf4lod uses RDF::Trine.
  • URI::Escape - used by pcurl.sh, pvload.sh and cache-queries.sh.
    • perl -MCPAN -e shell
    • perl -MCPAN -e install YAML
    • install URI::Escape
    • install Data::Dumper
    • install HTTP:Config
    • install LWP::UserAgent
      • sudo apt-cache search perl LWP::UserAgent
      • sudo apt-get install liblwp-useragent-determined-perl
      • OR cpan -f -i LWP::UserAgent
    • install IO::Socket::SSL
    • install Text::CSV_XS and install Text::CSV
  • SuRF and rdflib 3.x - used by pcurl.py and friends (Jim McCusker's contributions).
  • tdbloader / tdbquery

Rolling back with git

If the latest version of csv2rdf4lod-automation breaks something that you need before I can fix it, git can be used to "roll" you back to a previous version csv2rdf4lod-automation.

For example, if I wanted to roll back to the last commit of 2011-07-22,

I'd go to https://github.com/timrdf/csv2rdf4lod-automation/commits/

click https://github.com/timrdf/csv2rdf4lod-automation/commit/27612cbe9ea45058beba260d899dba781a677a52

copy the URL part: 27612cbe9ea45058beba260d899dba781a677a52

do:

git checkout 27612cbe9ea45058beba260d899dba781a677a52

to roll back to 22 July.

Then, when I want to return to the latest, I:

git checkout master

Misc - Alternative CSV Parsers

Clone this wiki locally