Skip to content

UTILITY: Transformer Tool

Bogusz Janiak edited this page Jun 9, 2023 · 2 revisions

The GloSIS repository consists of scripts that can help with a contribution from the domain experts and soil scientists that are not familiar with the RDF language. The transformer tool was developed to help with the following:

  • Enabling contributions from entities that are not familiar with RDF language;
  • Reproducibility;
  • Maintainability - comparing changes while maintaining the ordering in the modules.

The tool is capable of performing transformations bidirectional - from an RDF document into a CSV file and from the CSV file into an RDF document (specifically a turtle file).

The former requires referencing the RDF file that will be exported to the CSV. The latter requires a specific SPARQL query that will allow translation from tabular data into RDF. The tool currently supports the transformation of two essential modules: code lists and procedures. Those two are most likely to be the subject of domain experts' contributions as they both consist of enumerated lists that provide concept details. The transformer tool is a Python script that can be executed from the command line. It re-uses the following libraries:

RDF -> CSV

The RDF into the CSV transformer can automatically recognize two supported modules: code-list and procedures. It uses the rdflib to load the module (TURTLE file) into a graph. First, it iterates through it to capture all Classes/Procedures and their corresponding instances with the help of regular expressions. Then it collects details from associated triples from each of them. Finally, all acquired pieces of information are arranged into the table and saved as a CSV file using Pandas. The CSV file has a fixed number of columns that are sufficient and compatible with the backward transformation.

python transform_to_csv.py [path to rdf file]

CSV -> RDF

CSV into RDF transformer tool starts with generating initial RDF representation from the CSV file using the pytarql against provided SPARQL query. The transformer is equipped with two pre-prepared SPARQL files, one for each of the two modules. Unlike the previous transformation, this one requires some amount of post-processing. First of all, the owl:oneOf predicate that connects a Class or Procedure to the list of instances should point to the Collection. Building a Collection directly through pytarql did not seem feasible. Therefore some post-processing is required. The rdflib library has a convenient way of introducing Collection to the graph. The first post-processing step utilizes the aforementioned functionality. The second one uses a template to append the module header to its content. It will adjust the header's owl:versionInfo and owl:versionIRI to the value provided through the tool initialization command.

Finally, the post-processing will end with ordering classes to maintain the order inside the Turtle. The ordering is fixed in the following manner:

  1. owl:Ontology (header)
  2. skos:ConceptScheme
  3. owl:Class
  4. skos:Concept

python transform_to_rdf.py [path to input csv] [path to SPARQL query file] [output filename] [version]

Clone this wiki locally