Skip to content

conversion:delimits_object

Timothy Lebo edited this page Feb 14, 2012 · 27 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

See conversion:Enhancement.

If a cell value contains multiple values, they can be separated into distinct triples.

Example 1

It would be much nicer if "Municipality, County, State, Regional, Watershed" was split up into triples for each item of this list.

data:

http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/enviro-reports-and-indicators/version/2011-Jan-12/ (not resolvable b/c this dataset is not published).

@prefix raw: 
   <http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/vocab/raw/> .
@prefix e1:  
   <http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/vocab/enhancement/1/> .
     
environmental-reports-enviro-reports-and-indicators:report_206 
  dcterms:identifier "report_206" ;
  dcterms:isReferencedBy 
  <http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/version/2011-Jan-12> ;
  a environmental-reports_vocab:Report , foaf:Document ;
  raw:environmental_indicator_scale "Municipality, County, State, Regional, Watershed" ;

becomes

environmental-reports-enviro-reports-and-indicators:report_206 
   dcterms:identifier "report_206" ;
   dcterms:isReferencedBy 
   <http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/version/2011-Jan-12> ;
   a environmental-reports_vocab:Report , foaf:Document ;
   e1:environmental_indicator_scale "Municipality" , "County" , "State" , "Regional" , "Watershed";

using enhancement parameters:

    :dataset a void:Dataset;
    conversion:base_uri           "http://logd.tw.rpi.edu"^^xsd:anyURI;
    conversion:source_identifier  "epa-gov-mcmahon-ethan";
    conversion:dataset_identifier "environmental-reports";
    conversion:dataset_version    "2011-Jan-12";
    conversion:conversion_process [
       a conversion:RawConversionProcess;
       conversion:enhancement_identifier "1";
       #conversion:subject_discriminator  "enviro-reports-and-indicators";
       conversion:interpret [
          conversion:symbol "";
          conversion:interpretation conversion:null;
       ];
       conversion:enhance [
          ov:csvRow 2;
          a conversion:HeaderRow;
       ];
       conversion:enhance [
          ov:csvCol         19;
          ov:csvHeader     "Environmental Indicator Scale";
          conversion:label "Environmental Indicator Scale";
          conversion:comment "";
          conversion:delimits_object ",\\s*";
          conversion:range  rdfs:Literal;
       ];

Example 2 - promoting delimited values to Resources

Input data:

"","normal","","http://a.twimg.stocktwits.com/p/233837225/1302189124/thumb.jpg","2011-05-16T04:18:27Z","Wow, 1 month ago I'd have been surprised to see $NKE my week's leader and $DD $DE and $CAT my greatest losers.","False","False","","233837225","2011-05-16T04:18:20Z","3796018","","","4656,4930,4940,6502","CAT,DD,DE,NKE"

One column of my dataset has cells which may contain multiple stock symbols such as "CAT,DD,DE,NKE", and during the conversion I'd like to promote each of these ticker symbol (separated by commas) as resources. Does the converter have such function?

In Example 1, we split up the string into strings. We can also split up the string into URIs. While keeping the conversion:delimits_object ",\s*"; in Example 1, we can also change the conversion:range to make URIs:

           conversion:range  rdfs:Literal;

to:

          conversion:range  rdfs:Resource;

Example 3

   raw:committee "Senate Special Committee on Aging|Senate Committee on Appropriations|Subcommittee on Commerce, Justice, and Science, and Related Agencies|Subcommittee on Defense|Subcommittee on Energy and Water Development|Subcommittee on Homeland Security|Subcommittee on Labor, Health and Human Services, Education, and Related Agencies|Subcommittee on Transportation, Housing and Urban Development, and Related Agencies|Senate Committee on Banking, Housing, and Urban Affairs|Senate Committee on Rules and Administration" ;
conversion:delimits_object "\\|";
   e1:committee 
typed_committee:Subcommittee_on_Homeland_Security , typed_committee:Subcommittee_on_Labor_Health_and_Human_Services_Education_and_Related_Agencies , typed_committee:Subcommittee_on_Transportation_Housing_and_Urban_Development_and_Related_Agencies , typed_committee:Subcommittee_on_Energy_and_Water_Development , typed_committee:Subcommittee_on_Commerce_Justice_and_Science_and_Related_Agencies , typed_committee:Subcommittee_on_Defense , typed_committee:Senate_Committee_on_Rules_and_Administration , typed_committee:Senate_Committee_on_Banking_Housing_and_Urban_Affairs , typed_committee:Senate_Committee_on_Appropriations , typed_committee:Senate_Special_Committee_on_Aging ;

How many datasets use conversion:delimits_object?

two

prefix geonames:   <http://www.geonames.org/ontology#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>

select ?dataset count(*) as ?count
where {
  graph <http://purl.org/twc/vocab/conversion/ConversionProcess> {
    ?dataset conversion:conversion_process [
      conversion:enhancement_identifier ?e;
      conversion:enhance [ 
        conversion:delimits_object ?delimiter
      ]
    ] 
  }
}
group by ?dataset ?e
order by ?count

See also

Clone this wiki locally