Skip to content

Latest commit

 

History

History
70 lines (56 loc) · 1.63 KB

README.md

File metadata and controls

70 lines (56 loc) · 1.63 KB

SapFhir

Join genomic variation graphs with public data or internal medical data e.g. FHIR. by having a FAIR data access, using W3C sparql as a standard protocol.

Status

This is a RDF4j SAIL implementation that can take any handlegraph4j implementation and represent it as a W3C sparql 1.1 endpoint.

It is functionally complete. Performance depends hugly on the specific handlegraph implementation.

It is currently read-only, but could be made read/write.

There is a query optimizer that is active that can significantly rewrite queries for the best performance.

Example queries

#Find the ten most forward to forward connected nodes (needs a lot of RAM)
PREFIX vg:<http://biohackathon.org/resource/vg#> 

SELECT ?node 
WHERE 
{
  ?node vg:linksForwardToForward ?node2 .
} 
GROUP BY ?node 
ORDER BY (COUNT(?node2)) 
LIMIT 10
# Counts the number of sequences of length 1 in the graph
PREFIX vg:<http://biohackathon.org/resource/vg#>
SELECT 
  (COUNT(?n) AS ?c)
WHERE {
  ?n rdf:value ?sequence .
  FILTER(strlen(?sequence) ==1)
}
# Counts the number of sequences with an R ambiguous nucleotide code
# handlegraph4j lower cases all dna sequences.
PREFIX vg:<http://biohackathon.org/resource/vg#>
SELECT 
  (COUNT(?n) AS ?c)
WHERE {
  ?n rdf:value ?sequence .
  FILTER(contains(?sequence, 'r'))
}
# List all the Paths in the variation graph
PREFIX vg:<http://biohackathon.org/resource/vg#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT 
  ?path 
  ?pathLabel
WHERE {
  ?path a vg:Path ;
        rdfs:label ?pathLabel .
}