sparqlfun
LinkML based SPARQL template library and execution engine
- modularized core library of SPARQL templates
- generic templates using common vocabs (rdf, owl, skos, ...)
- OBO and biology specific, e.g. Ubergraph
- coming soon: uniprot, wikidata, etc
- Fully FAIR description of templates
- Each template has a URI
- Each template parameters has a URI
- Full metadata including descriptions of each
- Templates described in YAML, RDF, SHACL, ShEx, ...
- optional python bindings using LinkML
- supports both SELECT and CONSTRUCT
- optional export to TSV, JSON, YAML
Browse the default templates
Note: currently not all metadata from the yaml is shown in the generated docs
Command Line
sparqlfun -e ubergraph -T PairwiseCommonSubClassAncestor node1=GO:0046220 node2=GO:0008295
results:
results:
- node1: GO:0046220
node2: GO:0008295
predicate1: rdfs:subClassOf
predicate2: rdfs:subClassOf
ancestor: GO:0009987
- node1: GO:0046220
node2: GO:0008295
predicate1: rdfs:subClassOf
predicate2: rdfs:subClassOf
ancestor: GO:0044237
- node1: GO:0046220
node2: GO:0008295
predicate1: rdfs:subClassOf
predicate2: rdfs:subClassOf
ancestor: GO:0044271
...
Python
se = SparqlEngine(endpoint='ubergraph')
se.bind_prefixes(GO='http://purl.obolibrary.org/obo/GO_')
for row in se.query(PairwiseCommonSubClassAncestor, node1='GO:0046220', node2='GO:0008295'):
print(f'ROW={row}')
For more examples, see tests/
Service (via Fast API)
coming soon!
Browsing the templates
- source is in sparqlfun/schema
- add new templates here
- Browse the generated markdown on the site
How it works
Basics
Templates are defined as YAML files following the LinkML schema.
A yaml file with a single template might look like this:
classes:
my template:
slots:
- my_var1
- my_var2
annotations:
sparql.select: |-
SELECT * WHERE { ... ?my_var1 ... ?my_var2}
slots:
my_var1:
description: about my var 1
my_var2:
description: about my var 2
This defines a template MyTemplate
with two slots/parameters, and an arbitrarily complex SPARQL select query.
Note that the definitions of the slots go in a different section from the classes/templates. You are encouraged to "reuse" slots across templates.
The above can be used in queries:
sparqlfun -e ubergraph -T MyTemplate my_var2=MY_VAL
You can ground any or all of your vars on the command line (if you ground all then your SELECT is effectively an ASK query).
However, the features go beyond other templating systems, and leverage the fact that LinkML is a fully-fledged rich modeling language with bindings to JSON-Schema, SHACL, ShEx, etc.
For example, you will get markdown documentation describing your templates. This markdown documentation will be even richer if you annotate your schemas with metadata such as
- descriptions
- ranges for slots
- mappings and URIs for your templates and slots
Template Inheritance
Templates can be inherited, facilitating reuse and composition patterns
To illustrate consider a simple "base" template to query a triple:
triple:
aliases:
- statement
description: >-
Represents an RDF triple
slots:
- subject
- predicate
- object
class_uri: rdf:Statement
in_subset:
- base table
annotations:
sparql.select: SELECT * WHERE { ?subject ?predicate ?object}
This is not a particularly useful template in isolation - you may as well query directly with sparql (nevertheless it can be useful to have templates for even this simple pattern, to faciliate generation of APIs etc)
This template can be inherited, which means that slots will be inherited, eliminating some boilerplate and the need to redefine them
Inerhitance allows even more powerful features using the LinkML classification_rules
construct. Let's say we want to represent type triples as children of generic triples:
rdf type triple:
is_a: triple
description: >-
A triple that indicates the asserted type of the subject entity
slot_usage:
object:
description: >-
The entity type
range: class node
classification_rules:
- is_a: triple
slot_conditions:
predicate:
equals_string: rdf:type
Note we don't need to specify a SPARQL template here - the template is autogenerated from the classification rule.
SPARQL CONSTRUCT and nested/inlined objects
Example CONSTRUCT query:
obo class:
is_a: class node
class_uri: owl:Class
slots:
- definition
- exact_synonyms
annotations:
sparql.construct: |-
CONSTRUCT {
?id a owl:Class ;
IAO:0000115 ?definition ;
oboInOwl:hasExactSynonym ?exact_snonyms
}
WHERE {
?id a owl:Class .
OPTIONAL { ?id IAO:0000115 ?definition } .
OPTIONAL { ?id oboInOwl:hasExactSynonym ?exact_snonyms } .
}
...
slots:
definition:
slot_uri: IAO:0000115
exact_synonyms:
slot_uri: oboInOwl:hasExactSynonym
multivalued: true
We can then query this as follows:
sparqlfun -e ubergraph -T OboClass id=GO:0000023
The results will be nested following the LinkML specification for the model
{
"results": [
{
"id": "GO:0000023",
"definition": "The chemical reactions and pathways involving the disaccharide maltose (4-O-alpha-D-glucopyranosyl-D-glucopyranose), an intermediate in the catabolism of glycogen and starch.",
"exact_synonyms": [
"malt sugar metabolic process",
"malt sugar metabolism",
"maltose metabolism"
]
}
],
"@type": "ResultSet"
}
You can also get the turtle as returned by the triplestore:
@prefix ns1:
.
@prefix ns2:
.
@prefix ns3:
.
ns2:GO_0000023 a
;
ns2:IAO_0000115 "The chemical reactions and pathways involving the disaccharide maltose (4-O-alpha-D-glucopyranosyl-D-glucopyranose), an intermediate in the catabolism of glycogen and starch." ;
ns1:hasExactSynonym "malt sugar metabolic process",
"malt sugar metabolism",
"maltose metabolism" .
[] a ns3:ResultSet ;
ns3:results ns2:GO_0000023 .
With -t tsv
the linkml csv dumper will attempt to flatten the nested structure to TSV as closely as possible, e.g. using pipe internal seperators for multivalued
Modularity
LinkML allows importing so templates can be modularized
In future this repo may be split up, with the bio/obo specific features migrating to a new repo.
Use of Jinja commands
You can incorporate additional logic via Jinja2 templating instructions:
obo class filtered:
is_a: class node
class_uri: owl:Class
slots:
- definition
- exact_synonyms
annotations:
sparql.construct: |-
CONSTRUCT {
?id a owl:Class ;
IAO:0000115 ?definition ;
oboInOwl:hasExactSynonym ?exact_snonyms
}
WHERE {
?id a owl:Class .
OPTIONAL { ?id IAO:0000115 ?definition } .
OPTIONAL { ?id oboInOwl:hasExactSynonym ?exact_snonyms } .
{% if query_has_subclass_ancestor %}
?id rdfs:subClassOf ?query_has_subclass_ancestor
{% endif %}
}
Supported Endpoints
This framework can be used with any SPARQL endpoint. However, the current pre-defined templates are geared towards the combination of OBO-style ontologies together with storage patterns employed in triplestores such as ubergraph and ontobee.
In particular, ubergraph uses the relation-graph inference tool to pre-compute inferred direct triples from TBox existential axioms, allowing for simple and powerful queries over inferred ontologies
See also
This was inspired in part by the powerful but arcane sparqlprog system
TODOs
- Better Document
- framework
- templates
- How-tos for use with Python, SHACL, ...
- exemplar notebooks
- Unify with SQL/rdftab functionality in semantic-sql
- Split into bio-specific
- Expose more ubergraph awesomeness
- FastAPI/serverless endpoint
- Expose more validatin
- Integrate visualization / obographviz
- Chaining
- inject output from one into another and merge results, e.g. to get labels
- similar to wikidata services
- Templates for
- uniprot
- gocams
- wikidata