AN INTRODUCTION TO KNOWLEDGE REPRESENTATION

(This is a multi-part series on semantics and reasoning)

SPARQL is pronounced ‘sparkle’ and stands for Semantic Protocol and RDF Query Language. It looks similar to SQL but also totally different since it has to encompass the non-existing field and table names, yet include named graphs and links attached to links.

The easiest way to play with SPARQL is to navigate to the YASGUI demo or to clone the Github repo. This UI allows you access many open SPQRQL endpoints, including DBPedia, PubChem and WordNet.

There are a few gotchas when fiddling with SPARQL:

  • triple stores have ‘databases’ and the SPARQL endpoint usually includes the name of the database.
  • furthermore, endpoints to select are different from endpoints to update data. For example, if you use Fuseki with a ‘Test’ database you have a read-endpoint ‘http://localhost:3030/Test/sparql’ and an update-endpoint ‘http://localhost:3030/Test/update’
  • every dataset has a default graph which you are using unless you specificly use a different one.
  • field names in SQL are replaced with placeholders starting with a question-mark
  • a triple ends with a dot, for example ‘?s ?p ?o.’. If you do not use a dot the parser will interprete whatever comes next as part of the triple specification.
  • explicit URI’s are enclosed with ‘<' and '>‘.
  • you can use prefixes (like XML) which you define at the beginning of the query like so ‘PREFIX dbpedia-owl: http://dbpedia.org/ontology/
  • a semi-colon can be used to specify multiple triples with the same subject. For example,
    ?s  ?p;
         ?k.
    

will put a constraint on the subject ‘s’ such that it has both a predicate ‘has_friend’ and ‘has_car’. Note the dot at the end to tell the parser that the constraint is finished.
– the “?s ?p ?o” triple stands for ‘subject predicate object’ but so does ‘?q ?x2 ?card’.

The Learning SPARQL book Bob Du Charme is an excellent overview and teaches you way more than you probably will ever need.

Below is a collection of SPARQL queries I often copy/paste. A kinda cheat-sheet, if you wish.

All graph triples

Selecting everything in the default graph is as simple as:

SELECT *   {
  ?s ?p ?o.
}

If you want all triple in all (named) graphs:

SELECT *   {
  graph ?g {?s ?p ?o}  
}

or a specific named graph:

SELECT *   {
graph  {?s ?p ?o}  
}

Delete all

Deleting everything is similar to the selection above:

DELETE 
  where
  {graph ?g {?s ?p ?o}}

or something more specific:

  DELETE 
    where
    {?s  ?o}}

or

delete {?s ?p ?o}
where {
  ?s ?p ?o
      {?s   ?o}  

    }

Note that one cannot select a single node or predicate. Nodes and links are concepts from a graph database and do not exist in a triple store.

Collecting triples

A common situation is to select all triples of nodes which are themselves the result of a query/constraint. Say, you have triples of the type

?something  "red"

but you actually want all the triples attached to ‘?something’. This is done with a subquery:

  SELECT ?s ?k ?m 
  where {
      (?something as ?s) ?k ?m
      {
          select ?something 
               {
                  ?something  "red"
                  }
      }
  }

where we have used an alias to convert ‘something’ to ‘s’ though the main query could have been ‘?something ?k ?m’ as well.

Counting

Counting things requires an alias like so:

SELECT (COUNT(*) AS ?count)
WHERE
{
  ?s ?p ?o.
} 

Using text

You can place various additional constrains on the placeholders, like filtering the content of the URI:

select distinct ?s
where
{  ?s ?p ?o .
   FILTER (STRSTARTS(str(?s), 'http://www.conceptgraph.com/fish')) 
}

You can also use regular expression ans all that.

Get degrees

Performing typical graph-like operations and traversals is challenge with SPARQL. Some vendors (like Stardog) implement both SPARQL and Tinkerpop Gremlin in order to make such operations possible. Gremlin complements SPARQL in a way.

Still, if you only need, for example, simple degree measurements you can use:

select ?x (count(*) as ?degree) { 
                      { ?x ?p ?o } union
                      { ?s ?p ?x }
                    }
                    group by ?x order by desc(?degree)

DBPedia

Querying DBPedia can be a lot of fun but be aware that some queries will time out and that DBPedia consists of billion of triples:

PREFIX  dbpedia-owl:  
PREFIX      dbpprop:  
SELECT  ?name ?p ?o

  WHERE 
    {
      ?s  dbpprop:label  ?name .
        ?name ?p ?o.
        FILTER (contains( str(?name), "London"))
    }
    limit 100

or

PREFIX  dbpedia-owl:  
PREFIX      dbpprop:  
SELECT  ?s         
  WHERE 
    {
      ?s  a   .          
        FILTER (contains( str(?s), "bomb"))    
    }
limit 100

Constrain to English only

Triple stores ans SPARQL are intrinsically multi-lingual. Literals can be adorned with languages and if you only want for example the English ones you can use a filter with a language match:

 FILTER(LANG(?o) = "" || LANGMATCHES(LANG(?o), "en"))

Using dates

Datetime literals can be filtered out with an appropriate schema prefix like so:

PREFIX xsd: 
select *
where{
    ?s ?p ?o
    {
        select ?s {?s  'Late Night News'
                      ; 'False'
                      ; ?date
                      FILTER (xsd:date(?date) >= "2012-01-01"^^xsd:date)
                   }
    }    
}
LIMIT 1000

Insert data

Note that you need to execute this on the ‘update’ endpoint.

PREFIX ns: 
PREFIX home: 
INSERT  data {  }

If you need to insert mutliple triples you’d use

PREFIX ns: 
PREFIX home: 
INSERT  data {
    
    
    
  }

Update a counter

There are sometimes SQL queries which are quite challenging in SPARQL. For example, updating a counter is one of them

DELETE {   ?outdated }
INSERT {   ?updated }
WHERE {

    OPTIONAL {   ?outdated }
    BIND ((IF(BOUND(?outdated), ?outdated + 1, 1)) AS ?updated)
}

Representing results in Python with Pandas

If you use Jupyter or nteract for inline experimentation you can use something like the following to output the result to Pandas. Once in a Pandas frame things towards machine learning are easier.

q = f"""
  select (?s as ?id) (?k as ?property) (?m as ?value)
  where
  {{
    ?s ?k ?m
         {{
         select ?s
                {{
                  ?s
                }}
         }}
  }}
 """
results = store.query(q)
import pandas
df = pandas.DataFrame.from_dict(results.bindings)
r = []
for name, group in df.groupby(rdflib.term.Variable('id')):
    x = {
        "id" : name.toPython()
    }
    for v in group.T.to_dict().values():
        x[v[rdflib.Variable('property')].toPython()] =  v[rdflib.Variable('value')].toPython()
    r.append(x)
r        

## Using SPARQL in R

You can install the SPARQL package in R via

install.packages("SPARQL")

and once present you can use something like this to return semantic data:

library(SPARQL)
data <- SPARQL("http://localhost:3030/Test/sparql", query = "select * where {?s ?p ?o.}")
data$results