Introduction to semantics: Python

The code below assumes a semantic store endpoint is available at localhost:3030 which corresponds to the default of a local Fuseki server. The code is actually independent of Fuseki and based on SPARQL 1.1; you can run an AllegroGraph service in a docker container or a remote Stardog service.

The rdflib is just a pip away if not installed

as well as the sparqlwrapper:

to connect to the service.

The query and update endpoints are separate but you can unify them in one object like so:

This allows you to insert data straight away:

and to query it one would use something like:

Note that the result set is a dictionary with a rdflib.Vairable key, not a string. The query can also be copy/pates in YASGUI provided you alter the dropdown box with the appropriate endpoint.

The toPython method is a utility converting uniformly URI’s and literals for you.

Let’s add some data to our store first so we can demo some more techniques with it. The iris dataset is the hello-world equivalent in data science and is useful for some ideas described later on:

sepal-length sepal-width petal-length petal-width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

How to push this typical tabular data in a semantic store? There are various ways, below is the fairly standard star-graph type of storage.

To extract these records back via SPARQL you need to use a subquery, somethind like this:

What about graphing these results? This can be easily done with NetworkX but requires first a bit of stripping. The URI’s are too verbose for a graph, so let’s reduce things a bit:

Note that technique shows you:

  • how to convert triples to standard record sets
  • how to strip noise away
  • how semantic data can be converted to tabular data ready for machine learning

Now, with some NetworkX API you can easily draw the flower network (or part of it):

There are of course many ways to approach the data, with H2O Flow with Zeppelin and so on.

Another way to visualize things is with YASGUI. For example, after having pushed the iris data above you can visualize the distribution of a feature:

YASGUI dataviz

How many records do we have in the triple store?

How many have the word ‘setosa’?

There are plenty of other fun things you can do with triples and Python but I’d hope this gets you started.