Tag Archives: Linked Data

The Difference between FROM and FROM NAMED in SPARQL, and the SeRQL Alternative

The SYNAT project involves intense use of semantic web technologies. We store data in an RDF repository (OWLIM), and use the SPARQL and SeRQL languages to query the data. The former is considered a standard, the latter, proposed by Aduna (producer of the Sesame RDF repository) is easier to use at least for some of us.

Last week we realized that no matter how often we use SPARQL, it was about time to fully understand the difference between FROM and FROM NAMED. It turned out that finding a reliable and complete source providing this information was not that easy, so we decided to create this post (based on this post on a team member’s private blog) to clear the matters.

It seems that one of the bigger problems is the name itself. Both FROM and FROM NAMED concern named graphs, which we hold reponsible for a lot of misunderstandings around those clauses. Below is a consise Q&A section that descibes the situation.

If you do not declare FROM or FROM NAMED, what exactly do you query?
You query the active graph. The active graph does not need to be the default graph! In OWLIM, for instance, the active graph is the whole of the repository.

Example (from the SPARQL specification):

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?nameX ?nameY ?nickY
WHERE
  { ?x foaf:knows ?y ;
       foaf:name ?nameX .
    ?y foaf:name ?nameY .
    OPTIONAL { ?y foaf:nick ?nickY }
  }

What is the active graph?
It is the graph(s) that is queried when FROM and FROM NAMED are not used. It might be the default graph, the whole repository contents… or possibly something else, depending on the implementation.

What is the default graph?
The default graph is the graph without a name, or without a context. This is the graph whose triples are in fact triples and not quads.

What does the FROM clause change?
If you use the FROM clause, you restrict the set of graphs that are queried. Only the named graph(s) given in the FROM clause(s) will be considered while matching the template.

Example. Only triples from the <http://example.org/foaf/aliceFoaf> graph will be used.

PREFIX foaf: <http://xmlns.com/foaf/0.1/glt;
SELECT  ?name
FROM    <http://example.org/foaf/aliceFoaf>
WHERE   { ?x foaf:name ?name }

What does the FROM NAMED clause change?
If you use the FROM NAMED, every graph name you use in the query will be matched only to the graph provided in the clause.

Example (which combines FROM and FROM NAMED). ?g will be matched either to <http://example.org/alice> or to <http://example.org/bob>, but to no other named graph.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?who ?g ?mbox
FROM <http://example.org/dft.ttl>
FROM NAMED <http://example.org/alice>
FROM NAMED <http://example.org/bob>
WHERE
{
   ?g dc:publisher ?who .
   GRAPH ?g { ?x foaf:mbox ?mbox }
}

Can you combine FROM and FROM NAMED?
Yes, see the question above. In the example the named triple has to be found in one of the graphs given in the FROM NAMED clause, and the loose triple will be matched against the graph given in the FROM clause.

What if there is only one FROM NAMED clause?
Then the following two queries yield the same results:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?who ?mbox
FROM <http://example.org/dft.ttl>
FROM NAMED <http://example.org/alice>
WHERE
{
   ?g dc:publisher ?who .
   GRAPH ?g { ?x foaf:mbox ?mbox }
}

is equal to

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?who ?mbox
FROM <http://example.org/dft.ttl>
WHERE
{
   ?g dc:publisher ?who .
   GRAPH <http://example.org/alice> { ?x foaf:mbox ?mbox }
}

Is it easier in SeRQL?
Yes. Both FROM and FROM NAMED are done using the FROM CONTEXT clause (example from the SeRQL specification):

SELECT name, mbox
FROM CONTEXT <http://example.org/context/graph2>
     {x} foaf:name {name};
         foaf:mbox {mbox}
USING NAMESPACE
foaf = <http://xmlns.com/foaf/0.1/>

Dublin Core’s dirty little secret

We recommend you to read a series of posts published on “The Reinvigorated Programmer” blog, describing different approaches for expressing the same bibliographic data in different standards. The blog author, programmer working on a library-related metasearch system, starting from traditional library catalogue cards, considers how to describe an article from a scientific journal. Trying to use different metadata standards, with a very practical approach, the blogger mercilessly revelas ther weak sides, in most cases related to different levels of interoperability. The entire adventure is described in three parts:

In the context of Polish digital libraries, the second part is especially interesting, as it focuses on Dublin Core and Dublin Core Terms and in a humorous way it shows their areas of failure.

By the way the author points the “Dumb-down Principle” related with Dublin Core Terms, according to which, qualifiers should be used only to narrow the semantic scope of the qualified element, and not extend this scope. When adding our own qualifiers to the Dublin Core schema, we should do this in a way which, in case when the qualifier part is removed, still allows to reuse the remaining value in a meaningfull way. For example, a qualification of Dublin Core “Publisher” element with the “Place of publishing” qualifier, very popular in Polish digital libraries, is not compatible with this rule.

Besides of the above blog posts, we also recommend you two interesting documents describing the use of Linked Data in libraries (both traditional and digital):

First of those texts contains a lot of useful links to tools, systems and standards related to Linked Data and libraries.

Free your data now!

We have mentioned already about the idea of Linked Open Data. Recently the CERN Library, willing to implement this idea, have published their entire library catalog as a sing MARC-XML file. Moreover, works on publishing the same data as the RDF file are now in progress.

Data from CERN Library will be used by Open Library Project and biblios.net, which will give access to this data via the Z39.50, SRU and OAI-PMH protocols. The file with the bibliographic data, published on the Creative Commons CC0 (“No copyright”) license can be downloaded from the CERN Library website.

The press release about this event was accompanied by a short advertisement – please take a look:

Europeana publishes White Paper #1

On June 1, 2010 Europeana announced its “White Paper 1 Knowledge = Information in Context: on the Importance of Semantic Contextualisation in Europeana”. Europeana’s first White Paper looks at the key role linked data will play in Europeana’s development and in helping Europe’s citizens make connections between existing knowledge to achieve new cultural and scientific developments.

Linked data gives machines the ability to make associations and put search terms into context. Without it, Europeana could be seen as a simple collection of digital objects. With linked data, the potential is far greater, as the author of the white paper, Prof. Stefan Gradmann, explains.

Professor Gradmann used an example with word “Paris” to show how the search results may lead to items in the Louvre located in Paris, where it is also possible to see paintings portreting Paris, a Greek prince who abducted Helen of Troy. From there links lead to the topics associated with the mythical Apple of Discord and further to the forbidden apple eaten by Adam and Eve.

The example presented in the White Paper shows how linked data will allow Europeana to propose connections between millions of items. These connections can then be used to generate new ideas and knowledge, on a scale not possible before.

Full text of the White Paper is available at Europeana.

TED, TEDx and Open Linked Data

TED (acronym for Technology, Entertainment, Design) is a series of meeting, on which the invited speakers have 18 minutes to present their idea for changing the world, which was evaluated by the organizers as an “idea worth spreading”. TED conferences are organized once a year in California, USA, and the recordings of speeches are available at TED websites. The TED meetings are extended by the TEDx initiative, which is focused on organization of similar conferences around the world. This year for the first time a Polish TEDx meeting was organized – TEDxWarsaw. It was held on the 5th of March, and since yesterday the video recordings from this meeting are available. For the 28th of May TEDxPoznań meeting is planned.

In February this year, during the TEDUniversity event there was a 5 minute “lecture” of Time Berners-Leeon open data. This lecture illustrates with really interesting examples how important is to publish the source data in the Internet, instead of publishing just papers/reports prepared on the basis of this data. This lecture is a supplement for the Tim Berners-Lee speech from the TED2009, in which he called for wide publishing of source (raw) data in a way allowing its automated reuse (the Linked Data standard). We recommend you to see both these speeches:

Such approach is also an important direction of the development of digital libraries. Presently in project such as Europeana, the automated reuse of metadata describing digital objects distributed in many digital libraries is implemented on a large scale. The next step can be a stronger semantic integration of this metadata, as for example shown in the “Thought Lab”. It is a prototype service allowing advanced search and exploration of a slice of information available in Europeana, which basis is the automated semantic integration of the metadata. Another example are the works in the DRIVER II project in the area of so called “enhanced publications”, which are complex objects connecting both scientific papers and source data used in this papers. Examples of such prototype enhanced publications can be found at one of the project websites.