The Theory and Practice of Digital Libraries conference (known before as European Conference on Digital Libraries, ECDL) was held in Paphos (Cyprus) on September 23-27, 2012. PSNC presented two papers which can be found in the conference proceedings published as Lecture Notes in Computer Science (7489):
- Creation of Textual Versions of Historical Documents from Polish Digital Libraries (Adam Dudczak, Miosz Kmieciak, Marcin Werla)
- Advanced Automatic Mapping from Flat or Hierarchical Metadata Schemas to a Semantic Web Ontology (Justyna Walkowska, Marcin Werla)
The former paper describes the prototype of the Virtual Transcription Laboratory created by PSNC as part of the SYNAT project. The work described in the paper included performing experiments whose goal was to train an OCR engine to automatically recognize text in digital scans of old documents (Polish texts printed between 16th and 17th century). The paper explains the rationale behind the prototype, its possibilities, and new development directions.
The latter paper concerns the issue of transforming data described using traditional metadata schemas (such as MARC 21 or Dublin Core) to an ontological formats, designed to exist in the Semantic Web and Linked Open Data environment. The paper describes requirements for languages expressing such mapping rules and the tools that implement them. It also shortly presents the jMet2Ont mapping tool.

For us, the conference started on Sunday with a so-called doctoral consortium. A doctoral consortium is a meeting during which each PhD student is assigned a mentor who is obliged to read (before the meeting) an extended abstract of the planned PhD thesis, and to prepare a list of comments and questions. During the meeting, each student presents their work and results to date. The mentor is expected to facilitate discussion after the presentation. Such an event is very beneficial for the students who are offered a chance to learn experts’ opinion on the strong and weak points of the research, all in a safe and friendly environment (the meeting is closed to the public).
The main conference lasted three days, Monday to Wednesday.
An outstanding keynote speech was Cathy Marshall‘s (Microsoft Research) Whose content is it anyway? Social media, personal data, and the fate of our digital legacy. The author raised a number of interesting issues concerning the transience of digital media, the expectances of the general user, and how the situation has been changed by social media such as Twitter or Facebook. The talk was well prepared and full of surprising points, turnabouts, and inspiring conclusions.
The same subject appeared in a presentation by Hany M. SalahEldeen and Michael L Nelson of Old Dominion University. In their paper entitled Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost? the authors analyzed archival contents of social media corresponding to six important events in the las few years (including the Egiptian revolution, H1N1 pandemic, and Michael Jackson’s death). It turns out that after a year 11% of content linked from social portals is no longer available. One more year means another dozen percent of dead links. The full paper is available at the arXiv.org pages. This study was considered important also by traditional media, including the BBC.
A large number of papers were dedicated to machine learning applications. Digital libraries data seem to be a perfect field to apply and test machine learning algorithms. Another very interesting talk was Finding Quality Issues in SKOS Vocabularies (Christian Mader, Bernhard Haslhofer, Antoine Isaac).The authors defined a set of quality indicators and good practices for thesauri encoded in the SKOS format, and also created a qSKOS validating tool.
One of the most interesting events during the conference was the poster and demo session. The best demo contest was won by FrbrVis: An Information Visualization Approach to Presenting FRBR Work Families (Tanja Mercun, Maja Zumer, and Trond Aalberg).The authors, aware of the fact that more and more libraries and metadata aggregators are thinking about introducing the FRBR model, assigned themselves the task of designing an effective way of displaying FRBR data, so that the user could benefit from the model without feeling overwhelmed by it. They proposed four interface options, and then performed usability testing on a large number of users. Two graphical representations were picked as favourite, a concentric (called sun burst) and a hierarchical one. An unexpected conclusion was that graph-based representation (popular in Semantic Web world due to the very nature of RDF data), even though considered attractive at the first glance, proved difficult to use. A notable poster was presented in this session by the already metnioned here Hany M. SalahEldeen, who studied the temporal intention of users publishing links to online resources in social networks.
Thursday was the day of workshops. Conference participants were given the following choice:
- International Workshop on Supporting Users’ Exploration of Digital Libraries
- Networked Knowledge Organisation Systems and Services. The 11th European Networked Knowledge Organisation Systems (NKOS) Workshop
- 2nd International Workshop on Semantic Digital Archives
The NKOS workshop was dedicated mainly to the ISO 25964 thesaurus standard and its relation to SKOS. Only the first part of the standard is ready as of now. The documents describing the standard are not available for free, but a number of materials can be downloaded from the ISO 25964 webpage, including the XML schema (xsd) definition.
The archives workshop included a Semantic Technologies & Ontologies session in which Vladimir Alexiev of Ontotext gave a very interesting presentation about CIDOC CRM Search Based on Fundamental Relations and OWLIM Rules. Mentioning the FORTH (Foundation for Research and Technology – Hellas) A New Framework for Querying Semantic Networks study, he presented a model of searching which translates the 82 classes and 142 properties of CIDOC CRM to a smaller number of so-called fundamental classes (e.g. Person, Place) and properties, making the search much easier. Ontotext is the producer of the RDF repository called OWLIM. The presentation also described a set of OWLIM reasoning rules producing the simplified model.
In the shortest of the workshops, on supporting users’ exploration (additional materials available here) the participants had a chance to listen to a talk by David Haskiya (Europeana Foundation) about Europeana’s existing and planned features supporting users’ exploration of resources. The workshop ended with an interesting panel discussion in which the most prominent subject were the needs and expectations of current and future users of digital libraries, especially in the context of the youngest generation (see the video below).
The conference was held in a beautiful and historically significant corner of Europe which unfortunately is very hard to reach from Poland. The last year’s location (Berlin) was easier to get to for most of the participants. Next year the conference is to be held in Malta.

Post authors: Adam Dudczak, Justyna Walkowska, Marcin Werla