Tag Archives: CIDOC CRM

TPDL 2012: Theory and Practice of Digital Libraries

The Theory and Practice of Digital Libraries conference (known before as European Conference on Digital Libraries, ECDL) was held in Paphos (Cyprus) on September 23-27, 2012. PSNC presented two papers which can be found in the conference proceedings published as Lecture Notes in Computer Science (7489):

The former paper describes the prototype of the Virtual Transcription Laboratory created by PSNC as part of the SYNAT project. The work described in the paper included performing experiments whose goal was to train an OCR engine to automatically recognize text in digital scans of old documents (Polish texts printed between 16th and 17th century). The paper explains the rationale behind the prototype, its possibilities, and new development directions.

The latter paper concerns the issue of transforming data described using traditional metadata schemas (such as MARC 21 or Dublin Core) to an ontological formats, designed to exist in the Semantic Web and Linked Open Data environment. The paper describes requirements for languages expressing such mapping rules and the tools that implement them. It also shortly presents the jMet2Ont mapping tool.

Maa – Palaeokastro Museum

For us, the conference started on Sunday with a so-called doctoral consortium. A doctoral consortium is a meeting during which each PhD student is assigned a mentor who is obliged to read (before the meeting) an extended abstract of the planned PhD thesis, and to prepare a list of comments and questions. During the meeting, each student presents their work and results to date. The mentor is expected to facilitate discussion after the presentation. Such an event is very beneficial for the students who are offered a chance to learn experts’ opinion on the strong and weak points of the research, all in a safe and friendly environment (the meeting is closed to the public).

The main conference lasted three days, Monday to Wednesday.

An outstanding keynote speech was Cathy Marshall‘s (Microsoft Research) Whose content is it anyway? Social media, personal data, and the fate of our digital legacy. The author raised a number of interesting issues concerning the transience of digital media, the expectances of the general user, and how the situation has been changed by social media such as Twitter or Facebook. The talk was well prepared and full of surprising points, turnabouts, and inspiring conclusions.

The same subject appeared in a presentation by Hany M. SalahEldeen and Michael L Nelson of Old Dominion University. In their paper entitled Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost? the authors analyzed archival contents of social media corresponding to six important events in the las few years (including the Egiptian revolution, H1N1 pandemic, and Michael Jackson’s death). It turns out that after a year 11% of content linked from social portals is no longer available. One more year means another dozen percent of dead links. The full paper is available at the arXiv.org pages. This study was considered important also by traditional media, including the BBC.

A large number of papers were dedicated to machine learning applications. Digital libraries data seem to be a perfect field to apply and test machine learning algorithms. Another very interesting talk was Finding Quality Issues in SKOS Vocabularies (Christian Mader, Bernhard Haslhofer, Antoine Isaac).The authors defined a set of quality indicators and good practices for thesauri encoded in the SKOS format, and also created a qSKOS validating tool.

One of the most interesting events during the conference was the poster and demo session. The best demo contest was won by FrbrVis: An Information Visualization Approach to Presenting FRBR Work Families (Tanja Mercun, Maja Zumer, and Trond Aalberg).The authors, aware of the fact that more and more libraries and metadata aggregators are thinking about introducing the FRBR model, assigned themselves the task of designing an effective way of displaying FRBR data, so that the user could benefit from the model without feeling overwhelmed by it. They proposed four interface options, and then performed usability testing on a large number of users. Two graphical representations were picked as favourite, a concentric (called sun burst) and a hierarchical one. An unexpected conclusion was that graph-based representation (popular in Semantic Web world due to the very nature of RDF data), even though considered attractive at the first glance, proved difficult to use. A notable poster was presented in this session by the already metnioned here Hany M. SalahEldeen, who studied the temporal intention of users publishing links to online resources in social networks.

Thursday was the day of workshops. Conference participants were given the following choice:

The NKOS workshop was dedicated mainly to the ISO 25964 thesaurus standard and its relation to SKOS. Only the first part of the standard is ready as of now. The documents describing the standard are not available for free, but a number of materials can be downloaded from the ISO 25964 webpage, including the XML schema (xsd) definition.

The archives workshop included a Semantic Technologies & Ontologies session in which Vladimir Alexiev of Ontotext gave a very interesting presentation about CIDOC CRM Search Based on Fundamental Relations and OWLIM Rules. Mentioning the FORTH (Foundation for Research and Technology – Hellas) A New Framework for Querying Semantic Networks study, he presented a model of searching which translates the 82 classes and 142 properties of CIDOC CRM to a smaller number of so-called fundamental classes (e.g. Person, Place) and properties, making the search much easier. Ontotext is the producer of the RDF repository called OWLIM. The presentation also described a set of OWLIM reasoning rules producing the simplified model.

In the shortest of the workshops, on supporting users’ exploration (additional materials available here) the participants had a chance to listen to a talk by David Haskiya (Europeana Foundation) about Europeana’s existing and planned features supporting users’ exploration of resources. The workshop ended with an interesting panel discussion in which the most prominent subject were the needs and expectations of current and future users of digital libraries, especially in the context of the youngest generation (see the video below).

The conference was held in a beautiful and historically significant corner of Europe which unfortunately is very hard to reach from Poland. The last year’s location (Berlin) was easier to get to for most of the participants. Next year the conference is to be held in Malta.

Cypriot cuisine

Post authors: Adam Dudczak, Justyna Walkowska, Marcin Werla

CIDOC 2012: Enriching Cultural Heritage

The Helsinki Cathedral, minutes before midnight.

The CIDOC 2012: Enriching Cultural Heritage conference was held in Helsinki (the World Design Capital this year) on June 10-14. The conference is organized annually by CIDOC/ICOM, the International Committee for Documentation at the Internation Council of Museums. Last year the conference was held in Sibiu, Romania – a short post about it is available here.

The reason why PSNC is interested in the works of CIDOC is that we have started using the CIDOC CRM model as the main ontology to organize metadata stored in a Semantic Web knowledge base we have built in the SYNAT project. The knowledge base contains information about resources of different type (currently: librariy, catalogue and museum), described with different metadata schemas. We needed adescription format to which we could map the existing heterogeneous records CIDOC CRM (Conceptual Reference Model) provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation, so it was a natural choice. Also, there exist an OWL implementation of CIDOC CRM, very useful in the Semantic Web environment.

While trying to map library metadata to CIDOC CRM we realized that the representation of books is more complicated than the representation of museum objects (even though, of course, museum can store books, and libraries can have old volumes that have historical value). You can find more information about this issue in the team’s publication list.

First we tried to cope with the problem by introducing our own CIDOC CRM extensions (mostly subclasses and subproperties), but then we switched to FRBRoo. FRBRoo is an extension of CIDOC CRM created by the CIDOC committee, that is also compliant with FRBR (Functional Requirements for Bibliographic Records) model as specified by IFLA (International Federation of Library Associations). The most distinctive feature of FRBR is the description of a publication (e.g. a book) on four levels:

  • work (e.g. ‘Heart of Darkness’ by Joseph Conrad),
  • expression (the intellectual contents first English edition of ‘Heart of Darkness’),
  • manifestation (all physical copies of the edition, as a set),
  • item (a particular exemmplar from the set).

During this CIDOC conference we presented our results: the description of mapping from MARC 21 and PLMET schemas to FRBRoo and the challenges related with this process.

  • The conference included a number of workshops (with a very interesting CIDOC CRM/FRBRoo/EDM/CRM Dig one by Martin Doerr), CIDOC working groups meetings, keynotes, and ‘regular’ presentations. The main themes of the conference were:
  • Co-operation & exchange,
  • Social media,
  •  Semantic Web,
  • Digital technologies and intangible cultural heritage,
  • Innovations in documentation,
  • Multilingualism and regional cultures.

“CIDOC 2011 – Knowledge Management and Museums” Conference

The “CIDOC 2011 – Knowledge Management and Museums” conference took place in Sibiu in Romania on September 4-9, 2011. The conference is an annual event, organized by ICOM-CIDOC, that is the Committee for Documentation at the International Council of Museums.

The conference participants came from very different, but cooperating environments: museologists, librarians, programmers and museum software vendors, researchers in the field of ontologies and semantic web,
and also people and institutions concerned with museum documentation standards.

The conference included meetings of CIDOC working groups:

  • Archaeological Sites
  • Conceptual Reference Model Special Interest Group
  • Co-reference
  • Data Harvesting and Interchange
  • Digital preservation
  • Documentation Standards
  • Information Centres
  • Multimedia
  • Transdisciplinary Approaches in Documentation

A number of topics were raised at the conference which are tightly connected with PSNC’s work in the SYNAT project. The most prominent ones were:

  • LIDO (Lightweight Information Describing Objects) specification (www.lido-schema.org/) for description of museum resources made available online
  • recommendation to use persistent, unique identifiers (URIs) of museum resources
  • FRBRoo ontology which merges CIDOC CRM and FRBR (Functional Requirements for Bibliographic Records) to properly describe digital resources online (www.nla.gov.au/lis/stndrds/grps/acoc/tillett2004.ppt, http://www.frbr.org/categories/frbroo)
  • Wiss-ki system presentation (http://wiss-ki.eu/, http://www8.informatik.uni-erlangen.de/transdisc/hohmann_cidoc09_wisski-2.pdf). The goals and assumptions of the project are very close to those of SYNAT. Some of the already used solutions might possibly be used in SYNAT.

The next CIDOC conference will take place in June 2012 in Helsinki. Additionally, the CIDOC “summer school” for people taking care of museum documentation is planned for the holiday period of 2012.