Monthly Archives: October 2012

Polish Collections in Europeana conference in Toruń

Toruń, AD 1641

The Polish Collections in Europeana conference was organized in the medieval city of Toruń on October 18-19 by the International Centre for Information Management Systems and Services ICIMSS.

The opening speech, The Decision to Digitise, was given by Eleanor Kenny of the Europeana Foundation. The remaining presentations, delivered in Polish, may be divided into the following four categories:

  • Presentation of Europeana-related projects
  • National IT infrastructure for cultural heritage resources
  • The support of Ministry of Culture and National Heritage for digitization projects
  • Problems and needs of Polish cultural heritage institutions

Two presentations were given by representatives of The Ministry of Culture and National Heritage: The Digitization Strategy of The Ministry of Culture and National Heritage (Anna
Duńczyk-Szulc) and The Project of a Ministerial Portal Dedicated to Cultural Heritage Resources Digitization (Agata Bratek). The portal is to be launched at the beginning of 2013.

A number of Europeana-related projects were presented, including:

  • Europeana Photography (Europeana Photography – Documentation of the First Century of Photography, Marta Miskowiec, Museum of History of Photography in Cracow, Piotr Kożurno, ICIMSS)
  • Athena (Athena and Athena Plus – Projects Encouraging Museums to Cooperate with Europeana, Maria Śliwińska, ICIMSS)
  • Judaica Europeana (Judaica Europeana – Digitizing Jewish Cultural Heritage in Europe, Edyta Kurek, Jewish Historical Institute, Warsaw)
  • APEX (Polish Archives’ Participation in the APEX Project, Anna Matejak, Head Office of State Archives, Warsaw)

Representatives of a number of big Polish institutions presented their current activities, including those related to Europeana:

  • National Institute of Museology and Collections Protection (National Institute of Museology and Collections Protection, Its Activities and Plans Concerning Museum Objects Digitization, Anna Kuśmidrowicz, Monika Jędralska)
  • National Audiovisual Institute (National Audiovisual Institute’s Digitization Support: Europeana Awareness Project Case Study, Jarosław Czuba)
  • The National Library of Poland (The National Library’s Participation in the Ongoing Europeana Projects, Katarzyna Ślaska)

Poznań Supecomputing and Networking Center prepared a presentation entitled The Digital Libraries Deferation: Supporting Institutions of Culture in Making Their Resources Available Online, Metadata Aggregation for Europeana (Marcin Werla, Justyna Walkowska), which is available here (in Polish). In the presentation we describe the role of the Polish Digital Libraries Federation in the Polish digital heritage resources environment and in the context of the Polish IT infrastracture for researchers and science. We also present our cooperation with Europeana, including a number of projects we have been involved or will be involved in near future.

The problems section was opened by a presentation prepared by prof. Folga-Januszewska, Problems Concerning the Delivery of Polish Museums Collections to Europeana. The representatives of smaller institutions were interested in obtaining information on digitization projects funding.

A very important issue was Europeana’s new Data Exchange Agreement. A set of materials and opinions on this subject in the context of the Polish law are available here: http://fbc.pionier.net.pl/pro/dla-dostawcow-danych/wspolpraca-z-zewnetrznymi-serwisami/wspolpraca-z-europeana/. The agreement, based on Creative Commons 0, is quite problematic in Polish law. It is not possible to waive copyright in Poland, and licenses can only be granted for enumerated fields of exploitation. The current ministerial directive is to send to Europeana only those metadata records or parts of records which are not copyrighted. This means, for example, excluding the conservation-restoration description of an object’s state. A very good news for all European readers is that the deputy director Katarzyna Ślaska announced that the National Library of Poland has decided to sign the agreement.

Another recurring subject was the need to translate (by a group of GLAM experts) the documentation of the most popular metadata description formats into Polish, so that they are unambiguous and used consistently by institutions.

The conference was open for general public, and there were a few people intested in publishing their private collections online. One of those people was Piotr Grzywacz from Tuchola, running the private Hunting Signals Museum.

Human Language Technology Days 2012

Human Language Technology Days was an international conference held on September 27-28 in Warsaw. It was organized by the Institute of Computer Science of the Polish Academy of Sciences and the University of Łódź as a part of CESAR project (Central and South-East European Resources) – one of several EU financed projects that aim at developing common platform of open tools and resources for European languages.

Presentations on the first day of the conference proved that although there are still some domains of language technology that need extensive research, there are some areas that are already quite mature and have been successfully introduced in well recognized commercial products such as iPhone Siri, Google Translate or IBM Watson. The last one is the supercomputer which won Jeopardy! quiz in 2011, overcoming two human masters of this game by giving accurate answers to the most questions. The advanced language processing and knowledge representation in this system was explained by Włodek Zadrożny – researcher in DeepQA project under which the Watson was developed. Now the system is being adjusted to be used as diagnosis assistance in medicine. Another interesting talk was given by Enrique Alfonseca, a researcher in Google. He presented a recently started service Knowledge Graph (which based on semantic database Freebase) and some new ideas that are currently under developement, such as automatic text summarization, which is useful in Google News to make a short description for many similar documents on the same topic.

The second day of the conference focused on the current language technology research in Poland. Wide range of natural language processing areas was introduced in 13 presentations, such as finite state automata construction and representation (used e.g. for dictionary compression), various services for linguisticians or speech recognition and question answering systems.

All presentations can be watched at http://www.hltdays.pl/video.

First Polish THATCamp

First Polish THATCamp will be organized on 24-25 October 2012 and will be held next to “Zwrot Cyfrowy w humanistyce Internet Nowe Media-Kultura 2.0” conference in Lublin. Event is organized by the Polish THATCamp coalition and will take place in headquarters of NN Theater on Old Town in Lublin (Grodzka 21). Poznań Supercomputing and Networking Center is an official partner of this event.

THATCamps (The Humanities And Technology Camp, http://www.thatcamp.org) is a meeting of people interested in new technologies in humanities, sociology, academic and artistic institutes activities (universities, galleries, archives, libraries and museums) organized all over the world. Participation in that kind of events is free.

Beginnings of THATCamp date back to 2008, when it was organized for the first time in USA by Center for History and New Media (CHNM) in George Mason University.

More information about event can be found here (in Polish).

Post authors: Bogna Wróż, Adam Dudczak

TPDL 2012: Theory and Practice of Digital Libraries

The Theory and Practice of Digital Libraries conference (known before as European Conference on Digital Libraries, ECDL) was held in Paphos (Cyprus) on September 23-27, 2012. PSNC presented two papers which can be found in the conference proceedings published as Lecture Notes in Computer Science (7489):

The former paper describes the prototype of the Virtual Transcription Laboratory created by PSNC as part of the SYNAT project. The work described in the paper included performing experiments whose goal was to train an OCR engine to automatically recognize text in digital scans of old documents (Polish texts printed between 16th and 17th century). The paper explains the rationale behind the prototype, its possibilities, and new development directions.

The latter paper concerns the issue of transforming data described using traditional metadata schemas (such as MARC 21 or Dublin Core) to an ontological formats, designed to exist in the Semantic Web and Linked Open Data environment. The paper describes requirements for languages expressing such mapping rules and the tools that implement them. It also shortly presents the jMet2Ont mapping tool.

Maa – Palaeokastro Museum

For us, the conference started on Sunday with a so-called doctoral consortium. A doctoral consortium is a meeting during which each PhD student is assigned a mentor who is obliged to read (before the meeting) an extended abstract of the planned PhD thesis, and to prepare a list of comments and questions. During the meeting, each student presents their work and results to date. The mentor is expected to facilitate discussion after the presentation. Such an event is very beneficial for the students who are offered a chance to learn experts’ opinion on the strong and weak points of the research, all in a safe and friendly environment (the meeting is closed to the public).

The main conference lasted three days, Monday to Wednesday.

An outstanding keynote speech was Cathy Marshall‘s (Microsoft Research) Whose content is it anyway? Social media, personal data, and the fate of our digital legacy. The author raised a number of interesting issues concerning the transience of digital media, the expectances of the general user, and how the situation has been changed by social media such as Twitter or Facebook. The talk was well prepared and full of surprising points, turnabouts, and inspiring conclusions.

The same subject appeared in a presentation by Hany M. SalahEldeen and Michael L Nelson of Old Dominion University. In their paper entitled Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost? the authors analyzed archival contents of social media corresponding to six important events in the las few years (including the Egiptian revolution, H1N1 pandemic, and Michael Jackson’s death). It turns out that after a year 11% of content linked from social portals is no longer available. One more year means another dozen percent of dead links. The full paper is available at the arXiv.org pages. This study was considered important also by traditional media, including the BBC.

A large number of papers were dedicated to machine learning applications. Digital libraries data seem to be a perfect field to apply and test machine learning algorithms. Another very interesting talk was Finding Quality Issues in SKOS Vocabularies (Christian Mader, Bernhard Haslhofer, Antoine Isaac).The authors defined a set of quality indicators and good practices for thesauri encoded in the SKOS format, and also created a qSKOS validating tool.

One of the most interesting events during the conference was the poster and demo session. The best demo contest was won by FrbrVis: An Information Visualization Approach to Presenting FRBR Work Families (Tanja Mercun, Maja Zumer, and Trond Aalberg).The authors, aware of the fact that more and more libraries and metadata aggregators are thinking about introducing the FRBR model, assigned themselves the task of designing an effective way of displaying FRBR data, so that the user could benefit from the model without feeling overwhelmed by it. They proposed four interface options, and then performed usability testing on a large number of users. Two graphical representations were picked as favourite, a concentric (called sun burst) and a hierarchical one. An unexpected conclusion was that graph-based representation (popular in Semantic Web world due to the very nature of RDF data), even though considered attractive at the first glance, proved difficult to use. A notable poster was presented in this session by the already metnioned here Hany M. SalahEldeen, who studied the temporal intention of users publishing links to online resources in social networks.

Thursday was the day of workshops. Conference participants were given the following choice:

The NKOS workshop was dedicated mainly to the ISO 25964 thesaurus standard and its relation to SKOS. Only the first part of the standard is ready as of now. The documents describing the standard are not available for free, but a number of materials can be downloaded from the ISO 25964 webpage, including the XML schema (xsd) definition.

The archives workshop included a Semantic Technologies & Ontologies session in which Vladimir Alexiev of Ontotext gave a very interesting presentation about CIDOC CRM Search Based on Fundamental Relations and OWLIM Rules. Mentioning the FORTH (Foundation for Research and Technology – Hellas) A New Framework for Querying Semantic Networks study, he presented a model of searching which translates the 82 classes and 142 properties of CIDOC CRM to a smaller number of so-called fundamental classes (e.g. Person, Place) and properties, making the search much easier. Ontotext is the producer of the RDF repository called OWLIM. The presentation also described a set of OWLIM reasoning rules producing the simplified model.

In the shortest of the workshops, on supporting users’ exploration (additional materials available here) the participants had a chance to listen to a talk by David Haskiya (Europeana Foundation) about Europeana’s existing and planned features supporting users’ exploration of resources. The workshop ended with an interesting panel discussion in which the most prominent subject were the needs and expectations of current and future users of digital libraries, especially in the context of the youngest generation (see the video below).

The conference was held in a beautiful and historically significant corner of Europe which unfortunately is very hard to reach from Poland. The last year’s location (Berlin) was easier to get to for most of the participants. Next year the conference is to be held in Malta.

Cypriot cuisine

Post authors: Adam Dudczak, Justyna Walkowska, Marcin Werla

New edition of DLF’s e-learning courses

Today we have started a new edition of Polish e-learning courses “Digital repositories for small memory institutions” and “Cooperation with Europeana“. Both courses are available in e-learning area of Digital Libraries Federation portal. Second one was recently updated because of the Europeana portal development and works in ACCESS IT Plus project and for the first time is available in Polish. This edition will last 3 months from today until first days of January 2013. After finishing each of those free courses, participants may receive electronic certificates (separate for each course). The course “Digital repositories for small memory institutions” was really successful earlier, we hope that time limitation will impact positively on curse participants motivation ;-).