Open Repositories 2012 conference and corresponding workshops were held in Edinburgh (Scotland) on 9-13 July 2012. Rich conference programme and workshops, as well as a huge number of participants confirms the importance of the Open Repositories series.
Workshops consisted of several sessions. Especially interesting in the context of digital libraries were those related to data and text mining as well as long-term preservation. Data mining workshops were mainly related to search engines, semantic search, metadata and data aggregation, information extraction from texts as well as workflow systems related to texts. Various topics and systems has been presented, including:
- Evidence Finder (http://labs.ukpmc.ac.uk), which allows searching 2M of documents and 71M of sentences.
- MEDIE (http://www.nactem.ac.uk/medie/), which allows semantic searching of biomedical information (it is based on MEDLINE, http://www.nlm.nih.gov/pubs/factsheets/medline.html).
- Argo (www.nactem.ac.uk/Argo), which allows creating workflows related to texts analysis and processing.
- HIVE and extension HIVE-ES ( https://www.nescent.org/sites/hive/) which makes it easy to create metadata and vocabularies.
- CORE (http://core-project.kmi.open.ac.uk/), which allows searching both data and metadata from various documents, it includes possibility to search for content based on harvested metadata.
During the development of the above systems various tools has been utilised, e.g. TextCat (http://odur.let.rug.nl/vannoord/TextCat/), U-Compare (http://u-compare.org/), OSCAR4 (https://bitbucket.org/wwmm/oscar4/wiki/Home), ANTRL (http://www.antlr.org/), MAUI (http://code.google.com/p/maui-indexer/), KEA (http://www.nzdl.org/Kea/), Sesame (http://www.openrdf.org/index.jsp), H2 (http://www.h2database.com/).
Workshops related to long-term preservation were focused mainly on Trident system and its possibilities.During the workshops most important aspects of long-term preservation has been presented, including identification of files that should be migrated or normalised as well as tools that can be used to create long-term preservation workflow (Kepler (https://kepler-project.org/), Taverna (http://www.taverna.org.uk/), Ptolemy II (http://ptolemy.eecs.berkeley.edu/ptolemyII/), Triana (http://www.trianacode.org/)).
The conference itself covered three days. Various topics has been raised and a number of interesting articles presented, e.g.:
- “Build to scale” – presentation that shows how to build search system based on ApacheSolr, for 250M of records and providing results in 2 or less seconds.
- “Inter-repository Linking of Research Objects with Webtracks” – presentation which describes InteRCom protocol for exchanging semantic information between repositories.
- “ResourceSync: Web-based Resource Synchronization” – presentation of the protocol for synchronisation of data. It is based on experienced from OAI-PMH and OAI-ORE protocols.
- “Griffith’s Research Data Evolution Journey: Enabling data capture, management, aggregation, discovery and reuse.” – description of research infrastructure of the Griffith University, including semantic tools such as VIVO (http://sourceforge.net/apps/mediawiki/vivo/) and VITRO (http://vitro.mannlib.cornell.edu/).
- “Multivio, a flexible solution for in-browser access to digital content” – presentation which describes multi purpose viewer for PDF, GIF, JPEG and PNG that can understand DublinCore, MARC21, MODS and METS.
- “ORCID update and why you should use ORCIDs in your repository” – presentation that shows the current status of the system for researchers identification called ORCID (http://about.orcid.org/).
- “Digital Preservation Network, Saving the Scholarly Record Together” – presentation related to the initiative among several institutions in the USA focused on building heterogeneous system for long-term preservation (http://d-p-n.org/).
During the conference representative of Poznań Supercomputing and Networking Center presented the article entitled “dArceo services: advancing long-term preservation” and described long-term preservation services, focused on texts, images and a/v content, dedicated for Polish scientific and cultural heritage institutions. We invite you to visit OR2012 (http://or2012.ed.ac.uk/) website and view available presentations.