All posts by Tomasz Parkoła

Excellent outputs of the Succeed project

Succeed project final technical review took place at the end of February in Alicante. Experts invited by the European Commission evaluated all project outcomes. The final assessment of the project was “excellent”, which means that “the project has fully achieved its objectives and technical goals for the period and has even exceeded expectations”.

Succeed supported take up and validation of research results applicable to digitization activities. The main achievements of the project include:

  • Improved IMPACT Interoperability Framework with 40 tools integrated, currently deployed at the IMPACT Center of Competence.
  • A searchable survey of more than 260 tools and resources useful in text digitization: http://succeed-project.eu/publications/available-tools/index-succeed
  • Over 20 evaluations of the tools useful in digitization. The evaluations were done in a real-life scenarios by 13 libraries from Europe, including: Wielkopolska Biblioteka Cyfrowa (Poland), Biblioteca Histórica General de Salamanca (Spain), Wroclaw University Library (Poland), University Library of Bratislava (Slovak Republic), National Library of Finland (Finland), Biblioteca de la Universidad de Granada (Spain), University Library of Leuven (Belgium), University Library of Antwerp (Belgium), University Library of Darmstadt (Germany), Biblioteca Virtual Miguel de Cervantes (Spain), British Library (United Kingdom), Bibliothèque nationale de France (France), Koninklijke Bibliotheek (Netherlands)
  • Two sets of recommendations for digitization projects, the first one related to standards and formats in digitization, the second one related to licensing schemes in digitization. Online summary can be found at http://www.digitisation.eu/training/recommendations-for-digitisation-projects/.
  • Two contests and two editions of Succeed awards (in the second edition of Succeed awards University Library of Wroclaw received Commendation of Merit).
  • Multiple events organized to disseminate and discuss the latest technology for digitization (three workshops, Digitisation Days, DATeCH conference as well as the ceremony of the Succeed awards).

Poznan Supercomputing and Networking Center participated in all work packages in the project, with the leading role in work package 4, that was related to normalization and standardization and produced two sets of Succeed recommendations.

Open Planets Foundation is becoming the Open Preservation Foundation

 

Last Tuesday Open Planets Foundation officially communicated change of its name to Open Preservation Foundation (please see a blog post on OPF website). Open Planets Foundation emerged from the the EU co-funded Planets project, therefore it had “Planets” part in the name. Currently OPF is working on a new strategy for 2015-2018 and it seems to be the right time to change the name as well. Fortunately, despite the fact that the name is changed, the acronym stays as it was – OPF. In the coming weeks OPF will release its new website, which hopefully will address the needs of Open Preservation Foundation members and community even better. 

Succeed survey on the use of licenses and innovative usages of digitised content

The Succeed project is undertaking a survey on the use of licenses in the field of digitisation and on innovative usages of digitised content. Please take a moment to fill it in: https://docs.google.com/forms/d/1LXEjvbgd6hzpY8blv1PWofGgWTm5HscN12oLhRTHPUA/viewform?usp=mail_form_link.

The aim of this survey is to gather information about current practices for licensing data, metadata and tools and on new trends in the exploitation of digitised content. This information will help us define recommendations for texts digitisation in Europe and worldwide. Fill in the survey and provide your e-mail if you wish to get an update on the results of this analysis: https://docs.google.com/forms/d/1LXEjvbgd6hzpY8blv1PWofGgWTm5HscN12oLhRTHPUA/viewform?usp=mail_form_link.

Please, respond before the end of June.

Succeed (http://succeed-project.eu) is a support action funded by the European Union. It promotes the take up and validation of research results in mass digitisation, with a focus on textual content.

Thank you very much,
The Succeed Team

Digital Library Conference

We invite you to participate in the Digital Library Conference which will be held in Slovakia from 1st to 3rd of April, 2014. It is already 15th edition of this conference which will discuss various activities in the area of Europeana, advanced ICT technologies for digital libraries and guidelines for digitization projects. During the conference two presentations will be given by PSNC representatives. One of them )”Cultural Heritage Institutions, Metadata Aggregators and The Cloud”) will present various approaches for applying cloud technologies to digital libraries. The other one (“Succeed with us: recommendations for mass digitization projects”) will present the results of Succeed project in the context of recommendations elaborated within one of its work packages (WP4). We warmly invite you to get familiar with the conference programme (http://www.schk.sk/wordpress/digital-library-english/preliminary-agenda/) and register (http://www.schk.sk/wordpress/digital-library-english/registration/).

SCAPE Project training event: Effective, Evidence-based Preservation Planning

We invite you all to participate in the SCAPE project training event related to digital preservation, especially in the context of preservation planning. The event will be held in Danemark on 13-14th of November 2013. The details can be found at: http://www.scape-project.eu/events/11/effective-evidence-based-preservation-planning

Succeed survey: standards and formats in digitisation

Succeed project is undertaking a survey on standards and formats used in digitisation of textual documents. Please take a moment to fill it in: https://docs.google.com/forms/d/16qvPbAZYUVmYz1MbeCGVcfxy0UXksr3-v5QQ7h3uETE/viewform
The aim of this survey is to gather information about your current practices for master files, delivery formats, metadata as well as OCR and emerging technologies. This information will then help us to define recommendations for texts digitisation in the European and worldwide context. Fill in the survey and provide your e-mail to get informed about results of our analysis: https://docs.google.com/forms/d/16qvPbAZYUVmYz1MbeCGVcfxy0UXksr3-v5QQ7h3uETE/viewform
Please respond before September, 30th.
Succeed (http://succeed-project.eu) is a support action funded by the European Union. It promotes the take up and validation of research results in mass digitisation, with a focus on textual content.

ICDAR2013 competitions

We invite all interested in document analysis and recognition to take part in the ICDA2013 competitions. There are two evaluation opportunities:

The main purpose of these two international evaluation opportunities is to record existing and emerging methods / systems and their performance in complete digitisation / recognition pipelines as well as in each step of a pipeline.

Succeed project kick-off meeting

One week ago (on 1st February 2013) a kick-off meeting of the Succeed project was held. The aim of this two-year project (2013-2014), supported by the EU (7th Framework Programme), is to foster the uptake of advanced tools and resources, being results of research and commercial activities. The main focus area is related to digitization, especially in the context of textual materials, which involves such institutions as libraries, museums and archives. Specialized OCR engines, dedicated linguistic resources or conversion services are only examples of the tools and resources facilitating digitisation process. These and other tools will be promoted by means of various events, including conferences, competitions and workshops. Selected tools will be used in real-life scenarios and validated in the existing digitization projects.

The project coordinator is University of Alicante. Other partners are:  National Library of Netherlands, Dutch Institute for Lexicology, Fraunhofer IAIS, Poznan Supercomputing and Networking Center, University of Salford, Foundation Biblioteca Virtual Miguel de Cervantes Saavedra, French National Library and British Library.

Poznan Supercomputing and Networking Center will mainly focus on:

  • support for the cultural heritage institutions in uptake of tools and resources in scope of the current digitization projects
  • coordination of works related to recommendations on formats, standards and licensing models in the context of digitization of textual materials.

5th Digital Encounters with Cultural Heritage

This year the conference was related to the topic of “Digital Representation of the Artefact – methods, reliability, sustainability.”. The conference took place in Wrocław (19-20 Nov 2012). During the conference multiple interesting presentations were performed, including those related to digital preservation, visualization and access to cultural heritage digital assets over the internet. Several standards and formats were presented (e.g. STARC metadata schema for cultural heritage documentation), 3D visualization approaches proposed, and finally cataloging techniques described supplemented with best practices. Conference attendees had also opportunity to learn about multiple tools developed by Poznań Supercomputing and Networking Center for the cultural heritage institutions, including dMuseion system for building digital museums, dLab system to manage digitization workflow, as well as dArceo focused on long term preservation of cultural heritage digital assets.

Comparison of FineReader and Tesseract OCR engines – report

Today we’ve published a report related to comparison of FineReader and Tesseract OCR engines. Both tools were tested on Polish historical documents (printed before 1850) coming from various Polish digital libraries. The comparison concerned both gothic and antiqua documents as well as noisy and clean images. In order to conduct the comparison both engines has been appropriately trained.

When comparing OCR results of both engines, there is no single winner that would outperform the second engine. However, we tried to point out differences between FineReader and Tesseract, their advantages and disadvantages. We invite you to read the report in order to get details of our approach and gained results.

All test cases are based on the ground truth data produced in the scope of the IMPACT project. The comparison itself was part of the pilot work conducted in course of the IMPACT project extension in the first half of 2012. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

The full report is available for download on the PSNC Digital Libraries Team website dedicated to the IMPACT project results.