New functions of the Virtual Transcription Laboratory portal

Source: http://pl.wikipedia.org/wiki/Plik:Escribano.jpg

We can gladly inform about the release of the newest version of the Virtual Transcription Laboratory (http://wlt.synat.pcss.pl).

This version contains new functions and correction of errors reported by users. Among most important changes there are:

  • the possibility to export the results of work in a ePUB file,
  • share project only with chosen VTL users,
  • support for scans in TIFF format (after uploading they will be automatically converted into the PNG/300DPI format),
  • changes in transcription editor dialogue,
  • a number of corrections in outcome hOCR files,

Full list of changes with screens can be found on our wiki:
https://confluence.man.poznan.pl/community/display/WLT/Note+about+release+from+2013-03-25

ACCESS IT plus course in Bosnia and Herzegovina

logo

Another good news from Balkans ;-). “Digital Repositories for small memory institutions” and “Cooperation with Europeana” e-learning courses developed by PSNC in the AccessIT and Access IT plus projects (funded under the EU Culture Programme) are now available in Serbian, thanks to the joint effort of National and University Library of Republica Srpska (NULRS, our Bosnian partner) and Belgrade City Library (Serbian partner of ACCESS IT). A few days ago NULRS has just released both courses in their e-learning portal.

Exactly the same as in Poland, Greece, Croatia, Serbia and Turkey, courses are free for anyone who is interested in digitisation, digital libraries and Europeana. Congratulations to our colleagues from Bosnia and Herzegovina, you did a great job! 😉

ICDAR2013 competitions

We invite all interested in document analysis and recognition to take part in the ICDA2013 competitions. There are two evaluation opportunities:

The main purpose of these two international evaluation opportunities is to record existing and emerging methods / systems and their performance in complete digitisation / recognition pipelines as well as in each step of a pipeline.

Next stage in beta testing of VTL

circle1
On Friday 15th of February 2013, we have released a number of new functions and improvements in the Virtual Transcription Laboratory (http://wlt.synat.pcss.pl) portal.

These are the most prominent ones:

  • noticeable improvement of capability and stability of whole portal activities,
  • change in the way of transcription edition history is stored,
  • import of existing DjVu publication on the basis of the OAI identifier (this feature is described in an end-user documentation),
  • batch OCR for all files in the project,
  • notifications showing whether changed performed in transcription editor were saved,
  • many minor improvements and bug fixes reported by users,
  • the first version of documentation for users has been published (http://confluence.man.poznan.pl/community/display/WLT).

A few months passed since the BETA release of VTL. We would like to thank everyone for their feedback ;-). After the initial release it became clear that serious changes must be done in the portal engine. The most important was the change in the way transcription is represented and stored in database. This was a very significant thing but it resulted in a significant performance and stability improvement.

In the near future two new functions will be added:

  • export of project results in EPUB format,
  • the possibility to upload TIFF files into the project (they will be automatically converted to  PNG file in 300 DPI).

Authors of the post: Bogna Wróż, Adam Dudczak

Summary of 2nd edition of FBC e-learning courses

The ongoing edition of Polish Digital Libraries Federation’s e-learning courses was available since October 2012 and it has ended along with the end of January. Courses “Cooperation with Europeana” and “Digital repositories for small memory institutions” met with quite big interest. The Europeana course attracted exactly 100 participants and the second one concerning subjects related to digitisation and digital libraries had 128 participants registered. Every participant could gain – apart from knowledge,
Keene High School (old) Graduating Class of 1875, Keene, New Hampshire
also a certificate confirming the participation in the course and the given note. Passing the tests was a condition for which a participant could gain a certificate; together, courses include 11 tests and 255 questions (57 in first and 198 in second). We certificated 78 people: 27 for Europeana course participants and 51 for digital repositories course participants (31 in this 2. edition). W congratulate everybody who has managed to finish all tests ;-).

People who have finished the course were asked to grade the course. Each subject of the course was rated in 1 to 5 scale, we received also personal comments about courses content.

As for the “Cooperation with Europeana” course, the first topic “Overview of Europeana” obtained the average note of 4.71 and the second “Technical aspects of Europeana” has 4.24. Some critical notes concerned mostly the technical aspect mentioned in the course, participants asked for more practical examples. We promise to amend 😉 Nonetheless most notes were positive, participants wrote that courses improved their skills which have a use in everyday work.

As far as the course “Digital repositories for small memory institutions” is concerned, the highest note was given to the subject “Introduction to digitisation” (4.77) and “Building digital collections” got the lowest one (4.57). There was a critical note about standardizing the grading of quizes. A few participants complained about minuteness of instructions related to usage of particular software packages but this was also one of biggest advantages mentioned by other graduates.

With the end of the second edition of the courses we will close the enrollment for courses till the late March. We are planning to start another edition of the courses at that time.

Authors: Bogna Wróż, Adam Dudczak

ACCESS IT plus course in Croatian

View on Rijeka

Digital Repositories for small memory institutions” and “Cooperation with Europeana” e-learning courses developed by PSNC under the AccessIT and Access IT plus projects (funded under the EU Culture Programme) were released in Croatian versions by City Library in Rijeka. 

Exactly the same as in Poland, Greece, Serbia and Turkey, courses are free for anyone who is interested in digitisation, digital libraries and Europeana. With a maximum duration of 3 months, upon completion of a course there is the possibility of gaining certification which is provided in cooperation with CSSU (Center for permanent education of librarians) at the National and University Library in Zagreb.

More information can be found at Access It Plus project website.

Succeed project kick-off meeting

One week ago (on 1st February 2013) a kick-off meeting of the Succeed project was held. The aim of this two-year project (2013-2014), supported by the EU (7th Framework Programme), is to foster the uptake of advanced tools and resources, being results of research and commercial activities. The main focus area is related to digitization, especially in the context of textual materials, which involves such institutions as libraries, museums and archives. Specialized OCR engines, dedicated linguistic resources or conversion services are only examples of the tools and resources facilitating digitisation process. These and other tools will be promoted by means of various events, including conferences, competitions and workshops. Selected tools will be used in real-life scenarios and validated in the existing digitization projects.

The project coordinator is University of Alicante. Other partners are:  National Library of Netherlands, Dutch Institute for Lexicology, Fraunhofer IAIS, Poznan Supercomputing and Networking Center, University of Salford, Foundation Biblioteca Virtual Miguel de Cervantes Saavedra, French National Library and British Library.

Poznan Supercomputing and Networking Center will mainly focus on:

  • support for the cultural heritage institutions in uptake of tools and resources in scope of the current digitization projects
  • coordination of works related to recommendations on formats, standards and licensing models in the context of digitization of textual materials.

Simple Visualisation of Small RDF Graphs

One of the things that we have created in the SYNAT project is an RDF knowledge base that contains information about resources from:

  • the Digital Libraries Federation,
  • NUKAT (a union catalogue of Polish academic libraries),
  • the National Museum in Warsaw,
  • and will soon be supplemented with data from the National Museum in Kraków.

The metadata is obtained from the sources in their original metadata format (which means: PLMET, MARC 21, Mona system format, CDWA-Lite), and is transformed to the target format (CIDOC CRM / FRBRoo) using the jMet2Ont mapper and a set of rules expressed in XML.

The RDF data can be queried using the SPARQL language, viewed with a browser at the Linked Open Data endpoint (well, it is going to be open soon) and is processed by the SYNAT portal. Still, sometimes we want to look at the data after mapping to understand how a record has been mapped, especially when we detect that the resulting graph is disconnected.

There are a few RDF visualisation tools, but not all of them produce satisfying results. Here is how we visualise the graphs:

  1. The graph is serialized (a result of the mapping process) in the RDF/XML form. An example can be found here.
  2. A very simple XSLT transformation (available here) is run against the RDF/XML file, producing a PlantUML source code. The transformation can be applied with a tool like Saxon.
  3. PlantUML is a simple tool to create UML diagrams based on text files. Here is the automatically generated PlantUML code.
  4. Finally, PlantUML is used to generate a diagram like this:
PlantUML-generated (class) diagram showing relation between RDF/XML entities

So, to go from a test.xml file to the test.png diagram:

Transform.exe -xsl:RDF2PUML.xsl -s:test.xml -o:test.puml

java -jar plantuml.jar  test.puml

We thought this might me useful 🙂