One week ago (on 1st February 2013) a kick-off meeting of the Succeed project was held. The aim of this two-year project (2013-2014), supported by the EU (7th Framework Programme), is to foster the uptake of advanced tools and resources, being results of research and commercial activities. The main focus area is related to digitization, especially in the context of textual materials, which involves such institutions as libraries, museums and archives. Specialized OCR engines, dedicated linguistic resources or conversion services are only examples of the tools and resources facilitating digitisation process. These and other tools will be promoted by means of various events, including conferences, competitions and workshops. Selected tools will be used in real-life scenarios and validated in the existing digitization projects.
The project coordinator is University of Alicante. Other partners are: National Library of Netherlands, Dutch Institute for Lexicology, Fraunhofer IAIS, Poznan Supercomputing and Networking Center, University of Salford, Foundation Biblioteca Virtual Miguel de Cervantes Saavedra, French National Library and British Library.
Poznan Supercomputing and Networking Center will mainly focus on:
- support for the cultural heritage institutions in uptake of tools and resources in scope of the current digitization projects
- coordination of works related to recommendations on formats, standards and licensing models in the context of digitization of textual materials.
The Europeana Newspapers project published a survey to learn about available digitised newspapers in Europe. The survey is aimed at institutions that are not currently part of the Europeana Newspapers project. As it is mentioned on the Europeana Newspapers website the survey has three purposes:
- To get a clear idea of the extent of newspaper digitisation within Europe
- To record the relevant metadata in the Berlin State Library’s Catalogue of Serials (http://www.zeitschriftendatenbank.de/) and as part of the central index of newspapers being created by The European Library (http://www.theeuropeanlibrary.org/)
- To help locate 10 additional partners to join the project
The survey is available at http://www.surveymonkey.com/s/BQ28579 and it is open until 31st of July 2012.
We would like to invite you to attend the IMPACT event “Project Outcomes” on 26 June 2012, which will take place at the KB National Library of the Netherlands in The Hague. At this event, the IMPACT project outcomes will be presented by IMPACT staff, along with results of several pilots that have been conducted with some of the tools at IMPACT libraries in early 2012.
The IMPACT project (January 2008 – June 2012) is a European research project focused on innovating OCR software and language technology to improve the digitisation of historical printed text. IMPACT is led by the KB National Library of the Netherlands. Our group of partners includes several major European national libraries, universities, research centres and two private sector companies (ABBYY and IBM Haifa). IMPACT recently launched the IMPACT Centre of Competence (www.digitisation.eu), a productive network of experts in digitisation that will build on the research and development of partners from the IMPACT project and continue to improve access to text.
At the end of the project in June 2012, IMPACT is presenting the following results:
- The improved commercial OCR engine ABBYY FineReader 10 (the IMPACT FineReader)
- IBM’s Adaptive OCR engine with the CONCERT tool for OCR correction
- Computerlexica for 9 European languages and tools for lexicon building
- A digitisation framework for demonstrating and evaluating tools and results
- An invaluable dataset which can foster further research activities
- The Functional Extension Parser capable of decoding layout elements of books
- A postcorrection tool with text and error profiler
- Novel Approaches to preprocessing and OCR for future development
- The IMPACT Centre of Competence for digitisation
Attendance of this event is free of charge, but we kindly ask you to register in advance through http://impactocr.eventbrite.com/
. The programme will be made available through this page in the near future.
Below you can see short movie about cultural heritage in digital form, prepared by the PLATON TV Team.
Dziś w Warszawie miało miejsce osiemnaste seminarium z cyklu “Digitalizacja” zorganizowane przez Centrum Promocji Informatyki, a prowadzone przez dra Henryka Hollendera. Program tej edycji seminarium składał się z ośmiu wystąpień podzielonych na trzy bloki tematyczne. W pierwszym z nich dr Edyta Kotyńska przedstawiła analizę funkcjonowania polskich bibliotek cyfrowych przeprowadzoną pod kątem procesów czy też zadań, które są realizowane w przypadku takiego przedsięwzięcia oraz ich ustandaryzowania i udokumentowania. Poza wprowadzeniem teoretycznym w wystąpieniu przedstawione zostały również wyniki ankiety, którą autorka przeprowadziła wśród instytucji tworzących polskie biblioteki cyfrowe.
Continue reading 18th “Digitisation” Seminar
“Digital repositories for small memory institutions” – this free e-learning course has been gradually released since June 2011 within the portal of Digital Libraries Federation (DLF).
It contains information about organizing and leading the digitisation of different kinds of documents. One of its main aims is to help to create high-quality digital libraries and enable their promotion by sharing information about available resources with services like Europeana.
This course is first of all (but not only) directed to people working in small memory institutions such as public libraries and regional museums. It can also be a source of knowledge for students of specialisations such as librarianship and scientific information which are naturally related to branches of digitisation of resources and creation of digital libraries. Participants will have an opportunity to familiarise themselves with a series of instructions discussing the way of implementation of typical digital librarian tasks, step by step. These are, i.e.: creation of descriptions for digitized objects; preparation of digital contents for web publication or promotion of digital objects on the Internet.
In the end of each subject, there is a quiz allowing participants to check their knowledge. Eventually, the course will contain several dozen modules grouped in 9 subjects;
By the end of the year 2011 participation in the course will not be encompassed with any strict time frames. Participants may choose any accessible subject, familiarize with it and check their knowledge in a quiz. According to the plan systematic training courses will be organized since 2012.
Information about the way of subscribing to the course can be found on Digital Libraries Federation portal: http://fbc.pionier.net.pl/elearning/.
At the end of October 2010 e-bUW made available first title digitized in frame of the “NUKAT – autostrada informacji cyfrowej” project. According to e-bUW it is “Nowy Pamiętnik Warszawski : [dziennik historyczny, polityczny, tudzież nauk i umiejętności]”, periodical published in Prussian Warsaw by Franciszek Ksawery Dmochowski, the man who contributed to literature field, participated in Kościuszko Uprising. The periodical has literary and politico-social character and for five year of its existence (1801-1805) had high influence on intellectual life of the whole Prussian partition. The periodical can be found at http://ebuw.uw.edu.pl/dlibra/publication?id=5512&tab=3
„NUKAT – autostrada informacji cyfrowej” project is financed by by EU Innovative Economy Programme. Project partners are BUW, UCI UMK in Toruń, faculty libraries of UW and libraries which catalogues will be combines in NUKAT database.
According to the Agence France-Presse, researchers from Tokyo’s Graduate School of Information Science and Technology have created a prototype system that allows to scan books while flipping pages. In other words, a person who wants to scan an entire book, only has to riffle. Using the developed prototype, a book having 170 pages can be scanned in 60 seconds.
“Book-flipping scanning system” components includes a camera, which is able to take 500 images per second, infrared lasers and computer. Distortions caused by curvature of the pages during the flipping are measured with infrared beams. Subsequently, the parties are “flattened” programmatically by using the three-dimensional model.
Despite the fact that this technology exceeds the potential of the application related only to the scanning of books, “book-flipping scanning” can accelerate the process of digitization of the printed cultural heritage. For example, to scan 110 200 objects (as much as it is for today in the largest Polish digital library – Digital Library of Wielkopolska), assuming that the average time for the “raw” scan for one digital object is 60 seconds, you should spend approximately 77 full days.
Unfortunately this system is suitable mainly for digitizing books in good condition. Copies in poor condition may be destroyed during the paging. Other types of publications, such as postcards or newspapers, cannot be simply riffled.
The team plans to finish work on the final version of the prototype of the “world’s fastest scanning system” in two years. Materials that present technology in action, can be found at http://www.k2.tu-tokyo.ac.jp/vision/BookFlipScan/index-e.html
40 years ago Michael Hart started Project Gutenberg. It is the right time to remind that books digitization did not begin in the past several years but much earlier. Hart has informed on his site that total number of titles in Project Gutenberg passed 37500 last August and before 40th anniversary it should pass 40000. Average of 1000 books a year over 40 years does not sound impressive but as the project author assures it should get to 5000 this year. Nowadays project site http://gutenberg.org gives away about 100000 eBooks per day which means 3 million a month or 36 million a year.
Michael Hart notices the progress made in USB flash drives since 2000, when they had 1000 less capacity and were 3 times more expensive. Portable “pocket drives” can hold up to 2,5 million books in .zip format even though according to their size we should not think of them as “pocket-sized” and they are not heavier than a book.
If we think for a moment of year 2000 it turns out that Google wouldn’t announce inventing eBooks for 5 more years and Project Gutenberg would still need 2 3/4 more years to pass 10000 titles. There are many changes to do by 2020, for the next decade. The author suggests that till then petabytes of data would be got and all findable public domain books would be put into at least some electronic formats.
At the end, the author worries that it is likely the rules would change to stop public domain and protect copyright making “The Digital Divide” problem deeper.
The complete post by Michael Hart is available here.
According to e-biblioteka Uniwersytetu Warszawskiego (e-bUW) portal, Warsaw University Library plans to digitize and publish on-line on e-bUW around 40 thousands publications until 2012. Documents which will be scanned are part of precious collection of 19th century periodicals. Readers will be able to browse various titles including “Kurier Warszawski”, “Gazeta Warszawska” and “Korespondent Warszawski”. The full list of titles that are to be scanned can be found here.
Scanning of those precious periodicals is performed in frame of the NUKAT – AUTOSTRADA INFORMACJI CYFROWEJ project, which is co-financed by Innowacyjna Gospodarka programme. We wish e-bUW successful project and many new publication on-line.
Moreover in frame of this project, KARO system will gain possibility to search metadata of all digital libraries connected to Digital Libraries Federation portal. Integration will be based on OpenSearch API implemented in DLF.