Today we have made available new textual resources, coming from the IMPACT project Polish Digital Libraries dataset. The new resources include 478 pages of full text with the details on coordinates reaching the region, line, word and glyph levels. This is an important textual material in the context of research, especially in scope of the optical character recognition algorithms. The quality of the developed resources is approx. 99.95%. All of them are available for download at http://dl.psnc.pl/activities/projekty/impact/results/.
These resources has been developed for the needs of the pilot work done by Poznań Supecomputing and Networking Center in course of the IMPACT project. The pilot was related to the comparison of two well-known OCR enignes: FineReader 10 CE and Tesseract 3.0.