Today we’ve published a report related to comparison of FineReader and Tesseract OCR engines. Both tools were tested on Polish historical documents (printed before 1850) coming from various Polish digital libraries. The comparison concerned both gothic and antiqua documents as well as noisy and clean images. In order to conduct the comparison both engines has been appropriately trained.
When comparing OCR results of both engines, there is no single winner that would outperform the second engine. However, we tried to point out differences between FineReader and Tesseract, their advantages and disadvantages. We invite you to read the report in order to get details of our approach and gained results.
All test cases are based on the ground truth data produced in the scope of the IMPACT project. The comparison itself was part of the pilot work conducted in course of the IMPACT project extension in the first half of 2012. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
The full report is available for download on the PSNC Digital Libraries Team website dedicated to the IMPACT project results.