All posts by Marcin Heliński

Project Gutenberg 40th anniversary

40 years ago Michael Hart started Project Gutenberg. It is the right time to remind that books digitization did not begin in the past several years but much earlier. Hart has informed on his site that total number of titles in Project Gutenberg passed 37500 last August and before 40th anniversary it should pass 40000. Average of 1000 books a year over 40 years does not sound impressive but as the project author assures it should get to 5000 this year. Nowadays project site http://gutenberg.org gives away about 100000 eBooks per day which means 3 million a month or 36 million a year.

Michael Hart notices the progress made in USB flash drives since 2000, when they had 1000 less capacity and were 3 times more expensive. Portable “pocket drives” can hold up to 2,5 million books in .zip format even though according to their size we should not think of them as “pocket-sized” and they are not heavier than a book.

If we think for a moment of year 2000 it turns out that Google wouldn’t announce inventing eBooks for 5 more years and Project Gutenberg would still need 2 3/4 more years to pass 10000 titles. There are many changes to do by 2020, for the next decade. The author suggests that till then petabytes of data would be got and all findable public domain books would be put into at least some electronic formats.

At the end, the author worries that it is likely the rules would change to stop public domain and protect copyright making “The Digital Divide” problem deeper.

The complete post by Michael Hart is available here.

Jagiellonian Digital Library has started

On July 19th  2010 the Jagiellonian Digital Library based on dLibra software officially started. This digital library is partially sponsored by the European Community from the funds of European Regional Development Fund for the Operational Programme ‘Infrastructure and Environment’ for years 2007 – 2013, Priority 11. “Culture and Cultural Heritage” Action 11.1. “Protecting and preserving cultural heritage of supra-regional importance”.

Looking at first publications it is worth noticing two different ways of maps presentation. One of them uses Zoomify, which has been used before in Kujawsko-Pomorska Biblioteka Cyfrowa. The example of the map presented this way is Mappa szczegulna [!] Woiewodztwa Płockiego i Ziemi Dobrzynskiey… The second way of presentation is based on The Google Maps Image Cutter mechanism developed by Centre for Advanced Spatial Analysis – University College London. The example can be Karta pocztowa Królestwa Polskiego przez K. Widulińskeigo Sekretarza Jeneral(nego) Poczt wydana na r. 1827.

Viewing maps with any of these methods does not require installing any additional software. Zoomify based on flash technology seems to be a bit slower than Google solution using JavaScript. However, two different publications were used to compare both solutions and this opinion might not be objective. Beside that, the functionality of both Zoomify and The Google Maps Image Cutter is similar. More advanced versions of Zoomify are payable but its free basic version supplies enough functionality to successfully present maps on digital library pages. The Google Maps Image Cutter is all free.

We encourage digital libraries to use both solutions. It should increase the attractiveness of the presented maps or drawings.

Europeana publishes White Paper #1

On June 1, 2010 Europeana announced its “White Paper 1 Knowledge = Information in Context: on the Importance of Semantic Contextualisation in Europeana”. Europeana’s first White Paper looks at the key role linked data will play in Europeana’s development and in helping Europe’s citizens make connections between existing knowledge to achieve new cultural and scientific developments.

Linked data gives machines the ability to make associations and put search terms into context. Without it, Europeana could be seen as a simple collection of digital objects. With linked data, the potential is far greater, as the author of the white paper, Prof. Stefan Gradmann, explains.

Professor Gradmann used an example with word “Paris” to show how the search results may lead to items in the Louvre located in Paris, where it is also possible to see paintings portreting Paris, a Greek prince who abducted Helen of Troy. From there links lead to the topics associated with the mythical Apple of Discord and further to the forbidden apple eaten by Adam and Eve.

The example presented in the White Paper shows how linked data will allow Europeana to propose connections between millions of items. These connections can then be used to generate new ideas and knowledge, on a scale not possible before.

Full text of the White Paper is available at Europeana.

Europeana published the Public Domain Charter

Europeana has just published the Public Domanin Charter. Europeana, as the Europe’s digital library, museum and archive belongs to the public and should represent public interest. Hence there is a need to start a discussion on this topic. The Charter is a declaration of the principles for a healthy Public Domain containing material from which society derives knowledge and fashions new cultural works. The Charter accents the fact that digitisation of the Public Domain content does not create new rights over it. The works that are in the Public Domain in analogue form continue to be the part of it once their digital form has been created.

You can find the full text of the Public Domain Charter at the Europeana

Europeana encourages all who want to discuss the Charter to send an e-mail to info@europeana.eu

Lucene 3.0.0 available

Lucene 3.0.0 was released on 25 November 2009. Lucene is an open Java framework that is used for indexing and searching text in dLibra software. Lucene 3.0.0 is the first release with Java 5 as a minimum requirement. The API was cleaned up to make use of Java 5’s advantages. You can find many optimizations and new features in the latest Lucene, though it is not fully compatible with earlier releases due to many changes. The most important are near real-time search capabilities added to IndexWriter, new query types, per segment searching and caching, improvements in wildcard searching, improved Unicode support, high-performance handling of numeric fields and many more. Detailed information on changes in Lucene framework is available here.

We are planning to use Lucene 3.0.0 in one of the future versions of dLibra software. It will help us improve indexing and searching performance.

Lucene 3.0.0 was released on 25 November 2009. Lucene is an open Java framework that is used for indexing and searching text in dLibra software. Lucene 3.0.0 is the first release with Java 5 as a minimum requirement. The API was cleaned up to make use of Java 5’s advantages. Unfortunately it is not fully compatible with earlier releases due to many optimizations and new features. The most important are near real-time search capabilities added to IndexWriter, new query types, per segment searching and caching, improvements in wildcard searching, improved Unicode support, high-performance handling of numeric fields and much more. Detailed information on changes in Lucene framework is available here.

We are planning to use Lucene 3.0.0 in one of the future versions of dLibra software. It will help us improve indexing and searching performance.