Tag Archives: WBC

Europeana OpenSearch API is now publicly available with PSNC as one of pilot partners!

On the 28th of February Europeana published its API, compatible with the OpenSearch standard. With such API external applications and Internet services can search in the information aggregated in Europeana and use such search results outside of it. Work on the Europeana API was carried out for the last few months, and in the second half of 2010, after the Europeana Open Culture conference, The Europeana Foundation began cooperation with institutions interested in testing the API and in the development of pilot applications. Poznan Supercomputing and Networking Center was one of such institutions.

As a part of the Europeana API pilot program, PSNC developed two components:

More information about the Europeana API can be found on the Europeana website. Besides technical information there is also a gallery of applications using the API. Information about components developed by PSNC is also there.

We encourage all interested in the use of the Europeana OpenSearch API to give it a try and recommend the post in which we describe our experiences with it. We would also like to remind you, that the Digital Libraries Federation also has a similar interface. It is publicly available and its documentation is published on the DLF website.

 

Europeana API – Example of use in Polish digital libraries

Introduction

After the Europeana Open Culture Conference in 2010 we started cooperation with Europeana on a prototype use of the Europeana API in some of our services. After some initial discussions we decided to develop two widgets based on the API: one for the Polish Digital Libraries Federation (DLF) and the other one for the Digital Library of Wielkopolska (DLW).

DLF is a Polish metadata aggregator which harvests information from around 60 digital libraries. Currently it provides information about more than 550,000 objects. Of course, the information is contributed to Europeana. DLW on the other hand is the largest Polish digital library. It holds around 130,000 digital objects, mostly national and local cultural heritage from tenths of memory institutions from the Wielkopolska (Greater Poland) region. DLW contributes the metadata to the Digital Libraries Federation.

We wanted to use the Europeana API to provide easier access to European cultural heritage artifacts for users of Polish digital libraries without forcing the users to change their usual workflow. Therefore we have made some initial assumptions about the workflow. We assumed that search in aggregated metadata is the main DLF functionality for the majority of end users. Each displayed DLF search result contains only a few elements of the harvested metadata and redirects the user to full information in the source digital library (e.g. in DLW). The functionality left to the source digital library is to display the full metadata record and to give access to the content of the digital object.

Finally we came up with the idea to achieve our aim by enrichment of the information presented to DLF and/or DLW users with links to additional objects available via Europeana, which can be practically done by putting widgets based on Europeana API on the DLF search results page and DLW full metadata record page.

Preparations

Further analysis was focused on technical aspects. Europeana API is an Open Search protocol interface. To get some results, an input query is needed. We assumed that for the DLF widget the input will be the query submitted to the DLF by the user, and for the DLW the query will be built from selected elements of a particular metadata record displayed by the user.

As the metadata from DLF is visible in Europeana, we had to face the fact that DLF database is updated each night and DLF to Europeana data transfer in practice takes place every three months. As a result, DLF is a more up-to-date source of information for Polish digital libraries metadata search. On the other hand, Europeana of course contains a lot more information than the Federation. The final decision was to join the data from Europeana and the DLF at runtime: when preparing the final set of information to be shown to the user, the results from Europeana should not include data from DLF, as this data should be taken directly from DLF. Another issue was related to cross-language searching. We decided that the subject element from the DLW metadata records will be translated with Google translate to English, Spanish, German and French before it is sent by the widget as a query to the Europeana API.

Implementation

The final result of our technical discussions was the architecture presented in the image below.

Final architecture of Europeana API usage

Figure 1. Final architecture of Europeana API usage by the Digital Libraries Federation and the Digital Library of Wielkopolska.

As you can see in Figure 1, we try to integrate the functionalities provided by three services – Europeana, Digital Libraries Federation and Digital Library of Wielkopolska:

  • Europeana exposes Open Search API.
  • Digital Libraries Federation uses Europeana API to provide the search results from Europeana together with the search results from the Federation. Those results are presented together in the Digital Libraries Federation website, as you can see in Figure 2.

    Figure 2. DLF’s search results page with results from Europeana [source].

    Moreover, the Federation exposes two Open Search APIs for external services. One of those is the Federation’s API and the second one is a proxy to Europeana API dedicated for use by Polish digital libraries. The main reason for which the Europeana API has a proxy in the DLF is the ease of development and use of the widget prepared by the Federation for Polish digital libraries.

  • This widget is embedded in websites presenting metadata records of digital objects published by the Digital Library of Wielkopolska. The widget extracts parts of the metadata, translates them on the browser side  with the Google Translate service and sends the translated metadata together with the OAI Identifier of the digital object to both Open Search APIs exposed by the Federation. After responses are processed by the widget, the search results are presented as a part of the website with the digital object metadata. You can see example of such results in the left column in Figure 3.

Figure 3. Search results from Europeana and DLF on the Digital Libraries of Wielkopolska site [source].

If you would like to see some live examples, you can try the following links:

The design and implementation of both widgets and other necessary code took about 10 person-days of a skilled programmer. The API-based widgets were first deployed in the test environment and consulted with Europeana Team which was also responsible for providing technical information about the API. Then on the 22nd/23rd of December 2010 widgets were deployed in the production environment.

Impact

While working on the design, implementation and deployment of the widgets based on the Europeana API, we were hoping to contribute to the following (expected) user flow (see Figure 4):

Figure 4. Expected user flow after widgets’ deployment.

We assumed that the widgets will attract its users to visit Europeana. In return the increase of the group of Europeana users should finally cause also an increase of DLF and DLW users. As widgets were deployed just two months ago, it is not possible to already observe the increased traffic. Nevertheless, at the beginning of March we have contacted the Europeana team and asked for some statistics regarding the traffic coming to Europeana from Poland and about the traffic attracted to Europeana by the Digital Libraries Federation and the Digital Library of Wielkopolska. For the purpose of this article, we have compared those statistics with our own data. All statistics were gathered with Google Analytics.

First, let’s try to find out whether the widgets were useful for end users. Both DLF and DLW get about 70,000 visits each month. Figure 5 contains a comparison of the percentage share of three types of such visits for DLF (blue bars) and DLW (red bars). The middle pair of bars (marked as 100%) represents visits during which a visitor displayed a page with the widgets installed. Those pages (DLF: search results page; DLW: metadata record page) are so crucial to the functionality provided by the service that we assumed that any visit skipping those pages must have been somehow accidental. The pair of bars on the left shows the number of all visits. As you can see that around one third of all visits are not reaching the crucial functionality of the website. This is for sure something that could be improved. But coming back to Europeana API widgets – the last pair of bars in Figure 5 shows the percentage of users who reached the page with widget and decided to click on the digital object’s link provided by Europeana via the API. As you can see, 7.5% of the Federation users went to Europeana and 0.67% of DLW did the same.

Figure 5. Comparison of user visits in DLF and DLW.

At this stage it is hard to estimate whether this is satisfactory, but when we think of the additional links as some kind of targeted advertisement placed on a cultural heritage website, the results may be seen as quite good.

Another interesting analysis is the comparison of the traffic coming out to Europeana from those two services with the traffic coming the other way around. This is shown in Figure 6. First, let us take a look at traffic coming out to Europeana (blue bars). As it was mentioned earlier, the widgets were deployed in the second half of December. In November 2010 there was no traffic coming to Europeana at all. In December we sent around 1, 000 visits, and in January 2011 it was almost 3,800. This again confirms that the widgets and data provided by Europeana were found useful by our users.

Figure 6. Comparison of traffic to and from Europeana over time.

At this moment it is hard to say, that the number of users coming from Europeana (red bars) to DLF and DLW changed after the widgets were deployed. The number of users in November 2010 and January 2011 is quite similar. The December 2010 traffic is significantly smaller, but this is also caused by the Christmas break. Another interesting number is the number of new users coming from Europeana each month. New users are users who have never before visited the service. It seems that Europeana provides us a constant flow of new users – around 400 each month.

The last statistics that we would like to present is a simple comparison of traffic sources for the Digital Library of Wielkopolska (the largest Polish digital library). Traffic sources are ways in which users reach our service.

Traffic source Visits %
(direct) 27 936 38,34%
google 14 925 20,48%
fbc.pionier.net.pl 8 286 11,37%
europeana.eu 3 143 4,31%
wtg-gniazdo.org 2 391 3,28%
pl.wikipedia.org 2 152 2,95%
genealodzy.pl 1 589 2,18%

The table above shows all traffic sources for the DLW in January 2011, which generated at least 1% of overall January traffic. Two first results are quite obvious – direct access (for example a bookmark in the web browser) and access from Google search results. But position 3 and 4 are very interesting. The 3rd place is the national aggregator – the Federation, and the 4th is the European aggregator – Europeana. The last three positions are taken by two genealogical services and Polish Wikipedia. These statistics show that the model of multilevel aggregation described in the Europeana Content Strategy is very good at attracting users to the participating digital libraries.

Summary

In this article we have described our experience with Europeana API. It offers very interesting possibilities and it is easy to use (as it is based on the well-known Open Search standard). We hope that there will be more such mechanisms in Europeana in the nearest future, as they give the possibility to move the knowledge about European cultural heritage from metadata aggregation to services integration. And this seems to be the direction of evolution desired by the users.

Presentation for this article can be found at http://dl.psnc.pl/biblioteka/dlibra/publication/349/content

Digital Library of Wielkopolska on the TERENA Storage Task Force meeting

On the 9-10 of September in Poznań a 7th meeting of the TERENA Storage Task Force was held. During this meeting the Digital Library of Wielkopolska was presented as an example of a network service related to the scope of the Task Force activities.

The meeting agenda can be found on the TERENA website, and a short note (in Polish) together with some pictures is availableon the PSNC website. Slides from the Digital Library of Wielkopolska presentation, made by Marcin Mielnicki from PSNC Digital Libraries Team are also available.

Implementation of The Commission Recommendation on Digitisation […] – Report 2010 by POLAND

28th February, 2010 was the deadline for submission of reports on the application of The Commission Recommendation on Digitisation and Online Accessibility of Cultural Material and Digital Preservation. Poland, as a Member State of the European Union, was also obliged to submit such report on the current implementation stage. You can download it here.

According to the document, it is estimated, that at the beginning of 2010 there is in Poland:

  • ca. 500 000 digital object in the libraries (80% available via the Internet),
  • ca. 1 000 000 digital object in the archives (20% available via the Internet),
  • ca. 500 000 digital object in the museums (1-2% available via the Internet).

Of which around 350 000 (17.5%), mainly located in the libraries, is visible in the European digital library, archive and museum – Europeana. In order to increase digital objects number and then to make them visible in the Internet, four Competence Centres were set up:

There is more interesting information in the report. The National Library estimates now, that there is around 1 600 000 books in the public domain in Poland. It is a great resource, which can be freely digitized and make available in the Internet. Thats way in the document there is an assumption that by 2013 there will be 1 000 000 digitized objects in the global network from libraries in Poland. Furthermore by 2020 there will be 15 000 000 of Polish digital objects, which will be stored in the digital repositories of different types:

  • digital libraries,
  • digital archives,
  • virtual museums,
  • audiovisual collections.

The Committee for Digitisation at the Ministry of Culture and National Heritage and the four aforementioned Competence Centres are responsible for work on reaching that number (you can find more here on the programe).

Moreover the report contains a short description and contact addresses of 5 selected data providers for the Europeana. We are pleased to note, that 4 out of 5 presented, are visible in the Europeana via the PIONIER network Digital Libraries Federation:

If you are looking for more news about:

  • projected costs of digitisation in the next few years,
  • state of the Public-Private Partnership (PPP),
  • government programs in Poland tackling the issue of digitisation,
  • long-term preservation

I encourage you to read the whole report and the document mentioned in the report – “Program for digitisation of cultural goods and collection, storage and availability of digital items in Poland in 2009-2020“.

100 000 publications in the Wielkopolska Digital Library

On the 22nd of March 2010 the Wielkopolska Digital Library has reached the number of 100 000 publications available on-line. The publication which was published as number 100 000 is the Kurier Poznański issue from 1884.06.05 (R.13 nr 128), published in the WDL by Mr Wojciech Zagartowski from the Department of Electronic Humanistic Texts of the Kórnik Library of the Polish Academy of Sciences.

Congratualtions to Mr Wojciech and all other supporting the existence and development of the Wielkopolska Digital Library!