Tag Archives: Digital Humanities

New version of the Virtual Transcription Laboratory portal

A few days ago a new version of Virtual Transcription Laboratory portal has been implemented (http://wlt.synat.pcss.pl). In the current version you can find some new functions and conveniences, also all mistakes reported by users were repaired.

Most important changes that were implemented:

  • transcription’s editor supports multicolumn documents f.ex. newspapers (this option is available in new projects),
  • lines verification mechanism was added, each line is connected to the information about whether it has been looked through or not,
  • the import of TIFF files mechanism was expedited,
  • the possibility to download transcription results in text format,
  • link to preview whole scan in transcription’s editor,
  • numbers of lines were added in a transcription’s editor view,
  • it is also possible to move lines to a certain position in transcription’s view (by giving its number),
  • a mail is sent to the project’s owner after finishing the batch OCR,
  • the information about changes author was added to a history view,
  • author of the project is an optional field in a formula of creating new project

Detailed note about release with all changes and adjustments can be found here.

Apart from changes mentioned above, a suggestions and improvements forum where you can inform us about your propositions of improvements in VTL and vote for other user’s ideas has started (it is available here). You can enter the forum using an orange bookmark “Your suggestion”, which appears in the right upper corner of the VTL site. We strongly encourage to report your ideas and vote for those already visible on forum. Further works on VTL will include functions which mostly appeal to users.

New functions of the Virtual Transcription Laboratory portal

Source: http://pl.wikipedia.org/wiki/Plik:Escribano.jpg

We can gladly inform about the release of the newest version of the Virtual Transcription Laboratory (http://wlt.synat.pcss.pl).

This version contains new functions and correction of errors reported by users. Among most important changes there are:

  • the possibility to export the results of work in a ePUB file,
  • share project only with chosen VTL users,
  • support for scans in TIFF format (after uploading they will be automatically converted into the PNG/300DPI format),
  • changes in transcription editor dialogue,
  • a number of corrections in outcome hOCR files,

Full list of changes with screens can be found on our wiki:
https://confluence.man.poznan.pl/community/display/WLT/Note+about+release+from+2013-03-25

Next stage in beta testing of VTL

circle1
On Friday 15th of February 2013, we have released a number of new functions and improvements in the Virtual Transcription Laboratory (http://wlt.synat.pcss.pl) portal.

These are the most prominent ones:

  • noticeable improvement of capability and stability of whole portal activities,
  • change in the way of transcription edition history is stored,
  • import of existing DjVu publication on the basis of the OAI identifier (this feature is described in an end-user documentation),
  • batch OCR for all files in the project,
  • notifications showing whether changed performed in transcription editor were saved,
  • many minor improvements and bug fixes reported by users,
  • the first version of documentation for users has been published (http://confluence.man.poznan.pl/community/display/WLT).

A few months passed since the BETA release of VTL. We would like to thank everyone for their feedback ;-). After the initial release it became clear that serious changes must be done in the portal engine. The most important was the change in the way transcription is represented and stored in database. This was a very significant thing but it resulted in a significant performance and stability improvement.

In the near future two new functions will be added:

  • export of project results in EPUB format,
  • the possibility to upload TIFF files into the project (they will be automatically converted to  PNG file in 300 DPI).

Authors of the post: Bogna Wróż, Adam Dudczak

First Polish THATCamp

First Polish THATCamp will be organized on 24-25 October 2012 and will be held next to “Zwrot Cyfrowy w humanistyce Internet Nowe Media-Kultura 2.0” conference in Lublin. Event is organized by the Polish THATCamp coalition and will take place in headquarters of NN Theater on Old Town in Lublin (Grodzka 21). Poznań Supercomputing and Networking Center is an official partner of this event.

THATCamps (The Humanities And Technology Camp, http://www.thatcamp.org) is a meeting of people interested in new technologies in humanities, sociology, academic and artistic institutes activities (universities, galleries, archives, libraries and museums) organized all over the world. Participation in that kind of events is free.

Beginnings of THATCamp date back to 2008, when it was organized for the first time in USA by Center for History and New Media (CHNM) in George Mason University.

More information about event can be found here (in Polish).

Post authors: Bogna Wróż, Adam Dudczak

Digital Humanities 2012 conference

Digital Humanities 2012 was, one of the best conferences which we have attended in 2012. Organizers managed to gather more than 500 attendes from all around the world. Conference was held at the University of Hamburg, which is a really great venue to host more than 200 sessions (5 parallel tracks of the main conference plus various workshops and tutorials) during the 5 days starting from Monday 16 of July.

As a summary of the conference we would like to bring your attention to a few very interesting projects and tools which were presented there. If you are interested in getting more information about them, you may check conference website, because all the lectures were filmed and videos are available online for free.

First project on our list, a “Programming historian 2” it is an effort which aims at creation of a second edition of a textbook which shows how programming tools like Python can be used by digital historians in their research. It sounds like a very ambitious and interesting task. Project is a collaborative effort, it consists of lessons which lasts from 30-60 minutes and tries to show what and how can be done using modern programming tools.

One of the most interesting project from user interface point of view was Neatline. It is a set of plugins for Omeka digital library framework. Neatline allows to creae a visually rich presentations of e.g. helps users to tell a story using timeline and map (example exhibitions about Battle of Chancellorsville). Tool itself is very nicely done, apart from normal fully-fledged version it’s also mobile ready – really worth trying out.

Next interesting project that was introduced at the conference was Pelagios. The name of project stand for ‘Pelagios: Enable Linked Ancient Geodata In Open Systems’. It is a collection of online ancient world projects (ie. Google Ancient Places, LUCERO and many others) used to find information about ancient places and visualize it in meaningful way. To achieve this purpose they use common RDF model to represent places reference and align all place references to the Pleiades Ancient World Gazetteer. As authors says project now focuses on ancient world, but it is only first step on building Geospatial Semantic Web for Humanities.

Among many other interesting things project named “Visualizing the History of English” introduced by Alexander Marc was one of the best project at the conference. Alexander Marc presented method to visualize English vocabulary by treemap chart from different time periods. For this purpose uses huge database of the Historical Thesaurus of English (793 747 entries within 236 346 categories). I truly recommend to have a look at the video from this presentation.

There was also few very exciting projects related to different aspects of history and geography. One of the best example was The MayaArch3D Project that combines art history, archeology with GIS, virtual reality for teaching and research on ancient architecture. The current prototype is a virtual searchable repository of Maya city located in Copan of western Honduras. One of the purpose of this paper is to analyse the visual and spatial relationships between built forms and landscape elements. Project was developed in Unity3D game engine in combination with PHP and PostgreSQL.

QueryArch3D Demo Film from Jennifer von Schwerin on Vimeo.

This is of course not all, below you can find a list of a few interesting tools, named in various presentations during the conference:

  • Stanford NLP group publishes results of work of NLP group from Stanford University. Website offers access to multiple tools which can be used for Natural Language Processing.
  • Apache Open NLP a machine learning based toolkit for the processing of natural language text.
  • Alchemy API – it helps to transform text info knowledge. Alchemy is a cloud-based text mining platform providing semantic tagging to over 18,000 developers. AlchemyAPI provides the most comprehensive set of natural language processing capabilities of any text mining platform, including: named entity extraction, author extraction, web page cleaning, language detection, keyword extraction, quotations extraction, intent mining, and topic categorization.
  • A few presentations during the DH named Open Calais an semantic enrichment API powered by Thomson Reuters. Now in 4.6 version, a very interesting project available since quite a while, nice to see that it is widely used.
  • D3.js – Data-Driven Documents is a very nice JavaScript library for manipulating documents based on data. It helps to bring data to life.
  • OKF annotator, developed by Open Knowledge Foundation allows to annotate virtually any resource on the web.
  • GeoStoryteller is one of the tools used during the German Traces NYC project. It is an educational tool that allows you to create stories about physical places. Users can take a walking tour and engage with the GeoStories you have created using their mobile phone.

Last but not least links to two interesting documents Research Infrastructure in the Digital Humanities (from European Science Foundation) and an inventory of FLOSS dig_hum tools.

authors: Piotr Smoczyk, Adam Dudczak

Google’s Digital Humanities Research Awards announced

Yesterday on the Google Research blog the information about research projects, which have received grants in “Google’s Digital Humanities Research Awards” was published. You can find the list also below:

  • Steven Abney and Terry Szymanski, University of Michigan. Automatic Identification and Extraction of Structured Linguistic Passages in Texts.
  • Elton Barker, The Open University, Eric C. Kansa, University of California-Berkeley, Leif Isaksen, University of Southampton, United Kingdom. Google Ancient Places (GAP): Discovering historic geographical entities in the Google Books corpus.
  • Dan Cohen and Fred Gibbs, George Mason University. Reframing the Victorians.
  • Gregory R. Crane, Tufts University. Classics in Google Books.
  • Miles Efron, Graduate School of Library and Information Science, University of Illinois. Meeting the Challenge of Language Change in Text Retrieval with Machine Translation Techniques.
  • Brian Geiger, University of California-Riverside, Benjamin Pauley, Eastern Connecticut State University. Early Modern Books Metadata in Google Books.
  • David Mimno and David Blei, Princeton University. The Open Encyclopedia of Classical Sites.
  • Alfonso Moreno, Magdalen College, University of Oxford. Bibliotheca Academica Translationum: link to Google Books.
  • Todd Presner, David Shepard, Chris Johanson, James Lee, University of California-Los Angeles.Hypercities Geo-Scribe.
  • Amelia del Rosario Sanz-Cabrerizo and José Luis Sierra-Rodríguez, Universidad Complutense de Madrid. Collaborative Annotation of Digitalized Literary Texts.
  • Andrew Stauffer, University of Virginia. JUXTA Collation Tool for the Web.
  • Timothy R. Tangherlini, University of California-Los Angeles, Peter Leonard, University of Washington.Northern Insights: Tools & Techniques for Automated Literary Analysis, Based on the Scandinavian Corpus in Google Books.

Summary budget of this program is around 1 million dollars, for a time period of about two years. Of course we congratulate to the awarded 🙂