Million Books Workshop
Friday, 14 March 2008
Imperial College Internet Centre


Workshop Theme and Topics

The Million Books workshop will explore issues raised by the mass digitisation of our libraries. We will focus on the general problems and opportunities of very large collections of books, digitised only as page images and with text automatically generated by Optical Character Recognition software. This discussion will build on earlier Million Books meetings in May and November 2007 at Tufts University and the Council for Library and Information Resources (CLIR) in Washington, DC. The London workshop and a subsequent workshop in Berlin will conclude a study on this general topic. The report from the November CLIR.org meeting will provide the starting point for discussion.
The focus of this workshop will be classical antiquity, especially as reflected in its major languages—Latin and Ancient Greek—and their long afterlife in the archives of European culture. The large quantity of European cultural heritage data referencing classical antiquity that will become available as a result of mass digitisation poses an immense challenge for digital collections on every level from knowledge management to systems engineering. The workshop seeks to bring experts in content, user access, technologies and policy together to address these fundamental issues in a holistic way.
We will examine five basic questions. We will begin with an overview of the current technical state of the art for large corpora moderated by experts in core technology. In the afternoon session, a roundtable discussion will identify and explore the challenges and opportunities created by mass digitisation, both for Latin and Ancient Greek, and for the many disciplines and domains in which they play a role. We will conclude with a general discussion about the way forward.

Programme

9:00 Coffee

9.30: Welcome John Darlington, Director, Imperial College Internet Centre.

9.40: Introduction and Overview: Gregory Crane, Perseus Project

10.00: SESSION I: Services

Dr. Thomas Breuel, DFKI and Technical University Kaiserslautern, –From Image to Text: OCR and Mass Digitisation.

10.45: Coffee

David Smith, Johns Hopkins University, & David Bamman, Perseus Project – From Text to Information: Machine Translation and Syntax Recognition.

David Mimno, U Mass, Amherst. – From Information to Learning: Machine Learning and Classification Techniques

1.00: Lunch

2.30: SESSION II: Collections

Roundtable discussion Moderator: Gregory Crane

4.00: Coffee

4.15: SESSION III: Systems and Infrastructure

Open Forum

6:00 Concluding Summary Gregory Crane

7:00 Dinner


Registration

Participation in the workshop is by invitation. A report will be made publically available.

For further information about the Workshop Programme, please contact Brian Fuchs (b.fuchs@imperial.ac.uk).

How to find us.



Background Material

Million Books Chicago Statement

Many More than a Million: Building the digital environment for the age of abundance



This workshop is sponsored by
the Mellon Foundation.
Delicious     Digg