Further links will be added to this post throughout the afternoon!
1: Grant Young, Jennie Fletcher, Huw Jones,'Laying the foundations of a new digital library'
First off, Grant explained the background.
In 2010 the Cambridge University Library digital library had ambition and vision and plans. Now, through major donation and JISC and research council funding, there is a major programme worth £2million, which is delivering infrastructure and content. It's already made a media splash with the release of some of the Newton papers: coverage in newspapers and TV, even in New Zealand.
You can explore the Digital Library at http://cudl.lib.cam.ac.uk/.
Backstory:
CUL has an impressive digitisation unit and has been digitising for a long time (particularly for the Genizah and Darwin projects, and has also supported Parker on Web and Freeze Frame).
Donor:
The Polonsky Foundation recognised that funding was needed to support infrastructure and content. £1.5milion donation for core three-year project: Foundations Project, Phase 1, to run mid 2010 to mid 2013. Has attracted further funding (£700,000) from JISC and AHRC.
Goals:
- Support creation, delivery, preservation digital content
- Enable discovery, access, reuse (as licenses permit - can download high res Newton MS and use under Creative Commons license) (will be developing APIs to allow use of data and metadata)
- Enable interaction with content, including personalisation and user interaction
- Enrich content by linking with research, providing digital humanities tools/platform
- Integrate well with existing library infrastructure
- Develop infrastructure that is flexible, scalable, extensible and sustainable
- Two broad areas covering priority collections and hopefully inspiring donors: Foundations of Faith and Foundations of Science
- Content delivery concentrated in years 2 and 3
- Liaising with academics to inform content choices
- Actively seeking further funding
Jennie spoke about the technology infrastructure:
They quickly identified that a commercial system wouldn't serve well, so use open source tech.
They have a lightweight modular system: individual parts can be altered and replaced as required.
It's scalable: designed to put minimum load on servers, and uses virtual machines that can be cloned and redeployed rapidly as needed
Production:
- They use Goobi, which is developed by Intrada, and has some special additions for us. It's written in Java, and is open source
- It allows workflow management and custom workflows: this means that experts can help at appropriate points in the process.
- Outputs METS MODS and image files
- Large tiff images and metadata are loaded into DSpace
- Displays using XTF and our own custom digital library viewer.
- XTF processes a variety of metadata into JSON for viewer, and it also indexes data for searching and faceted browsing (to be added to interface in future)
- Allows transcriptions next to MSS - these are provided via the Newton Project
- Focus on images: simple and clear interface. Scales to size of browser window and is customisable by user.
- It uses HTML5, ExtJS, SeaDragonAjax, Java Spring Framework
- Runs on Tomcat and Apache on Ubuntu, but can be run on any OS
- Further user customisation: bookmarks, own notes and annotations to share
- Search and faceted browsing
- Specialised views for other content
- Further Improved mobile support
- Mailing list for updates
- More download options: pdf? metadata?
- More content!
- JISC project information: http://www.jisc.ac.uk/whatwedo/programmes/digitisation/content2011_2013/Board%20of%20Longitude.aspx
- AHRC-funded project info: http://www.rmg.co.uk/blogs/longitude/
- This project digitises the Board of Longitude Archive and objects from the National Maritime Museum, in partnership with JISC and AHRC (dept of History and Philosophy of Sciecne in Cambridge are writing history of the board, funded by AHRC, JISC supporting the digital bit)
- Has 65K images as well as archive and accompanying material. It's more than just the Longitude story we know! Includes famous and non-famous people, many areas of science and technology, early example of state-funded research and many other interesting things.
- Will run Nov 2011-July 2013
- Amazing content
- Working in partnership: libraries involved in research projects
- Uniting Museum and Library Collections
- Driving development: i.e for such a large collection you need to have search, browse, etc., making the most of different types of materials, e.g. maps
- Library as publisher - i.e. integrating research output into Digital Library: raises question of library moving from disinterested provider of content, to being a publisher choosing and curating research output. But this is an opportunity.
2: Christy Henshaw, Programme Manager, Wellcome Digital Library, 'Creating an Online Resource for Medical Archives at the Wellcome Library'
The Wellcome Library has be digitising for a while, but the Digital Library as a long-term strategic programme is new. It's currently in the pilot stage, and although digitisation and infrastructre development are underway, there is no delivery system yet.
The Wellcome Library is smaller than many, and is subject specific, so the idea of digitising it all is maybe less intimidating than in other places.
Digisation is a particular strand of the Library Transformation Strategy 2009-14, which covers: targeted collecting, expert interpretation, and strategic digitisation
The pilot project is: 'Genetics and its Modern Foundations 2010-2013'. It plans to:
- Build a sustainable expandable mechanism
- Digitise key holdings
- Digitise important third party content
- Use innovative content and tools
- Explore commercial partnerships
They use Goobi, too, and have similar infrastructure to CUL. But they don't have a separate digital library website: will be searchable through main OPAC, including full-text search in the main library catalogue.
What's being digitised (2 years, 10 collections, 600K pages)?
- Archival material: Will be 1.1million images: 600K internal 500K internal. The major internal collection is the Francis Crick collection.
- Books: 600K images related to genetics research, including up to modern material.
- Non-genetics stuff: Early printed books as part of ProQuest Early European Books = c. 5.5million images/14-15K books. Also Medical Officer Health Reports for London, 400K images
- Born-digital material: small but growing.
Physical work (0.6FTE):
- Flattening
- Check sequences
- Protect with sleeves
- Remove stapless
- Everything, even copyrighted: will block access to inappropriate material at a later stage)
- Canon digital cameras
- Make clear what's not currently physical available via catalogue and an online list
- Maintain a schedule on a staff wiki
- Remove items for shortest time possible
- Include buffer in advertised unavailable time
- Set targets based on what's been completed already
- Churchill Archives Centre
- Cold Spring Harbour Laboratory
- King's College London
- University of Glasgow
- University College London
As Wellcome isn't publicly funded they don't have to comply in the main with FOI, and as most data isn't structured, data protection isn't such a huge issue, but they do hold a lot of very sensitive material (which isn't in the public domain) and they try to abide by spirit of legislation and behave sensitively. Archivists asses collections based on metadata, then sample items from collections are checked against a checklist: it's an iterative process to determine what can be made available how. They don't have resources to check everything straight off. This is done to an extent at cataloguing stage, of course, but have to be more careful and granular if material is to be made available online.
Graded online access:
- Material >100 yrs old: open access
- Material <100 yrs but open in reading room: register online
- Restricted material (e.g. reference letters, grant applications (within certain dates), letters discussing medical information not in the public domain): users have to sign a form in person
- closed: not available to anyone
- Impossible to do the project if clearance obtained for everything
- Orphan works v. difficult to obtain copyright for
- Taking a managed risk
- Copyright clearance by exception
- Bearing risk for non-Wellcome content
- Ensure clear research value: not commercial purposes. Terms of use prohibit commerical use for less than 100 years old.
- Take-down policy
Users who register agree to various responsibilities including abiding by data protection act. The reuse agreement specifies reuse encourage within copyright, data protection, with acknowledgement, for non-commerical use.
Find out more about the archives digitisation: goo.gl/T1RS9
Questions and discussion
There were several interesting questions covering issues including why we provide digital library content for free, how these libraries are advertised, the costs of long-term preservation, what user interaction is expected and how it is encouraged, and how and why digital library content can and should be integrated into the main library website and catalogue.
This is such an interesting story at it gives in-depth information to the reader as well. Good thing you have it shared. Damiana
ReplyDeleteThis is a good blog for anyone who wants to know about affiliate jump.Thanks for sharing such a nice information to everyone. e107 themes
ReplyDelete