In September 2009 JISC Digital Media hosted two free seminars focussing on key topics for individuals involved with digital media. The seminars were funded by the Joint Information System Committee under the JISC ITT Workshops & Seminars: Achievements & Challenges in Digitisation & e-Content strand.
The presentation slides and details of each session are provided below.
See also our Digital Documentation and Performance seminars.
The hashtag for this event is #jdmcollect09 Please use this when tagging the event (e.g. on Twitter, Flickr, Delicious, etc).
15th September 2009
In order for a digitisation project to provide useful, effective content many interlinking aspects need to be considered. This seminar, which drew on the knowledge of experts in the field, considered some of the vital facets in addition to the purely technical details of physically capturing objects. These topics ranged from recording accurate and relevant metadata, dealing with copyright and other rights issues and balancing the demands of users and stakeholders.
John Hargreaves - Technical Support Officer, JISC Digital Media
John gave an overview of metadata for image collections examining some of the key concepts to consider when looking at a schema to suit your needs. There is no one correct answer for all. Much will depend on the content and context of the collection and whether interoperability is a requirement (i.e. sharing information with other systems).
He gave an interesting insight into some web projects which are leveraging the power of Web 2.0 technology to tag up vast collections of images. One such example given was the Galaxy Zoo project in which people can help to describe some of the many thousands of unclassified galaxies - with a little bit of prior web-based training.
John’s overall advice was to do your homework with metadata schemas and find one that gives you the best fit to your needs. What you want to avoid is having to start your project over having found that the information you are collecting is inappropriate for your needs.
David Dawson - Director of the Wiltshire Archaeological and Natural History Society (WANHS)
David Dawson gave a very informative presentation on the subject of copyright as it pertains to images. He started off with some of the basics and highlighted that possession of an image doesn't equate to ownership of the copyright. He discussed moral rights briefly (the right of a works creator to have their work correctly represented) and also highlighted the minefield of film copyright where each constituent element can (and probably does) have its own rights (e.g. the titles, images, story, audio and so on).
David then went on to look at pragmatic approaches to using in-copyright materials and keeping a due diligence record. He also gave a very useful introduction to Creative Commons highlighting this as a sensible system for sourcing images (and making your own available).
Ed I Bremner - Freelance Consultant and Trainer
Ed is a freelance consultant to the museum sector predominantly on issues pertaining to image creation/capture, storage and use. He has a long track record in the digital imaging field and is a former member of staff at JISC Digital Media (or TASI as it was then). Ed examined approaches to building image collections on a small budget. He first looked at the equipment a project might utilise and examined some of the major pros and cons. Ed’s general overall preferred option is to capture using a digital SLR owing to the flexibility it offers.
He made the point that it is often the setting of too high a standard for image capture that can trip projects up and that standards should meet the objectives of the project in hand and not be arbitrarily defined. Many collections may end up being re-digitised in years to come, a process which is contrary to the expectations of early projects.
Ed highlighted the fact the image capture can now be achieved relatively cheaply but the other elements of the workflow may come with a higher price tag - metadata addition being one - and so thought needs to be given equally to these areas.
Ed then looked at minimising the cost of the workflow by keeping processes running in parallel and making sure that typical bottlenecks were avoided. He made the point that it should be the safe handling of items and not technical constraints that in today's typical workflows should be the main limitation to work speed. He also flagged up methods for ensuring that quality assurance was not ignored.
Ed looked at “raw” as being an ideal file format for ensuring high quality files and making for a simple and effective workflow. He then demonstrated a system of image capture using a copy stand and digital SLR linked to a laptop. The camera was controlled remotely and images imported directly into an image management system (Adobe Photoshop Lightroom) where much of the required metadata was added automatically upon importation.
Grant Young - Digitisation and Digital Preservation Specialist, Cambridge University Library
Grant presented a case study of the JISC funded 19th Century Pamphlets digitisation project, for which he was the project manager. He covered the decisions made in planning the project, the challenges encountered, and key lessons learned.
The 19th Century Pamphlets digitisation project was a £1 million large-scale digitisation initiative (LDSI) involving 12 partners which was scheduled to run for two years.
Grant began by showing us the output from this project (hosted by JSTOR) which hosts all of the content generated from the project which is in excess of 26,000 pamphlets comprising over one million pages.
Read the full report of Grant’s project findings (PDF).
Grant highlighted that a project of this size will inevitably experience problems which present both barriers and unforeseen opportunities. He also highlighted that finding and keeping appropriate staff from the start to finish of a time-limited project is difficult – there will almost certainly be a drop off towards the end as people leave to secure new positions ahead of the project close. He also raised the point that the sheer amount of data involved (both analogue and digital) caused unforeseen problems for the project. Initially the scanners weren’t able to cope with the volume and the project entered a period of being behind schedule. It was decided to invest in new hardware which then brought the project back on track (the bottleneck then shifted to an inability to source sufficient pamphlets to keep up!).
Another major barrier experienced was the issue of IPR. This was a very complicated and time-consuming issue given that 12 partners were involved from both the UK and US. Negotiations to iron out the IPR issues ran for the duration of the project (two years) and required a great deal of diplomacy.
Grant highlighted that having met the project objectives and make the content available in a sustainable fashion the infrastructure created (at Southampton) was left underutilised and staff with considerable expertise were lost due to having no large follow on projects to occupy capacity. Outsourcing the digitisation may have been an option but in Southampton’s case, the infrastructure and core team of experts were already in existence and so it made sense to use and develop these assets.
Michael Popham - Head of the Oxford Digital Library
Michael spoke about Oxford’s work towards a sustainable business model for the outputs of their JISC-funded mass digitisation Ephemera project and their relationship with Google involving the mass digitisation of books.
Michael began by looking at what makes a project sustainable and examine some contrasting definitions of what sustainable actually is.
His talk was based around the John Johnson ephemera project.
The John Johnson collection is widely recognised as one of the most important collections of printed ephemera in the world and generally regarded as the most significant single collection of ephemera in the UK. Containing 1.5 million items ranging in date from 1508 to 1939, it spans the entire range of printing and social history. It contains a high proportion of unique material which has remained hidden to researchers up until now and which will surface through this innovative digitisation project
For this project Oxford University Library digitised the content and utilised a 3rd party (ProQuest) to host the material and manage access to the content on an ongoing basis. The collection can be seen here.
ProQuest licences access to non-UK subscribers and revenue is used to cover ProQuest's costs. Royalties are then paid to the Bodleian library.
Michael then gave an overview of the Oxford-Google Digitization Programme which saw the digitisation of more than 1 million of the Bodleian Library's printed books. This work was based on widening access rather than looking at preservation and began with 19th-century material in an attempt to minimise problems with material being in copyright. This was largely successful although some of the older materials remain in copyright and IPR issues needed to be resolved.
Michael finished his talk by highlighting some of the key findings of the Ithaka report on sustaining digital resources and sustainability and revenue models (PDF), which was commissioned by JISC's Strategic Content Alliance.
The day concluded with a panel discussion, the video of which is here and also available via the Internet Archive.
16th September 2009
Obsolescence, deterioration of physical storage media or withdrawal of institutional support: just what will prove to be the greatest threat to the materials we digitise today? This seminar looked one hundred years into the future and attempted to predict the future ‘preservability’ of what we digitise today. This seminar examined changing user demands and inevitable developments in technology.
Dr William Kilbride - Executive Director of the Digital Preservation Coalition
William began the day with his presentations of an overview of the digital preservation landscape. He highlighted that long-term preservation has generally turned out to be far trickier than we expected and cited several notable cases where large quantities of data have been lost. One such example was a quote by Amanda Spencer, National Archives Continuity Project:
"Of all the websites referenced within Hansard* between 1997 and 2006, 60% of the URLs are now broken."
*record of the House of Commons daily debates
He looked at why we might want long-term preservation strategies, ranging from regulatory requirement to preserving important cultural information. Seven long-term preservation challenges were then identified and examined which were:
Nigel Goldsmith - Technical Support Officer for Still Images, JISC Digital Media
Nigel gave a comprehensive overview of the raw file format for those working with digital cameras. He started by looking at some of the technical requirements and specifications of the format and explained how it works.
Since it appeared there have been questions raised about the suitability of raw when considering it as archival format since all raw formats are proprietary. However, the introduction of the DNG format (Adobe's Digital negative) provides a solution to this. All raw files can be converted into DNGs and this format is natively supported by Photoshop and an increasing number of other applications. Although not 'open source' the code for the DNG format is made openly available making it a far more suitable choice for archiving images. There are some other advantages to DNGs including the ability to write metadata into the file, making the management of images and their associated information simpler.
Nigel examined the key differences of utilising raw/DNG compared to other popular image file formats (namely JPEG and Tiffs) and highlighted the additional benefits of capturing more information and being able to edit raw files without information loss. On the downside he pointed out that utilising raw files necessitates extra processing work on the computer to make the images usable and the fact that the files are considerably larger than JPEGs.
Getaneh Alemu - Humanities Computing Department, The University of Portsmouth
Getaneh, from the KEEP Project (Keeping Emulation Environments Portable), spoke about state-of-the-art metadata standards and how metadata can help ensure the integrity, identity and authenticity of digital documents. He gave an overview of the various metadata initiatives and standards (OAIS, CEDARS, NEDLIB, LMER, PREMIS, and METS) along with information on how each one supports digital preservation.
Getaneh stressed that it has proved possible to preserve written material over millennia but that we struggle to preserve digital information even for few decades. He argued that metadata is at the heart of preservation.
Tom Woolley gave an insight into the National Media Museum and its collections highlighting some of the diverse problems they face concerning preservation and trying to keep now obsolete hardware and software accessible. The museum often resorts to emulating the older technologies using new hardware in order to preserve what is often rare, original hardware. It is also generally necessary to provide a user experience that corresponds with today's expectations in terms of speed of access and usability. However, this comes at the expense of authenticity.
Professor James Newman, from Bath Spa University and the National Videogame Archive then spoke about videogame preservation. He related how the market is almost totally focused on the future and emerging technology with little interest in old computer game technology. This fixation on new technology is so great that 70-90% of the articles in games magazines feature previews of games not yet available to buy.
This concentration of attention on the future presents certain challenges when it comes to preserving games since the interest in this is largely lacking from the consumer.
Simon Tanner - Director, King's Digital Consultancy Services
Simon’s presentation focused on the need for effective collaboration in coping with the reality of implementing an effective digital preservation strategy. He drew the analogy between climate change and preservation: people view both as significant threats but until either directly affects us we will generally assume that they are someone else's problem.
It was Simon's view is that it is only through both intra and inter-organisational collaboration that progress towards digital preservation can be made in a holistic way. He highlighted some successful collaborative digital preservation activities and outlined what effective collaboration would involve.
Neil Grindley - JISC Programme Manager - Digital Preservation
Ensuring that an organisation's digital assets are safe, secure and accessible for the long term should (in theory) be an interesting, responsible and useful role for anyone in an organisation to accept. The critical importance of digital assets, the ubiquity of digital methods and the need for people in all walks of life to have effective means to refer to persistent sources of data reinforce this notion. How is it then that long-term asset management, information lifecycle management, data curation, digital preservation (call it what you will) is often regarded as a peripheral specialist activity that it is difficult to resource, complex to carry out, and delivers benefits that are, at best, simply an insurance policy rather than an activity that adds value to an organisation?
Neil’s presentation examined the importance of defining clear roles for those involved with digital preservation and considered the importance of associating this professional activity with strategic and tactical frameworks. He proposed that it is likely that automated services will increasingly be required to deal with the colossal amount of digital information that will be produced and consumed over the next century and whilst the type and nature of these services are yet to be defined, we can be fairly certain of one enduring requirement, namely, that human judgement will always be needed to curate interesting and useful content for future generations. The idea of simply keeping everything is not only impractical but also undesirable.
The more defined and measurable elements of his presentation aside Neil offered us several predictions for the future of the world of Information Technology and preservation. These were that "today’s hardware and software will look antiquated and rather cumbersome" that "methods of storing and accessing information will change in ways we are currently unable to imagine" that "the demands of a swelling global population and dwindling or compromised natural resources will require new paradigms of power consumption" and rather reassuringly that "machines will not take over the world!"
So, not only did the audience leave the day’s set of presentations better informed but also able to sleep more soundly at night.
The day concluded with a panel discussion, the video of which is here and also available via the Internet Archive.
If you have any enquires or comments please contact Dave Kilbey.
You may also be interested in reading about our Digital Performance Seminars.