Metadata and Digital Images
Working with metadata can be a very time-consuming and resource heavy activity, so it is important to have a firm understanding of your objectives before initial decisions are made. Before reading this advice document, it is worth familiarising yourself with the document An Introduction to Metadata.
Metadata is essential in providing the means to describe, share, search, manage and preserve our digital resources, ensuring maximum potential for use and re-use throughout their lifecycle. The selection of metadata, and its management and administration, should be tailored to meet the specific needs of the collection and its various users, and follow established best practice, where it exists, within the community in which you are working. While metadata should be created focussing on the resource at hand, ideally some thought should also be made towards making it possible to share your data with other collections, catalogues and systems to maximise the potential use and re-use of resources.
Identifying required metadata
Preparing a set of specifications is a very good way to start the process of identifying metadata requirements. These specifications should be based on a survey of stakeholders' needs and attempt to describe both the resource at hand and to sketch out methods for searching, retrieving, managing and preserving the resource.
As stressed in the overview, 'An Introduction to Metadata', metadata will always be selective. You cannot possibly say everything there is to say about an image, its context and its potential use. You will be constrained by your practical resources (time and money) and by the extent of your knowledge. You will also be constrained by the limitations of language, since you're using text to describe visual information.
Digital Images can be very complex to describe and before you can even begin to say anything about an image you need to be very clear about what you're actually focusing on. For example, are you more concerned with the content depicted within the image (e.g. an object, a place or a person) or the analogue image itself (e.g. the painting or photograph) or both? To enable effective re-use of your images, do your users require only some comparatively straightforward low level descriptors like 'title',' 'creator', 'format', 'colour', 'size' etc?, or perhaps your users require information on more abstract higher level meaning, for example, feelings or emotions elicited from an image, such as 'happy', 'sad', 'love', 'anger' etc., or fuller descriptions of the image content.
A digital image may contain many different layers (e.g. a particular landscape, a drawing of that landscape, a photograph of that drawing, a digital scan of that photograph). Each of these "images within the image" will have its own context (e.g. a geographical location, an art collection, a photo album, a folder on a server). Each will also have its own particular history (e.g. when they were made, and by whom). How much of their context and history do you need to record?
Therefore, in developing metadata for your collection, you will first need to make decisions about what it is you're describing and how best to build the relationships between the various possible 'layers' of your image. You will then need to decide what particular characteristics or categories are going to be necessary to record for each.
It is extremely worthwhile investigating other metadata systems in projects similar to yours. Finding out what members of your community (be it HE, FE, museum, archive etc) have implemented, and what (if any) standards they have used can produce invaluable information. This communication is also beneficial for spreading consistency and good practice within your immediate community. Metadata should accumulate over time, and it is important that this is taken into account to allow for future expansion, for input from all of the stakeholders involved.
Once you are clear in your objectives it will be easier to identify which schema or extension schema should be implemented to best meet the needs of the production staff, the repository and the users. Once these needs have been identified a draft list of required metadata can be drawn up.
Metadata standards and interoperability
In principle there is nothing wrong with taking your resultant list of metadata, indicating just how each field should be filled out (in practice, this might prove to be a lot of work) and to begin to create a simple database record for each digital image file. However, problems might occur if you later wished the share your collection with an outside institution. Further problems might also occur if you wished to use software tools to automatically extract your metadata and add them to the database.
For these reasons, working with interoperable standards in recommended. Metadata standards that have been developed specifically for digital still images are comparatively well advanced, and include, among others, the Visual Resources Association Core (VRA), Categories for the Description of Works of Arts (CDWA) and Metadata for Images in XML (MIX). See the advice document Metadata Standards and Interoperability for more information on these, and also, Putting Things in Order - a Directory of Metadata Schemas and Related Standards for a full listing of the main metadata schemas used for images. These standards tend to have been developed largely by the art and design and cultural sectors, but have been adapted successfully over the years by other areas that deal with images in the commercial, medical and scientific communities.
Unfortunately however, the ‘definitive metadata standard', which can be used without modification, does not exist. As outlined above, your metadata needs are inextricably linked to the various needs of your own users, making a globally accepted ‘definitive metadata standard' impossible. Some modification (even if this is simply omitting some fields) will almost certainly be required to make the standard you choose fit your needs.
Dublin Core (DC) is a set of metadata elements which comes closest (so far) to a universally exchangeable standard, but not without compromise. With its very broad aim as a standard that can be used for any digital resource irrespective of type, DC can be criticised as not being detailed enough when it comes to dealing with specific resources. However whatever its failings, DC does provide a well defined core set of descriptors, is widely used and is relatively straightforward to implement. Also in its favour, Simple DC can prove very useful in facilitating simple machine to machine data interoperability and is used commonly in data sharing protocols like the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), Z39.50 and Resource Description Framework (RDF), among others, which facilitate the cross searching and sharing of information from one access point.
You may decide one particular schema is more suited to your needs than another. Or, perhaps more likely, that a number of different elements from several different schemas will best fit your needs. Schemas can be mixed together, although when mixing elements from different standards into one record, problems can occur when attempting to share this data with other machines and systems, as usually they are set up to deal with one standard set of metadata at a time. To overcome this problem Metadata Encoding and Transmission Standard (METS) has been developed by the Library of Congress to provide an Extensible Mark-up Language (XML) 'wrapper' into which various metadata elements - perhaps using various standards used to describe one object - can be placed and read by another machine which understands the METS standard. It's possible, for instance to use Simple DC and VRA Core for descriptive metadata, the METSRights schema for copyright information and PREMIS for preservation metadata (see below).
Another approach to mixing elements from different schemas is through the development of a metadata 'Application Profile' usually tailored to the needs of a specific domain or resource type. There is more detail on Application Profiles in the advice document Metadata Standards and Interoperability.
Mapping or crosswalking metadata
If you decide not to use an established schema ‘as is' but modify one or several to meet your requirements, interoperability need not be sacrificed. Metadata elements can be mapped between schemas. This involves approximating an element in one schema to an element (or multiple elements) in another. Although this map can be developed when it is required, it is not a foolproof operation and compromise will most likely be needed; an element from one schema may not have a direct equivalent in another, for instance.
If using an in-house schema, it is best to plan ahead and facilitate future mappings as you construct your set of metadata elements. Mapping can also prove useful if a set of older ‘legacy' metadata was inherited along with a collection and this to be added to a newer management system or database.
UKOLN have an extremely useful list of existing mappings between many popular schemas.
Management systems and metadata
Metadata handling can be closely associated with the choice of software management system. A system may be as simple as creating a small database with fields for the required pieces of descriptive information and the location of a digital video file on an internal hard drive. Such simple systems allow for infinite customisation and are usually relatively straight forward to develop. But for collections that perhaps require a more complex set of metadata, a a fully functioning Digital Asset Management System (DAMS) may be required. Dedicated management systems allow for more advanced operations and often support at least some commonly recognised metadata schemas. There are many different commercial and open source management systems available, so if you plan to acquire one you should, given the choice available, be able to fulfil your needs with at least one of them. A further point to note is that your choice of system can also be affected by local considerations such as the skills available to you in terms of staff and also the technical infrastructure of you institution. Therefore it may be worthwhile consulting with your IT department to ascertain which database software and systems can be supported.
Which information to record?
Metadata can be conveniently separated into different types; it is likely that your required set of metadata elements will include elements of each type (the following categories are taken from the Metadata Encoding and Transmission Standard or METS, other schemas define different but similar types):
- Structural metadata: describing the metadata record and its relationship to the digital video resource
- Descriptive metadata: which summarises the content of the digital video
- Administrative metadata: which includes rights metadata, information about analogue sources of digitised videos and preservation metadata
- Technical metadata: a special kind of administrative metadata which describes the properties of the digital video file itself
Is the type most closely associated with your content management system. If your collection is too large or complex to keep track of by using a simple database because it consists of multiple relationships between objects and their metadata records, using a CMS will probably provide an effective means of managing your collection's 'structure'.
Descriptive metadata is the primary retrieval gateway for most end users. As such, the kind of information recorded is likely to be very specific. In order to make your collection interoperable, consider making use of elements from the Simple DC metadata set such as: Title, Creator, Subject, Description and Coverage. These can either be used directly, if they fit your identified requirements, or ‘mapped' to, from the elements you decide to use (see below). The VRA Core schema is another strong alternative.
Generally, administrative metadata assists collection managers in organising, providing access to and preserving digital collections. Such information may not directly describe the resource itself, but may provide useful, even vital data from elsewhere, such as legal rights. METSRights is an excellent schema which can be used to describe intellectual property rights.
The PREMIS (PREservation Metadata: Implementation Strategies) schema offers elements designed specifically to assist with the complex task of digital preservation and is used alongside other schemas in order to achieve this special purpose.
Adobe's XMP (Extensible Metadata Platform) schema has elements which together form a very detailed description of digital image material. XMP is usually used as embedded metadata (see below) but can also be imported and exported from a central management system. Another prominent contender for describing the technical aspects of digital images is the NISO Technical Metadata for Still Images data dictionary, which is available in XML format as Metadata for Images in XML (MIX). This extensive schema offers digital image specific elements which together paint a comprehensive picture of the lifecycle of a digital image resource. Adaptors may wish to select elements from such detailed schemas carefully, as it is easy to make descriptions too rich and so overburden cataloguing staff.
Most metadata schemas consist of well-defined elements and descriptions of how these should be used.
DC's ‘Title' element for instance is:
"The name given to the resource. Typically, a Title will be a name by which the resource is formally known... If in doubt about what constitutes the title, repeat the Title element and include the variants in second and subsequent Title iterations. If the item is in HTML, view the source document and make sure that the title identified in the title header (if any) is also included as a Title."
But the values which are used to populate an element's field are not necessarily as tightly controlled. Would a poster of Stanley Kubrick's timeless movie be ‘2001: A Space Odyssey' or ‘Two Thousand One: A Space Odyssey'?
Authority lists and vocabularies are controlled lists of terms or names which institutions can draw upon in order to maintain consistency. While inevitably there are some caveats given its broad remit, the Library of Congress Subject Headings is perhaps the best known of these and offers several different authorities for searching names, subjects, titles etc. You may also want to consult more subject focussed vocabularies which are provided in the advice document Controlling Your Language: a Directory of Metadata Vocabularies.
Using controlled vocabularies, whether from an existing list developed externally, or from one drawn up in house, provide consistency in use of terms and spelling. They will help to make sense of the collection in isolation and if used effectively can provide a sound basis for cross searching and data sharing with other similar collections.
An ISAN (International Standard Audiovisual Number) works in a similar way to a printed book's ISBN (International Standard Book Number), acting as a unique identifier for analogue or digital moving image works.
The International Standards Organisation (ISO) and similar organisation (such as the British Standards Institution) publish standards documents which can also help improve consistency. ISO 8601 for example, describes standard notation for time and date values. The IANA (the Internet Assigned Numbers Authority) list of MIME types should be used for classifying digital video file type and subtype (e.g. image/jpeg). These conventions can be drafted in to cataloguing procedures to improve uniformity.
An example of varying vocabulary needs could be if you have a collection of biological images and your audience includes both researchers and the general public. In such a scenario you are probably going to need to draw on formal scientific taxonomies and more popular terminologies. Take the picture below. A researcher might recognise this creature by its scientific name (Danaus plexippus), but a school student almost certainly would not. Moreover an art and design student might be interested in some very different aspects of this image.
Image is from julesexperiment on Flickr - used within terms of a Creative Commons license
Instead of automatically reaching for standard schemas and vocabularies and then seeing your images through the "lenses" of those standards, it might be better to first take a step back and consider your users and how they are going to view your images. Once you are clear about their interests and needs, then you can look for suitable standards to use or adapt.
All digital files can hold a certain amount of metadata in addition to the information which actually makes up the content of the file. Historically the two principle formats for storing data within the image itself were: Exchangeable Image Format (EXIF) or the International Press Telecommunications Council (IPTC). Adobe's 'File Info' was also common, which was a sub-set of IPTC). This data is typically captured by the device at point of capture and is usually technical in nature. Most common image software applications will at least be able to read it, some will offer editing capabilities. As noted above, Adobe have also recently developed XMP for the purpose of embedding metadata. More recently developed image formats like JPG2000 and Portable Network graphics (PNG) offer potentially more flexibility for working with embedded metadata.
Embedding metadata has the advantage of protecting against loss or unavailability of a central database. For instance, a student can be given or download an image file with no need for an accompanying metadata file and still have access to valuable contextual information. However, a central database is still recommended which can perform advanced and speedy searches across collections without finding and accessing the metadata embedded within many individual files. If embedded metadata is to be used, the challenge of synchronising the two sets of metadata should be met by the overarching management system. Remember to check this functionality if purchasing a digital asset management system as additional bolt-on modules or even separate harvesters/editors may be required, depending on the file types within your collection.
Although it's likely you'll want to make use of some of the formal standards, it's useful to first consider the ways your users will approach your images and the different aspects of a visual work. Even if you're sticking very closely to established standards, you will still have some choices to make - especially at the level of vocabulary (the specific terms you apply to your images). Having a clear understanding of your images and the way your users view them will enable you to more critically evaluate potential metadata schemas and vocabularies and to assemble a metadata framework that works well for both your users and your cataloguers.
An effective method of selecting and using metadata is to begin by drawing up a set of requirements that will best describe the resource at hand. When complete, assess your requirements against available standard metadata and vocabularies, and investigate the approaches to metadata handling of collections with similar aims and objectives to your own. This approach will a) ensure your resource is described in a way that makes it fit for its intended purpose, b) potentially save you time in developing cataloguing rules and accepted field terms and mappings, and c) ensure a better likelihood that your resource will be interoperable with other related collections.
JISC Digital Media provide a helpdesk service and can assist by providing expert tailored advice on your own collection and the associated metadata.