Metadata and Digital Video
Working with metadata can be a very time-consuming and resource heavy activity, so it is important to have a firm understanding of your objectives before initial decisions are made. Before reading this advice document, it is worth familiarising yourself with the document An Introduction to Metadata.
Metadata is essential in providing the means to describe, share, search, manage and preserve our digital resources, ensuring maximum potential for use and re-use throughout their lifecycle. The selection of metadata, and its management and administration, should be tailored to meet the specific needs of the collection and its various users, and follow established best practice, where it exists, within the community in which you are working. While metadata should be created focussing on the resource at hand, ideally some thought should also be made towards making it possible to share your data with other collections, catalogues and systems to maximise the potential use and re-use of resources.
Identifying required metadata
It is extremely worthwhile investigating other metadata systems in projects similar to yours. Finding out what members of your community (be it HE, FE, museum, archive etc) have implemented, and what (if any) standards they have used can produce invaluable information. This communication is also beneficial for spreading consistency and good practice within your immediate community. Metadata should accumulate over time, and it is important that this is taken into account to allow for future expansion, for input from all of the stakeholders involved.
Therefore from the outset, metadata usually has to cater for multiple potential uses and end users of a given resource, and this specification phase will provide the means to get these requirements expressed and planned for at an early stage, potentially saving time and effort later on. And, by identifying the needs of the various collection users a draft list of required metadata can be drawn up.
Metadata standards and interoperability
In principle there is nothing wrong with taking your resultant list of metadata, indicating just how each field should be filled out (in practice, this might prove to be a lot of work) and to begin to create a simple database record for each digital video file. However, problems might occur if you later wished the share your collection with an outside institution. Further problems might also occur if you wished to use software tools to automatically extract metadata from digital video files and add them to the database.
For these reasons, working with interoperable standards in recommended. Several metadata schemas, vocabularies and authority lists exist which can be freely drawn upon and pressed into the service of your video collection. These can save a lot of time and effort, as each typically comes with a pre-prepared usage guide. Qualified Dublin Core (DC), Public Broadcasting Core (PB Core), and Moving Picture Experts Group 7 (MPEG7) are some that can be adapted, or have been developed specifically, for video resources. See the advice document on the topics of Metadata Schemas for more information on these and other available standards.
Unfortunately however, the ‘definitive metadata standard', which can be used without modification, does not exist. As outlined above, your metadata needs are inextricably linked to the various needs of your own users, making a globally accepted ‘definitive metadata standard' impossible. Some modification (even if this is simply omitting some fields) will almost certainly be required to make the standard you choose fit your needs.
DC is a set of metadata elements which comes closest (so far) to a universally exchangeable standard, but not without compromise. With its very broad aim as a standard that can be used for any digital resource irrespective of type, DC can be criticised as not being detailed enough when it comes to dealing with specific resources. However whatever its failings DC does provide a well defined core set of descriptors, is widely used and relatively straightforward to implement. Also in its favour, Simple DC can prove very useful in facilitating simple machine to machine data interoperability and is used commonly in data sharing protocols like Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), Z39.50 and the Resource Description Framework (RDF), among others, which facilitate the cross searching and sharing of information from one access point.
You may decide one particular schema is more suited to your needs than another. Or, perhaps more likely, that a number of different elements from several different schemas will best fit your needs. Schemas can be mixed together, although when mixing elements from different standards into one record, problems can occur when attempting to share this data with other machines and systems, as usually they are set up to deal with one standard set of metadata at a time. To overcome this problem the Metadata Encoding and Transmission Standard (METS) has been developed by the Library of Congress to provide an Extensible Mark-up Language (XML) 'wrapper' into which various metadata elements - perhaps using various standards used to describe one object - can be placed and read by another machine which understands the METS standard. It's possible, for instance to use Simple DC and the Metadata Object Description Schema (MODS) for descriptive metadata, the METSRights schema for copyright information and PREMIS for preservation metadata (see below). We have produced an example document using METS so you can have a look at the structures used within it. This particular instance uses MODS for descriptive metadata and PB Core for technical metadata.
Another approach to mixing elements from different schemas is through the development of a metadata 'Application Profile' usually tailored to the needs of a specific domain or resource type. There is more detail on Application Profiles in the advice document Metadata Standards and Interoperability.
Mapping or crosswalking metadata
If you decide not to use an established schema ‘as is' but modify one or several to meet your requirements, interoperability need not be sacrificed. Metadata elements can be mapped between schemas. This involves approximating an element in one schema to an element (or multiple elements) in another. Although this map can be developed when it is required, it is not a foolproof operation and compromise will most likely be needed; an element from one schema may not have a direct equivalent in another, for instance.
If using an in-house schema, it is best to plan ahead and facilitate future mappings as you construct your set of metadata elements. Mapping can also prove useful if a set of older ‘legacy' metadata was inherited along with a collection and this to be added to a newer management system or database.
UKOLN have an extremely useful list of existing mappings between many popular schemas.
Management systems and metadata
Metadata handling can be closely associated with the choice of software management system. A system may be as simple as creating a small database with fields for the required pieces of descriptive information and the location of a digital video file on an internal hard drive. Such simple systems allow for infinite customisation and are usually relatively straight forward to develop. But for collections that perhaps require a more complex set of metadata, a fully functioning Digital Asset Management System (DAMS) may be required. Dedicated management systems allow for more advanced operations and often support at least some commonly recognised metadata schemas. There are many different commercial and open source management systems available, so if you plan to acquire one you should, given the choice available, be able to fulfil your needs with at least one of them. A further point to note is that your choice of system can also be affected by local considerations such as the skills available to you in terms of staff and also the technical infrastructure of you institution. Therefore it may be worthwhile consulting with your IT department to ascertain which database software and systems can be supported.
Which information to record?
Metadata can be conveniently separated into different types; it is likely that your required set of metadata elements will include elements of each type (the following categories are taken from the Metadata Encoding and Transmission Standard or METS, other schemas define different but similar types):
- Structural metadata: describing the metadata record and its relationship to the digital video resource
- Descriptive metadata: which summarises the content of the digital video
- Administrative metadata: which includes rights metadata, information about analogue sources of digitised videos and preservation metadata
- Technical metadata: a special kind of administrative metadata which describes the properties of the digital video file itself
Is the type most closely associated with your content management system. If your collection is too large or complex to keep track of by using a simple database because it consists of multiple relationships between objects and their metadata records, using a CMS will probably provide an effective means of managing your collection's 'structure'.
If you wish to use a metadata structure which is more aligned to commercial digital video production, MPEG-21 DIDL (The Motion Pictures Expert Group's Digital Item Declaration Language) offers a structure similar to METS but one which can hold very complex structural metadata. If your video collection contains interactive elements of multiple picture streams or complex audio, consider MPEG-21.
Descriptive metadata is the primary retrieval gateway for most end users. As such, the kind of information recorded is likely to be very specific. In order to make your collection interoperable, consider making use of elements from the Simple DC metadata set such as: Title, Creator, Subject, Description and Coverage. These can either be used directly, if they fit your identified requirements, or ‘mapped' to, from the elements you decide to use (see below). The MODS schema is another strong alternative.
Generally, administrative metadata assists collection managers in organising, providing access to and preserving digital collections. Such information may not directly describe the resource itself, but may provide useful, even vital data from elsewhere, such as legal rights or the source of a digitised video's content. METSRights is an excellent schema which can be used to describe intellectual property rights. MPEG's Rights Data Dictionary (or RDD) is a highly detailed rights management schema, often used in conjunction with MPEG-21.
The PREMIS (PREservation Metadata: Implementation Strategies) schema offers elements designed specifically to assist with the complex task of digital preservation and is used alongside other schemas in order to achieve this special purpose.
Some technical metadata is required to make use of digital files, the file type for instance. Much richer technical metadata can often be automatically extracted from a file upon ingestion to a management system and so represents a large gain for little time outlay. JISC's Significant Properties of Moving Images (PDF file) report suggests a number of properties for digital video which it might be useful to record. Many of these are included in VideoMD a smaller schema created specifically for describing the technical properties of digital video.
Alternatively, if more detail is required, Adobe's XMP (Extensible Metadata Platform) schema has elements which together form a very detailed description of digital video material. XMP is usually used as embedded metadata (see below) but can also be imported and exported from a central management system. Another prominent contender for describing the technical aspects of digital video is MPEG-7 from the Motion Pictures Expert Group. This extensive schema offers digital video specific elements which together paint a comprehensive picture of a digital resource. Adaptors may wish to select elements from such detailed schemas carefully, as it is easy to make descriptions too rich and so overburden cataloguing staff.
Metadata schemas consist of well-defined elements and description of how these should be used. DC's ‘Title' element for instance is:
"The name given to the resource. Typically, a Title will be a name by which the resource is formally known... If in doubt about what constitutes the title, repeat the Title element and include the variants in second and subsequent Title iterations. If the item is in HTML, view the source document and make sure that the title identified in the title header (if any) is also included as a Title."
But the values which are used to populate an element's field are not necessarily as tightly controlled. Would Stanley Kubrick's timeless movie be ‘2001: A Space Odyssey' or ‘Two Thousand One: A Space Odyssey'?
Authority lists and vocabularies are controlled lists of terms or names which institutions can draw upon in order to maintain consistency. While inevitably there are some caveats given its broad remit, the Library of Congress Subject Headings is perhaps the best known of these and offers several different authorities for searching names, subjects, titles etc. You may also want to consult more subject focussed vocabularies which are listed in the advice document Controlling Your Language: a Directory of Metadata Vocabularies.
Using controlled vocabularies, whether from an existing list developed externally, or from one drawn up in house, provide consistency in use of terms and spelling. They will help to make sense of the collection in isolation and if used effectively can provide a sound basis for cross searching and data sharing with other similar collections.
An ISAN (International Standard Audiovisual Number) works in a similar way to a printed book's ISBN (International Standard Book Number), acting as a unique identifier for analogue or digital moving image works.
The International Standards Organisation (ISO) and similar organisation (such as the British Standards Institution) publish standards documents which can also help improve consistency. ISO 8601 for example, describes standard notation for time and date values. The IANA (the Internet Assigned Numbers Authority) list of MIME types should be used for classifying digital video file type and subtype (e.g. video/mpeg). These conventions can be drafted in to cataloguing procedures to improve uniformity. See our advice document on Vocabularies for more information.
All digital files can hold a certain amount of metadata in addition to the information which actually makes up the content of the file. This may be very basic (such as the file's name) or very complex (for instance an .avi file with embedded XMP metadata). A certain amount of embedded technical metadata is typically generated upon the creation of born-digital files. Additional descriptive metadata can be added later if required, using an editor.
Embedding metadata has the advantage of protecting against loss or unavailability of a central database. For instance, a student can be given or download a video file with no need for an accompanying metadata file and still have access to valuable contextual information. However, a central database is still recommended which can perform advanced and speedy searches across collections without finding and accessing the metadata embedded within many individual files. If embedded metadata is to be used, the challenge of synchronising the two sets of metadata should be met by the overarching management system. Remember to check this functionality if purchasing a digital asset management system as additional bolt-on modules or even separate harvesters/editors may be required, depending on the file types within your collection.
Digital video can be stored and managed in many different ways. Similarly, many different approaches exist for metadata handling. A commercial media production company is likely to handle metadata in a very different way to an audiovisual academic research library. Museums, national archives and galleries each have their own approaches. This will be true even if an identical digital video file exists within each organisation.
No single digital video ‘industry' exists and so there is no such thing as ‘industry standard' (although confusingly, this term is sometimes used as shorthand for ‘television production and broadcast industry'). As discussed above, interoperability can go someway towards bridging the gaps between the sectors, but entirely different cultures exist for video production, academic use, research and preservation. Ultimately a decision will have to be made; what kind of digital video collection do you want? And secondly, what other types of collections would you like to be interoperable with? Finding a comfortable balance between detailed metadata and interoperability can be a challenge.
An effective method of selecting and using metadata is to begin by drawing up a set of requirements that will best describe the resource at hand. When complete, assess your requirements against available standard metadata and vocabularies, and investigate the approaches to metadata handling of collections with similar aims and objectives to your own. This approach will a) ensure your resource is described in a way that makes it fit for its intended purpose, b) potentially save you time in developing cataloguing rules and accepted field terms and mappings, and c) ensure a better likelihood that your resource will be interoperable with other related collections.
JISC Digital Media provide a helpdesk service and can assist by providing expert tailored advice on your own collection and the associated metadata.