Free help and advice to the UK Further and Higher Education community

Helpdesk

Putting Things in Order: a Directory of Metadata Schemas and Related Standards

Last updated: 07 January 2010
Published in: Managing your digital resources
Tags: business & community engagement | metadata

Comment icon Comments (1)

Summary

This directory provides a list of formal metadata schemas and related standards giving brief descriptions and links to further information. It complements JISC Digital Media’s series of advice documents on metadata, which the reader is strongly advised to read alongside this resource.

Introduction

A metadata framework can be viewed as having five key components:

  • A schema (the categories of information you choose to record)
  • Vocabulary (specific ‘words’ or ‘values’ you enter into those categories)
  • Conceptual model - the underlying model that describes how all the information and concepts inherent in a resource are related to one another
  • Content standard - practical standards that describe how specific information (e.g. vocabularies) should be entered within metadata schema categories (e.g. Cataloguing Cultural Objects)
  • Encoding - which is concerned with the way the metadata is presented (e.g. XML)

This directory makes reference to all of these except metadata vocabularies which form a separate advice document, see:

Readers should note that we are using “schemas” in a broad sense, to describe a set of categories (i.e. “elements” or “units”) of information used to describe resource.  As JISC Digital Media’s metadata advice documents describe, metadata schemas can be differentiated in many different ways, for example:

  • Their size and scope (e.g. comprehensive or ‘core’; emphasis on description, administration, preservation; concern with single items or collections or both)
  • Things they describe (e.g. art images, audio, video, objects, books, places)
  • Communities they serve (e.g. libraries, museums, educators)

This document does not attempt to categorise the schemas and other standards: it presents them within an alphabetical listing. The descriptions will provide some information about their scope, but readers should refer to JISC Digital Media’s still image, moving image or sound specific advice documents for more context.

Furthermore distinctions between schemas, conceptual models, content standards, and encoding standards are often not fixed or discreet. Several metadata schemas describe their underlying conceptual models, provide guidance on what data might to be entered within their categories, or indicate how the metadata should be encoded. Dublin Core, for example, provides all of these.

Listing

Cataloguing Cultural Objects (CCO)

CCO is a data content standard, providing guidelines for entering data into schemas relating to cultural objects, particularly the VRA Core and CDWA Lite (see below). CCO was developed by the US-based Visual Resources Association (VRA) with significant input from the Getty Research Institute.

Main link:

Categories for the Description of Works of Art (CDWA)

CDWA is an extensive metadata schema for cataloguing objects held by art museums. It was developed in the US in the 1990s by the National Endowment for the Humanities (NEH), College Art Association (CAA), and J. Paul Getty Trust, and is maintained by the Getty Research Institute. A second edition of CDWA was published in 2000 and revised in 2006. An XML encoding of selected categories from CDWA was developed in 2005, called CDWA Lite. CDWA Lite is intended to work with the CCO data content standard (see above) and the OAI-PMH encoding and harvesting standard (see below).

Main links:

Other useful links:

CCO see Cataloging Cultural Objects

CDWA see Categories for the Description of Works of Art

CIDOC CRM see Conceptual Reference Model

CRM see Conceptual Reference Model

Conceptual Reference Model (CRM)

The CRM is a conceptual model, providing formal definitions and structures for describing the concepts and relationships used in cultural heritage documentation (e.g. by museums or archives). The CRM is not a metadata schema but can be used as a tool for analysing and mapping existing schemas or as a guide to creating new schemas. The development of the MIDAS standard (see below), for example, has been influenced by this model. CRM can be represented as XML and used within Semantic Web applications. The CRM was developed by the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM) and achieved ISO standardisation in 2006 (ISO 21127). There is an overlap between CRM and FRBR (see below) and there are efforts underway to harmonise these two standards.

Main link:

Other useful links:

DACS see Describing Archives: A Content Standard

DCMES or DCMI see Dublin Core

Describing Archives: A Content Standard (DACS)

DACS is a US content standard for archival description, based on ISAD(G) (see below). Published by the Society of American Archivists in 2004 and only available in print, it provides guidance on how information about archival resources might be entered into schemas such as EAD or MARC 21 (see below).

Main link:

Other useful links:

Dublin Core

Dublin Core is a generic metadata schema (i.e. intended to be able to describe any type of resource) which has been widely used and adapted. Developed from the mid 1990s through a process of international collaboration, it is maintained by the Dublin Core Metadata Initiative (DCMI). In its simple 15-element form, Dublin Core has achieved NISO and ISO standardisation. In addition, there is also a larger set of elements and sub-elements (DCMI Metadata Terms) and several ‘application profiles’ (versions of Dublin Core developed for particular purposes). There are XML encodings of Dublin Core and a version for harvesting via OAI-PMH (see below). Many of the other schemas in this listing have been based on or influenced by Dublin Core. Most have established mappings to Dublin Core for the purpose of interoperability.

Main links:

Other useful links:

EAD see Encoded Archival Description

Encoded Archival Description (EAD)

The EAD metadata schema provides an XML encoding for archival descriptions. It adopts a multi-level approach to description, providing information about a collection as a whole and then breaking it down into groups, series and (if significant) individual items. EAD grew out of work done at UC Berkeley in the mid 1990s and was influenced by TEI and ISAD(G) (see below). Version 1.0 was released in 1998 with a major revision in 2002 (Version 2002). EAD is maintained by the US Library of Congress and Society of American Archivists, but is used internationally, including the UK. The DACS content standard (see above) provides guidelines for US archivists on how to enter data into EAD.

Main link:

Other useful links:

EXIF (Exchangeable Image File Format)

EXIF is a technical metadata standard that can be written to and read from a still image file itself (JPEG and TIFF formats). It was developed by JEITA (Japan Electronics and Information Technology Industries Association) to enable camera manufacturers to write technical data into digital images (e.g. camera settings). Although primarily used by digital cameras, some scanners will also write EXIF data.

Main link:

Other useful links:

FRBR (Functional Requirements for Bibliographic Records

FRBR is a conceptual model for describing information resources within a library context. It describes particular entities (e.g. Item or Person) and their relationships (e.g. Item is owned by Person). Like the CRM (see above) FRBR is not a metadata schema, but a model that can be used to analyse existing schemas or influence the development of new schemas or content standards. It is currently being drawn on in the development of the RDA content standard (see below). FRBR is an international model, published in 1998 by a working group of the International Federation of Library Associations (IFLA). A working group was established in 2002 to review and further develop the standard. One of its tasks is to look at how FRBR and the CRM (see above) can be related.

Main link:

Other useful links:

Functional Requirements for Bibliographic Records see FRBR

General International Standard Archival Description see ISAD(G)

IEEE LOM (Learning Object Metadata)

The IEEE LOM provides a metadata schema for describing learning resources. The first part of the standard, the set of metadata categories, was published in 2002. A later part, published in 2005, outlined how LOM was to be encoded as XML. The UK LOM Core (see below) provides a UK version of the IEEE LOM.

Main link:

Other useful links:

IPTC (International Press Telecommunications Council)

IPTC is both the name of an organisation and a descriptive metadata schema. The IPTC schema can be written to and read from an image file itself.  From the mid-1990s, through the Council’s work with Adobe, it has been possible to embed IPTC metadata directly into the header of JPEG and TIFF image files. In 2005 the Council released “IPTC Core”, a standard for using IPTC within Adobe’s XMP schema (see below). This enables IPTC data to be incorporated (via XMP) into a wider range of image formats (e.g. JPEG, TIFF, JPEG2000, PNG, DNG, SVG). In July 2008 the IPTC Photo Metadata 2008 standard was released.

Main link:

Other useful links:

ISAD(G) (General International Standard Archival Description)

ISAD(G) outlines metadata elements that should be used in the description of archival collections. . It adopts a multi-level approach to description, providing information about a collection as a whole and then breaking it down into groups, series and (if significant) individual items. ISAD(G) has influenced national archival standards and the development of the international archival encoding schema: EAD (see above) and the European SEPIADES schema (see below). ISAD(G) is in its 2nd edition, published in 1999.

Main link:

Other useful links:

ISAN (International Standard Audiovisual Number)

ISANs can be compared to ISBNs (International Standard Book Numbers) for printed matter. They are a system of unique identifiers for analogue or digital resources which are used to ensure consistency across catalogues.

Main link:

LOM see IEEE LOM and UK LOM Core

MARC (Machine-Readable Cataloguing)

MARC is a family of metadata standards for representing library resources. Although chiefly used by libraries to describe bibliographic material (books or periodicals), it is also sometimes used to describe non-book material (e.g. images) or archival collections. MARC is a very extensive and formalised standard, with hundreds of potential categories and a rigid way of encoding its data. In the past, individual countries developed their own versions of MARC (e.g. UKMARC), but many are now converging to the current version: MARC 21 (published in 1999 and maintained by the US Library of Congress). MODS (see below) provides a sub-set of MARC encoded as XML, but there are also efforts underway to provide an XML encoding of the larger MARC standard (MARCXML). Many libraries have relied on the Anglo-American Cataloguing Rules for guidance on how to enter data within MARC, but this will be replaced by the new RAD content standard (see below).

Main link:

Other useful links:

Metadata Encoding and Transmission Standard see METS

Metadata Object Description Schema see MODS

METS (Metadata Encoding and Transmission Standard)

METS is a standard for encoding metadata within an XML format. Although it contains descriptive and administrative elements of its own, a key function of the METS standard is to structure or “package” other metadata or data for exchange or delivery. METS can embed or link to other XML-based metadata (e.g. MODS, MIX, PREMIS or TEI, see below). Any number or type of digital files can be described and linked together by a METS record, enabling it to represent very complex digital resources (e.g. a whole digitised book, with bibliographic data, images and transcribed text). METS grew out of work in the mid 1990s on the Making of America II (MOA2) digitisation programme sponsored by the US Digital Library Federation. It is now maintained by the US Library of Congress. METS 1.1 was released in 2001; the current version is 1.5, released in 2005.

Main link:

Other useful links:

MIDAS Heritage: the UK Historic Environment Data Standard

MIDAS is a UK standard for describing cultural heritage assets that form the historic environment (buildings, archaeological sites, shipwrecks, areas of interest, artefacts and ecofacts). MIDAS was published by English Heritage in 1998 with slight revisions in 2000 and 2003. It is now being developed by the Forum on Information Standards in Heritage (FISH). In 2004 FISH published MIDAS XML, representing an improved version of MIDAS with XML encoding. The development of MIDAS has been influenced by both SPECTRUM (see below) and the Conceptual Reference Model (CRM, see above).

Main links:

Other useful links:

MIX (NISO Metadata for Images in XML)

MIX is an XML-based schema for encoding the NISO Technical Metadata for Digital Still Images standard (see below). It is being developed and maintained by the US Library of Congress. MIX could be incorporated within any XML-based metadata, but is particularly intended for use within METS (see above). MIX is currently in its second version.

Main link:

MODS (Metadata Object Description Schema)

MODS is an XML-based metadata schema for encoding information about library resources (particular books). It is based on a subset of the MARC 21 standard (see above). The first (draft) version 1.0 was released in 2002. The current version is 3.2, published in 2006. MODS is maintained by the US Library of Congress and is often used with METS (see above).

Main link:

Other useful links:

MPEG-7 (Moving Pictures Expert Group)

MPEG-7 is a multimedia metadata schema which can be used to provide rich descriptions of digital image, digital video or digital audio content. One key strength of MPEG-7 is the ability to segment time-based media and attribute different metadata to each part. When constructed MPEG-7 was intended to take into account aspects of several other schemas such as: the SMPTE (Society of Motion Picture Technical Experts) Metadata Dictionary, Dublin Core, P/Meta and TV-Anytime. MPEG-7 can be used alone or as a technical metadata schema within models such as METS or MPEG-21.

Main link:

MPEG-21 (Moving Pictures Expert Group)

MPEG-21 has a similar purpose to METS (see above): providing an XML framework for “packaging” sets of metadata and files representing complex digital resources. Of particular interest is MPEG-21 Part 2: Digital Item Declaration Language (DIDL), which describes the digital resource, and Part 5: Rights Expression Language (REL), which details rights-related information (e.g. copyright). To date MPEG-21 has only been used experimentally with digitised collections.

Main link:

Other useful links:

OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting)

OAI is an important initiative to facilitate the interoperability of metadata records.  The OAI Protocol for Metadata Harvesting (OAI-PMH) provides a means of requesting metadata records from OAI-compliant repositories.  In other words, it works by ‘harvesting’ (from data providers) metadata that has been made available in a standard xml format. This harvested metadata can then be searched together from one place (from a service provider). In order to take advantage of OAI, those maintaining digital repositories or collections must provide an encoding of their metadata as simple Dublin Core (see above), although other metadata schema can be additionally supplied. The current version of OAI-PMH is 2.0, released in 2002.

Main link:

Other useful links:

NISO Metadata for Images in XML Schema see MIX

NISO Technical Metadata for Digital Still Images

NISO Technical Metadata is a “data dictionary” rather than a formal metadata schema: a list of categories of data that might be used to describe the technical aspects of raster-based digital images (e.g. TIFF, JPEG, GIF). The standard is very lengthy and few if any implement it in its entirety. NISO Technical Metadata does not specify any particular encoding, so the MIX standard (see above) is being developed to represent it as XML for incorporation into other XML-based schema (e.g. METS, see above). NISO Technical Metadata has had a long development. Work began in 1999 and its current version was made available in 2006.

Main link:

Other useful links:

P/Meta

P/Meta is the European Broadcasting Union’s own metadata standard, developed by EBU members on a not-for-profit basis. It is intended to be used to exchange media between professional media broadcasting organisations
P/Meta was designed to be flexible and suitable for use in a wide range of broadcasting activities and be both language and system independent. P/Meta has, so far, not found wide support outside of the broadcasting industry.

Main link:

PBCore

PBCore (or the Public Broadcasting Metadata Dictionary) is intended for use by television, radio and web broadcasters and hopes to be a standard way of describing and using this metadata, allowing content to be more easily retrieved and shared among colleagues, software systems, institutions, community and production partners, private individuals and educators. It can also be used as a guide for an archival or asset management process at an individual station or institution. As with other, primarily technical, metadata standards PBCore can be incorporated to cover technical metadata within structures such as a METS record.

Main link:

PREMIS

Like NISO Technical Metadata (above) PREMIS provides a “dictionary” of core metadata elements that can be used to support the digital preservation of a resource. PREMIS was based on an international survey of practice and on previous research and development. It was particularly influenced by a conceptual model called the Open Archival Information System (OAIS), which provides a conceptual framework for preserving digital (and non-digital) resources. The current version of PREMIS (1.0) was finalised in 2005 but planning has begun (in 2006) for version 2.0. The official website provides an XML-encoding for PREMIS, which is intended to facilitate its use with other XML-based metadata such as METS (see above).

Main link:

Other useful links:

RDA see Resource Description and Access

Resource Description and Access (RDA)

RDA is a data content standard (like CCO or DACS above), the online version of which is due for publication in June 2010.  RDA is intended to provide guidance on how data should be entered into library-based metadata schemas. It represents a major revision of an established library standard called the Anglo-American Cataloguing Rules (AACR). RDA will draw on the FRBR conceptual model (see above) and will focus on the use of library standards such as MARC (see above).

Main link

SEPIADES (SEPIA Data Element Set)

SEPIADES provides a metadata schema for describing archival photographic collections. It is closely based on ISAD(G) (see above), adopting a similar multi-level approach to description. SEPIDES was developed by the European-funded SEPIA Project (Safeguarding European Photographic Images for Access, 2000-2003). SEPIADES was published in 2003 as a set of “Recommendations for cataloguing photographic collections”. It does not provide a particular encoding, but in 2004 one of the SEPIA partners released a cataloguing tool that incorporates the standard and generate records in a Dublin Core format suitable for harvesting under OAI-PMH (see above).

Main links:

Other useful links:

SMPTE (Society of Motion Picture Technical Experts) Data Dictionary

The metadata dictionary structure defined by the SMPTE dictionary covers the use of metadata for video, audio and multimedia data in their various forms. SMPTE metadata must conform to definitions published in the ‘dictionary
structure standard’ and the associated ‘metadata dictionary recommended practice’ (SMPTE RP 210). SMPTE RP 210 also defines a registered set of metadata element descriptions. Although the SMPTE dictionary forms the basis of many other descriptive schemas (such as MPEG-7). It is rarely used in isolation.

Main link:

SPECTRUM

SPECTRUM is a key UK standard for museum documentation. It recommends “units of information” that can be used to document museum procedures (and objects). First published in 1994 and currently in its 3rd edition (released in 2005), SPECTRUM is developed and maintained by the Collections Trust. 

Main link:

Technical Metadata for Digital Still Images see NISO Technical Metadata

TEI or TEI-Lite see Text Encoding Initiative

Text Encoding Initiative (TEI)

TEI is a standard for describing and encoding literary texts. The first part of an encoded work (TEI Header) provides metadata about the work, while the remainder of the file ‘marks up’ the transcribed text, indicating chapters, paragraphs, and other noteworthy features. TEI was founded in 1987 with the first version released in 1990. However it is version P3 (1994) that most TEI implementers have used. TEI was based on SGML, a precursor to XML, so version P4 (2002) converted the standard into XML. The latest version (2007) is P5.  In addition, there is a simplified version of TEI, called TEI Lite.

Main link

Other useful links:

TV-Anytime

Within the TV-Anytime metadata specification, the most visible parts of metadata are the attractors/descriptors (used in Electronic Program Guides (EPG) or in web pages to describe multimedia content). This is the information that the consumer uses to search and select content available from a provider. Another set of TV-Anytime metadata describes user preferences, representing  consumption habits, and defining other information such as demographics models, for targeting a specific audience. Whilst highlighting practical interactivity the TV-Anytime schema cannot easily be incorporated into other metadata models.

UK LOM Core

The UK LOM Core provides a metadata schema for describing learning resources. UK LOM Core is a UK version of the international IEEE LOM standard (see above). It specifies a “core” set of LOM elements that should be used, and provides guidance on how data should be entered within the LOM‘s categories. The development of the UK LOM Core is being coordinated by CETIS (the Centre for Educational Technology Interoperability Standards) and the standard is still in a draft form (2004).

Main link:

Other useful links:

VideoMD

VideoMD was designed by the Library of Congress and is a schema specifically constructed to describe the technical elements of digital video. The schema uses 36 top level elements and is expressed as XML (eXtensible Mark-up Language). VideoMD does not aim to cover all aspects of a video resource, only technical features. As such VideoMD is best used in conjunction with other systems, such as within METS records. Although the schema is well defined and detailed, at time of writing surprisingly few examples are available.

Main link:

Visual Resources Association Core Categories see VRA Core

VRA Core

The VRA Core is a widely used metadata schema for describing art or cultural images, providing 17 core categories. The current version is VRA Core 4.0 (2007). VRA Core was originally based on CDWA, but later versions have been heavily influenced by Dublin Core (see above). The new version (4.0)  draws on the CCO content standard and provides an XML encoding for VRA Core.

Main links:

Other useful links:

XMP (Extensible Metadata Platform)

XMP is an XML-based open Adobe standard, used within Adobe’s imaging software and also by an increasing number of third parties. XMP can incorporate metadata from other schemas (such as Dublin Core and IPTC, see above) and write this data into an image file.

Main link:

Other useful links:

Z39.85 see Dublin Core

Z39.87 see NISO Technical Metadata for Digital Still Images, Data Dictionary

Last updated: 07 January 2010
Published in: Managing your digital resources
Tags: business & community engagement | metadata

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Comments (1)

1 of 1

Comment posted by Bram Walraet on 16 May 2010 at 1:36pm

very useful document, thanks!

Post your comment

How was this document useful to you? Do you have any questions?

Name

Email (required, but will not be shown)

URL (optional)


Please note: All comments are reviewed by a moderator for approval