Last updated: 07 January 2010
Published in:
Managing your digital resources
Tags:
business & community engagement |
metadata
This directory provides details of more than 70 vocabulary sources. It categorises the various types of vocabularies available to us as Thesauri, Subject headings, Authority lists and Classification schemes. Thesauri, subject headings and word lists more generally, are used primarily in aiding retrieval, whereas Classification schemes help us to organise resources, and Authority Lists help us to standardise the expression of values used in our metadata, like, for example, the way we enter names and dates. Although there are overlaps, broadly speaking each serves a different purpose in helping to control the terminology used in our schemas and in aiding the search and retrieval of our resources.
There are many vocabulary sources already available and it makes sense to check these out before inventing your own. Depending on your particular needs you might find yourself:
Of course, you could also use a combination of these approaches. It is quite reasonable to use multiple vocabularies, for example, a formal controlled vocabulary plus additional keywords the cataloguer thinks will assist in retrieval.
In choosing a vocabulary, you should bear in mind:
This directory presents a selection of formal vocabularies, most of which are available via the Internet. Brief introductions are given to the different types of vocabularies and their uses.
Thesauri, subject headings and word lists are sources of subject terms and their primary purpose is to aid retrieval.
A thesaurus orders its words hierarchically. If you look up a particular term (e.g. houses), you are likely to find references to Broader Terms (e.g. buildings), Narrower Terms (e.g. cottages), or Related Terms (e.g. palaces - terms which are different, but overlap in meaning). Where there are different words with the same meaning (e.g. houses and dwellings), a thesaurus will also tell you which is the preferred term (e.g. “dwellings, USE houses”). The thesaurus’s hierarchical structure is intended to help you find a suitable subject term at the appropriate level of detail.
dwellings
USE houses
houses
BT=buildings
NT=cottages
RT=palaces
USE FOR dwellings
Subject headings are often arranged like a thesaurus, so the distinction is not always clear. However, instead of giving you a single term or phrase to use, as a thesaurus does, subject headings often enable you to link or coordinate terms to produce long phrases or strings of terms (this is sometimes referred to as ‘pre-coordination’). For example, the Library of Congress Subject Headings (LCSH) bring together the concepts “Art” and “War” to form the heading “Art and war”. You can further coordinate this with headings for particular wars, for example “World War, 1939-1945 - - Art and the war” (this latter example, using the ‘- -’ notation, is known as ‘subdivision’ - dividing up a main concept with another concept). The published LCSH is very big, including 270,000 pre-formed headings, but because of the way headings can be coordinated and sub-divided, the total number of potential headings is incredibly vast.
Sometimes people use thesauri to generate subject headings, for example “buildings - houses - cottages” (from our thesauri example above). This goes against traditional indexing practice, which insists that you take the thesauri term at the appropriate level and don’t include any of its broader terms, but it can make good sense in the age of digital retrieval. If we only added “cottages” to a record, a search on “buildings” would not retrieve it (unless the search software was quite sophisticated). So in this example, including the broader terms in the hierarchy would greatly improve the search results. Some cataloguing systems now do this automatically - if you choose a term from their thesaurus, they automatically insert all of the broader terms into the catalogue record. This kind of practice is blurring the distinction between thesauri and subject headings.
We’ve included the term “word lists” in our heading for this section to catch the simpler lists of words that are not coordinated like subject headings or organised hierarchically like thesauri. These sorts of vocabularies are, typically, simple alphabetical lists of terms or phrases. They’re also often created locally, for particular projects or institutions. The IEEE 1998 Keyword List (see below) offers an example of such a word list, although this is probably much longer than any list you would produce ‘in-house’.
Classifications are sources of subject categories and their primary purpose is to organise resources.
Traditionally, the main purpose of subject headings and thesauri terms was retrieval, while classification schemes were more about putting things ‘in their place’ on a shelf, in a box, into a category, etc. Generally (there have always been exceptions), an item would be assigned many different subject terms, but only one classification. This makes perfect sense in a physical world, but in the virtual world there is no reason why something shouldn’t have more than one ‘location’. So the distinction between classifications and subject terms is beginning to break down.
Classifications are usually hierarchical: they start off with broad subject areas and then break them down into increasingly narrower topics. In this way they resemble thesauri, but classifications are generally much more rigid in their structure. While it is entirely feasible for a thesaurus term to have more than one broader term (this is known as ‘polyhierarchy’), a classification scheme will break down its subject domain in just one way. Because of this, classifications offer a single ‘world view’, imposing a structure that is never going to satisfy every user. And, unlike thesauri terms, classification schemes declare their structural biases openly through the numbers and codes they employ. For example, in the Dewey Decimal classification resources on Buddhism are usually classified at “294”. These digits are meaningful: the 200s are for “Religion”; the 290s, “Other and comparative religions” (note that most of the numbers from 200-289 are devoted to Christianity); and the 294s, “Religions of Indic Origin”. Here the nineteenth-century Western world view upon which the Dewey classification is based becomes apparent.
The classification scheme’s use of codes or numbers is the other important feature that distinguishes it from other kinds of controlled vocabulary, which are word-based. This coding can be used to advantage in a digital context, especially if it is based on a decimal system, like Dewey or the UDC (see below). Since numbers are much more “machine-readable” than words they can be used to advantage in searching. For example, searching for all the Dewey Classifications beginning with “2” would retrieve items relating to religion. They can also be used to generate hierarchical browse interfaces: users might be shown the first 10 subject categories, then choose one of these to view 10 sub-categories, then one of these to look at the next level… etc. Some of those building online collections are taking advantage of these opportunities.
Authority lists help you control names.
The other main grouping of controlled vocabularies are “authority lists” or “authority files”. These are sources of proper nouns (e.g. people, organisations, places). Names and places could be included in general thesauri or subject headings, but it makes sense to keep them in separate lists or databases.
Some institutions are cataloguing resources related to internationally known figures, such as authors and artists. In these instances, they can draw on common authority lists like the Library of Congress Authorities (see below), which includes the names of nearly 4 million individuals. Other institutions, like archives and museums, have resources relating to people that are not widely known. They cannot draw these names from a common list, but must create their own authority records, using particular rules like the ISAAR(CPF) or the UK National Council on Archives Rules (see below).
There are several online sources of place names listed below. Other sources could include atlases or official maps. Increasingly, digitisation projects are adding other place references such as postcodes or geospatial coordinates.
If chosen and managed carefully, controlled vocabularies can make cataloguing easier and improve the retrieval and presentation of items from your collection. Careful choice and management of vocabularies are key as:
Last updated: 07 January 2010
Published in:
Managing your digital resources
Tags:
business & community engagement |
metadata
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++