Last updated: 21 November 2007
Published in:
Managing your digital resources
Tags:
business & community engagement |
digital collections |
digital preservation |
dvd |
storage |
sustainability
This document looks at the issues involved with storing digital images on optical media, specifically recordable CDs and DVDs.
It has only been in the last few years that many digitisation projects have taken a serious approach to digital preservation. In that time CD-R quickly become the most common optical media for preserving digital images. More recently there has been a move towards DVD-R due to the additional space that it offers. The popularity of both these media is largely due to their ease of use and low cost. However during this time, there has been a steady flow of worrying reports of ‘lost data’ or ‘unreadable disks’, which has lead to the reliability of laser-burnt media being questioned.
Much of this worry has been centred on the choice of disc manufacturer and the construction method used to create the disc. Of course a project should always buy the best and most reliable discs it can find. However, it should be remembered that loss of digital data stored on optical media is far more likely to be caused by bad writing (burning) or storage rather than by badly manufactured discs in the first place. Optical media can provide reliable back up at a reasonable cost as long as great care and attention is paid to burning them and storing them correctly.
To ensure reliability of discs in the long-term, it is imperative to follow a few ‘best practice’ steps.
Remember the choice of preservation media is only one small part of the whole Digital Preservation Strategy; make sure you plan to use all known best practice from the very outset of the project. One of the first and most important issues to consider is “who is going to ‘own’ this valuable resource long after your project has finished?” This ownership will come with responsibilities to protect and look after the data (whichever optical media has been used to preserve it). This will require an ongoing budget - part of which will be needed to cover future migration to new storage media.
See JISC Digital Media advice document Establishing a Digital Preservation Strategy for further advice on preservation strategies.
Whilst CD-R has been the de facto choice for preservation, DVD-R is becoming a preferable alternative. DVD-R discs hold up to 4.7Gb of data compared to a CD-R’s 700Mb and there are an increasing number of recordable DVD drives available (which also read/write CD-Rs).
As with most aspects of digitisation, archival media continues to develop. Looking beyond CDs and DVDs, a number of new media are emerging. These include Blu-ray Discs which offer much higher amounts of storage than DVDs. A single layer Blu-ray Disc can hold up to 25Gb of information while a dual layer Blu-ray Disc can hold up to 50Gb.
There are also a number of other emerging media that are currently under development. These include:
While it is not possible to predict which media will be adopted in the future, it is safe to assume that technology will move on, so it is essential to be prepared for future migration.
Whatever optical media you decide to use, it is best to steer clear of cheap ‘cake box’ or ‘spindle pack’ bulk buys - not only are the discs themselves likely to be of poorer quality, but you don’t get the jewel cases, which are essential for storage (see below). Stick to one of the big well-known makes, or look at independent reviews. Some of the big manufacturers (Kodak, MAM-A and Verbatim) make special ‘archive’ discs, often under the name of ‘gold’, but at a premium cost. The adage ‘you get what you pay for’ can be considered to be as true with discs as with any other purchase and your master image archive is unlikely to be the best place to try and cut costs.
Do not rely on other people’s disc recommendations unless you are using exactly the same hardware as they are - some brands may work well in one burner but perform badly in another. You should always test-write a few discs and then check them for readability (if possible on a range of disc drives on different platforms) before opting for one brand over another.
Whilst it is important to make sure the discs you buy are of dependable quality, their long-term reliability has as much to do with the writer that recorded them, and the drive that reads them. Equally significant are the actions you take when burning and how well the discs are treated before, during and after burning.
Do not attach labels to the discs; use a ‘xylene-free’ marker to mark the top face. Never touch the bottom face, but be aware that the top surface is equally vulnerable and it is under the top layer where the data is recorded - if scratched the disc can be rendered unusable.
All optical storage media should be regarded as very fragile and are much more susceptible to environmental conditions than commercially produced discs. They should be protected from dust, pressure, heat, cold, high humidity and most importantly, light. It is important to keep them in jewel cases, where nothing is allowed to touch their surfaces, upright, in a dark cool place. You might wish to consider using preservation quality jewel cases as research has shown that some cheaper cases give off corrosive/acidic gasses from the plastic, which can harm the coating of the discs.
Remember that both the discs and the burner will both support burning at a range of speeds. It is best practice to not stretch any of the capabilities of the equipment and to work at below maximum on both burner and disc.
Some programs may allow you to record data beyond the recommended amount onto the disc. Whilst this can be useful for a one-off, it is not recommended for archival purposes as it increases the likelihood of it not being read by other disc drives, even though it is likely that the burner that created it will be able to read it. As good practice, it is always best to stay within the recommended limits of the manufacturer. However it should be noted that, in the case of CD-Rs, manufacturers are now extending the recordable space from 74min 640Mb to 80mins 700Mb. So far there has been no recorded problem with using this larger size. That said as an archive media, it might be considered wiser to stick to the smaller and universally accepted 74min 640Mb discs.
It is best practice to burn using a machine dedicated to the task, but if this is not possible, try to ensure other applications are not being used while it is burning. If the machine is networked, make sure that it does not receive calls from elsewhere on the network – if it does, could it be worth temporarily removing it from the network? The recording process must not be interrupted once it has started – any interruptions cause the write process to fail.
To avoid this, CD and DVD recorders have a ‘write buffer’ that stores data as it is read from the hard disk. The data is then pulled from the buffer as needed by the recorder, so allowing for a continuous write process. If the burner runs out of data in the buffer (also called cache) to burn to the disc, then a ‘buffer underrun’ occurs and the recording process aborts. Most new disc burners offer some sort of ‘buffer underrun’ protection, which recognises if the buffer is about to empty, stops recording and restarts only when the buffer is full again. Using a Small Computer System Interface (SCSI) burner and following the other steps relating to burning outlined here can also help avoid buffer underrun.
Burn all the data/files in one go and finalise the disc, rather than adding files ad hoc to the same disc. This method is more reliable, more efficient, faster and gains more space, although you lose the advantage of being able to incrementally add to the disc. Burn direct from a fast disk on the host computer, not from the network, and preferably not from another CD-R or DVD-R, although most CD-R and DVD-R are now fast enough to reliably support disc-to-disc recording. If you have problems doing this then try again at a slower burning speed.
When naming files to be burnt to disc, there are a number of systems that can be used. The most reliable method is to use the ISO 9660 standard. ISO 9660 is the International Organisation for Standardisation’s file system for CD-ROMs and CD-Rs that specifies the directory format of the disc and allows read-only interoperability across all computer platforms. It uses the 8.3 file naming convention (i.e. an eight-character filename followed by a three-character file extension, such as filename.jpg).
It is best practice to use the 8.3 system, but if you do need longer file names, the Joliet system can support them. Joliet is a Microsoft extension to ISO 9660 that allows longer filenames up to 64 characters in length. It should provide a disc readable across all formats, but problems arise where the actual filename is truncated when read on Mac or Unix platforms (e.g. alongfilename.jpg reads alongfi~1.jpg). Software is available to overcome this, but to ensure true cross-platform interoperability ISO 9660 is recommended.
A relatively new file system, the Universal Disc Format (UDF) ( http://www.osta.org/specs/index.htm), a subset of the ISO 13346 standard, allows up to 127 characters per filename. UDF is designed to take advantage of packet writing, where files can be written to disc incrementally in small packets (this is not recommended – see Burn ‘disc-at-once’ above). Although companies such as Sony, Philips and the Optical Storage Technology Association (OSTA) have approved UDF as a standard, it is still relatively unsupported and additional software is required to read UDF files.
See also JISC Digital Media’s advice on Choosing a File Name.
This is readily available within most burning software on both the Mac and PC platforms. Although it certainly slows the workflow, it simply cannot be taken out of the process - would you write a letter and then not read it through at the end? Some systems provide a ‘test’ to try the burn out first to make sure the system can cope with the demands made of it. In general within a normal workflow this can be avoided - it doubles the time taken to write the disc and it is more important to check it at the end and redo if necessary. The time taken to ‘test’ before each burn is normally too much to support within a busy workflow.
In addition to verifying your discs, and even if you are making lots of them, it is strongly recommended that each and every one should be tested - and on a different machine/platform to that which created it. As mentioned, you should always burn the disc in one go on a machine dedicated to the task, but despite these precautions problems can occur where early files are readable (and pass verify) but later files, although visible within the file structure, cannot be read. To be sure of the quality of your archive, EVERY disc should be checked by having at least two files pulled and actually opened.
Once you have burnt and checked an archive disc, it should go directly into storage and remain there. If you need the information on that disc for any reason, it should be copied straight onto another ‘working copy’ of the disc then put straight back into the archive. On no account should the archive disc ever be used for day-to-day use.
However good your system and however dependable your choice of discs, it is a hard and disappointing fact that they are not 100% reliable, and it can only be good practice to make two copies and keep them in different places. Also remember that even if all ‘best practice’ is used when creating and storing the discs, they should be migrated well within their expected lifetime. For CD-R/DVD-R current best practice suggests this should be between 2 and 5 years. Take encouragement that all storage costs are continually getting cheaper and more reliable, so it can only get easier.
Remember these discs contain the definitive versions of your image files and are therefore the backbone of your collection - it is imperative that you look after them to the very best of your capability.
Make sure that you keep an independent database that records all the content of your archive discs. As good and reliable as the images and discs might be, they will be useless if at a later date you can not tell what data is on them or be able to find any images contained within the archive. This indexing metadata should of course all be stored within your Image Management System, however it is often worth backing up this index with a simple database or spreadsheet that records the total contents of each and every archive disc, which can then be stored independently to all the other metadata. If at a later date there is any problem with migrating the information within the IMS, then at least the contents of the archive can be searched and extracted. These files should be small and in a non-proprietary, easy to migrate format. At any later migration of the archive, a copy of these indexing files should be included with and stored alongside the image archive.
Optical media discs have very extensive and complicated internal error-correction. It is quite normal for discs to have errors and faults within them, but these errors can normally be automatically corrected by the disc drive as it reads the disc. However there comes a point where the disc has so many errors on it that the error correction can no longer work out the correct data and the disc fails. Precisely where this point occurs is on something of a sliding scale: it is quite likely that a disc that has become ‘just’ unreadable on one machine might still ‘just’ work on another, or at another time, or later in the day when it is cooler …or one of many other possible variables.
So if you have a disc which you know used to work (it should have been checked immediately after burning) but has now stopped, don’t immediately panic. Give it a gentle clean with a very soft cloth and then try it on another drive and, if possible, on the drive which created it (keep this information with the other indexing metadata in your archive index). You might be lucky!
You might well find that there are some CD drives that are much better at reading error-ridden discs (some older SCSI CD drives) and others that are especially bad (such as many modern DVD-ROM drives). Get to know which these are and copy any suspect discs as soon as you grow suspicious of them.
Be aware that the ‘error correction’ process takes time and an error-free disc will open much faster than an error-ridden one. If you notice that a particular disc is much slower than other similar discs, consider this a warning and copy it as soon as possible to another disc.
If all else fails and you still cannot read the data from the disc, it might be worth considering using some CD-R and/or DVD-R data-reclamation software, such as BadCopy Pro or Data Rescue.
However, experience would suggest that the time taken to undertake any data reclamation (with no guarantee of success) can often be longer than the time needed to recreate the image data on the lost disc (if this is possible).
Last updated: 21 November 2007
Published in:
Managing your digital resources
Tags:
business & community engagement |
digital collections |
digital preservation |
dvd |
storage |
sustainability
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++