Generic Image Digitisation Workflow
This document reviews the general concepts of 'best practice' within an image digitisation workflow and then looks at how these concepts can be mapped onto the workflow, to allow the efficient production of all required deliverables.
- Image workflow
- Putting the workflow into action
When starting to plan for digitisation, whether large or small scale, one of the first tasks is to draw up some project specifications that will specify what format images are to be created in, how they are to be archived and in what form they will be delivered. Digitisation 'best practice' is now sufficiently well established that it should be easy enough to work out what will be the most appropriate image file types for each stage of your project. However, it can still be confusing to establish how you can create a workflow that supports your project's requirements.
This document reviews the general concepts of 'best practice' within the imaging workflow and then looks at how these concepts can be mapped onto the workflow, to allow the efficient production of all required deliverables.
Before planning any workflow, it is best to start by considering the basic aims and outcomes of the project and then use those to create some general precepts on which the workflow may be based.
It is important to try and create a workflow that is generic enough to provide a level of standardisation for all types of capture planned for use within the project but also versatile enough to work with any other types of capture that might be necessary in the future.
The following general precepts should be applied during the planning of all image capture workflow:
- Best Quality Possible - Great care should be taken to establish a quality benchmark that captures as much resolution and quality as is possible within the restrictions of equipment, technology and budget
- Fit-for-Purpose - However, this should be tempered with the knowledge that the key requirement of any quality benchmark is that it should be 'fit-for-purpose'
- Full Size and Uncompressed - All images captured with a camera should be archived in an uncompressed form at both the full size and the highest quality available. All scanned images should be made at a size that provides images to the established benchmark
- No interpolation - Interpolation (increasing size of image) should be avoided at both the capture and optimisation stages of workflow. Images should not be interpolated up to any larger size than the original pixel dimensions of the image-receptor in the camera or size of the original scanned image. The only exception to this 'may' be when print use demands a higher resolution
- Archive Original Image Data - Once captured, all images should be archived in the original form as captured by the scanner or camera and before they are optimised. These images will become the 'Master Archive'. Some digital cameras can store the captured data in an unprocessed proprietary Raw format. The Raw format captures at a higher bit depth than 'standard' camera formats and offers the ability to reprocess and retrieve more information from the original Raw at a later date. If it is decided to create a Master Archive of proprietary Raw files it is recommended that high quality surrogates in an open standard format are also created. The original images are then optimised (cropped, colour corrected etc.) and archived separately within a 'Master Optimised Archive'.
- Archive Optimised Image Data - The 'Master Optimised image' can then be used to create all surrogate images needed for delivery purposes (printing, Web etc.)
- Use Standard File Formats - The Master Archive image should be stored in an uncompressed format, normally 'RGB Baseline TIFF Rev 6' (although PNG or losslessly compressed JPEG 2000 are possible alternative file formats)
- Create Surrogate Images for Delivery - The surrogate delivery images made from the 'Master Optimised Image' should be stored in a format suitable for their use. In most cases this will be in the JFIF (JPEG) format. An appropriate level of compression should be used that will provide files of a usable size whilst still of an acceptable visual standard
- Only Create Surrogates from Master files - All surrogate delivery images should only be made from the 'Master Optimised Images' (lossless compression) and never from another surrogate delivery file (lossy compression). The golden rule is: 'Never JPEG a JPEG'
Generic capture workflow
The general concepts above can be used to create a generic workflow that fulfils all requirements for a standardised workflow that supports best practice within all aspects of digitisation:
|Images should be captured at a size large enough for all planned use and to the agreed benchmark that is considered 'fit for purpose'. The original capture format will depend on the capture device. A scanner will normally create a RGB TIFF file, whereas a camera will provide either a TIFF, Raw or JPEG format image (TIFF and Raw are preferred). One-shot cameras should always capture at the full size of their image receptor. If possible, JPEG compression should be avoided due to its lossy nature, if JPEG is the only choice (as provided by some cheaper cameras) then the highest quality setting should always be used.
Master Raw Archive
|Once images have been captured they should be immediately archived so that they are saved in the original form, exactly how they came out of the camera or scanner. This means that whatever happens to the images at a later time, it is always possible to go back to exactly what was originally captured, knowing that nothing has been lost. This will normally be either TIFF or a proprietary Raw format although it might have to be JPEG if one is using a camera that only captures direct into JPEG. The Master Archive images should be kept within a wide-gamut colour space such as CIE Lab or Adobe RGB 1998, although leaving them in the native colour space of the device is acceptable as long as the appropriate calibration images are also kept with the archive to enable them to be converted at a later date. The Master Archive should not need to be regularly accessed and is normally only used for long-term emergency back up. It is best practice to back these up to an external 'Write Once Read Many' (WORM) type drive, for example CD-R or DVD-R. If RAW files are archived, care must be taken to also archive all software necessary to open them and migrate to an open standard such as TIFF.
|It will normally be necessary to undertake at least some 'optimisation' work on the captured images. This is skilled work and will include at least some of the following stages, not necessarily in this order:
Master Optimised Archive
|Once the image has been optimised it is 'best practice' to archive it again as a Master Optimised image. This image includes all the skilled optimisation work that has been undertaken on the image. These must be saved in an uncompressed or losslessly compressed format such as TIFF or possibly PNG to retain image quality. It may be argued that for some workflows (where Adobe Photoshop takes a central role) it is acceptabe to save the file in the proprietary Photoshop format PSD.
Master Optimised images should be kept in the Adobe RGB 1998 colour space and profiled as such, unless there is a specific requirement to use the Web-based sRGB space. In either case the file should hold the appropriate profile.
As these images are uncompressed or losslessly compressed, they still require a large amount of storage space. Hard-drive space is getting cheaper all the time and it may be possible to store them online, however it is more likely that they will be stored offline on DVD-R or CD-R, in a similar fashion to the Master Raw archive.
|Surrogate images should now be created from the 'Master Optimised Archive'. Surrogate creation will include at least some of these stages, not necessarily in this order:
|There are four main types of surrogate image that need to be created. Each is used in a different way and has different attributes. All surrogates should be made direct from the 'Master Optimised image' and once created should be considered as ephemeral. It is always best practice to make new surrogates as required and never to create new surrogates from other surrogates.
There are normally four generic types of surrogate image required:
Generic workflow diagram
A non-graphical text version of this diagram is also available.
Putting the workflow into action
Using this generic image workflow as a basis, it should be possible to develop a customised workflow that fits the exact needs and requirements of your project.
Once developed, the workflow will then need to be tested in practice and possibly modified before it can be finally established. There are many practical elements and tasks that need to be considered during the process, some of which will include:
- Movement of original works in and out of studio
- Capture of works, whether by camera or scanner
- Possibly some conservation work
- Image entered into database
- Collation and recording of metadata - both indexing and technical
- Image optimisation and creation of surrogate delivery images
- Quality control work - 'Sign off'
- Archiving of images and metadata
- IT support for capture equipment
- Progress reporting for project management team
Each of these practical tasks will need to be considered, evaluated and even timed before they can be interwoven into an efficient and high quality workflow. Due to the large number of elements in the workflow, it is necessary to spend some time working out what is the most efficient way for the tasks to be undertaken. It is worth considering that although many of these tasks demand great care and attention, they can also often be quite tedious and efforts taken to stop the operator from getting bored are normally rewarded with a much higher quality result.
It is the nature of some of these tasks that they can have quite long periods of 'waiting time' where the operator is waiting for a computer to undertake some task (scanning an image or burning a DVD-R). This can be especially boring for the operator and it is good practice to try and find some other tasks from within the workflow that can be done within tandem. This will normally give a happier operator as well as a more efficient workflow.
It is important during digitisation that it is clear for all project staff what stage any image is at within the workflow. The workflow should provide some easy method to ascertain whether an original (or record) has been captured, is within the process (maybe captured, but not yet optimised) or has been completed. There are many ways of undertaking this and it will depend on practical elements of your project, however simple pragmatic forms of control are normally best. For instance, originals may be delivered to one table, but having been captured they are returned to another, or images captured are placed in a 'raw' folder but once optimised they can be moved to an 'optimised' folder.
Whatever system is used, it must provide one piece of important information – The 'Sign-off', which states that the image/metadata was created, checked and found to be up to the required quality standard.
This is normally best recorded within the Image Management System and should simply record the name of the operator and the time and date at which it was checked.
Further details of implementing quality assurance within the workflow can be found in the JISC Digital Media advice document Quality Assurance and Digitisation Projects.
Creating an effective and efficient image workflow for a digitisation project is not an easy task, but if approached with some research, care and attention to detail it should be possible to establish one that enables high quality work to be maintained in the most efficient manner possible.
Remember it is nearly impossible to get all this right first time, so expect to have to create prototype workflows and then test them to see how they work and how they might be improved before creating an improved version. Even once you have gone into production, it is necessary to be aware of how the system is working so that changes may be made to improve quality or efficiency.
It is highly likely that the workflow that you finally establish is highly complicated with a host of interacting tasks and actions. It is therefore absolutely imperative that it is fully documented, down to the smallest detail necessary to train a new team member in current working practice. These 'workflow manuals' are best undertaken by the capture team themselves who should be encouraged to continually update them as necessary, so that they remain a good reflection of the working practice that supports the imaging workflow.