Last updated: 03 December 2009
Published in:
Digitising analogue media |
Managing your digital resources |
Finding and using digital media |
Tags:
compression |
digital collections |
file formats |
This document looks at the theory of file formats and the common methods of data compression.
Put simply, file formats are orderly sequences of data used to encode digital information for storage or exchange. They are like written languages, with their own peculiar rules or grammars. Although they are structured in different ways, digital media files generally begin with an introductory 'header' section followed by a 'body', which contains most of the data. In time-based media files this structure is often described as ‘wrapper' format (the envelope that holds various elements together such as video, audio and related metadata) and ‘codec' (COmpression-DECompression algorithm) which is the language used to encode the actual media content.
It is particularly important to keep in mind the distinction between file format and type of compression used within it, because they can sometimes become confused.
File names end with an extension or suffix (generally 3 letters, like .avi, .wav or .tif), which help computer programs to recognise them. Older Macintosh systems identified files in a different way, using resource forks and 4-letter codes written into the file.
Media files can be unrealistically large and compression is a way of encoding a file's data more concisely or efficiently - squeezing or squashing the file, as the term suggests. A compression algorithm is simply a finite series of steps required to perform a given task (in this case reducing file size). A compression algorithm can either be ‘lossy': information is discarded in order to reduce file size and/or bandwidth or ‘lossless': no information is irreversibly discarded.
There are other distinctions that are important when considering digital media file formats: such as open- versus proprietary- file formats. These are discussed below.
As mentioned above, digital media files can be very large. It can be useful or necessary to compress them for ease of storage or delivery. However, while compression can save space or assist delivery, it can slow or delay the opening of the file, since it must be decompressed when accessed. Some forms of compression will also compromise the ‘quality' of the file's content.
Digital compression is a complicated science. This section offers a non-technical introduction. If it still seems a little complicated, don't worry. The most important distinction to grasp is that of 'lossless' and 'lossy' (see below).
Compression relies on two main strategies: getting rid of redundant information ('redundancy reduction') and getting rid of irrelevant information ('irrelevancy reduction').
Redundancy reduction is often used during lossless encoding. It looks for patterns and repetitions that can be expressed more efficiently. If, for example, there are 25 values all the same, it is clearly better to record the information once and state that the next 24 values are all the same, than to record each value separately. This particular example is known as run-length encoding (RLE).
Irrelevancy reduction aims to remove or alter information that makes little or no difference to the perception of the file's content. This usually happens prior to the encoding and involves an irreversible ‘lossy' transformation of the content. Some of a video's colour information, for example, can be safely simplified without being perceptible to the human eye. However, when carried to extreme this sort of compression becomes obvious and compromises quality.
Lossless compressions are generally based on redundancy reduction and typically concentrate on more efficient ways of encoding data. The key point to grasp about lossless compression is that no information is irretrievably lost in the process. The common .zip format is an example of lossless compression, if used to ‘zip' a collection of text documents, the exact same documents will be reconstructed once ‘unzipped'
Lossy compressions are based on irrelevancy reduction strategies but will usually also employ some redundancy strategies. Lossy compressions transform and simplify the media information in a way that gives much larger reductions in file size than lossless compressions. A typical lossless compression can be expected to cut file sizes down to three quarters or two thirds of the original - perhaps even by half, if very efficient. In contrast, a lossy compression can reduce the file size to as little as 1% of the original, although anything less than 10% is likely to distort the file's content. The trade off, however, is that a lossy compression is by definition irreversible - it permanently disposes of information.
Irrelevancy strategy is based on the characteristics of human perception. Some information is more easily perceived and therefore more important than other information (brightness rather than colour in the visual realm, inaudible frequencies over audible ones in the audio realm).
Some digital video compressors such as MPEG-2, have an extra trick up their sleeve, as well as this simplifying of visual data in each frame of video (intraframe compression), simplifying also occurs across frames (interframe compression). So if a character remains stationary across several frames, the pixels making up the character will no be ‘refreshed' between every frame. Instead they will remain stationary until the character moves again. In effect that portion of the video becomes a static digital photograph until a change is required. This tactic allows lossy video codecs to be more efficient.
Certain audio data compression techniques uses intelligent approaches based on models of human listening and the content of the audio file. Data relating to frequencies which are either minimally present (or not at all present) within the audio file, or those that the human ear is not particularly responsive to, are removed using mathematical algorithms. This reduction of audible content becomes increasingly noticeable when listening to highly compressed audio files, such as MP3 files with a low bit rate, where the sonic qualities are drastically altered through loss of more information at more and more frequencies.
Let's look at the relatively simple example of the lossy digital image compressor: JPEG. JPEG makes use of a mathematical transformation known as the Discrete Cosine Transform (DCT) to shift the image's colour values into a mode that can be more efficiently compressed and coded. The Discrete Cosine Transformation is not in itself lossy, but the next step in the compression process, known as quantisation, simplifies and rounds the colour values before they are encoded, throwing away real information. This is where the JPEG quality slider operates - it governs how much simplification occurs.
The diagram below shows the key steps in a lossy image compression. These steps are reversed when the image is displayed (decompressed). In contrast, a lossless compression will usually just have the encoding/decoding step.

Figure 1: Steps in a typical lossy image compression
JPEG compresses small 8x8 pixel blocks of the image at a time, working from top left to bottom right. Because the simplification (quantisation) of each 8x8 (64 pixel) block is done independently, at a high compression (i.e. low quality) the boundaries between the blocks will begin to show, causing the 'blockiness' or 'blocking artefacts' often observed in JPEG images. This is illustrated below.

Figure 2: JPEG without compression
These two sets of images illustrate how the processing of 8x8 pixel blocks can visibly distort an image at high JPEG compression. Each image is shown full size and enlarged 8 and 16 times. The image in the upper row is uncompressed, while the one below has had maximum JPEG compression applied (i.e. lowest quality). The blocking is a little difficult to see in the full sized image, below, but becomes quite evident as the image is viewed more closely. While both images begin to 'pixelate' (i.e. display their pixels) as they are enlarged, the JPEG compressed image additionally shows the edges - and the distorting effects - of its 8x8 pixel blocks.

Figure 3: JPEG maximum compression
Because they are prepared to throw information away, lossy approaches will always be able to achieve a much greater compression than lossless approaches. This makes them most suitable for situations where size is more crucial than quality, for example, streaming or downloading via the Internet. Where quality is valued more, or file size is not an issue, lossless compressions - or no compression at all - will be preferable.
Wavelet compression is a special type of compression that has been around for some time, but only in recent years has it been adopted in digital media compression. It is used within proprietary formats like MrSID and in the JPEG 2000 format (potential successor to the current JPEG) and is used for compressing both still images and digital video.
Instead of treating the image as sets of numbers (e.g. pixel values) to be processed, the wavelet transform regards the image as a signal or wave. It organises the image information into a continuous wave (typically with many peaks and dips) and centres this on zero. It records the distances from this zero line to points along the wave and then takes the average between adjacent points to produce a simplified version of the wave - in effect, it reduces the image's resolution or detail by half. The averages are then averaged again, and so on, producing progressively simpler waves. This process is known as 'decomposition'.
The wavelet transform results in simplified versions of the image along with all of the information necessary to reconstruct the original (i.e. to rebuild the complete wave or image). At this point, all of the information can be kept and encoded as a lossless compressed image. Alternatively, the final image can be based on a simplified version, with only the most significant detail added back into the wave. The result of this quantisation process is a much smaller file, but one that has thrown away some of the less important image information (i.e. lossy compression).
The wavelet compression has several key advantages over other lossy compressions. Wavelet compression is capable of both lossy and lossless compression. Lossy wavelet compression is more discriminating - it can preserve the important detail of the file while simplifying and smoothing over less significant features. It can also operate over a much larger area of the image at once (often the whole image), avoiding the introduction of unwanted 'artefacts'.
The working through of the wavelet transform also produces a 'by-product' that offers interesting display potential. The increasingly simplified waves can be encoded in several different ways so that as the file is decompressed it can grow in size (i.e. spatial resolution) or become increasingly more detailed (i.e. fidelity). Importantly, wavelet compression can also be lossless compression, stopping short of discarding information but still optimising the data's structure.
A fuller explanation, using JPEG 2000 as an example, can be found in JISC Digital Media's document What is Wavelet Compression?
Efficient compression and good functionality are real assets in digital file formats, especially when editing or delivering. But these technologies are often proprietary: controlled by patents, related to commercial imaging products or controlled by copyright.
The digital media community has become more wary of proprietary standards after a succession of formats have been released and failed commercially. Manufacturer support has been withdrawn, making access to media complicated, even impossible.
Although it might seem sensible, then, to always and only use open standards which are not tied to a single manufacturer or group of manufacturers, it is not always that simple. Certain features may only be available within proprietary formats, and open formats may include features in their specification that are not actually supported by any of the applications available to create or access them.
It is very important to keep in mind this distinction between specification and application. Features and functionality described in file format specifications may not be possible to achieve in the real world, due to a lack of support by the software used to encode them or decode them.
This issue of support is often a problem with open standard formats upon initially release. Open standard usually take a long time to develop, with input from many interested parties. When they are eventually released (or even before official release) you can expect to see a number of different applications based on the standard, each favouring a slightly different set of features. With open standards, it is rare for all of the features to receive full community support.
As you might expect, there are generally few, initial support issues with a proprietary file format, since it has been created for a specific application. However, since it is tied to the fortunes of a commercial company, the format as a whole is vulnerable to being changed or dropped without notice or regard for its users.
The distinction between open and proprietary standards is not always clear-cut. Proprietary formats which become de facto or industry standards occasionally become open standards (e.g. Kodak's FlashPix format). Sometimes, too, a company will adapt an open format into one of its own - particularly if it can be used to rival the proprietary formats of its competitors. In this case an open format will develop into a proprietary product.
In selecting particular formats for use within your digital imaging project, you will need to consider both the openness of the file format (and sometime the individual features within that format), and the way the format is supported by applications.
In addition to the way they compress data, different file formats have different unique features or functionality. Of special interest to digitisation projects are the abilities of file formats to store additional information alongside the main media data. This may include embedded metadata, closed captions even several different versions of the media ranging from uncompressed (an archival ‘master' copy) to highly compressed (perhaps for delivery on mobile devices) grouped together.
Digital still images fall into two main categories: raster (or 'bit-mapped') images and vector ('object-oriented') images. Raster images take the form of a grid or matrix, with each picture element (pixel) having a unique location and independent colour value. Vector files are really just a set of mathematical instructions that are used by a drawing program to construct an image. There is a third category of formats known as metafiles, which are able to contain both raster and vector images.
A basic understanding of the raster image is essential, since it is the most common category of image created and used within digitisation projects. All scanners and digital cameras produce raster images and most output devices (print and screen) also use them. TIFFs, JPEG/JFIFs, and GIFs are common examples of raster file formats.
Raster images take the form of a grid or matrix. This pattern becomes easily visible as the image is magnified (i.e. viewed at more than 100% - see box below). Each square (pixel) within the matrix occupies a unique position and can be edited separately.

Figure 1: The raster grid. Photo: Standard test image from the University of Waterloo. No copyright restrictions
The image above has been magnified by 800% and 1600% to reveal its grid structure. Each square of colour is a pixel.
Raster images are internally very simple. If you examine their coding, you will typically find some brief header information describing the structure of the file followed by a series of values, each describing the colour of the individual pixels.
Since a raster image records information for each pixel, its file size can be quite large. For an uncompressed raster image, the file size will be directly related to its pixel dimensions (spatial resolution) and the extent of the colour information recorded for each pixel (its colour resolution or 'bit-depth'). A more detailed explanation of spatial and colour resolution can be found in JISC Digital Media advice document: The Digital Still Image.
Although most raster file formats are similar in structure, they can be distinguished by the amount of information they record per pixel (i.e. their bit-depth), the methods used to record their code more efficiently (their compression), and the additional functionality they offer (e.g. transparency layers, colour management or metadata support). They can also be divided into open formats and proprietary formats (see above).
While most raster images result from a digital capture process, vector images are typically created and displayed within drawing programs. Common vector images include 2- and 3-D architectural drawings, flow charts, logos and fonts. They consist of lines, curves and shapes with editable attributes such as colour or fill. Because they are defined by mathematical equations, they are more easily transformed than raster images. Unlike raster images, vectors are 'resolution independent': they can be reshaped or rescaled without losing quality.
Vector images are looked at in greater depth in JISC Digital Media's advice document Introduction to the Vector Image Format.
There is a third category of file formats that can contain or encapsulate raster and vector images. This category includes metafiles and Page Description Languages (PDLs). In addition to holding different types of images and text within the same file, these formats enable, to varying degrees, their contents to be consistently displayed and used across different computer programs and operating systems.
Metafiles contain lists of commands that will draw or display an image when they are run. Vector drawing commands are most common, but metafiles can also include raster information or text. Sometimes they are little more than a kind of envelope, containing an instruction to open up another image file.
Common metafile formats include the Computer Graphics Metafile (CGM), Windows Metafile (WMF), and Enhanced Metafile (EMF). The Computer Graphics Metafile is older and will run on most computer operating systems. The other two were developed specifically for the Windows operating system, but are used more widely.
Page Description Languages (PDLs) are computer languages used to describe information about layout, fonts and graphics to a printer or display device. The classic Page Description Language is PostScript (PS), which was designed to provide detailed instructions to computer printers. PostScript can contain raster or vector images, but was not developed as a graphic file format in the way that metafiles were. However, later incarnations of PostScript, such as EPS and PDF, have been developed very much with the exchange of graphics in mind, blurring the distinction between PDLs and metafiles.
EPS (Encapsulated PostScript) is a file format based on PostScript. Specifically intended to encapsulate graphics, it uses a subset of PostScript commands, allows only one image per file, and ignores page sizes or positioning. EPS became something of an industry standard for sending images to commercial printers because it was able to 'lock' the images and layout so that they could not be altered. It was intended to be completely cross-platform, although proved more reliable on the Macintosh platform than the PC. EPS has now been largely superseded by the PDF.
Like EPS, PDF (Portable Document Format) is based on PostScript, but it adds, rather than subtracts, functionality. PDF can include text and image, multiple-pages, hotspots, links, extensive metadata and, like EPS, be locked to stop editing. It also supports a range of compressions, both lossy (JPEG) and lossless (ZIP). The PDF has now become a de facto standard for exchanging documents on the Web, is almost universally supported within the print industry, and is increasingly being used simply as a container for exchanging images.
PostScript, EPS and PDF were all developed by Adobe, but are created and used within many other applications and have become industry standards. There are other common proprietary formats with similar functionality, but their use tends to be limited to their own applications. Examples include the Quark file, which is used for page layout, and Photoshop's PSD, which is primarily a raster image format but also able to accommodate vector information, such as type, within its layers.
As the example of Photoshop's PSD format indicates, the distinction between raster and vector images is less clear-cut. Many of the newer formats now act as metafiles or encapsulating formats - capable of holding raster and vector information in different layers. This trend can also be seen in the emerging JPEG 2000 (primarily a raster format) and in the Flash, SWF, and SVG formats (primarily vector).
These formats do some or all of the following:
Enable 'locking' to provide security and assist in rights management
Over the past few years the relatively small number of available video file formats has grown exponentially. An immense number of file formats are now in common use for preservation, re-mastering or for access. Different file formats are suited to different purposes. For instance, the open source-over-proprietary argument may not extend to delivery, if the aim is to reach as many users as possible. File size and required bandwidth (i.e. data rate) are limiting factors when selecting a file format for delivery but perhaps not so much when selecting a format for long-term preservation, where a very large file size may be acceptable.
For instance, at the current time the H.264 format (MPEG-4) is growing in significance as a delivery format, while archives begin to take advantage of the uncompressed possibilities of JPEG2000.
Different formats employ different ‘tricks' in order to reduce file size and bandwidth and (hopefully) reduce the deterioration of the image. Interframe and intraframe compression are mentioned above, but another common tactic is chroma sub-sampling. Chroma sub-sampling relies on the fact that human vision is more sensitive to contrast than colour. In effect a higher resolution monochrome image in overlaid with a lower resolution coloured image. In chroma sub-sampling the saving in file size is made at the expense of some colour information.
Whether or not these sacrifices in visual precision are acceptable depends very much on the purpose the video will serve. What is acceptable for the oral history project may not be acceptable for the film archive. But just what do the differences between different video file formats actually look like?
Presented below is the same video clip in several popular file formats for comparison purposes. The clip is available for download only, as several of these versions are too large to be streamed over the Internet (and so would be unsuited to web delivery). The clip is purely computer generated, this rules out the variables of in-camera recording format and lens quality as a purely virtual animation sequence involves no camera and no lens. The correct codec will need to be installed on your machine in order to play back the video files - conduct a web search for the named codec or use an open source player such as VLC player. In addition, your desktop computer may not have sufficient capability to playback uncompressed video in real time due to its high data rate.
The ‘quality' of images which are moving can be difficult to judge so still image examples of each codec are also included for comparison purposes.

Sample 1: Uncompressed, 10bit, 4:4:4 subsampling
Download Video (Right-click and save as...) (177MB)
Standard: Standard definition PAL
Duration: 4.2 seconds
Frame size horizontal: 720 pixels
Frame size vertical: 576 pixels
Frame rate: 25 frames per second
Frame type: progressive
Wrapper file format: .avi
Format/codec: r210 (uncompressed)
Bit depth per channel: 10bit
Chroma sub sampling: 4:4:4
Data rate: 354 Mbps
File size: 177 MB
Sample 2: Uncompressed, 10bit, 4:2:2 subsampling
Download Video (Right-click and save as...) (110MB)
Standard: Standard definition PAL
Duration: 4.2 seconds
Frame size horizontal: 720 pixels
Frame size vertical: 576 pixels
Frame rate: 25 frames per second
Frame type: progressive
Wrapper file format: .avi
Format/codec: HDYC (uncompressed)
Bit depth per channel: 10 bit
Chroma sub sampling: 4:2:2
Data rate: 221 Mbps
File size: 110 MB

Sample 3: Uncompressed, 8bit, 4:2:2 subsampling
Download Video (Right-click and save as...) (85MB)
Standard: Standard definition PAL
Duration: 4.2 seconds
Frame size horizontal: 720 pixels
Frame size vertical: 576 pixels
Frame rate: 25 frames per second
Frame type: progressive
Wrapper file format: .avi
Format/codec: UYVY
Bit depth per channel: 8 bit
Chroma sub sampling: 4:2:2
Data rate: 166 Mbps
File size: 85 MB

Sample 4: DV
Download Video (Right-click and save as...) (15MB)
Standard: Standard definition PAL
Duration: 4.2 seconds
Frame size horizontal: 720 pixels
Frame size vertical: 576 pixels
Frame rate: 25 frames per second
Frame type: progressive
Wrapper file format: .avi
Format/codec: dvsd
Bit depth per channel: 8 bit
Chroma sub sampling: 4:2:0
Data rate: 30 Mbps
File size: 15 MB

Sample 5: MPEG-4, Profile/level: Main@L3.1*
Download Video (Right-click and save as...) (4.6MB)
Standard: Standard definition PAL
Duration: 4.2 seconds
Frame size horizontal: 720 pixels
Frame size vertical: 576 pixels
Frame rate: 25 frames per second
Frame type: progressive
Wrapper file format: .mp4
Format/codec: AVC1
Bit depth per channel: 8 bit
Chroma sub sampling: 4:2:0
Data rate: 9 Mbps
File size: 4.6 MB

Sample 6: MPEG-2, Profile/level: Main@Main*
Download Video (Right-click and save as...) (3MB)
Standard: Standard definition PAL
Duration: 4.2 seconds
Frame size horizontal: 720 pixels
Frame size vertical: 576 pixels
Frame rate: 25 frames per second
Frame type: progressive
Wrapper file format: .mpg
Format/codec: MPEG2
Bit depth per channel: 8 bit
Chroma sub sampling: 4:2:0
Data rate: 9 Mbps
File size: 3 MB
*MPEG (Moving Picture Expert Group) formats often involve a whole host of variable parameters. Sets of these parameters become popular for different applications. For instance, MPEG-4 is used for both Internet delivery and Blu-ray movie discs. The latter application offers far more visual information per frame. These sets of parameters are defined as ‘profiles' and ‘levels'.
Until recently digital audio formats were limited to a small standardised number, fit for purpose for the professional broadcast and production industries. Originally, the properties of consumer digital audio were defined by the capabilities of the digital video technology of the time, which led to the popular use of PCM (Pulse Code Modulation) digital audio. With the recent rise of delivery methods, newer audio formats have been adapted to suit the changing trends and capabilities of audio playback and storage systems.
With PCM the raw encoding of an analogue audio signal into digital information is done by sampling the signal at regular discrete intervals. This can be done either at the recording stage (analogue to digital conversion) or when repurposing a digital file. Samples are rounded to the nearest discrete number of evenly spaced levels (quantised). The amplitude values are at a constant ratio to the amplitude, and this linear recording is often referred to as LPCM (linear PCM). This raw data (called a bitstream) is contained in a wrapper, a structured container which helps the appropriate software or digital system interpret and use the new file. Numerous wrapper formats (including video formats) can contain mono, stereo, standard surround or multi-channel audio files of varying sizes.
Below are some examples of some uncompressed recordings which are contained in WAV format. Note the relative bit rates and size of each file, listed in the corresponding table. Each recording was captured separately so as not to introduce any re-sampling and no further analogue or digital processing has been done. The recordings were taken with a simple setup of Earthworks QTC-1 microphones straight into a USB interface with no further digital or analogue processing to the signal. The piece being played is an excerpt from the first movement of Beethoven's piano sonata no. 14 (Op 27).
Beethoven's Moonlight Sonata by JISC Digital Media. All files © University of Bristol, 2009
| File | Sample Rate (Hz) | Bit Depth | Bit rate (kbps) | File Size (MB) |
| 1 | 44100 | 16 | 1411 | 20.8 |
| 2 | 48000 | 24 | 2034 | 33.9 |
| 3 | 88000 | 24 | 4233 | 62.4 |
| 4 | 96000 | 24 | 4608 | 67.9 |
| 5 | 192000 | 24 | 9216 | 135 |
Aside from linear PCM encoding, other techniques exist to deliver audio across a narrow bandwidth or at a faster speed. Logarithmic PCM is a technique where amplitude values are spaced close together at low amplitudes and far apart at high amplitudes. This approach works well for speech and requires less bits per sample than linear PCM. A-law and μ-law compression are both logarithmic transformations used in telephony, and can be applied in digital audio systems. These forms of compression reduce samples to 8-bits whilst retaining the dynamic range equivilant to 14-bit linearly quantised samples.
There are further alternate coding architectures to PCM, more than will be mentioned here, but we will briefly look at a few important ones. Whereas linear PCM encoding reads discrete individual values, differential encoding uses techniques to predict the relative difference between samples. As the value of the difference between one sample to the next is less than the absolute amplitude values recorded in PCM, less bits are needed to record the information. There are two prominent implementations of this practice that are worth mentioning here, Adaptive Differential Pulse Code Modulation (ADPCM) and Delta-Sigma encoding.
The ADPCM encoder makes use of the fact that adjacent audio samples are on the whole similar to one another. It computes the difference between the current input sample and the predicted value of the next sample using components of the decoder to compute the predicted value. ADPCM is commonly used in digital audio files including .aif and .wav and in some VOIP (Voice Over Internet Protocol) applications. In theory the performance of ADPCM can just as good results as linear PCM in terms of fidelity at lower bit rates. Other dedicated codecs for compressing speech exist such as CELP, used for GSM mobile telephony.
DSD (Direct Stream Digital) is a relatively new audio format which uses an extreme form of differential encoding. Sigma delta encoding runs at very high sample rates (between 2.8MHz - 5.6MHz) and similar to ADPCM records the difference between amplitude values but does this at a 1-bit resolution. This means that each new amplitude value is measured to be either higher or lower than the previous value and the output is either 1 for louder or 0 for quieter (i.e. 1-bit). These minute recordings of variation in amplitude can recreate waveforms due to the size of the sample rates being used. Using these high sample rates also means that quantisation noise is spread over a very wide range of frequencies at a much reduced intensity. This technique is capable of reproducing frequencies from 0Hz (D.C) to 100KHz with a dynamic range of 120db.
Whilst uncompressed audio files may not be nearly as large as uncompressed video files of the same duration, they can still prove problematic for web delivery. An uncompressed stereo 44.1KHz, 24-Bit WAV file contains just over 1MB of information for every 4 seconds of audio. The same file has a bit rate of 2,116.8 kbits/s which is considerably large for efficient online delivery. Compressing audio files lowers this bit rate so they can be downloaded (either via a direct download, streamed, or played via a browser)
A popular lossy technique, the MPEG 1 compression algorithm, uses psychoacoustic modelling algorithms that analyse the frequency spectrum of an audio signal, and then remove data intelligently at and around frequency bands with relatively little or no content. Although fidelity is more improved than other compression techniques, the effects of this become more audibly pronounced when harshly applied, and signals can sound ‘pixalated' as anomalies are introduced. This is a high compression technique aimed at producing high fidelity results, and elements of this have been expanded upon to create new codecs, such as the open source MPC (Musepack) and the LAME codec, due to the propietray restrictions of the MP3 codec. Lossless compression, as discussed in the section above on compression, is used for decreasing the size of audio files without losing any audio fidelity. Monkey's Audio and FLAC are two codecs that utilise this technique. An example of the FLAC format is given in the embedded player below.
One of the problems when presenting these formats and types of compression is that in a practical sense there is no one-size fits all solution when considering which method to use. Format and compression choice should be based on the objectives of a project. A spoken word podcast of a lecture recording can have its fidelity compromised considerably to improve accessibility via web downloading, whereas the online streaming of a performance may require greater fidelity to give clarity to music. Audio fidelity is a subjective concept, and perhaps the best way to understand the effects of audio data compression is to listen to varying amounts of compression. Although there is a wealth of codecs available for delivering compressed audio files, the audible distinction between them can sometimes be difficult to perceive. As a result a selection of files, of the same format (MP3), is presented below to highlight audible effects of differing bit rates, a primary concern when delivering compressed audio files.
All of the tracks in the player below are surrogates from an uncompressed PCM 48KHz 24-bit master WAV file, 33.9MB in size. Note that the FLAC file retains the sampling rate and bit depth but is only 18.5MB in comparison.
Compressed Examples by JISC Digital Media. All files © University of Bristol, 2009
Last updated: 03 December 2009
Published in:
Digitising analogue media |
Managing your digital resources |
Finding and using digital media |
Tags:
compression |
digital collections |
file formats |
We provide a FREE enquiry service giving advice to the UK Further and Higher Education community.
You can ask us anything, typical questions include - "What formats should I use?" "How do I...?" "What tools can achieve the result I need?" "What is new and emerging?"
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++