Choosing a Digital Audio File Format
The choice of file formats can often prove overwhelming for someone new to the world of digital audio. The aim of this document is to discuss some of the key factors that should be considered before choosing a format and suggest suitable file formats for specific applications.
There are a number of commonly used digital file types within the audio realm. These have been expanded and improved over recent years, as digital audio has become a more popular medium. Different audio software applications may prefer or favour different file types, depending on proprietary alliances, compatibility, and the type of work primarily associated with the software. As a result it is often unclear which file types should be considered for specific delivery and use, and the information surrounding this can often be biased or misleading.
This document therefore intends to offer a basic guide to the most common and adaptable file types, and their properties and uses within the fields of digitisation of analogue resources, creation of new resources and the reformatting of existing digital files. We also examine the suitability of the various filetypes in delivering sound resources to audiences.
Codecs and Wrappers
It is important to distinguish between codecs and wrapper (or container) formats when working with digital audio files. Put simply, ‘codec' refers to the algorithm used to encode and decode the audio data in binary form; ‘wrapper' describes the container format for this raw data, which may include headers to describe encoding settings, as well as other content - such as artwork or even video data (with certain wrappers) - and descriptive metadata. Different types of wrapper will be denoted by different file extensions, such as .wav, .aiff, .mp3 etc.
The most common codec used in digital audio is Linear Pulse Code Modulation (LPCM), which is ‘lossless', i.e. all the data from the sample points is kept intact. LPCM is also standard for CD audio, and unless otherwise mentioned is recommended for use with uncompressed wrappers WAV, AIFF and BWF. A basic explanation of audio sampling and how bit-depth is directly related to dynamic range is given in the advice document An Introduction to Digital Audio.
Other codecs may simplify or discard some of the audio data to reduce the file size - a process referred to generically as ‘lossy' compression. One example of this is the MP3 codec, which uses psychoacoustic modelling techniques to reduce the complexity of the audio waveform, and thus the amount of audio data and the resultant file size.
Open ‘standard' Vs Proprietary formats
To establish whether you have a requirement for an open standard wrapper format, you will need to consider the type of project you are undertaking. Any digitisation project which needs to ensure long-term accessibility to the audio files will require a format that is industry standard in terms of both quality and accessibility. A project more focussed on the immediate delivery of audio files, where there are no requirements for preservation, will often not need to consider the longer-term implications of file type choices.
There are only a few non-proprietary lossless formats in common use, so if you choose this route then your decision is made immediately easier. A major drawback with open standard formats is that lot of playback software requires a plug-in decoder. This causes problems when delivered to users or listeners who don't already own the specific decoding software. This limits the range to a much more easy to manage number, the main two are listed here:
- Audio Interchange File Format (AIFF)
- Free Lossless Audio Codec (FLAC)
So unless it is defined in the objectives of your project, ‘open-standard' formats are, on the whole, not a particularly viable option.
File formats for capture
By ‘capture', we are here referring to the digitisation of existing analogue media. Information about file formats for use in the creation of new media through recording are discussed in the section Creation of new media later in this document.
The sound quality of digitally recorded audio of course depends on the equipment used in the digitisation process, but capturing to a suitably high quality format is equally vital to maintaining audio fidelity. To capture as much information as possible, a high sample rate and sample accuracy (bit depth) should be selected for the target file before capture begins.
Most digital audio capture software can record into either WAV, BWF or AIFF wrappers, the three most popular uncompressed (lossless) formats. It is recommended that digitisation is done at a minimum sample rate of 48KHz and 24-bits, but there are often some exception to this rule.
Firstly, if there is a limited amount of storage space available then a lower rate and depth may be advisable. Secondly, if the quality of the incoming source material is poor then such high quality settings may be not necessary. Finally, audio with a low frequency bandwidth - such as spoken work (discussed later) - need not necessarily be captured at a high rate.
Audio capture software will often have a default setting for the capture file format. This format should be checked before capture and adjusted accordingly.
WAV format, developed by Microsoft and IBM, is recommended by the IASA (International Association of Sound and Audiovisual Archives), in its publication TC-04 Guidelines for the Production and Preservation of Digital Audio Objects, as a master archive wrapper. This is due to its wide use in the professional audio industry, and its acceptance in the archiving community.
WAV's extended format, BWF (Broadcast WAV) wrapper, which has been standardised by the European Broadcasting Union (EBU), is also suggested as an archival format due to its facility of embedding simple metadata, which sets it apart from the otherwise similar WAV wrapper. Using embedded metadata may be particularly useful if embarking on a small scale digitisation project where complex metadata stored externally from the audio file may not be wholly feasible.
It can be the case that certain file formats are not compatible with some audio software, for example Digidesign's Pro Tools requires a paid for plug-in to allow importing and exporting MP3 files in projects. This is worth investigating before you decide on your capture software.
Choose a format which:
- Retains as much information as possible from the recording source, be it a microphone or an open-reel tape machine. A minimum sampling rate of 48KHz (96Khz if available) with 24-bit accuracy is recommended.
- Uses a lossless codec.
Suggested format: WAV or BWF
Master Archive File Formats
The master archive file should be an uncompressed copy of the source material where the objective is to create as true a representation of the original audio as possible. It is important to remember that when sound is captured at the highest available quality, maximum information is captured. They may not sound as good as you intend them to if there is a need for some restorative or mastering processing but they are transparent in their representation of their analogue counterpart.
There are two possible methodologies for creating a Master Archive audio file.
1. Archive data at the highest quality possible
This gives a template that allows for later alterations for different methods of delivery. Quite simply, a file that is compressed cannot regain its lost data, whereas an uncompressed high quality master file can be later compressed if need be but also retained in the highest quality possible. This is important to consider for any possible future use of the archive material, where lost information through compression and optimisation may be later required. However, although the content is true to the original analogue file, it may not have the aurally qualities gained through optimisation.
Sometimes, when creating archives on a budget, storage space is a concern. Methods of lossless data compression which use advanced algorithms for encoding are available for creating smaller size audio files, without losing any audio data. Most notably, the FLAC codec offers this feature, however this is poorly supported by most playback and editing software, and is therefore generally unsuitable as an archival format.
When digitising analogue media, it is likely that there is metadata inherent to the content. In small scale projects the option to embed some limited metadata within the digital file may be beneficial, especially due to the expense and time of populating a separate database. The BWF format offers this facility whereby information not included within the audio data can be included in the file.
2. Archive an optimised version of the file
When preparing a Master Archive file, it may be desirable to store an optimised file, which is a ‘re-mastered' version of the original. It may have been edited or processed to enhance its audible qualities, or summed with other files if part of a multi-track recording or mix. Examples of such audio processing include normalisation, filtering or multi-band compression. During optimisation it is inevitable that information is lost or adapted from the original capture file. It may be the case that processes can be applied consistently to multiple audio files from a similar source where anomalies are recurrent as a time saving procedure.
When saving optimised versions of the original files it is recommended that efforts are taken to save the information of the optimisation along with the audio file for future reference. An effective method of saving the exact changes undertaken is to utilise the capabilities of the editing software used for optimisation. Most digital audio editors save the settings of optimisation in a project file (Pro Tools - .pts or .ptf, Audacity - .aud file).
Audio re-mastering is a subjective process, and as a result any optimisation done may need to be altered or undone in the future. Because of this optimised audio files should be separate files from the master archive files.
The requirements for archiving are the same as the first two requirements described in file formats for capture (above) except it may be prudent, if possible, to
ensure compatibility with any software you envisage using to edit or optimise the file in the future.
Suggested formats: WAV or BWF
Some projects may require an open standard format. In these cases the suggested format is AIFF.
For files that exceed a size of 4Gb, the RF64 (or MBWF) is a multichannel format which has been standardised by the European broadcasting Union.
File formats for re-mastering
As previously mentioned, there are some proprietary formats that cannot be easily read by certain audio software programs and would therefore need to be converted into a different format in order to be used. This process can take a long time when dealing with large numbers of files and is therefore best avoided. This is achieved by either capturing to a format native to the editing software or using digital audio software which supports the open standard format you wish to archive to. Lossless formats should always be used when conducting optimisation as the character of the sound is affected by data compression and may introduce anomalies to the original sound file which may make optimisation far more difficult.
Suggested formats: Again WAV and BWF are the most common formats read by digital audio software.
Creation of new sound resources
You may be creating new audio resources for one of many reasons. Perhaps for teaching purposes, for creative output, or for starting or adding to an existing institutional collection. Aside from any artistic reasons behind choosing a file format, the same considerations should be taken as when choosing a file format for capture. Again, it is advisable that capture is done at a high quality and surrogate files for delivery or optimisation are derived from the original.
It may be the case that your recordings are intended purely for delivery to your audience and do not need to be of a particularly high standard because of the content, and for optimum accessibility. In these cases you may wish to choose lower quality capture settings and a ‘lossy' format. Although not recommended, this may prove necessary when working under time and/or resource constraints.
Care should be taken when making this decision and the following questions should be considered:
- Could the files be useful as a long-term resource?
- Will the files need to be accessed at a later date?
- Do the files have any significant value (intellectual or financial)?
- If the files have little or no value at present, could this change in time?
Depending on the objectives of the recordings you are creating, you will need to choose an appropriate wrapper format. Whereas compressed or ‘lossy' formats, such as MP3, may reduce storage space (which can be beneficial when recording to a small size memory card or flash drive for example) and offer a direct format for delivery, data is irretrievably lost from the outset. A side effect of audio data compression is the introduction of anomalies in the spectral and time domains. These are:
- Generally impossible, or at the least extremely difficult, to remove.
- Misrepresentative of the original source.
Compressed formats are therefore not suitable for archival purposes or for projects where audio fidelity is required, where a wrapper format such as WAV or BWF is recommended.
Spoken word recordings
Spoken word recordings have a much smaller frequency bandwidth than most musical or other types of recordings. The audible quality of the recording may sometimes not be as important as providing an accessible file in terms of size. Because of this, it is possible to use a lower sampling rate and bit depth as the fundamental frequency is likely to be around 100Hz and the main frequency content is made up between 200Hz - 4KHz (although there is content at higher frequencies) , and the dynamic range of the recording may not be very large. Again, it is always recommended to record at the highest quality and then compress afterwards for delivery, but this compression can generally be greater for spoken word recordings than for musical or reference material, which may have more complex harmonic content.
It is important to note that many spoken word recordings contain background noises and location ambience, which add content to the frequency spectrum and add context and an environment behind the sole recording of the voice. This information may be just as important then the words being spoken. Oral history recordings are a good example of one area where this may need to be considered.
Location recordings can be taken almost anywhere, from a bustling city centre to a hushed empty church. The variations in audible results (space, levels, timbre etc) are almost endless, and depend primarily on the microphones used, their type, arrangement and position. It is prudent, to retain some consistency and attempt to capture the widest dynamic range when selecting a capture format and settings.
A number of dedicated portable audio recorders offer sampling rates of up to and even beyond 96KHz which may suit certain applications, however a minimum of 48KHz and 24-bit accuracy is recommended for all serious recording projects, and where feasible the sampling rate can be increased. BWF can be advantageous when making field recordings on location as an instant collector for metadata, which is embedded in the file, which offers an efficient method of documenting recordings as they are created.
- At least 48KHz sampling rate and 24-bit accuracy. 96KHz, 24-bit is recommended with high quality analogue-to-digital converters
- A format which is compatible with your editing software saves time and conversion issues
Suggested formats: WAV or BWF (if archiving), or AIFF
Although at present the use of surround-sound in archives is rare, it is becoming an increasingly popular format, the evidence of which can be seen most obviously in film soundtracks, where surround-sound has become the standard.
At present there are no open standard surround-sound formats which offer full uncompressed quality, so there are difficulties associated with archiving surround-sound material.
One option available is to archive the separate channel files as open standard mono files where possible. Failing this, archiving a copy of the multi-track session in which the material was originally created, along with the resultant surround-sound master, may provide a bit more stability for long term preservation. However, for either of these options to be viable, these files will need to be obtained from the owner or composer of the work.
Uncompressed surround-sound takes up a lot of disk space and currently only relatively new media, such as the Blu-Ray discs offer the facility to store uncompressed surround sound in a portable, playable format. The two main formats that support this are DTS HD Master Audio and Dolby True HD, both proprietary formats.
When choosing a file format for delivery, the main factor is choosing a format and compression (if any) with the widest accessibility to your audience. You may find that you need to compromise on file size and quality (through compression) to effectively deliver your files. This care for accessibility can be in the form of:
- Compatibility - The file type needs to be compatible with playback device of your users
- Internet download speeds - The file size needs to be small enough to cater for all potential speeds
- The distribution media - Effective use of compiling discs/memory cards etc, with respect to optimising file sizes
An important factor in selecting a file type for delivery is whether a file can contain any embedded metadata. This is particularly important to help your audience identify the files on playback devices, where simple but essential information such as a title, date, and artist is displayed. The MP3 format uses IDE tags which can contain this embedded information.
Web ready - (web hosting, e-mail, blackboard and VLE's)
It is very likely you may wish to make your resources available via the web. This can take more than one form and the following guidelines discuss the most common forms of web delivery.
A simple method of direct delivery to individuals or groups is via e-mail. Every mail provider, such as your institutions', allows a maximum message size to be sent and received but whereas your own provider may allow sizes for your account, other peoples' (your recipients) providers may limit file sizes differently. This makes wide distribution difficult as you may find emails are bounced back to you as they could not be received.
The procedure here, therefore, is to use e-mail attachments for sending files to a small number of recipients and, where possible, only send files which are of a small size, such as ≤5mb.
It is possible to compress the file size of an audio file for more effective delivery and to get around issues with attachment size restrictions. These methods are discussed in the section: Compressing audio files for the web.
· An optimised file size for the majority of mail provider limits (≤5mb)
· A format which can be opened by major audio players
Suggested formats: MP3. An open standard compressed file type which is readable by the majority of audio file playing software. For advice on MP3 compression see the section: Compressing audio files for the web.
When audio files are hosted online, they can either be accessed via streaming the file or downloading the file. Streaming where the file is downloaded and stored in a (commonly temporary) buffer which allows playback without the whole file being downloaded. The buffer is filled with the upcoming content which is then overwritten with the next upcoming section once the first has been played back. Direct file downloading is where the complete file is fully downloaded to the local computer's hard drive before it can be played.
Streaming files are accessed through web browsers, and where proprietary file formats are used a third party browser plug-in may be required to decode the format.
There are various file formats that can be streamed per se but the most common two, which incidentally offer very low quality, are the proprietary formats Real Audio and Windows Media Audio.
For further information regarding hosting audio files within VLE's please see the advice document Audio via Blackboard
Compressing audio files for the web
Hosting uncompressed audio online requires greater storage space and is highly inefficient for users to download as opposed to compressed audio files. Though it remains a fact that audio fidelity is lost with compression, some compression algorithms are designed to minimise the effect of this whilst massively reducing the file size.
Audio files can be compressed into sizes relative to the amount of information stored per second of playback. This is known as the bit rate. The smaller the bit rate, then the less information there is to be processed, and the result is a smaller file.
For web delivery it is advisable to audibly test your files and deliver at the lowest bit rate possible whilst at the time preserving as much audio quality as you think you need.
There is an increasing need for audio, and media in general, to be played from a portable media device, such as a mobile phone or a dedicated mp3 player. It is also common for these files to be downloaded from the internet first before being transferred across. Therefore the same considerations as described in the section above apply, as well as ensuring compatibility with common devices.
Suggested formats: mp3 format. Recognised as the most common audio format for mobile devices.
Audio files can be stored on optical discs as either data files or playable audio files.
Both types of file can be read by a computer and indeed can be of the same file type. However, when writing an ‘audio disc' as opposed to a ‘data disc', within your authoring software, files are read as audio and can be played by designated audio playback devices, such as CD and DVD players. Oppositely, files read as data cannot be played by these devices, as they have not been written to contain playable audio.
The amount of data storage is dependant on the maximum file limit of the disc, and the amount of audio depends on the time taken for the disc to be optically read.
12cm Standard Audio CD
Can hold up to ≈ 700MB of data or ≈ 74minutes of audio at 44.1KHz 16-bit
12cm Standard DVD-A (DVD Audio)
Can hold up to ≈ 4.7Gb of data as well as audio with sample rates up to 19KHz and 24-bit accuracy. Commonly used for 5.1 surround sound playback.
12cm Standard SACD (Super Audio CD) - proprietary media, developed by Sony and Phillips.
Single layer can hold up to 4.7Gb of data or 100mins of audio. Double layer can hold up to 8.54Gb of data. Can store audio with sampling rates up to 2822.4Khz.
When distributing discs as playback media an uncompressed common format is recommended unless the size of the content is much greater than the capacity of the storage space available.
Recommended: WAV or BWF