Last updated: 25 August 2010
Published in:
Creating new digital media |
Tags:
audio |
audio editing |
sound recordings |
This paper is aimed at those with a low level of confidence in working with digital audio. Whereas some basic knowledge and experience is expected, the techniques and concepts adopted herein are presented for the novice. This paper intends to provide an introduction to some of the main considerations when producing digital audio after the recording stage of a spoken word project. Post-production offers a huge expanse of technical and creative capabilities where the spoken word can be fine-tuned to sound crisp and clear, blended with music, or made to sound as if it were recorded in a different physical space. This paper aims to provide simple, practical solutions to common problems faced when working with spoken word, as well as tips to enhance recordings and help you achieve the best possible result.
If you wish to skip to the section where techniques are introduced and discussed, navigate here to the section Post-production tips and techniques.
The term post-production refers to work undertaken after the process of recording (or production). In terms of digital media post-production exists to iron out any flaws and imperfections accidentally or unknowingly brought to a recording, and to enhance the qualities that were captured, presenting the material to a user as best as possible (usually through editing). Sometimes in a recording, even where no particular flaws stand out, if compared to a commercial production it may sound inferior in quality, perhaps quieter, not as clear, more muffled and maybe not as ‘bright' sounding.
Post-production work can also exist to present the impossible made possible, seamlessly piecing audio recordings recorded in separate places and times into one, blending spoken word over a musical introduction. An example of this is in commercial films where dialogue is nearly always recorded in a studio separate from the actual filming. This is then synchronised with the pictures to create consistency and realism with sound effects and environmental recordings.
Used on its own, or alongside moving images or still images, audio is the consistent element of time-based media production. Podcasts, vod-casts, PR videos, lecture recordings and screencasts all contain recorded sound.
After the recording stage, commercially produced audio typically passes through two more stages, mixing and mastering. Mixing is the process where multiple audio files are treated individually and then mixed together to create the overall sound when rendered (creation of a a final file) together. Mastering is the process applied to the final rendered audio file to further enhance the quality, and to prepare it for the desired method of delivery. Mastering is also undertaken so that multiple finished files (such as completed podcast recordings) can be heard by the listener one after the other with seamless continuity and a professional sound quality.
There are quite literally hundreds of software and hardware tools available for mixing and mastering audio. Some dedicated packages also offer specially designed templates for applying all sorts of processing to spoken word recordings, to make them ‘podcast' or ‘broadcast' ready. This document is not concerned with specific audio equipment and all of the techniques described can be applied using Audacity, the free open-source audio editing software. However, it should be noted that certain techniques may be far more intuitive to perform and may yield better results when undertaken in higher quality, paid-for audio software.
There may be times when the term ‘sound is just sound' may apply. These can be when quality is not of importance, when time and resources do not allow for any post-production work, and when coherence of the words is adequate for the use.
There is however, a wealth of techniques available to enhance and improve spoken word recordings in the post production stage, and this paper aims to highlight the most useful and effective in providing the best possible results, alongside practical technical advice with example audio files. In the commercial sector these types of 'mastering' techniques are done by a highly skilled mastering engineer, but if your requirements aren't too demanding, and with a little bit of knowledge, it is possible to utilise even the most basic audio recording software to help perform some basic mastering to your recordings.
The following section of this paper presents common problems found when producing spoken word recordings. Often there is not a ‘one-size fits all' solution when addressing the issues outlined, and occasionally a mixture of techniques should be applied for best results. With any type of recording the key to producing good sounding results is to capture the voice as best as possible at the recording stage, taking into account the following:
When attempting to produce digital audio of high quality, a useful tip to help you on your way is to find a sound file you wish yours to sound similar to in quality, and import this into your software project. Placing this on a separate track within the project you can have the track muted whilst working on your own production, but using this as a reference (solo the track when you wish to listen to it) allows to directly switch between the two to help you make your production sound similar, through the techniques introduced in this paper.
Problem - An audio file you have recorded, or your final mix file, is very quiet in comparison to commercially produced CDs or other people's recordings. Is there a way to increase the volume to its maximum capacity without distorting the sound?
Solution - The most straightforward technique is to normalise the audio file. Normalisation is a feature included in digital audio software that analyses the volume of a file and increases the overall volume so that the loudest peak within the file is boosted to the maximum available. This relative process has two main drawbacks worth noting
Unnormalised File
If you cannot see the audio player above, please use this link to download the audio file (824KB)
Normalised File
If you cannot see the audio player above, please use this link to download the audio file (824KB)
Normalising a sound file in Audacity
If you cannot see the video above, please use this link to download the video file (8.1MB)
Problem - The volume of your audio file varies a lot during the presenter's speaking. Sometimes it may be loud, other times difficult to hear. The production lacks a level of volume consistency compared to professionally produced recording.
Solution - Sound level compression (not to be confused with data compression, such as in MP3 files) is another audio processing technique that when used correctly, can iron out the dynamic changes in a voice i.e. the distinctions between quiet and loud talking. Compression is also useful to level out an audio file where more than one person is speaking, but where one voice is a lot louder than others. A common problem when using one microphone to record multiple people.
Radio broadcasts use heavy compression (a large ratio of compression) to keep the voice of a presenter at the same volume whether they are whispering or shouting, so as the listener doesn't have to pay attention to quiet words or reach for the volume dial when things get too loud.
An in depth explanation on audio signal compression can be found in our advice document Audio Processing - Dynamics and Compression. It is worth noting that most software compressors included with digital audio software packages have presets for spoken word that can help provide a starting point for the inexperienced user.
The following audio files show a speech recorded without compression and then with using the settings shown from Audacity's built in compressor. The technique used is then shown in the following screencast.
Without Compression
If you cannot see the audio player above, please use this link to download the audio file (326KB)
With Compression
If you cannot see the audio player above, please use this link to download the audio file (326KB)
Compression and Normalisation in Audacity.
If you cannot see the video above, please use this link to download the video file (8.1MB)
Possibly the main task undertaken in post-production work is editing, which includes the removal of unwanted regions of sound. Whether it be silence at the beginning or the end of a recording, or spoken mistakes which were then repeated for correction, editing is an important tool when working with spoken word. Making changes to spoken word recordings in the time domain does have one major drawback however. As we are so used to listening to people's voices, including our own, it is difficult to trick the ear to perceive an unnatural occurrence as a natural. An example of this is editing two halves of the same sentence together where both sentences were recorded separately.
Our advice document Basic Audio Editing introduces techniques useful for removing unwanted sounds and blending regions of audio together effectively.
In the following example the sound of rustling paper is edited out of a recording.
Audio with unwanted sound.
If you cannot see the audio player above, please use this link to download the audio file (298KB)
Unwanted sound removed.
If you cannot see the audio player above, please use this link to download the audio file (298KB)
These days digital audio software comes bundled with tens, if not more, of effects and signal processors, such as filters, reverb units, and EQ tools. There can often be so many available, that it can be confusing just knowing what which ones are supposed to do.
These should be applied to recordings with great care, and in the case of spoken word, if you can't think of a reason to use them, then simply don't just for the sake of it.
A common problem that people encounter when they first start making recordings is that of background noise unintentionally being picked up by microphones. This is usually due to a combination of a lack of access to soundproof recording spaces and the fact that background noise is often very unnoticeable as we are constantly used to living with, until we critically listen back to a recording.
The most effective way to tell whether your recording will pick up background noise is to undertake a test recording of silence in your recording environment and then listen carefully back on headphones, in a quiet locations.
Short impulses of sound, such as coughs or bangs are easy to remove from spoken work recordings if they don't happen at the same time as the talking. Unfortunately, unless contained to a specific frequency area (for example the buzz of a bumble bee, which is not exactly a common problem), background noise can be near-on impossible to completely remove.
The good news is that sometimes unwanted noises picked up on recordings can be reduced, making recordings more enjoyable to listen to. Most background sounds exist within a specific frequency bandwidth, such as in the range of bass frequencies, or the middle or treble. The volume of frequency areas can be reduced using equalisation (EQ) plug-ins, although if the sound of the voice exists in these areas there may be a trade-off in affecting both sounds. For further information on digital EQ and it's use, please see our advise document Digital Equalisation.
Similar to EQ, a filter is a dedicated tool to reducing the volume of specific frequencies, and can often be simpler to use than EQ tools. Image 1 below shows the setting of a simple band pass filter plug-in set to reduce traffic noise from a recording. The before and after audio files can be heard below.

Image 1 - Audacity's High-Pass Filter Plug-in
Background Noise.
If you cannot see the audio player above, please use this link to download the audio file (256KB)
Background noise reduced with a filter (note the difference in the sound of the voice compared to the audio file above).
If you cannot see the audio player above, please use this link to download the audio file (256KB)
Reducing background noise.
If you cannot see the video above, please use this link to download the video file (17MB)
Problem - How can I effectively blend in sound effects or music with my spoken word recordings?
Solution - Firstly, the relative volume of the music should be monitored in relation to the spoken word so as not to cause the listener to have to adjust their playback volume. In areas where you wish to have spoken word over the top of music, leave the volume of the voice alone, and lower the volume of the music accordingly so that the voice still be heard clearly and the music provides a background (this level might be lower than you first think!). Fading In/Out is a technique commonly used to aid the transition between voice and music (and vice versa), where music fades in (or out, if the end of the file) as the voice begins.
As previously mentioned there are many factors that contribute to the sound quality of a recording, these include the sound of the recording room and the equipment being used. Dull or poor sounding recordings can often benefit from some equalisation in post-production, a technique where frequency areas can be boosted or cut. Our advice document Digital Equalisation has a section on vocal enhancement to help improve the quality of voice recordings.
Generally speaking however, a bad recording is a bad recording. Clip distortion (introduced by recording with an input volume set too high) cannot be undone. Background noise and echo can often be impossible to completely remove, and any ‘rescue' procedures to fix bad recordings can be costly in time and expertise.
Problem - I have a recording with multiple tracks of different people speaking where different microphones were used. When I play them back together in my digital audio software it sounds unnatural due to the differences in volume of the tracks and the microphones used.
Solution - Most digital audio software allows you treat each track independently. Meaning you can adjust the volume of one without affect the volume of another. Mixing is simply the art of blending channels together until you can attain the best possible results, which in the case of spoken word, is achieving consistency between files and creating a believable environment in which the recording was done ‘as one'. Mixing the individual volumes allows you to create the volume consistency needed, whereas pan adjustment allows you to place speakers in the stereo field (between far left and far right on headphones), to create the feel of space between speakers. The audio files below demonstrate how this can be done.
If you cannot see the audio player above, please use this link to download the audio file (1.1MB)
If you cannot see the audio player above, please use this link to download the audio file (1.1MB)
Choosing the right post-production techniques to use will depend entirely on the intended outcome for your recordings, and if there are any problems you wish to remedy. Applying these techniques requires patience and a keen ear due to their subjective nature, and if in doubt, always rely on what you think sounds best. The number of post-production techniques for spoken word far exceeds those highlighted in this paper and if you have any specific needs for performing post-production techniques within your own projects then please contact us directly for advice using our helpdesk.
Last updated: 25 August 2010
Published in:
Creating new digital media |
Tags:
audio |
audio editing |
sound recordings |
We provide a FREE enquiry service giving advice to the UK Further and Higher Education community.
You can ask us anything, typical questions include - "What formats should I use?" "How do I...?" "What tools can achieve the result I need?" "What is new and emerging?"
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++