Audio Processing - Dynamics and Compression
How compression and other forms of dynamics processing work, and how they can be used to improve the impact, clarity and subjective quality of your recordings.
[Note: In the context of audio signal processing, 'compression' has a completely different connotation to 'compression' when used to describe file compression in the computer domain. Though they share the same term the two are entirely unrelated, and should not be confused!]
File size compression - 'Compression' is a term familiar to many people in the context of digital file management. It describes the process used to reduce the size of a file by reducing the complexity of its contents, or compressing the binary data used to encode it with mathematical algorithms, or sometimes both.
JPEGs, MP3s and Quicktime movies are all forms of compressed digital media files, and their popularity over the last few years - along with all the myriad other types of compressed digital data file - has cemented the term 'compression' to the computational process used to generate these files, in most people's minds.
However, in the context of audio signal processing, 'Compression' has an entirely different and unrelated meaning, which is what we examine here. Thus:
Dynamic Range Compression - When used to refer to an audio process or effect, 'compression' (or more completely 'dynamic range compression') describes a process developed during the 1920s and 30s for automatically reducing the dynamic range (the difference between the loudest and softest volumes of sound) of an audio signal.
It was originally invented to enhance the performance of early radio technology by limiting the signal level to suit the medium's then narrow dynamic range. Compression is still used heavily in radio and all broadcast media, but has also become a popular effect with audio engineers for artistic and corrective purposes.
The processor used to achieve dynamic range compression is called simply a compressor, and the earliest ones predate the advent of computing and digital audio by many years. The Western Electric 110 limiting amplifier is often considered the first commercially produced compressor, and was first made in 1937.
Simply speaking, a compressor works by monitoring its own input or output volume, and altering its signal gain (amount of amplification it applies) depending on variance in this reference level. This has the effect of making the loud parts quieter, and therefore less likely to overload a sensitive circuit, and the signal's dynamic range is literally 'compressed'.
Compression can bring many other benefits to sound quality, and since its invention it has become one of the most widely used of audio processes. Ironically though, because it is also one of the least conspicuous, it is also one of the least well understood.
Recorded, broadcast and live audio can exhibit a huge range in volumes, sometimes even with just a single speaker or performer. These variations in level pose significant problems when trying to set recording and playback levels - how to retain intelligibility, audibility and detail during quieter sections while avoiding overload when the volume increases? Dynamic variations need management.
'Dynamics Processing' describes the field of audio signal processing concerned with managing the volume level of audio signals over time. By making alterations to signal level - at speeds often measured in milliseconds - dynamics processors can dramatically alter the perceived loudness, clarity, consistency and power of live or recorded sound. When mixing signals together, correct dynamics processing can help both to integrate and separate elements of a mix - sometimes simultaneously - and an understanding of dynamics is vital to good mix engineering.
Compression and other forms of dynamics processing - limiting, gating and expansion - are used extensively in music production, broadcast and live sound, and often have their most beneficial effect on the voice, both sung and spoken. The exposed nature and wide dynamic range of the solo voice makes it a good candidate for dynamic treatment, but also the most demanding, since most people are sensitive to even subtle changes in the sound of a human voice.
So, as well as generic dynamics treatment, we will look at a couple of techniques aimed specifically at vocal processing, and will suggest some broad guidelines for the beginner, as well as tips for the experienced user.
Finally, as well as the general function of these tools, we will look at a few common scenarios and 'tricks of the trade' for enhancing and maximising your results, and for dealing with common problems, along with some audio examples.
If something sounds louder, it is louder, right? Perhaps surprisingly, this is not necessarily true. The brain processes everything we hear in a multitude of complex ways based on some very subtle audio cues indeed, to allow us to mentally construct audio pictures of the world around us
As a familiar example of psychoacoustic treatment, consider television adverts. They always seem very loud in comparison to the programs they interrupt; however, rather than being actually louder, they use very heavy compression to slam all the signal to the very top volume level available, and compress everything to a very narrow dynamic range, thereby allowing every word to be at volume '10'. Sound quality is of course compromised, and extreme examples lose their dynamic range almost entirely, but though they may not sound good, they do sound loud, which of course is the idea!
Here is a recording of my voice - the first version is the raw recording, and the second has been compressed in Logic Pro with their basic compressor plug-in at a ratio of 8:1, to give a noticeable difference in perceived volume. Note also how much 'bigger' the waveform of the compressed version looks - this reflects its extra audio energy:
These two files have identical maximum levels (as they have both been normalised to 0dB), but the second sounds louder and clearer thanks to the compressor. This is how it works:
As noted, both demo files above have been normalised to ensure that they have exactly the same dynamic range. Normalisation means that the computer has analysed the complete waveform, determined the highest level it reaches, and then increased the amplitude of the whole waveform so that the loudest point in the signal corresponds to the loudest level the system can process. Without altering the sound of the signal, the computer is making it as loud as it can be without clipping the ends off the waveform at its loudest point. In this way you can make sure that there is no wasted digital 'headroom' when playing back your files, and maximise the resolution of subsequent processing.
Normalisation of recorded files before further digital processing will allow the subsequent effects algorithms a greater range of level values to work with, and thus effectively increases the bit resolution of the process by using all available ones and zeros to describe the waveform.
Similarly, final mixes should be normalised to 0dB to ensure that your final product is using the full available dynamic range of playback equipment.
A compressor is the most common type of dynamics processor, and is the most flexible in both operation and application. It's function is broadly to control the level of louder parts of an input signal, and to reduce (or 'compress') their volume quickly and automatically when they occur, while leaving the quieter portions of the signal alone, thus allowing them to be heard more clearly. It will help to 'smooth out' the volume changes in the signal, and give a more consistent output level than the uncompressed version. By making the louder bits quieter, a compressor also makes room for the overall volume level to be turned up, and so is often used to improve percieved 'loudness' as well as making the signal level more even.
To achieve this effect an analogue compressor uses either a voltage controlled amplifier (VCA), vacuum tube (valve) amplifier or photo-resistor to 'squash' the peaks in the input signal by a predetermined amount, and at the selected speed, when prompted to do so by a level sensor built into the compressor. Think of it as being similar to a sound engineer with lightning-fast reactions sitting with his hand on the volume control, ready to turn it down if it gets too loud, then back up once the loud bits are over*.
A digital compressor replicates these functions in software, and many digital models of vintage and high-end compressors are available, which aim to reproduce the particular characteristics of these (often very expensive) hardware units. The digital environment additionally allows for functionality which would be impossible - or at least impractical and prohibitively expensive - to achieve with electronic components: audio can be split into several different frequency bands, each of which can then be individually compressed (Multi-band compression); microscopic delays can be used to allow the processor to 'predict' future level variations ('Look-ahead Limiting'), and very precise control can be offered over parameters, along with saving and recall, and libraries of preset settings.
Before any dynamics processing is applied, the output level of the signal is of course the same as the input level, thus:
As the signal passes through a standard single channel compressor, it will be affected by each of the following controls in turn [Note - not all compressors will give access to all these parameters]:
Input Level allows the user to boost or cut the 'dry' input signal level before compression takes place. Usually accompanied by a level meter.
Threshold defines the level of input signal at which the compression effect will be triggered.
[*To extend the previous analogy, this would be level at which the hair-trigger engineer decides to turn down the volume control.]
A 'Threshold' level is chosen, and when the level of the signal exceeds this threshold, the compressor reduces the volume. once the signal drops below the threshold, the compressor allows the level to return to its previous point, thus:
A and B are the points where volume reduction will be applied and then removed.
Attack Time sets the speed at which the volume of the signal will be reduced by the compressor, once the threshold is exceeded. Usually measured in milliseconds (1ms = 1/1000s). As soon as the Threshold is reached, the compressor begins to reduce the signal level, and takes the Attack Time to reach its full effect, or 'Ratio'.
Ratio defines the maximum amount by which the signal level is reduced once compression is triggered. At a setting of 2:1 the volume of the ouput signal is allowed to exceed the threshold level by only half of the amount that the uncompressed signal would do. A setting of 3:1 reduces this excession to a third, 4:1 to a quarter etc, up to a maximum setting of ∞:1, at which point the signal is not allowed to exceed the threshold level at all, a process referred to as 'brick wall limiting'.
This diagram shows the relationship between the input and output levels for three different compression ratios:
Release Time - Once the signal drops back below the threshold level, the gain reduction applied by the compressor can be removed (in the reverse procedure to its application, described above) .
In this diagram - simulating the effect of a compressor on the level of playback for a single word - the signal as before exceeds the threshold between points A and B. The blue lines at the top and bottom of this diagram represent the compressor's control over the signal level, with time A to A' being the 'Attack Time' of the compressor, and B to B' the 'Release Time'.
Occasionally, if compression is removed too suddenly then there can be audible 'breathing' or 'pumping' in the signal, as any background noise is effectively turned back up too fast. For this reason release times are usually much longer than attack times, ranging generally from 50ms+ to several seconds.
Hard knee/Soft knee
Some processors allow for a softer form of compression, where compression is introduced gradually as the signal approaches the threshold, with full ratio gain reduction only being applied at a higher volume. This is called 'soft knee' compression, whereas 'hard knee' describes the process where compression at full ratio is applied to all signals exceeding the threshold, and not at all to those below.
This diagram shows the relationship between input and output levels for hard and soft knee settings:
Soft knee compression is often considered more natural sounding.
Make-Up Gain - As the compressor's effect is to reduce the level of loud portions of the signal, and leave quieter sections alone, its overall effect will be to reduce the level of the signal. To return the signal level to its former glory, 'Make-Up Gain' is applied to compensate for the volume reduction made by the compressor circuits.
Louder portions are returned to their original peak level by make-up gain. However, as the quieter sections also benefit from make-up gain, but have undergone no compression, the overall effect is to make the whole signal sound louder (or at least more even in level).
Unlike many other forms of effects processing, dynamics processing is usually intended to be relatively unnoticeable, and the best compressors and limiters are often praised for their 'natural' or 'warm' sound. Sometimes the effect can be so transparent that the best way to gauge or measure a processor's behaviour is with visual metering:
The unit's Input Level meter shows the level of the signal before processing.
Gain Reduction Meter
The Gain Reduction meter shows how much the compressor is reducing the level by, and will therefore usually be orientated in the opposite direction to the input and output meters. This is usually the most useful meter for helping to choose the threshold and compression ratio settings, as variations in their settings will be clearly reflected by this meter's behaviour, with more activity for lower threshold settings and higher ratios.
The Output Level meter indicates the level being output from the unit after compression and make-up gain. If you aim to have this meter hitting the same peak levels as the Input Level meter, then your signal will occupy the same dynamic range, but with an increase in perceived level, as well as a more even and consistent sound.
Auditioning the effect of the processor
As already noted, the effect of a compressor can be quite subtle, and careful auditioning of the 'dry' and affected signal is recommended. Most dynamics processors and software plug-ins include an 'effect enable' or 'bypass' switch, for switching the effect in and out of the chain to allow easy comparison.
Bear in mind also that compression is not easy to 'undo' (except of course while still within a non-destructive audio editor!). Expansion can partially restore overcompressed dynamcs, but not completely, so err on the side of caution if you are unsure.
Compression is one of the most widely used vocal effects in almost all audio fields. If applied correctly it can greatly improve the subjective quality of the recorded voice, and can enhance clarity and 'intimacy', particularly with the spoken word.
The key is to be subtle with compression, and lower ratios (between about 2:1 and 5:1) tend to work better for sensitive voiceover applications, where higher ratios - while giving a greater increase in perceived loudness - can result in a loss of natural dynamics, and the voice sounding 'squashed'. If you can conciously hear the compressor altering the level, it's probably doing too much.
Having said that, the use of heavier and heavier compression and limiting in the popular media and the music industry in recent years - in the unending quest to have the 'loudest voice in the crowd' - has made use of much more extreme settings acceptable, and your or your audience's preferences may differ from mine, so experiment to see what suits your material best!
'Breathing' and 'Pumping'
While the effect of a compressor is linked to level variations in the input signal, any associated background sounds with more consistent levels will also be attenuated. As the compressor effectively turns the volume control up and down to suit the dynamic portions of the input material, this variation in the level of consistent background noise or other high frequency content can have an adverse effect.
Within complex musical material, rhythmical loud noises over a consistent 'bed' of mixed frequencies can make a compressor alter the volume in a quite unnatural way, and lead to 'breathing' cymbals or 'pumping' bass.
Slower release times can help alleviate noticeable breathing, as can Noise Gating (see below), but this is another good reason for being careful to minimise background noise in the first place.
Alternatively Multi-band compression (see below) can allow separation and independent compression of the various frequency bands, largely avoiding the interdependency of level control which leads to these problems. This makes it the ideal tool for compressing more harmonically and dynamically complex material.
A limiter is a form of compressor, but its functions are more, well... limited. Its purpose is to prevent a signal from going above its threshold level, and set an absolute limit to its dynamic range.
Some digital audio interfaces incorporate limiters to prevent clipping of the input signal and associated digital distortion. Beware however of relying too heavily on a limiter to control signal volume, and using it as a substitute for following a careful level setting procedure, as its effect can become quite audible and sound rather unnatural if used too liberally. It is, however, a useful failsafe.
A limiter is also an invaluable tool when mastering, squeezing a little more percieved volume from your final mix, while not exceeding available headroom. It will usually offer all or some of the same control parameters as a compressor, with the exception of the 'Ratio' control, which is fixed at ∞:1.
Some units - such as the Manley Variable Mu - blur the line between compressor and limiter by introducing a graduated response to input volume, applying softer compression at lower volumes, and hard limiting at high volume. This would be equivalent to a limiter, but with a very 'soft' knee, and the input/output volume response might look something like this:
To function as a perfect 'brick wall' limiter, where not even transients can exceed the threshold, a processor would have to offer true ∞:1 limiting, with an attack time of zero. To achieve the latter, an element of prescience is needed, as the volume level needs to be reduced effectively before the transient arrives to breach the threshold; this sounds logically impossible.
However, some digital processors and plug-in effects elegantly sidestep this seeming impossibility by introducing a microscopic and inaudible delay to the processed signal, and then analysing and measuring the real-time (non-delayed) version to decide when to apply limiting, thereby allowing themselves to 'predict' excessions of the threshold, and be ready and waiting with appropriate gain reduction when the delayed version arrives.
This process is appropriately called 'look-ahead' limiting, although it does not strictly speaking allow the processor to see into the future.
An expander is in some ways the opposite of a compressor (as the name might imply). It shares all the same parameters, but applies gain reduction only when the signal level is below the threshold, making the quiet bits quieter still, and increasing dynamic 'contrast'. In this way it expands the dynamic range of the signal, and can be used to partially reverse the effects of over-zealous compression, thus:
Once again, a gate will share the attack and release times and threshold control of the compressor, but this time will reduce the signal volume to zero once it drops below the threshold. Similarly to the relationship between limiter and compressor, a gate is effectively an expander with a fixed ∞:1 gain reduction ratio.
Gating is often called 'Noise Gating', as it is most commonly used to cut background hiss or other noise when the desired 'master' signal is not present.
Side-chain & Ducking Compression
Sometimes, rather than a signal's level modulating its own gain, it is desirable to have it affect the level of an entirely different signal.
For example, it is common for a radio presenter's voice to be used as a trigger to automatically reduce the volume of any music playing, which will return to its previous level when the DJ stops talking, as this improves voice separation and intelligibility; similarly for television the voice-over is often used to attenuate the background music in wildlife documentaries etc.
To achieve this, many compressors feature a 'side chain' input, which is used to connect a feed from the signal you want to trigger the compressor. It is this side chain signal whose level will be monitored for excessions of the 'Threshold' as described above, and compression then applied to the signal passing through the compressor as before.
In this example the compressor reduces the volume of the background music automatically when its side-chain signal (in this case the voiceover) exceeds the threshold and triggers the gain reduction. A long release time is set so that the music fades back in again smoothly.
[Musical excerpt from 'Time To Pretend' - Stanley/Magee/Johns used with permission]
This effect can be subtle or noticeable - depending on desired results - and is referred to as 'ducking' or 'ducking compression'. One signal 'ducks' when the other is present, then re-emerges upon its departure.
The side chain signal is used purely for monitoring and analysis, and is not output from the audio outputs of the compressor, nor itself processed in any way.
Another incarnation of side-chain compression is the process of 'de-essing', which is used to control excessive sibilance in a speaker's voice. Sibilant consonants - especially the letter 's' - can cause a microphone to emphasise the high frequencies in an unpleasant way, usually in the 5-10kHz region.
The input to the side chain is in this case a boosted and frequency filtered version of the output signal, centered on the problem frequency, which then attenuates the signal when over-sibilant 'esses' occur.
Setting up a de-esser is quite difficult and specialised use of a compressor, and most audio software packages now include preset de-essers, which simply require 'tuning' to the problem frequency range of the particular speaker.
For more complex material - especially music - having the whole signal respond to its own overall level can make for an intrusive effect. For instance, if a musical piece includes a loud snare drum, then this will become the primary trigger for the compressor threshold, and will modulate the entire audio signal, when perhaps the desired effect is to compress only its own frequency band.
Many software audio environments now offer 'Multi-band' compression, where the signal is divided into two or more separate frequency bands which are then compressed individually, allowing (for example) the string orchestra to be treated independently of the kick drum.
Multi-band compression is usually favoured for mastering and final mix compression, and can be used to alter the balance between frequency bands dynamically, based on their content.
The separate channels of a stereo or surround sound signal should not be altered independently of each other, as this will lead to problems of phase, balance and imaging. Multiple monophonic channels of compression can be linked together by use of the 'Stereo Link' connector, where one channel will become the 'master' signal, and the other(s) the 'slave(s)'.
Most software environments now offer dedicated stereo or surround compressors, so nowadays this is less of a pressing issue, but care should still be taken not to apply mono compressors to separate stereo/surround outputs without linking, even if all their settings are identical.