Choosing an Audio Interface - Technical Considerations
Explanation of the various specifications quoted for computer audio interfaces, and their relevance to functionality and sound quality.
When comparing different audio interfaces, many specifications will be listed for their capabilities, limits, and audio performance. These will all to a greater or lesser extent affect the suitability of an interface to a specific project or group of projects.
The quality and performance of the analogue to digital (A/D) converters will be the most critical to the sound quality of your recordings, and for archiving duties should meet the recommendations of the International Association of Sound and Audio Visual Archives (IASA).
The capabilities of an interface to provide live monitor mixes to recordists and recorded subjects, as well as outputs suitable for your chosen monitoring system, are also of great importance, especially where an interface will be used for recording performance, voiceover etc.
To assess the implications of your project objectives for your choice of interface, refer to our companion advice document Choosing an Audio Interface - Project Requirements.
Audio converters fall into two types: Analogue to Digital (AD), which convert the analogue electrical version of soundwaves from a microphone or amplifier into digital data, and Digital to Analogue (DA), which do the reverse. Each process presents different problems to designers, including quantisation noise, clock jitter, oversampling etc. Understanding the niceties of converter design is not necessary for making an informed choice of interface (though it may help you make sense of some of the more gnomic performance statistics quoted for high performance units), but it is important to be aware that AD/DA conversion is not a 'standard' procedure. The different approaches of designers and manufacturers to its various challenges can have sometimes significant impact on the audible results of the conversion process.
Most interfaces offer both AD and DA conversion. 'One-way' converters are usually esoteric high-end units, or built in to dedicated recording or playback devices (e.g. USB microphone).
There are many options for the resolution at which a waveform is digitised which will define the quality 'ceiling' for its subsequent digital lifetime. For uncompressed PCM (Pulse Code Modulated) digital capture - such as encoding to wav or aiff, which is what we are interested in here - there are two important user selectable variables:
1. Sample rate
AD converters work by repeatedly measuring the level of an analogue waveform, and presenting these readings as a succession of levels expressed as binary numbers. The 'sample rate' determines how many times per second the signal is measured, and is expressed in Hertz (times per second) or more commonly kiloHertz (thousands of times per second). The standard sampling rate for CD, for example, is 44.1kHz, meaning that the waveform level is measured 44,100 times per second.
This Audacity screenshot shows the 48kHz sampling points for a 440Hz tone, with the timeline (measured in seconds) at the top - thus in the first .001 seconds, the level is sampled 48 times:
The converter's sampling rate will determine the limits of the range of frequencies it is possible for it to present in the digital bitstream to the computer. Logic might suggest that a sampling rate of double the highest audible frequency (about 20kHz) would allow reproduction of all audible frequencies, and this limit - proposed by Swedish scientist Harry Nyqvist - was the thinking behind the choice of 44.1kHz as the sample rate for CDs.
In practice, however, other artefacts of the digital conversion process, and limits to filter design, mean that improvements in fidelity are noted when raising the sample rate to 96kHz and beyond, and though 48kHz is the minimum rate advised for archiving and critical music recording, 96kHz is becoming a de facto standard and is recommended where possible.
Recording of most other educational materials, however, and spoken word recordings in general, will usually be perfectly well served by a sample rate of 44.1 or 48kHz, especially if final delivery by any compressed medium is envisaged.
2. Bit depth
Whereas sample rate determines how often the level of the input signal is sampled (i.e. its frequency), bit depth indicates the accuracy of each level reading, and the number of discrete subdivisions within the measurement scale.
Again, the original 16 bit depth of CDs - offering 216 (65,536) discrete level values - was anticipated to offer sufficient dynamic range to capture and reproduce accurately the human listening experience. However, in practice a 24 bit depth - 224 (16,777,216) levels - offers significant improvements to most listeners. 24-bit converters' theoretical dynamic range of 144dB exceeds the current limits of circuit design and the tolerance of human hearing, whereas the maximum 96dB dynamic range of 16-bit conversion compares unfavourably on both counts.
5-bit sampling, giving 25(i.e. 32) level values
For all critical audio work 24-bit depth is preferred, and for archiving it is essential.
[Note: Bit depth of an uncompressed PCM format (wav, aiff etc) should not be confused with the bit rate of an MP3 or compressed file, which denotes data throughput rate, and is not the same thing, though it is related]
The conversion process can produce small amounts of distortion in the signal - obviously an undesirable addition. This distortion is measured at different signal levels and quoted in unit specifications as THD+N (Total Harmonic Distortion + Noise) and IMD (InterModulation Distortion). The IASA's full recommendations for THD+N and IMD performance can be found in their Guidelines on the Production and Preservation of Digital Audio Objects.
All analogue circuitry will inevitably add some noise - no matter how small - to a signal passing through it. However, the signal amplifiers/attenuators and other analogue elements of an audio interface should add as little of this noise as possible. The ratio between the maximum signal capacity of the device and it's own self-noise is called the signal to noise ratio, and defines its dynamic range.
The IASA recommend a minimum dynamic range of 115dB unweighted, 117dB A-weighted - full details as above.
Clocking and jitter
The stability of the clock which provides the reference timeline to which the sampling rate is locked can also significantly affect the objective accuracy and subjective sound quality of the recorded signal.
The degree by which the clock wanders in and out of absolute timing accuracy is called 'jitter', which results in phase noise. Most high-end interfaces will boast clocks with very low jitter, with resultant improvements to spectral purity.
The IASA recommend internal clock accuracy of better than ±25ppm, and jitter of <5ns.
[See Connectivity>Wordclock below for related information]
There will always be a small delay between the inputting of a sound to a digital audio system, and this sound being made available at the outputs of this system for monitoring. This delay is required by the system to convert the signal from analogue to digital data (and vice versa), and is referred to as latency. Ideally latency will be as small as possible.
Latency is not generally a property attributed to, or largely affected by a particular computer system or audio software (although these can have a marginal effect) but rather is primarily defined by the audio interface.*
A latency of 10 milliseconds or less is generally considered to be imperceptible by the human brain, even for critical music uses, and will allow a recording artist to monitor their signal in real time through the digitisation system, with no sensation of an 'echo'. This is critical for 'live' recording applications, where a perceptible delay between speaking/playing, and hearing the signal back in headphones can make performance awkward or impossible.
Latency is the time required by the interface to AD/DA convert a specified number of samples, and therefore (possibly counter-intuitively) working at higher sample rates, while placing greater demands on the audio system, will lower latency, as this fixed number of samples will be processed more quickly. For example at 48kHz a 512 sample buffer will take 10.67ms to process, whereas at 96kHz it will take 5.33ms. This is another additional benefit of higher sample rate recording.
Though most systems will allow you to choose the sample buffer, which will determine the system latency, if this value is set too low then the workstation will not be able to process the audio quickly enough, and the signal will begin to break up and stutter.
[* A separate manifestation of latency does take place within the software environment, which when compared to the latency caused in the AD/DA conversion process, is the comparatively small delay caused by the application of digital processing within the digital realm. However, while this system latency has implications for digital processing and mixing, it is not relevant in the context of choosing an audio interface, and its effects are covered elsewhere, in our guides to Digital Processing]
A recorded subject will very often need to hear their performance through headphones during recording, to judge their use of a microphone or the tone of their voice, or to allow them to hear their voice superimposed onto an existing backing track over which they are being overdubbed, or for other confidence reasons. This is called their 'monitor mix'. Though some inexperienced performers can find it disconcerting, headphone monitoring is often unavoidable, so extra care should be taken in setting up monitor mixes.
As explained above, some combinations of hardware and software can result in latency sufficient to introduce a noticeable delay between the 'live' input signal and the version output from the system for monitoring. This delay can be very off-putting for artists and performers, making performance difficult or impossible, especially if their performance includes percussive or rhythmic elements, where the problem is exacerbated.
To avoid these undesirable effects of latency, many interfaces offer a 'direct monitoring' facility, whereby a copy of the input signal can be routed directly to a monitor output, before digital conversion, and mixed with any existing track(s), thereby allowing latency-free monitoring.
Additionally, the monitoring needs of performer and recording engineer are quite different: the sound engineer needs to monitor the recorded signal during recording to ensure (primarily) consistency of signal and correct level adjustment, and additionally to monitor the effects of any pre-recording processing - e.g. compression and equalisation. A performer may however wish for a different balance between themselves and any backing tracks, to enable them to hear themselves better, and direct monitoring is a simple way of achieving this.
This monitoring section (from the M-Audio Fast Track Pro) shows several options:
- Stereo or mono switching
- Headphone output
- Headphone level control
- Switch to select different monitor outputs for audition direct to headphones
- Direct Monitor mix control, crossfading between PlayBack (PB) from the computer and direct signal from the inputs (IN)
- Monitor output level control
The only caveat when using direct monitoring is of course to ensure that the playback coming from the computer does not already include a monitor mix, as this will be superimposed onto the direct monitor mix, leading to phasing or flanging between the two identical but time-delayed signals. Software monitoring should be disabled if direct monitoring is being used.
When using direct monitoring the sound engineer can of course still route the monitor signal from the digital audio workstation to an alternative audio output, and then to headphones or speakers in a separate room to the performer, to enable real-time monitoring of the recorded digital file, and this is recommended where possible.
Direct monitoring via a software control panel
As a further refinement of direct monitoring, some more powerful interfaces also allow the user to route the input signal via a software mixer and its control panel, directly to the output. By routing the monitor signal before it passes into the DAW recorder this method allows for very low latency monitoring, while still allowing software control and recall of monitoring parameters and monitor mix settings - the best of both worlds.
This is such a control panel for the Apogee Duet:
Output from the Duet interface to headphones consists of a mix of playback from the audio recording software and the live input signals, which can be balanced with this simple mixer to provide a monitor mix.
Duplex refers to the ability of an interface to allow simultaneous two-way signal transfer (ie input and output). The term 'duplex' originates in the early days of electronic communication, when often only one party could speak at a time (half duplex) - e.g. CB radio. By contrast the telephone is a full duplex device, allowing both parties to speak at the same time.
In the context of an audio interface full or half duplex denoted its ability either to allow input and output signals simultaneously (full duplex), or only one or the other (half duplex).
Almost all interfaces now offer full duplex recording/playback, and it is the overwhelming norm, but some older models of soundcard do not offer full duplex. Any soundcard failing this test should be considered unsuitable for use in a modern digital audio environment.
For more background on the theory behind these and other aspects of Digital Audio, refer to our Introduction to Digital Audio.
The most common connections found on audio interfaces.
Refer to our guide to Audio Visual Signal Types and Interconnects for further details.
- Balanced XLR @ +4/-10dB line level (optionally offering microphone gain and +48V phantom power)
- Balanced/unbalanced TRS 1/4" jack
Balanced connectors are generally considered superior for analogue connections, as the balancing process increases signal strength and eliminates noise picked up in the cable run. Both 1/4" TRS jack and XLR connectors allow signal balancing, and are used in professional equipment. Balanced TRS connectors will additionally accept unbalanced mono TS jacks without compromising signal integrity or compatibility. Some units will additionally allow switching between -10dB (consumer) and +4dB (professional) I/O levels.
Microphone input(s) - if offered - should be three pin XLR, the standard microphone connector. If use with professional condenser microphones is anticipated, you should ensure that these inputs can also offer +48V phantom power.
- RCA Phono
RCA Phono plugs are a consumer connector, and generally less robust. They offer no signal balancing capability, having only two terminals, and will operate at -10dB.
Some interfaces offer the ability to apply RIAA equalisation to their phono inputs, allowing direct connection of a turntable, which otherwise requires an external phono preamp.
- Stereo headphone jack (1/4" or 3.5mm mini-jack)
Headphone outputs are usually also on 1/4" TRS jack, wired in a standard stereo configuration, but may occasionally use a 3.5mm mini TRS jack.
- S/PDIF co-axial stereo (RCA phono)
- S/PDIF optical stereo (TOSLINK)
Many interfaces offer a variety of digital connectors. SPDIF is the most common type, available on either optical TOSLINK or co-axial RCA phono connectors (though using different impedance cable to analogue phono cables). Though not directly connectable one to the other due to fundamental differences in technology, these optical and electronic signals carry similar data, and can be cross-converted with dedicated SPDIF converter units. SPDIF can carry up to a 24-bit 48kHz digital signal.
- AES/EBU stereo (XLR)
The professional stereo digital standard is AES/EBU, which uses a three-pin XLR connection. AES/EBU can also be converted to SPDIF (and vice versa) with small and relatively inexpensive dedicated units.
- ADAT 8 channel optical
- T-DIF 8 channel D-SUB
The optical TOSLINK connector is also used for multi-channel digital connection by the proprietary ADAT format, as is the 20 pin D-SUB connector for the 8 channel T-DIF digital protocol, developed by Alesis and Tascam respectively. They each carry 8 channels of up to 48kHz 24-bit digital audio, and can only be connected to other similarly equipped units or converters.
The inputs on the audio interface will determine the number and types of signals which can be recorded into the DAW at the same time. If you are using multitrack audio software (e.g. Logic, Cubase, Reaper etc) you may be able to assign more than the standard two inputs for simultaneous recording. In a situation involving recording multiple channels this will be required.
A multi channel interface (combined with suitable software) enables signals to be simultaneously recorded onto separate software tracks, for balancing and mixing at a later point; this enables the recordist to delay the critical balancing of relative levels of various elements until such a time as a considered judgement based on careful and repeated critical listening can be made.
If your interface has less inputs than required for all the sources you want to record, then the various signals can be mixed with an external mixing desk and fed to the recorder as a stereo 'submix', but the relative levels of those elements can then not be altered. When recording multiple sources simultaneously, multi-track recording is therefore almost always preferable.
If you are relying on the interface inputs to accept direct microphone input, you should also assess the quality of microphone amplification which they offer, which can vary widely. Listening tests, A/B comparison and reading reputable reviews will all assist in making these judgements. For example some units offer valve mic preamps, to add 'warmth' to the signal, whereas others will boast a 'transparent' or uncoloured sound. Your choice of preamplification will be a matter of personal taste.
Many studio-grade condenser microphones require 48V 'phantom power', which is supplied by the microphone preamplifier. If you intend to do serious voice recording you should ensure that your microphone inputs offer phantom power, to allow connection of professional mics.
Level management at all stages of the recording chain is vital to ensuring good recording quality, using the available headroom of your system and minimising background noise. Analogue inputs on your interface should offer sufficient metering for your needs, and those with adjustable input levels particularly so. If connecting microphones and setting their levels, a multi-stage LED meter or a VU meter (one with a needle) will allow much more accurate settings and result in better quality.
The interface's software control panel may offer additional metering, which should be cross-checked with analogue meters to avoid distortion or insufficient level in the input signal(s).
Multiple outputs (i.e. more than two) from an interface will usually serve one of two purposes:
To allow multiple signals to be externally routed through 'outboard' processors and/or mixed with an external mixing desk.
This practice is increasingly rare, as the quality and power of software mixing increases. It still carries relevance, but now mostly in the field of commercial music recording or large-scale live sound, neither of which needs we address here. In the majority of scenarios this practice is all but obsolete.
- To feed multiple sets of stereo monitoring speakers, or to enable...
Surround Sound monitoring
5.1 Surround Sound uses 6 speakers or channels - Left, Right, Centre, Left Surround, Right Surround (rear speakers) and LFE (Low Frequency Effects, or Sub) i.e. 5 surround or 'satellite' channels + 1 sub channel - hence the name 5.1 [note: the much more uncommon 7.1 surround sound similarly uses 7 surround channels + 1 sub channel] . Each of these speakers requires an individual and discrete signal from the digital workstation. For monitoring in surround, your interface should have at least 6 outputs, and many offer 8 to allow for a pair of stereo monitors in addition to the surround outs, or 7.1 surround.
If surround mixing or monitoring figures heavily in your workflow, you should consider an interface designed with surround applications in mind. Some units offer a dedicated set of 5.1 surround outputs in addition to stereo and headphone outputs, and a dedicated master surround sound volume control.
Most DAW applications will have a surround mode where the various channels of the 5.1 or 7.1 mix can be routed to appropriate outputs of the interface.
Audio interfaces can be connected to their host computers in a variety of ways, each of which offer advantages and disadvantages in terms of ubiquity, compatibility, and bandwidth:
PCI connections are often found on the motherboard of a 'tower' or 'desktop' computer. These are generally accessed via the rear panel, and may be covered with a blanking plate if not yet used. Consult you computer's documentation or your institution's ICT Support Team to establish how many, and what kind of PCI busses it has (PCI, PCI-X and PCI Express slots will be compatible with different versions of PCI peripheral).
PCI expansion chassis are also available, to allow the connection of PCI peripherals to computers with no PCI slots of their own (e.g. laptops), or not enough to accomodate all desired devices.
A computer's PCI slots offer wide-bandwidth data transfer to the processor. This bandwidth is required for smooth audio and video streaming, and at one time all audio and video capture cards were of the PCI type. Many video capture cards remain PCI or PCI-X based, as the bandwidth demands of video are still beyond Firewire, but most new audio interfaces exploit the newer Firewire or USB2 protocols and connectors.
Like PCI, Firewire also offers wide bandwidth, and its capacity is sufficient for almost all audio needs. Though Firewire does not quite match the PCI bus for data throughput rate, many manufacturers have moved production away from PCI units towards Firewire, due to its more portable nature and less labour-intensive installation, combined with the ability still to accommodate many simultaneous input and output channels at very high sample and bit rates.
Firewire is available in two variations - 400 and 800. Firewire 800 offers double the bandwidth of Firewire 400. Many recent high end units (e.g. Apogee Duet and Ensemble, or RME Fireface) suitable for archiving and other highly critical work now use the Firewire bus for connection.
Unlike PCI, many laptops feature Firewire and/or USB, and the connectors are small, easily accessible and 'hot swappable' - i.e. can be connected/disconnected without shutting down the computer.
Many consumer and semi-professional audio interfaces use USB to connect to the host computer. USB is perfectly sufficient for high sample rate/bit depth two channel audio, and can often offer real-time low latency monitoring of a stereo signal. Furthermore, USB2 has a larger measured bandwidth to Firewire 400, although its data packet structure differs slightly, making throughput slower.
If you work with stereo inputs and outputs a USB interface will make a cost effective solution. However, if you require multitrack input or output (including high resolution 5.1 surround monitoring) we would still recommend a Firewire audio device.
The PCMCIA card slot - which features on many laptops - has a surprisingly wide bandwidth, and a few good quality interfaces are made which exploit this often underused bus. The RME Cardbus technology is noteworthy in that it was one of the first systems to bring professional audio recording quality to mobile systems.
RME Cardbus PCMCIA interface
USB and Firewire can deliver a limited amount of low voltage power through their cable. Many smaller units can draw their modest power requirement through this connection, obviating the need for a separate mains power supply. If you intend to use this feature in an 'unplugged' location recording situation with a laptop however, consider the effect it may have on the battery life of the host machine, and conduct pre-event tests before committing, to ensure that the chosen computer can provide sufficient power for itself and bus powered peripheral(s) for a sufficient period without connection to the mains.
The use of +48V phantom power for condenser microphones connected to the interface will place still greater demands on the power supply. Though when powered by a USB or Firewire bus this polarising voltage is generally reduced to +5V to preserve battery life, you should conduct 'live' testing with all relevant peripherals in place to gauge battery performance and life
More advanced interfaces may include a Wordclock input/output on BNC connectors. The wordclock is the absolute timing reference for the digital audio system, and its accuracy and synchronicity can affect the quality of the recorded signal, as sampling rates will be derived from it.
Reference units - such as those made by RME, Prism and Apogee - are equipped with very accurate clocks, which can be used as master timing reference by any units with wordclock slave input, and the host computer, with resultant improvements in quality.
High-end workstations and studios may use a separate wordclock generator (e.g. Apogee Big Ben) as their house master clock, and slave all interfaces, CPUs and other externally clockable digital processors from it.
System compatibility and drivers
An interface's quoted System Requirements will tell you the minimum specification required of the computer to which it will be connected, and often suggested minimum requirements to allow optimal performance. Ensure that current software device drivers are offered for your combination of operating system and interface, or - in the case of Mac OSX - whether compatibility is built in to the OS, as is often the case.
Intermittent or elusive problems with an existing interface can often be solved by updating old device drivers.
In addition to the standard 'Sounds' or 'Audio/Midi Set-up' control panels or preference panes of your operating system, more advanced soundcards will usually provide their own expanded software interface, allowing access to parameters and features not present in the generic system profile. Levels for individual inputs and outputs can often be adjusted, and signals re-routed, and some include plug-in effects and mixing facilities.
High end interfaces increasingly access crucial settings through this software interface, so take into account features beyond those immediately apparent from the unit's physical controls, and familiarise yourself with the complete feature sets of your various alternatives.
If you would like some guidance and suggestions for models suitable for your specific project(s), please refer to the attached comparison chart of suggested units, or contact us for specific recommendations based on your exact requirements.