Free help and advice to the UK Further and Higher Education community

Helpdesk

AAC Audio and the MP4 Media Format

Last updated: 12 February 2010
Published in: Creating new digital media | Managing your digital resources |
Tags: audio | bit depth | codec | compression | conversion | delivery | digitisation | dvd | file formats | finding audio | metadata | mobile | music | open source | software | sound recordings | standards | surround sound | video |

Comment icon Comments (0)

Summary

A guide to the creation and use of AAC compressed audio resources. AAC is the successor to the popular MP3 format, and this document explains its advantages over MP3, as well as its place within the wider MPEG-4 media family.

Introduction

Advanced Audio Coding (AAC) is a powerful and flexible standard for compressing, encoding and delivering audio - similar, but superior to MP3 - and is suitable for use in a wide variety of scenarios in teaching and learning. Forms of AAC file are widely used for online delivery of audio, as well as forming part of several video and multi-media standards. AAC offers significant reduction of audio file size while still retaining good sound quality, and as such has become the current recommended method of lossy audio compression, superceding MP3 - its direct predecessor.

Defined by the Motion Picture Experts Group (MPEG) as part of the MPEG-2 and MPEG-4 standards for compressing digital media, AAC has several distinct advantages over MP3, and though MP3 still has a high profile and strong presence in the digital audio market combined with a huge user base, there is a slow market shift to the superior AAC format. With the increased global use of other types of MPEG-4 media across all sectors of education, as well as the internet as a whole, this trend is anticipated to continue and expand.

AAC forms an integral part of the later MPEG standards for digital multimedia items, and though we won't go into too much detail here beyond their audio implementation, much of this information is also relevant within the wider context of video and multimedia.

Overview

This document aims to cover the salient points of AAC which affect the user interested in creating, encoding, tagging and using AAC and MP4 audio files. Some technical knowledge will be useful for later sections.

We'll give a potted history of the format to put it into context, and try to summarise its most important features and common forms before introducing some of the technology behind it. AAC can be used at varying levels of complexity, and in this document we will start with the basics and then progress to more advanced implementations of the standard:

AAC Basics examines the case for AAC audio, and looks at the way that Advanced Audio Coding analyses, simplifies and shrinks the audio file, and the raw output of this process.

Working with AAC will give you the knowledge you need to use and manipulate AAC audio files, encode your own audio as a listenable AAC, and to create AAC audio resources suitable for most commercial and educational uses. It also explains the variations of file type and extension which you may encounter when using AAC audio.

AAC and MPEG-4 multimedia resources looks at more advanced audio options. It also offers an overview of how AAC fits into the wider MPEG-4 multimedia family, and how it is embedded into more complex multimedia resources. This is aimed at those wanting more in-depth knowledge of AAC, and is suitable for users needing more technical detail.

AAC Key Facts

  • Successor to MP3 and Dolby AC-3
  • Audio component of MPEG-2 and MPEG-4 multimedia standards
  • Sample rates 8kHz - 96kHz
  • Bitrates 16kbps - 320kbps (stereo)
  • Up to 48 independent AAC audio channels within MP4 container
  • Common AAC audio file extensions include .aac .mp4 .m4a and .m4p

History

The AAC format was developed by a group including Fraunhofer IIS (the authors of the MP3 format), Dolby, AT&T, Sony and Nokia, and was standardized by MPEG in 1997, as Part 7 of the MPEG-2 standard (ISO13818-7). Until that point MPEG-2 had (like MPEG-1) used MP3 as its compressed audio format - albeit in a slightly updated form. AAC improved on MP3 in all areas - sound quality, coding efficiency and audio features - but at the cost of backward compatibility. AAC audio is not readable by MP3 or MPEG-1 codecs.

AAC has since become the default compressed audio format for MPEG-2 and MPEG-4, and is used for compressing the audio elements of multimedia MP4 resources.

Apple's iTunes Store was the first major commercial supplier of online audio to embrace AAC as its default audio format on its inception in 2003, and this has played a significant role in popularising AAC and shaping its development as a consumer audio format.

AAC basics

When to use AAC

If you use audio in teaching and learning in almost any capacity - either on its own, as a podcast, or as a part of video, screencast or other mixed media materials - then you should be aware of AAC as a method of compressing your audio. Reducing the size of your audio files makes them faster to download or stream, and AAC currently offers the best means of doing so. While not yet the ideal tool for all scenarios, it is very flexible and growing in usage.

As part of the MPEG-4 standard, AAC is applicable across a broad range of digital audio environments. It is related to both MP3 and Dolby AC-3 formats, and is envisaged to replace both in time, as well as enabling compressed stereo and multi-channel audio in MP4, M4A and other formats. Video users will find much of relevance here to inform the audio side of their projects, which will often feature AAC encoded sound.

AAC audio requires a compatible codec for the final user to be able to listen to it, and whilst AAC is an integral element of a major international standard, there are one or two developers who still provide only patchy support for MP4 (both audio and multimedia) - most notably Microsoft. See Compatibility.

How Advanced Audio Coding works

Like MP3, AAC encoding reduces the complexity of the audio signal by progressively simplifying or removing peripheral elements of the signal. The audio is split into various frequency bands and psycho-acoustic modelling is used to determine which bands can be least noticeably reduced in complexity, thus reducing the amount of data needed to express them in digital form, and thereby the total size of the file.

This process can be applied with varying strength, depending on the desired size of the target file's 'bitrate', measured in kilobytes per second (kbps) - ie how many thousand binary bits are used to store each second of audio. Higher compression ratios (i.e. lower bitrates and smaller files) obviously require more alteration of the original audio, and corresponding reduction of sound quality.

Archival Suitability

Any compression method which alters the audio signal to achieve filesize reduction - termed 'lossy' compression - is deemed unsuitable for archiving of sensitive audio material. This includes AAC, which despite its subjective transparency at higher bitrates is still discarding parts of the original signal.

For standards-compliant sound archiving, Broadcast WAV format should be used in accordance with the guidelines laid out by the IASA. However, if standards compliance and absolute fidelity are not required for your archiving needs, or there is insufficient storage available for the much larger uncompressed BWAV files, then you may want to consider AAC as the overall best currently available lossy compression method.

As part of a major open standard, support for AAC audio decoding and MP4 file management is anticipated to be long term.

See Uncompressed Audio File Formats for information about the choice and use of archival formats.

AAC vs MP3

The psycho-acoustic encoding principle used by AAC is similar to that used by MP3, but AAC uses an improved implementation of the psycho-acoustic encoding model and more efficient analysis and encoding, and will therefore yield better sound quality than MP3 at the same bitrate.

Though there is broad consensus that AAC improves on MP3, there is debate about the degree to which it does so. Different encoders will give different comparative results, as will tests using different sample groups and source material, so it is not possible to give concrete figures for respective bitrates at which the two formats give equivalent sound quality. A bitrate of 128kbps is often considered sufficient for a stereo AAC to attain 'transparency' (ie indiscernible from the original by the average listener), and is roughly comparable to MP3 at 160 or 192kbps. In the words of one notable developer:

"AAC is developed by same commitee as MP3 (MPEG) and by 5 most important companies in audio coding field (AT&T, FhG IIS, Sony, Nokia and Dolby) - it solves many of the issues MP3 had as a standard (like bad stereo coding) and, in general, is about 30% more efficient than MP3. This means that 96 kbps AAC file sounds as good as 128 kbps MP3 file. Furthermore, some samples that were almost impossible to code with MP3 sound very good even at medium-bit rate AAC." - Ivan Dimkovic - designer, Nero AAC encoder

AAC does have a few other quantifiable advantages over MP3. Most significantly it is capable of storing up to 48 channels of synchronous audio, compared to MP3's 2 channels. This makes it ideal for compressing surround sound mixes and other multi-channel material. A combined bitrate of 320kbps is sometimes cited as a good 'rule of thumb' for transparent 5.1 surround AAC compression. The potential of AAC for multi-channel support within the MP4 container is however well beyond 5.1 or indeed any current standard surround sound format, and therefore offers significant future scalability.

Also the metadata capabilities of MP4, while perhaps lacking the user-friendliness and simplicity of MP3's ID3 tagging system, are more flexible and standards-compliant. See below for further details.

Raw AAC audio stream - the .aac file

In its simplest form, the AAC encoding process produces an AAC audio stream. This stream consists of a series of 'frames' of compressed audio data, each of which contains the audio data itself, and technical descriptive information to allow correct playback. This stream of data - or 'bitstream' - containing the various frequency bands is decoded, and the different elements of the audio signal are recombined at the time of playback.

Many media players can decode this bitstream, and render audio from it. However, a 'raw' AAC file contains no information about the file's contents - other than the file name and the technical playback data contained in the frame headers. Most commonly (and more flexibly) AAC audio is packaged in a 'container' file with descriptive data, and sometimes images, video and/or text. These are all variations of MP4 container, recognisable by the wide range of software and devices capable of MP4 playback.

Working with AAC

Audience

AAC does not yet have the public profile of MP3, and may be seen as less accessible by some less technical users who are comfortable with MP3. You may want to consider this if using AAC, M4A or MP4 for delivery.

However, while AAC is not as ubiquitous as MP3 there are no impediments to its free use by all users, and there are several complete open-source solutions. Though not all operating systems offer MP4 capabilities 'out of the box', there are free and/or open source MP4 players available for all platforms - see Playing and managing AAC and MP4.

The MP4 container

The current AAC standard is a subsection of the MPEG-4 standard. The official MPEG-4 filetype - MP4 - can be seen as the 'parent' of AAC, and in most cases when you purchase or download an 'AAC audio' file from the internet, what you are getting is an MP4 file (albeit sometimes going by another name). This MP4 will contain primarily AAC audio, but also additional data to describe and possibly enrich it, and may have been given a non-standard file extension to denote its audio bias, and to flag it as such for particular player and/or library applications.

Whereas MP3 audio was extracted from MPEG-1 standard for use in its own right as a .mp3 audio file, and packaged with its own ID3 metadata (not a part of the MPEG-1 standard), AAC is hardly ever delivered to final users in its 'raw' form.Rather, for delivering AAC audio it is recommended that the MP4 container is used, to allow standardized metadata describing the audio to be included, as well as accompanying text (lyrics etc) and images (cover art etc). 

Until loaded into a media player application or inspected with a content management system however, the contents of an MP4 file can be rather mysterious, as the generic file extension .mp4 gives no clue as to its exact contents, which can include a very wide range of possibilities. The first large-scale supplier of audio-only MP4/AAC - Apple - therefore gave the audio MP4 files which they supplied new file extensions - .m4a and .m4p - to denote audio-specific resources. These filetypes also benefited from a default metadata schema suitable for audio, within Apple's iTunes software.

Playing and managing AAC and MPEG-4

Any software designed to play back MP4 in any form should be able to decode AAC audio, and there are free and/or even open-source players and encoders available for all platforms. That said, there are inevitable variations in implementation and codecs which may mean that your users need to install or update audiovisual playback software to give full MP4 and/or AAC compatibility. There are clearly too many potential system variations to enumerate each solution here, but these are the broad guidelines:

Browser

In-browser MP4 or AAC playback generally requires Quicktime or Flash plug-in, available free of charge for a wide range of browsers on all platforms.

Some new websites written in the latest version of HTML - HTML5 - are able to embed MP4 content without the need for an embedded player or plug-in. This should not by any means be assumed though, as this technology was first demonstrated only a short while before this document's publication!

Native Windows support for MP4 is incomplete, but can be expanded with plug-ins etc. 

"The MPEG-4 file format as defined by the MPEG-4 specification contains MPEG-4 encoded video and Advanced Audio Coding (AAC)-encoded audio content. It typically uses the .mp4 extension. Windows Media Player does not support the playback of the .mp4 file format. You can play back .mp4 media files in Windows Media Player when you install DirectShow-compatible MPEG-4 decoder packs." - extract from Information about the Multimedia file types that Windows Media Player supports Microsoft 2008

Alternatively the VLC and Foobar open source players both give good MP4 and AAC support on Windows PCs. Of the two, Foobar offers better library management and embedded metadata inspection for a library of files if you are working primarily with M4A or AAC audio, whereas VLC is aimed more at video playback and encoding. Quicktime and iTunes are also available free for Windows, and give good playback and metadata compatibility, particularly with Apple's MP4 variants. iTunes offers excellent AAC encoding and tagging features.

Mac

Given that the MP4 container was based very closely on the existing Apple Quicktime MOV container format, and that M4A and M4V are the default formats of the iTunes store, native Mac support for MPEG-4 is predictably good. Quicktime and iTunes can play all types of MP4 file, and both can encode video and audio in MP4 and AAC. Both are pre-installed on all recent Macs. VLC is available for OSX, and offers features absent from earlier free versions of Quicktime.

All Apple applications offering AAC encoding/decoding will access the Quicktime codec to do so.

Linux

In keeping with the open source ethic of Linux, there are several open source audio and media players available for the platform. VLC and xine offer multimedia playback, and Juk allows flexible organisation and inspection of the audio library.

Variations and implementations

File types using AAC audio

  • .aac - 'raw' AAC audio data stream - contains vital playback information (frame bitrate, sampling frequency, stereo mode etc) but no descriptive metadata
  • .mp4 - the official file extension for MPEG-4 - can contain any combination of video, audio, images, text and/or data
  • .m4a - audio-only version of mp4. Standards compliant and can be played by most mp4 capable media players. Functionally identical to mp4 but usually used for audio-only files.
  • .m4v - MP4 video file. Adds some features in Quicktime (including AC-3 audio support)
  • .m4p - 'Protected AAC Audio' version of mp4 - used only by Apple, and playable only on iTunes and authorised devices (mostly Apple iPod). Included proprietary DRM and therefore not MPEG-4 standards compliant. Phased out in 2007 and now obsolete, though still playable by Apple devices and software
  • .m4r - ringtone format - a size-limited form of MP4 with reduced features
  • .3gp - a form of MP4 optimised for use on mobile devices

Note: This is not an exhaustive list. While all of these types of file can contain AAC audio, it is not a requirement for any except the .aac file itself and Apple's proprietary variation .m4p; video files do not of course have to contain any audio at all, but if they do then MPEG-4 can incorporate audio in formats other than AAC - compressed, uncompressed, stereo or surround - so the presence of AAC should not be assumed.

Metadata

MP4 does not have a standardized metadata schema like the ID3 system used by MP3s, but rather stores its descriptive metadata as MPEG-4 metadata atoms. While this makes it difficult to specify a standard set of descriptive parameters for audio MP4s it gives immense flexibility to the data which the producer or user can choose to package with the audio.

Your choice of encoder/library tool (iTunes, Foobar etc) will influence the viewable and writeable fields offered to you, as each can theoretically implement its own set of tags; in practice, Apple's M4A tag set from iTunes offers a flexible and popular schema readable by the greatest range of devices and software, and iTunes is a good tool for tagging your AAC audio MP4s.

Alternatively, the open-source and cross-platform metadata parser Atomic Parsley offers low-level inspection of MP4 metadata.

The iTunes M4A schema

M4A audio downloaded from or encoded with Apple iTunes has a specific set of metadata values:

4char code

Name

Class/Flag

Appearance

©alb

Album

1

text

iTunes 4.0

©art

Artist

1

text

iTunes 4.0

aART

Album Artist

1

text

unknown

©cmt

Comment

1

text

iTunes 4.0

©day

Year

1

text

iTunes 4.0

©nam

Title

1

text

iTunes 4.0

©gen | gnre

Genre

1 | 0 1

text | uint8

iTunes 4.0

trkn

Track number

0

uint8

iTunes 4.0

disk

Disk number

0

uint8

iTunes 4.0

©wrt

Composer

1

text

iTunes 4.0

©too

Encoder

1

text

iTunes 4.0

tmpo

BPM

21

uint8

iTunes 4.0

cprt

Copyright

1

text

? iTunes 4.0

cpil

Compilation

21

uint8

iTunes 4.0

covr

Artwork

13 | 14 2

jpeg | png

iTunes 4.0

rtng

Rating/Advisory

21

uint8

iTunes 4.0

©grp

Grouping

1

text

iTunes 4.2

stik

Media Type (?)

21

uint8

unknown

pcst

Podcast

21

uint8

iTunes 4.9

catg

Category

1

text

iTunes 4.9

keyw

Keyword

1

text

iTunes 4.9

purl

Podcast URL

21 | 0 4

uint8

iTunes 4.9

egid

Episode Global Unique ID

21 | 0 4

uint8

iTunes 4.9

desc

Description

1

text

iTunes 5.0

©lyr

Lyrics

1 3

text

iTunes 5.0

tvnn

TV Network Name

1

text

iTunes 6.0

tvsh

TV Show Name

1

text

iTunes 6.0

tven

TV Episode Number

1

text

iTunes 6.0

tvsn

TV Season

21

uint8

iTunes 6.0

tves

TV Episode

21

uint8

iTunes 6.0

purd

Purchase Date

1

text

iTunes 6.0.2

pgap

Gapless Playback

21

uint8

iTunes 7.0

1 Genre comes on 2 atoms - standard genres are on gnre; custom genres are on ©gen; only 1 is permitted at a time.
2 Coverart is the only atom that permits more than 1 data child atom. If there is a limit, its > 16.
3 Lyrics is the only text atom that doesnt't fall under a 255byte limit.
4 Apple changed from the original 21 to the current 0 around the release of iTunes 6.0.3

'uint8' = 8-bit unsigned integer (ie a numerical value from 0-255)

Schema courtesy of Atomic Parsley @ sourceforge

This schema has been adopted by some other suppliers of M4A, and is the set of identifiers readable and/or writeable if you encode or manage your AAC audio with iTunes.

See AAC and MP4 multimedia resources for more information on metadata atoms and structure.

DRM, Apple and AAC

Apple were one of the the first commercial suppliers of AAC audio, and remain by far the largest, via their iTunes Store. Their interpretation and use of the standard has therefore in many ways shaped its development as a consumer audio format.

Since it inception in April 2003, Apple's iTunes Store has used AAC as its default audio format. However, between 2003 and 2007 Apple sold audio which, though encoded in AAC form, was packaged in a slightly altered form of MPEG-4 file, incorporating their proprietary 'FairPlay' technology which prevented copying and piracy. These files were called "Protected AAC" by Apple, and given the file extension .m4p; while protecting copyright holders' interests, this Digital Rights Management (DRM) also rendered the files incompatible with other manufacturers' software and hardware players, restricting M4P users to Apple's iTunes player and iPod portable devices (or one of only three licensed third party mobile devices made briefly by Motorola).

Since 2007 Apple have phased out FairPlay on the iTunes Store in most countries, and it now supplies AAC audio in unprotected M4A format, which is fully MPEG-4 compliant and compatible with other manufacturers' products. The M4A file extension is now used simply to identify the file as one formatted with Apple's own implementation and schema of MP4 tagging and metadata.

The .m4a file extension has become the de facto label for audio MPEG-4 and consumer AAC files, and is now widely used and accepted beyond Apple. When creating AAC audio resources either an mp4 or m4a extension will give compatibility with most media players.

How to make an M4A

Encoder choice

While many applications offer AAC encoding, most incorporate one of the small number of core AAC codecs available:

  • Apple software uses Quicktime's AAC encoder. When encoding or transcoding to AAC iTunes offers all of the above options plus the ability to tag the resultant M4A file with a wide range of metadata, as well as attaching lyrics, artwork and pictures etc. iTunes is proprietary software whose Macintosh version is built in to Mac OSX. The Windows version is available as a free download for Windows XP/Vista/7. There is no Linux version.

For Windows or Mac OSX, iTunes probably offers the best balance of cost (free), good user interface, flexibility, quality of results and standards compliance if working with AAC audio in M4A and MP4 forms.

  • The Nero AAC codec is also available as a free download, and can operate as a stand-alone command-line codec with no graphical user interface, or integrated into a media player such as Foobar. While giving equally excellent quality results in either implementation, the command-line interface with its lack of GUI and inability to transcode or tag its AAC audio makes Nero a more limited solution for handling AAC encoding and decoding when used in isolation, and it is recommended that it is used within a host application.
  • The Freeware Advanced Audio Encoder and Decoder (FAAC and FAAD) are free and open-source codecs used by several applications to deal with their AAC encoding and decoding requirements, including VLC and MPlayer. However, the age of the codec and its lack of development to address features of later AAC implementations mean that the quality of results is poor when compared to Quicktime or Nero.
  • Adobe Media Encoder performs AAC encoding duties for all Adobe Creative Suite applications and delivers excellent sound quality.

Types of AAC

There are several different 'profiles' for AAC encoding, which are suited to particular applications - voice compression, high ratio compression, high fidelity etc. These will be automatically selected by most encoders based on your choice of encoding options.

  • AAC-LC - Low Complexity profile, suitable for higher compression ratios.
  • AAC-LD - Low Delay profile to optimise playback speed - used for real-time applications: telephony etc.
  • HE-AAC - High Efficiency profile, introduced to the standard in 2003 the same as AAC-LC profile but now incorporating Spectral Band Replication (SBR) to improve coding efficiency. Updated in HE-AAC v2 to include Perceptual Noise Shaping (PNS) technology. SBR and PNS are both types of perceptual modelling.

Different encoding software will offer some or all of these options when encoding an AAC audio file:

Bitrate

The number of binary values which will be used to encode each second of audio - eg 128kbps (kilo-bits per second) = 128,000 bits per second. This bitrate is shared according to the number of simultaneous audio channels. Guide values:

  • <64kbps - average to low quality - some audible degradation - acceptable for non-critical streaming audio etc
  • 64-96kbps - good quality - no noticeable compression artefacts - fine for most vocal uses
  • 128-160kbps - high quality - transparent to most listeners - suitable for music and sensitive audio
  • 160kbps+ - indistinguishable from original to most listeners

Stereo Mode

Rather than recording left and right channels independently, thus doubling the amount of data, a single 'master' channel can be recorded, and the other expressed in terms of its difference from it. This is called 'Joint Stereo' mode In this way stereo signals exhibiting less stereo 'width' can be more efficiently encoded.

Encoders usually offer the ability to import as either mono or stereo AAC, and if stereo mode is selected then the AAC encoder algorithms will automatically apply whichever joint stereo technique will offer greatest data savings for each frequency band.

Variable Bitrate (VBR)

Because each frame of AAC audio specifies that frame's bitrate, the bitrate can be varied between frames. While there were inconsistent results and playback support for VBR with MP3, implementation of VBR in AAC encoding has been much improved, and all AAC players will support it. It is therefore recommended where offered.

High Efficiency (HE)

An improvement to the original Low Complexity (LC) AAC encoding process, and was introduced to the standard in 2003. Again, unless you have a specific reason not to use HE it should be selected if offered.

Voice Optimised

Some encoders offer an option to 'Optimise for Voice'. This will usually mean that the encoder algrithm will allocate greater resolution to frequency bands in the vocal range. If your material is primarily voice-based, you should select this option when encoding.

AAC and MP4 multimedia resources

MP4 anatomy - boxes in boxes

The MP4 file acts as a container for many different types of media files. As a container, an MP4 is like a big box, into which smaller boxes are placed, which will in turn contain more and more smaller boxes. Each box is labelled according to an agreed system to identify its contents; the label primarily says what kind of information is in the box (including other boxes), how to read it, and how much of it there is. Thus an MP4 is like lots of carefully labelled nested boxes.

One of these boxes contains all of the information about the audio properties of the MP4 file, including details of which version and implementation of AAC codec has been used to encode the audio (if AAC is present), how it is presented in the main multiplexed data stream,  and metadata describing its contents and technical specifications.

Other boxes contain similar information for the visual and other elements of the MP4.

These boxes are also referred to as 'atoms', and there is a list of registered atom identifiers, each of which is a set of four of the first 256 Unicode characters.

Once these atoms have all described their respective areas of the MP4's contents, a large box follows which contains the multiplexed data stream itself, which the player will now know how to decode. This is where the actual audio, video and other media streams are stored as a single multiplexed stream (see 'Multiplexing and boxes' below)

Boxes, atoms, child atoms and metadata

MP4 has open-ended metadata architecture suitable and scalable for all types of media description data.

At the beginning of an MP4 file is a box labelled 'moov'. This box contains detailed descriptions of all the various elements of the multiplexed media datastream - video, audio, image and text items etc. This information is all broken down into a heirarchy and packaged by sections - the aforementioned 'boxes'.

Here the Quicktime heritage of the MP4 container is of particular importance; the Quicktime container's equivalent of 'boxes' are called 'atoms', and their behaviour is detailed within the very comprehensive Quicktime File Format Specification. MP4 boxes are functionally identical to Quictime atoms, and share the same labelling system. The 'box within a box' in Quicktime parlance is called a 'child atom', which can itelf contain further child atoms, in an ever decreasing series, at the end of which is the atom which contains no further child atoms, but only its own data. This is termed a 'leaf atom'.

The MPEG-4 standard stipulates that an atom can contain either more atoms or atom data, but not both. In practice this rule is often broken, but this is contrary to strict MPEG-4 specifications.

There are several good analogies for the heirarchical structure of the MP4 file - nested boxes, branches, stems and leaves, etc. - but one of the best ways to get a working understanding is to download and install the aforementioned Atomic Parsley parser, print out their list of common identifiers, and inspect some of your own MP4 or M4A files.

A simplified visual representation of an MP4

This diagram shows how the boxes or atoms within an MP4 file might fit inside each other. A real MP4 will contain many more boxes, so this is simplified for clarity.

Note that the last 'mdat' box is left open-ended; in an average video MP4 the mdat box will occupy over 99.9% of the space of the entire file!

Diagram shows how boxes or atoms within an MP4 file might fit inside each other

Important boxes

Some of the registered atom identifiers used to label some of the more important boxes, and brief descriptions:

ftyp - describes the file type and compatibility of the MP4. Always present - a top level atom which always comes first.

moov - contains all the item metadata. at beginning of the MP4 file after ftyp. Contains all descriptive and technical metadata, to allow the player to use appropriate codec(s) for the various elementary streams, identify them correctly etc.

The moov box itself contains:

  • mvhd - the master header describing the movie content
  • trak - a data 'track' or stream - description of one of the elementary streams: video, audio, subtitles
  • udta - user data box (eg the box containing iTunes metadata)

data - data portion of some types of user box (eg the picture data of a covr cover art atom)

mdat - contains the multiplexed media data stream (usually by far the biggest box)

Multiplexing and boxes

The MP4 container can contain audio, video, text and other elements, which are interleaved - multiplexed or 'muxed' - with each other to allow simultaneous streaming.

Example of simple multiplexing of audio and video

 

Note - MPEG-7 metadata

As well as the 'static' metadata contained in the MP4 file header, which is used to describe all the elements therein, there is a different form of metadata which an MP4 can contain - MPEG-7 metadata. MPEG-7 is not the simple metadata descriptor of the MP4 file, but rather is an open standard for dynamically describing elements of the file's contents. Information like subtitles or transcription can be time-stamped and synchronised to the other content of the main data stream. In this sense MPEG-7 is a separate elementary stream in its own right, and can exist independently of an MP4.

References

ISO13818-3 - International Organization for Standardization 1998

Various resources - MP4 Registration Authority

Quicktime File Format Specification - Apple Inc 2007

Quicktime Container - Multimedia wiki

Overview of the MPEG-4 Standard - MPEG 2002

MPEG-4 File Structure - Atomic Parsley

Metadata Integration and Querying - A Case of MPEG-7 - Muhammad Javed 2009

MP3 and AAC Explained - Karlheinz Brandenberg

Last updated: 12 February 2010
Published in: Creating new digital media | Managing your digital resources |
Tags: audio | bit depth | codec | compression | conversion | delivery | digitisation | dvd | file formats | finding audio | metadata | mobile | music | open source | software | sound recordings | standards | surround sound | video |

Ask us a question

We provide a FREE enquiry service giving advice to the UK Further and Higher Education community.

You can ask us anything, typical questions include - "What formats should I use?" "How do I...?" "What tools can achieve the result I need?" "What is new and emerging?"

megaphone

Ask now

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Comments (0)

Post your comment

How was this document useful to you? Do you have any questions?

Name

Email (required, but will not be shown)

URL (optional)


Please note: All comments are reviewed by a moderator for approval

Related moving images advice

Related cross-media advice