Free help and advice to the UK Further and Higher Education community

Helpdesk

Introduction to Digital Video

Last updated: 29 January 2009
Published in: Creating new digital media | Finding and using digital media
Tags: codec | compression | standards | video | white balance

Comment icon Comments (0)

Summary

We look at a variety of concepts and issues relating to digital video.  The intention is to give the reader enough knowledge to make informed choices about digital video equipment, to create digital video and to read more advanced documents about digital video on the JISC Digital Media website.

Contents

Introduction

This document is intended to serve as an entry-level document for people considering working with digital video for the first time.  We are only concerned here with developing an understanding of what video is and how it works.  For information on the practical aspects of shooting, editing and digitising videos the reader is directed to the documents Basic Guide to Shooting Video, Basic Guide to Video Editing and Deciding to Digitise.
This document looks at a number of different terms and concepts the knowledge of which should increase the reader’s understanding of how video works, what to look for in a camera and more.

What is Video?

Both film and video are methods of recording moving images and sound.  The fundamental difference between the two is that film captures images by using a chemical process and video does so electronically (film uses a variety of methods of recording sound, some of which are essentially identical to video).  More importantly, the stored information can be viewed through a purely optical process with film - in fact, you can look at a piece of film and see the still images which make up the information - whereas with video the information needs to be transformed electronically into images.
Originally video was stored in an analogue format on magnetic tape, but this is no longer necessarily the case.  Video may be stored on tape in analogue or digital formats, on CDs, DVD, on computer disks or flash media.  It may also be encoded onto these different media using a variety of methods.  In all of these cases it remains video, because it has been captured and stored electronically.
This topic is, understandably, enormous, so to break it down into digestible chunks, we shall look at the creation of video in the two actions which occur in a camcorder: image acquisition and recording/storage.

Image Acquisition

Video is shot, unsurprisingly, with a video camera.  This consists of an optical front end which allows us to control focus and exposure and an electronic rear end which converts the optical information into an electronic signal.
The front end of a video camera is essentially identical to that of a film camera and, indeed, a still camera.  As a result we will not discuss optics, exposure, depth of field, etc. here but rather refer the reader to an introduction to photography for a discussion of these topics.
We will, however look at topics specific to the electronic nature of video.  We recognise that even in doing this there is some overlap with modern digital still cameras.

1. The Charge-Coupled Device (CCD)

At the heart of a modern video camera are one or more CCDs.  These serve the same purpose that film does in an optical camera: they react to the image being projected upon them by the camera’s optics.  Whereas the amount of detail that film can record is determined by the size of the crystals of silver salt which coat the surface of the film, the amount of detail recorded by a CCD is determined by the number of pixel sensors on its surface.  Thus, while the quality of a film image is determined in part by the type of film used, the quality of a video image is fixed and inherent in the camera.

Cameras can be classified as either one-chip or three-chip devices, depending on how many CCDs they use.  A one-chip camera has a coloured mask over the chip (called a Bayer filter) which results in some pixels being used to record red information, some blue and some green.  As a result the colour resolution of the image will be less than the monochrome resolution.  A similar reduction in colour information occurs when the video signal is recorded as we shall see when discussing chroma subsampling below.

Higher picture quality can be achieved with a three-chip camera, which has dedicated red, green and blue CCDs.  Even if the chips are the same resolution as that on a one-chip camera, the picture quality will tend to appear better because of the higher colour resolution.  For a discussion relevant to this, see the section below on chroma subsampling.

2. Digital vs. Optical Zoom

Often the specifications of a video camera will include both digital and optical zoom figures.  It is important to understand the difference between these.

The optical zoom refers to the amount the image can be magnified by the camera lens.  The sharpness of the image is limited only by the quality of the lens; as an approximation we can consider that the sharpness of the image is unchanged by zooming in optically.

A digital zoom, however, works by taking the information captured by the CCD and using only a part of it.  For example, a 2x digital zoom will take ¼ of the image captured by the CCD (i.e. half the height times half the width) and use that for the whole image.  As a result the sharpness of the image will be decreased by the same amount that the image is magnified by.

The upshot of this is that using a digital zoom seriously degrades the quality of an image and is to be discouraged unless there is a compelling reason to use it.  Furthermore, when acquiring a video camera one should always consider how much optical zoom is provided by the lens, as this is a much more useful feature.

3. White Balance

Video cameras register colour information in much the same way that our eyes do, by measuring the amounts of red, green and blue in light. Simply put, because the colour white is actually a combination of all three of these colours, it can be used as a benchmark with which to calibrate the camera’s colour ‘perception.’  This is particularly useful because the colour information that the camera measures in a scene will vary greatly with the type of light which is illuminating that scene.  For example, a white sheet of paper lit by incandescent bulbs will have a much higher amount of red in it than it would in daylight.  Similarly, the same sheet of paper lit by a fluorescent light will have a much higher amount of green.  This is looked at in greater depth in the paper Colour Theory: Understanding and Modelling Colour.

In order that images we record always look the same regardless of the light source, it is necessary to do a white balance.  Many consumer video cameras automatically do a white balance or have presets from which the user may select the most appropriate one.  However, the best results are achieved when the user does a manual white balance.  Quite simply, we point the camera at a sheet of white paper, a white wall or anything which is pure white and push a button to tell it that this is what it should record as being white.

Care should be taken to change the white balance when shooting in a different location with different light - unless, of course, the user wants the illumination to look different.

4. Outputs: Composite, S-video (Y/C), IEEE 1394 (FireWire, i.LINK), USB, Component

Modern camcorders provide the facility to play back the material they record.  This enables the user to make copies of the material or to load it onto a computer in order to convert it into another format or to edit it.

Camcorders can have a number of different connectors on them for the purpose of playing back material; note, however, that it is uncommon for these connectors to be able to be used to record material onto the camera.  Note, too, that all of these connectors with the exception of IEEE 1394 provide only picture information: sound is provided through other connectors.

The composite output provides the lowest quality picture, but is the most commonly found.  It is often accompanied by two audio outputs as well.  All three outputs are identical in appearance (RCA phono sockets), but the composite output is normally coloured yellow, while the two audio sockets are white and red for channels 1 and 2 respectively.

composite sockets
From left to right: Ch-2 (right) audio, Ch-1 (left) audio, composite video.

Many modern camcorders save space by combining the composite and audio outputs into a single mini-jack socket, properly referred to as a TRRS connector.

Slightly higher picture quality is provided by the S-video output.  This is a 4-pin DIN socket.  S-video is also known as Y/C.

Composite and S-video outputs can be found on older analogue recorders as well as modern digital ones.  The signals which appear at the composite and S-video outputs are independent of the way in which the information is recorded on the tape.  Thus one can use material recorded in an older format in a digital editing system just as easily as one can new material: all that is needed is a means of playing the tape into the system.

   composite cable   S-video and other outputs
LEFT: From top to bottom: TRRS connector for composite video and 2 channels of audio, IEEE 1394 connector for digital video, USB connector for still images.
CENTRE: Composite cable with TRRS jack at one end and three RCA phono plugs (composite and 2 channels of audio at the other.
RIGHT: From top to bottom: IEEE 1394 connector for digital video, LANC connector for camera control, TRRS connector for composite video and 2 channels of audio, S-video connector.

The highest quality picture (and sound) information commonly available on camcorders is provided by the IEEE 1394 output.  IEEE 1394 is also commonly known as FireWire or i.LINK and on some camcorders is simply identified as DV.  Unlike composite and S-video, IEEE 1394 is a digital output and so there is no information lost in transfer.  Also unlike the other two the IEEE 1394 output provides audio as well as video information.

A number of less expensive video cameras are now using a USB connector to download video and audio.  As with IEEE 1394, USB is a digital output, so there is no loss in transfer.

USB output
USB output on a compact video camera.

Mention should also be made here of component output.  This is normally only encountered on professional equipment.  It again only provides picture information.  The component output consists of three sockets, either BNCs or RCAs.  Care should be taken not to confuse the two terms ‘component’ and ‘composite.’

How Video is Recorded and Stored

First of all, it should be noted that all tape-based video cameras record audio and video separately.  These cameras usually record two tracks of audio at either 32KHz or 48 KHz, independent of the way video is recorded.

Many disc-based recorders, however, multiplex the audio information into the video file.  Thus the recording will consist of a single file with both video and audio information.  The audio information can then be extracted in an editing program or dealt with as part of the recording.

1. Analogue vs. Digital

It may be helpful to briefly discuss the difference between analogue and digital recording and why the latter has become the standard method of recording in the audio and video worlds.

Imagine that you are listening to a continuous tone which can change in volume.  You then draw a line on a piece of paper which plots the loudness of the tone against time.  To do this, you drag your pencil from the left side of the paper to the right side, moving it upwards as the tone gets louder and downwards as it gets quieter.  When you are finished you will have a wavy line on a sheet of paper which is a very basic analogue ‘recording’ of the tone.

Next, take a new piece of paper, place it over the first and trace the line you just drew.  You have now made a copy of your recording.

Finally, show this paper to someone who can control the volume of the tone with a knob and they can recreate the changes in volume you heard by ‘following’ the line on the paper with corresponding twists of the volume knob.  They have now done a playback of your recording.

There are certain problems associated with this method of recording, copying and playback.  First of all, since you are drawing this line freehand, the accuracy with which you represent the changes in volume may be less than perfect.  Even if you were able to draw a perfect line, there is no absolute reference which states that this position on the page corresponds to this volume and so on.

Second, when you trace the line on the second paper, you may not have performed an exact trace.  Even if you do, the paper might not be exactly positioned over the original, thus throwing the entire trace out.  The thickness of the line you are drawing may also be enough that there is some uncertainty in the exact positions the line is meant to represent.

Finally, when the other person attempts to recreate the tone by looking at the copy of your original line, they may not interpret the line in exactly the way you drew it.  Their values for the quietest and loudest settings may differ from yours and they may twist the dial differently from the way you would.

Now consider another way of recording this tone.  Suppose you have a display that gives you a number from 1 to 16 depending on how loud the tone is.  Instead of drawing a line on the paper you can simply write down the numbers as they appear on the display.  You have now made a digital recording.

We can now take another piece of paper and place it over the original (or not: it doesn’t really matter) and trace (or copy) the numbers onto the original sheet.  You have again made a copy.

Finally you hand the second sheet of paper to someone who controls a volume knob which is connected to a display like the one which was used to indicate how loud the tone was.  They read the numbers on the sheet of paper and twist the dial so that the display agrees with the numbers.  They have now done a playback.

Certain advantages over the analogue recording, copying and playback are immediately evident.  The numbers you recorded on the sheet aren’t subject to your interpretation or inaccuracy the way the line was.  You simply record a number and that number is correct.

Copying the recording also becomes error-free.  It doesn’t matter if you don’t trace the numbers exactly - or even if you don’t trace them at all.  The numbers can still be read as numbers.  In fact, you can copy the numbers onto a new sheet, then copy the numbers from the new sheet onto a third sheet and so on until you have made a hundredth-generation copy of the original and the numbers will still be exactly the same as the originals.

Finally, error is also eliminated from the playback.  The operator of the knob doesn’t have to interpret your line precisely.  As long as the numbers on the display correspond with the numbers on the paper, playback is accurate.

Of course, even digital recording has its potential drawbacks.  When error is introduced into the analogue recording it is minor.  Not so with digital.  If, for example, when you copy your numbers onto the second sheet you misread 2 as 5, the resultant error will be very glaring.  Despite this, the advantages of digital so far outweigh the disadvantages that all media have being moving inexorably towards digital for years now.  CDs and MP3s have replaced LPs, DVDs have replaced VHS tapes and rasterised computer monitors have replaced CRTs.

2. Standards, Encodings, Compression, Media and Formats

The recording of video is an area that it is easy to get very confused by.  This can stem in part from an unclear understanding of the difference between standards, encodings, compression, media and formats.  We shall take a simplified look at these five terms in an attempt to avoid this confusion.
Images can be displayed on a TV screen in a number of different ways.  Variables include:
· the number of times a second that the picture is redrawn
· the number of horizontal lines that make up the display
· the ratio of screen width to screen height (the aspect ratio)
· the order in which these lines are redrawn: either in one drawing cycle all at the same (progressive) or in two drawing cycles alternating between all the odd lines and all the even lines (interlaced)
· the way the information contained in these lines is encoded.
Several standards were established which specified sets of values for these variables.  We will only look at three: PAL, NTSC and SECAM.  Furthermore, we will concern ourselves here with the informal use of these terms, for a comprehensive discussion is beyond the scope of this document.

For our purposes, PAL can be understood as referring to a television system where a 625-line image is refreshed 50 times a second.  Of these 625 lines, only 576 will be displayed on a normal TV.  Also, only half of the image is redrawn at each refresh, so a complete new picture occurs only 25 times a second: this is called an interlaced image.  As a result, in modern parlance, PAL is known as 576i/25 (and sometimes, incorrectly, as 576i/50).

Similarly, NTSC is used to refer to a system where the image has 525 lines and is refreshed 60 times a second.  Because this is also interlaced and only 480 lines are normally seen, it is known as 480i/30 (again, sometimes incorrectly as 480i/60).

Clearly the standard used will determine the type of equipment the video or television can be displayed on.  In general, European (i.e. PAL) televisions are capable of displaying NTSC video whilst North American (i.e. NTSC) televisions cannot display PAL video.  This does not, however, apply to computer monitors, which can generally display both standards and HD ones as well.

There are also newer standards which have been developed in recent years, particularly with the advent of high-definition TV.  The most important ones for the UK are 720p/25, 720p/50, 1080i/25, 1080p/25 and 1080i/50.  Here the ‘p’ refers to progressive, rather than interlaced, scan, where the image is drawn all at once rather than in two passes.

While it is possible to convert between the different standards using programs such as DVFilm Atlantis or VLC Player, this can be a complicated process and the quality of the recording will always suffer at least a little.  If one is shooting video for use in Europe it is definitely not a good idea to do so using a camera purchased in North America.  The problems associated with the different standards far outweigh any financial incentive there may be for purchasing equipment from elsewhere.

In addition to the standard, we must concern ourselves with the encoding used to store the video information onto a medium.  This is the way the video information is represented on the medium.  Any recording can be viewed as an encoding: the pencil line drawn in our analogue vs. digital example is a type of encoding.

Tied up with encodings is the concept of compression.  Video information can be stored on a medium in an uncompressed format, that is, a format in which there is a one-to-one mapping between the picture information and the data stored.  However, it is much more common to have a degree of compression, i.e. reduction in the amount of space needed to store the information on the medium.  Often this compression is transparent to the user: for example, video recorded in the DigiBeta format is done so with a 2:1 compression which is decompressed on playback.  DV is even more extreme, using 5:1 compression.

Where the issue of compression gets complicated is in the area of video on the Internet.  Whereas a videotape format like DigiBeta or DV or the standard DVD format have codecs (i.e. algorithms that perform compression and decompression) associated with them, there is no such universal standard for the Internet.  Devices such as 3G phones and handheld devices have many different codecs associated with them as well.

These different codecs have different characteristics and as a result different strengths and weaknesses.  One may give excellent colour resolution but perform poorly at encoding motion.  Another may work extremely well with motion but take a prohibitively long time to encode.  For any given job, it becomes important to consider and prioritise the various demands which will be made of the codec.

The medium on which video is recorded is generally either tape, disc or solid state.  The format on which video is recorded is a particular specification for a tape, disc or solid state medium.  This specification may include the size of the medium, the way in which it is utilised (e.g. tape speed, disc capacity) and the encoding which is employed.

Tape media have been around since the beginning of video and so it is not surprising that there are many more tape formats than there are of the other two combined.  Currently the most popular tape formats are probably DV, DVCAM and DigiBeta.  An illustration of the difference between medium and format can be made by comparing the two formats DV and DVCAM.  Whilst the two formats employ different encodings, they use tapes which are exactly the same size.  Indeed, while Sony has a slightly different specification for the actual coating on the two tapes and insists that the two formats shouldn’t be mixed, one can in fact record DV on a DVCAM tape and vice versa.
Older camcorders, i.e. ones which record in analogue formats such as VHS-C, S-VHS-C, Video8 and Hi8, all use tape media.  Whilst the picture quality of these formats does not compare to that of a modern digital camcorder, recorded material may still be used as long as there is a camera or other means of playing back the tapes.

The most common removeable disc media are different versions of DVD, including DVD-R, DVD+R, DVD-RW and DVD+RW, as well as Blu-ray (BD) and Professional Disc (PFD), used in XDCAM and XDCAM HD cameras. In addition, a number of camcorders now contain internal hard disc drives (HDD).

A distinction may be made here between acquisition media and storage media.  Acquisition media are media which are used in association with cameras for the recording of video as it is being shot; obviously these media are also used for storage of the footage which is shot on them.  Storage media are media which are used solely to store the finished products made from video.  In the case of tapes, the distinction between the two media is often simply one of size.  The smaller tape sizes in DV, DVCAM and DigiBeta are all acquisition formats.  In addition, however, there are also larger-sized versions of all three of these formats which are used solely for storage.

In the case of discs, PFD is an acquisition format but both BD and DVD can be either.

Solid state media include SxS cards (used in XDCAM EX cameras), P2, Memory Stick and the MMC family of memory cards (SD, miniSD, microSD, SDHC).  All are acquisition formats.

For a detailed look the reader is directed to the article Digital and Analogue Media for Video.

3. Colour Systems

An exhaustive discussion of this complex area is beyond the scope of this document.  It is, however, useful to understand a bit about how colour is encoded and the terminology used to describe these encodings.

As described above, colour is initially recorded by a camera in much the same way that the human eye does, as three sets of values giving the intensity of the red, green and blue components of the light.  The obvious way to store this information digitally is as three numbers giving these values.  This system is known, unsurprisingly, as RGB.

The most common types of RGB system are 24 bit systems which store 8 bits of information each for the R, G and B signals, giving 256 possible values for each and 16,777,216 possible colours (referred to on computer displays as “Millions of Colours.”

Not only do video cameras capture light using the RGB model, computer monitors and television screens display using this system.  However what happens between the camera and the screen is a bit more complicated.

The RGB system has two characteristics which have rendered it unsuitable for the storage or transmission of colour information.  The first is that it takes up a relatively large amount of size: it is not a terribly compact way of storing this information.  The second is that there is no simple way to extract a black and white signal from it.  It is particularly because of this second characteristic that RGB was not adopted for colour television, as it was necessary to encode colour in a way such that black & white televisions would still work properly.

Each of the three TV systems, NTSC, PAL and SECAM, developed its own colour system, respectively YIQ, YUV and YDBDR.  What they have in common is that they all separate the brightness of the image (called luminance) from the colour content (called chrominance).  In all three cases the “Y” is the name of the luminance signal and the chrominance is encoded in the other two signals, I and Q for NTSC, U and V for PAL and DB and DR for SECAM.  There are numerous other colour systems used, for example in analogue and digital component video.  These include YPBPR, YCBCR and Y B-Y R-Y.

The separation of colour into luminance and chrominance does more than enable black and white televisions to receive colour signals.  It results in a much more compact signal, since the luminance information is replicated three times in an RGB model.

Further reduction in signal size is also possible due to the fact that the eye requires much less colour information than luminance information to see an image sharply: there are approximately 20 times as many rods (the ‘black and white’ receptors) in the human eye as there are cones (the colour receptors).  As a result, we can thus provide far less colour information than luminance information without any perceived difference in image quality.  This is done through a method called chroma subsampling.

In chroma subsampling, whilst every pixel has its luminance measured, some pixels do not have their chrominance measured.  The degree to which this subsampling occurs is expressed by a ratio of three numbers.  When no chroma subsampling is carried out, the ratio is 4:4:4.  Although there are a number of subsampling schemes, there are two which are of interest to us.  4:2:2 is used in DigiBeta and XDCAM.  4:2:0 is used in DV, DVCAM, HDV and all MPEG codecs.  The amount of reduction of data can be found by adding the three numbers together and dividing by 12.  thus 4:4:4 involves no data reduction, 4:2:2 a reduction of 1/3 and 4:2:0 a reduction of ½.

It has been claimed that the combination of chroma subsampling and file compression yields far poorer results than these figures might suggest and that with certain types of compression greater space savings may be obtained by not performing subsampling and then using more extreme compression.  Nonetheless, subsampling is a standard part of data reduction in current schemes for recording video.

4. Aspect Ratio

Aspect ratio is the ratio of the width of the image to the height.  In video, the two aspect ratios normally encountered are 4:3 and 16:9.  4:3 is the standard ratio we are used to from normal television screens, whilst 16:9 is the most common widescreen ratio.  Most modern video cameras give the user a choice of which aspect ratio to shoot in.

It is important to consider both the subject matter of your video and where and how it is to be distributed before deciding on an aspect ratio.  The wider aspect ratio is generally used because it makes a video look more cinematic.  However, 16:9 videos are usually recorded by squeezing the horizontal dimension of the image down, with the result that the horizontal resolution of the image will be less than that of a 4:3 image of the same height.  Similarly, if the 16:9 image is recorded by letterboxing, i.e. not using the top and bottom extremes of the frame, the resolution will be reduced.

If your video is being posted on YouTube, a 4:3 image will be smaller than a corresponding 16:9 one.  If your video consists of the contents of a PowerPoint presentation, the extra image space on the sides of a 16:9 frame will be wasted, as PowerPoint is 4:3.

If you wish to make a video in 16:9 but your camera can only handle 4:3
you can shoot what is termed 16:9 safe.  By shooting your video such that none of the action you want to record occurs in the top or bottom extremes of the frame, it is possible to crop the video in post-production, retaining a 16:9 image.  Note, however, that the resolution of the final video will be significantly less that that of the camera it was shot on.

good composition
The above left-hand photo is shot 16:9 safe so that when it is cropped to 16:9 no important information is lost.

bad composition
The above left-hand photo is not shot 16:9 safe.  As a result, important information is lost when it is cropped.

5. Audio Sample Rate

Many video cameras give you a choice of the audio sampling rate, either 32 KHz (Kilohertz) or 48 KHz.  The latter theoretically gives better audio results, although for most purposes the difference is at best minimal.  The important thing to remember is that if the video is intended to be edited it may be advisable to ensure that all cameras are shooting at the same sampling rate to avoid any possible incompatibilities.

Conclusion

We have looked at a variety of concepts and issues relating to digital video with the intention of giving the reader a basic understanding of how digital video works and what aspects of it the user should be concerned with.  It is not possible to produce an encyclopaedic document in the space available, but the reader should have enough information to facilitate both the use of digital video and the studying of more advanced documents on the JISC Digital Media website.

 

Last updated: 29 January 2009
Published in: Creating new digital media | Finding and using digital media
Tags: codec | compression | standards | video | white balance

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Comments (0)

Post your comment

How was this document useful to you? Do you have any questions?

Name

Email (required, but will not be shown)

URL (optional)


Please note: All comments are reviewed by a moderator for approval

Related moving images advice

Related audio advice