Work in progress

The Foundations of Digital Audio Part 1: Sampling

0 Samples and sampling

This article is intended to show very briefly how and why sampling works in the context of digital audio.

The sampling of signals is a foundation of digital audio. Sampling of an analogue audio waveform takes place just before the analogue-to-digital converter (the ADC) which converts a sample to digital. The inverse of sampling - re-construction (of the original waveform from the samples) - takes place after the digital-to-analogue converter (the DAC) turns a digital sample back into an analogue sample.

Insert illustration

Note, however that sampling remains a separate issue from the quantisation of the sample by the ADC. Similarly, re-construction is separate from the re-creation of the analogue sample by the DAC. The samples discussed in this article are all in the analogue domain. They are representations of the analogue waveform of a signal at discrete points in time.

Sampling the waveform of an audio signal followed by reconstruction is a process capable of mathematically perfect reproduction of the original signal. Yes, I do mean perfect - no gaps and no loss of audible information. The Shannon-Nyquist sampling theorem formally shows this.

In practice, and in common with all current real-world audio systems, the engineering can't yet be perfect. Sampled audio systems (e.g. digital audio via CD) need to be engineered well enough to make their imperfections inaudible. Thus some of the engineering as well as the mathematics is outlined below.

Digital and analogue systems do have different imperfections. Just how these are audibly different is often a subject of great controversy. There are many myths about digital audio. One of these is to do with the "gaps" in sampled signals and their audible effect. We will show that whatever the actual imperfections in digital audio reproduction, no gaps exist to cause audible effects.

1 Sampling in theory

It is usually accepted that the Sampling Theorem was formulated by Harry Nyquist in 1928 and proved by Claude Shannon in 1949. The proof is a consequence of the equivalence of a Fourier transform and a Fourier series for band-limited signals. These terms are explained below.

We start with the the waveform of a continuous one-channel audio signal. This original signal could be represented by a graph of its amplitude versus time. The amplitude is often represented in audio systems as a time-varying voltage, which can be shown, for example, on the display of an oscilloscope. To start off the proof we need to select a portion that starts at some time we call t=0 and goes along continuously to some later time t=T (where T is just a time limit we can pick as appropriate).

Insert illustration

It turns out we can prove that this time-limited signal can be represented by the sum of an infinite series of sine waves. These sine waves are at frequencies of 0 (i.e. DC), 1/T, 2/T, 3/T, ... n/T, etc. When time T is in seconds, frequency n/T is in cycles per second or, more formally, Hertz (Hz). As well as its own frequency, each sine wave has its own amplitude, which can be calculated from the time-limited signal.

That's a simplified version of what is called the Fourier Series. Any freshman year electronics undergraduate should be familiar with this to the extent of proving it and using it when required.

Let's label the amplitudes of each sine wave in the Fourier series as A0, A1, A2, A3, ... An, etc. If we plot this series of amplitudes on a graph of amplitude versus frequency we have drawn a discrete spectrum of the time-limited signal. The discrete points are 1/T apart on the frequency axis. The two representations of the time-limited signal - its waveform and its spectrum - are exactly equivalent: if you have one you can exactly calculate the other. In fact, requiring the time-limited signal's spectrum to be equivalent to its waveform gives you the equation that allows you to calculate the amplitudes of each spectrum component to make it just so (thus proving the Fourier series does work).

Insert illustration

It turns out the Fourier series can be mathematically generalised into the Fourier transform. By allowing the signal to extend indefinitely (i.e. allowing T to approach infinity), all of the discrete spectrum components get squeezed together (i.e. their separation, 1/T, approaches zero). In the limit they become continuous. So using the Fourier transform it is the original signal's entire waveform (not just a segment over the time interval 0 to T) that is transformed into its equivalent spectrum. But instead of a discrete spectrum the waveform is transformed into a continuous spectrum, such as you would see on the display of a spectrum analyser.

Insert illustration

The formal proof that this generalisation works is mathematically quite difficult and is probably the domain of the mathematics graduate.

However, from the generalisation it does emerge that there is also an inverse Fourier transform which transforms a signal's spectrum back to its equivalent waveform. It also emerges that that the Fourier transform and the inverse Fourier transform are exactly the same function give or take a scaling factor which depends on the units of frequency (e.g. Hertz or Radians/s) and units of time (you were using micro-fortnights weren't you, not seconds?). This looks very like what we already know: frequency = 1/time and time = 1/frequency give or take a scaling factor that depends on units. That's what the Fourier transform and its inverse do - transform a signal between waveform (amplitude versus time) and spectrum (amplitude versus frequency). So this all seems very reasonable.

Now we have to apply a condition to our original signal: select it so it has a frequency spectrum that falls entirely below some specified frequency limit, fm. By definition this means the Fourier transform of the original signal's waveform has non-zero values in the spectrum range -fm to +fm and is zero elsewhere. In practice this means we have applied a perfect low-pass filter to the original signal and removed all spectrum components above fm.

Insert illustration

A question may be asked about the significance of the mathematics' use of negative frequency. To explain, note that in some real examples, such as drill bit rotation, a negative frequency is something we can perceive in practice so it is reasonable that the generalised mathematics should distinguish this too. However for audio signals we cannot distinguish positive from negative frequency so the amplitudes of the positive and negative halves of the spectrum just need to be added back together to give us what we perceive. In practice, the Fourier spectrum of real signals is symmetrically mirrored around the zero frequency point.

Notice however that we can now apply the Fourier *series* to the continuous but finite spectrum from -fm to +fm, just like we did earlier to the continuous but finite waveform over the time period 0 to T. Remember that the Fourier transform and the inverse Fourier transform are mathematically symmetrical so applying the Fourier series "backwards" in this way seems very reasonable too.

When we do this, the continuous spectrum over -fm to +fm (which is a total bandwidth of 2fm) is represented exactly by an infinite series of discrete amplitudes along the time axis, spaced apart by 1/(2fm). This is a discrete waveform.

That's interesting. We have a discrete waveform which is equivalent to our band-limited frequency spectrum. And we also have a continuous waveform which is equivalent to the same band-limited frequency spectrum. From pure logic that means the original signal's two representations in the time domain must themselves be equivalent to each other. The only restriction we applied was that the original waveform had a band-limited spectrum and we sampled it at twice that rate.

Insert illustration

So how does the discrete waveform compare to the original continuous waveform? If you compare the Fourier transform equations with the Fourier series equations you will discover that each of the discrete waveform's amplitudes is just the same as the amplitude of the signal's continuous waveform as sampled at the relevant point in time.

Thus we have shown that sampling a signal loses no information in the band 0 to fm as long as you sample it at a frequency (call this fs) which is more than twice fm. We have therefore proved the sampling theorem.

2 Sampling in practice

The mathematics of sampling may be perfect but how about the engineering?

The sampling theorem requires perfect removal of all frequency components of the original signal above a limit fm which is half of the sampling rate. The filters that do this are analogue filters just before the sampling (which is usually done in conjunction with an ADC).

Is it possible to do this perfectly? No. Brick-wall low-pass filters are not realizable. So the real question is whether can it be done well enough so the errors do not matter?

What are the errors? The errors are called aliases. Take a signal at fm+delta. In mathematics delta is conventionally used for a small increment so this is a frequency just outside the desired band. It turns out this signal cannot be distinguished from one at fm-delta, just inside our band, and thus it contaminates the signal. The filters which prevent this aliasing are called anti-alias filters.

How small an error is small enough? Human hearing has an effective dynamic range of about 120 dB. Aliasing errors below -120 dB referred to peak signal levels will therefore not be perceived. In some situations we do not have to go as far as -120 dB because of other details of human hearing, but we do know there is a limit we can use below which aliasing errors are not audibly significant.

For CD the sampling frequency, fs, is set at 44.1 kHz so the band limit, fm, is at 22.05 kHz. The upper human audibility limit in frequency is actually considered to be 20 kHz, some 2.05 kHz below fm. The aliases which fall at 20 kHz and below therefore originate 2.05 kHz above fm. That is at 24.1 kHz upwards.

The anti-alias filter therefore does not need to look like a brick wall and has "dead" band in the frequency spectrum to use for the filter's gain to roll off. It uses this space to arrange to be as close as possible to perfect at 20 kHz (i.e. 0 dB loss and zero phase shift) while having a loss of 120 dB at 24.1 kHz. However, achieving 120 dB loss at 24.1 kHz while being perfect at 20kHz is rather hard and the performance at 20 kHz and at 24.1 kHz may both be compromised. If so we damage the signals we want by the filter and by the residual aliases. Although aliasing damage cannot be fixed once it happens, there are ways to avoid it.

Oversampling the original signal, say x2 at 88.2 kHz, makes the anti-alias filter much easier and in the digital domain you can easily downsample back to 44.1 kHz. You now need to get from 0 dB loss at 20 kHz to 120 dB loss by 68.2 kHz. Doing this over a 48.2 kHz band is much easier than over 4.1 kHz. Oversampling at x4 (176.4 kHz) is even better. Non-integer oversampling is also possible (e.g. 96 kHz and 192 kHz) with interpolation in the digital domain to create 44.1 kHz samples.

With the use of oversampling, anti-alias filters work well enough to create substantially no audible problem. Some do claim to be able to hear the difference between different anti-alias filters. This may be possible on at least two reasonable grounds: the filter's interference in amplitude or phase with the audible part of the signal; or the filter's failure to attenuate aliased signals sufficiently to be inaudible. Another mechanism may be intermodulation in the filter of signals that are being rejected so creating audible effects. The author has seen some claims of this type of imperfection.

3 Reconstruction in theory

Mathematically, reconstruction fills in all of the gaps in the sampled signal and makes it continuous again. In practice analogue electrical signals are always continuous. For example, voltage representing the amplitude of a waveform doesn't cease to exist between between the samples regardless of the mathematics of discrete samples. It's actually the case that that our real-life samples match the original waveform at regular intervals and have an undefined waveform in between.

Remember, however, that all of the relevant information from the original signal is present in those samples. The existence of undefined waveforms between them (the "gaps") actually leaves nothing out (on the condition the original signal is appropriately band-limited). However we do have to reconstruct the original waveform to follow the theory, so the task is to reconstruct it between the samples but insert no audible extraneous information, particularly nothing in the original bandwidth up to fs/2.

As it happens, the full derivation of the sampling theorem also provides an equation for reconstruction. It tells you to put the train of samples through a perfect brick-wall low-pass filter which cuts off at half the sampling frequency, fs. Mathematically, this will perfectly reconstruct the original continuous signal.

This looks perfectly reasonable. We have proved that the series of samples contain all of the information needed to reconstruct the signal provided the signal contains no information above fs/2.

In fact we can reconstruct the original signal by filtering any waveform that passes through the sample points. To appreciate this, consider that any such waveform substituted in the place of our original signal would have resulted in exactly the same set of samples. So any such waveform must contain all of the information within the original bandwidth (up to fs/2). The only difference between such waveforms is in the band above fs/2. Only the original waveform contains no information above fs/2 so cutting off everything above must perform reconstruction perfectly.

4 Reconstruction in practice

The trouble again is that no-one can produce a perfect brick-wall reconstruction filter. You have to ask once more how good does the filter have to be for its imperfections to be inaudible.

To consider this you can ask whether a reconstruction filter is needed at all for audio, Any waveform through the sample points (including the real-world representation of the samples themselves) perfectly reproduces the signal's information in the band 0 to fs/2 (0 to 22.05 kHz for CD). Surely, we cannot hear anything inserted above 22.05 kHz anyway, so will our ears not act as perfect reconstruction filters?

Well, intermodulation in electronic amplifiers is a problem (and perhaps in loudspeakers too). We do need a reconstruction filter to prevent the high-frequency signals from exceeding the limits of the electronics.

Upsampling comes to our rescue ... (to be completed).

John A. Phillips, 25nd August 2004.