Performance of audio resampling software

1 Introduction

I have a lot of 16-bit stereo audio files to convert from 48 kHz sample rate to 44.1 kHz. There are many audio resampling programs to choose from, so I have been testing a few packages to see which to use. These are the candidates I found with a quick search:

By the way I use Linux here so it's software available for Linux that I have been testing. Also, I use Audacity under Linux for audio editing and that too has a resampler. However I am currently testing only command-line resamplers.

2 Frequency response and simple aliasing

2.1 Method

The resamplers were all asked to process a 16-bit mono 48 kHz-sampled WAVE file (sweep.wav) containing a linear frequency sweep from 0 Hz to 23.999 kHz. A sweep rate of 1 kHz/s was used for initial evaluation and 100 Hz/s for more detailed testing. This allows the instantaneous frequency to be known from the time (provided the sweep time is long enough). The input files for this test and the others below were generated by a custom C program. The signal level for the sweep was set at -1 dBFS (so that any pass-band ripples will not clip).

The following resampling program commands were used:

The above commands all use default settings for resampling to 44.1 kHz. Note that SoX has three resamplers - "rate", "resample" and "polyphase" of which only the polyphase method is tested.

2.2 Results

Figure 1 (below) shows the "frequency responses" of all six resamplers (in the same order as above) with the 1 kHz/s sweep. The application used to display the sweep is Audacity. The horizontal scale is in seconds but may be read as kHz. The vertical scale is linear amplitude.

Frequency response of six resamplers

Figure 1. Frequency response of six resamplers (1 kHz/s sweep)

Figure 2 (below) shows the "frequency responses" of the three best resamplers (see the discussion below) with the 100 Hz/s sweep. The horizontal scale starts at 21.7 kHz on the left and the cursor is positioned at the Nyquist frequency - 22.05 kHz. The vertical scale is logarithmic amplitude this time (dBFS).

Frequency response of the three best resamplers

Figure 2. Frequency response of the three best resamplers (100 Hz/s sweep)

The execution times on a 2.4 GHz AMD Athlon XP for resampling a 2,500 second stereo 16-bit WAVE file from 48 kHz to 44.1 kHz were:

2.3 Discussion

In theory the frequency responses in figure 1 should be flat up to a little less than 22.05 kHz and then should drop as rapidly as possible to zero at 22.05 kHz and above. In practice a frequency response flat to 20 kHz is quite sufficient for an audio re-sampler (or a little less than 20 kHz in my case as I can no longer hear that high at my age).

There really should be nothing significant above 22.05 kHz in the frequency responses as this represents aliasing, which is a fault in a resampler. When resampling from 48 kHz to 44.1 kHz the aliasing products shown by this test cannot fall below 20.1 kHz so they will not be audible. However, this test may not show all of the aliasing possible from a resampler.

First of all I can rule out resample which has a -3 dB point at 17.9 kHz.

Also resample seems to have a -1.5 dB default gain for this conversion (this is not seen in figure 1), rather than 0 dB as achieved by the rest. It seems to lack a frequency-dependent amplitude correction factor. It is possible to generate a better filter for resample to use but I think I have easier alternatives. To be fair, resample is the quickest of the batch by a long way although a better filter may slow it down.

On the grounds of their aliasing I can also rule out ResampAudio and sr-convert. (Also resample would fall here if it had not already been eliminated on its frequency response.) Their level of aliasing is sufficiently obvious from the fast frequency sweep that I did not need to look further.

That leaves ssrc, SoX (polyphase) and sndfile-resample, which I have examined using the slower 100 Hz/s sweep (figure 2).

SoX (polyphase) shows a small amount of aliasing in this test. It's sufficiently small, at -71.2 dBFS peak in the range 22.05 kHz to 22.16 kHz and better than -90 dBFS elsewhere. sndfile-resample and ssrc show no visible aliasing - just noise levels of better than -90 dBFS above 22.05 kHz. These three are perfect for most purposes.

Of these three ssrc looks like the best at this point. It has an exceptionally extended flat response, a very sharp cutoff very close to the Nyquist limit at 22.05 kHz, and the lowest RMS noise floor (light red is RMS, dark red is peak in the figures above). Also it runs quickly (and it has an even quicker "--profile fast" option which, while not as flat as the default profile, is still quite good enough for this job). This is the most attractive option compared to the slower SoX (polyphase) or sndfile-resample. If I had to choose a runner-up at this stage it would be sndfile-resample.

3 Intermodulation Distortion

3.1 Method

The intermodulation test was performed on the three best resamplers from section 2.

A 7.5-second long 48 kHz 16 bit WAVE file was generated containing two tones plus dither. The tones were SMPTE-standard IMD test tones of 60 Hz at -5 dBFS RMS and 7 kHz at -17 dBFS RMS). The dither was TPDF noise of 2 LSBs peak-to-peak (-96.3 dBFS RMS). The file was resampled to 44.1 kHz and a FFT-based spectrum analysis was performed using Octave.

3.2 Results

Figure 3. Intermodulation from ssrc

Intermodulation from
    ssrc

In figure 3 (left) it can be seen that the output from ssrc looks quite clean. There are no obvious intermodulation products down to at least -130 dBFS.

Careful examination shows what appears to be a scattering of very low level lines, difficult to distinguish from noise. The largest is a line at 20 kHz (-131 dBFS) but this is not an obvious IM product. Other such lines, all below -132 dBFS, exist at 6,840 Hz, 2,460 Hz etc., although some may just be noise. However, in amongst them there are IMD products such as those at 6940 Hz (-133 dBFS at 7 kHz - 60 Hz) and 7120 Hz (-135 dBFS at 7 kHz + 2*60 Hz).

Figure 4. Intermodulation from sndfile-resample

Intermodulation from sndfile-resample

In figure 4 (left) the result from sndfile-resample shows a number of clearly visible lines which may be intermodulation products. The worst is a pair of lines 120 Hz apart at 19.2 kHz +/- 60 Hz. These are at -103 dBFS (about 86 dB below the -17 dBFS 7 kHz tone). The rest, including a pair of lines at 9.6 kHz +/- 60 Hz, are at about -122 dBFS or better.

The drop in the noise floor above 21 kHz is consistent with the resampler's frequency response (see section 2).

Figure 5. Intermodulation from SoX (polyphase method)

Intermodulation from
    sox

In figure 5 (left) it can be seen that SoX (polyphase method) produces an output which looks clean with no obvious intermodulation products above the noise floor. There are IM products to be found with a careful search, such as that at 20.94 kHz (-136 dBFS at 3*7 kHz - 60 Hz). Analysis of a longer data file would reduce the noise floor and might reveal more, but the levels are very low.

The shape of the noise floor above 20 kHz is consistent with the frequency response in section 2.

3.3 Discussion

Both ssrc and SoX (polyphase method) emerge cleanly from this test. That means there's nothing here to displace ssrc as the best resampler of the batch, nor anything to dent its superlative performance.

However we now know that sndfile-resample has a small defect in its intermodulation performance. Nevertheless this does not seem to be too significant for 16-bit resampling. It matches the small but probably insignificant aliasing defect from SoX (polyphase method), so choosing the runner-up is not now so clear.

4 Aliasing

4.1 Method

The aliasing test was also performed on the three best resamplers from section 2.

A 7.5-second long 48 kHz 16 bit WAVE file was generated containing a 23 kHz tone at -4 dBFS RMS plus dither. The dither was TPDF noise of 2 LSBs peak-to-peak (-96.3 dBFS RMS). The file was resampled to 44.1 kHz. A FFT-based spectrum analysis was performed using Octave.

The primary alias of 23 kHz will be at 21.1 kHz in this test.

Given that the frequency response varies in the aliasing region above 22.05 kHz we might better use multiple tones or high levels of broadband noise to test aliasing. This may be possible in future.

4.2 Results

Figure 6. Aliasing from ssrc

Aliasing from
    ssrc

In figure 6 (left), there appears to be a 21.1 kHz primary alias at -122 dBFS. The worst of the remaining lines (whether due to aliasing or not) is at about -120 dBFS at 15.0 kHz.

Figure 7. Aliasing from sndfile-resample

Aliasing
    from sndfile-resample

Figure 7 (left) shows a primary alias at about -114 dBFS, with the worst of the other lines (whether due to aliasing or not) being about -113 dBFS at 15.4 kHz.

Figure 8. Aliasing from SoX (polyphase method)

Aliasing from
    sox

In figure 8 (left) we see the primary alias at about -115 dBFS and remaining lines (whether due to aliasing or not) at no worse than about -121 dBFS (the line at 9.0 kHz).

4.3 Discussion

With a TPDF dither noise floor at about -96 dBFS we might expect to still hear music a further 15 dB down or so. That's at about -111 dBFS. None of these three resamplers has aliasing above that level (or spurious lines from other effects). However, possibly a better test should be devised than a single tone.

Even the worst performer, sndfile-resample, achieved -113 dBFS which is almost certainly not going to be audible. The performance of SoX (polyphase method) is slightly better and ssrc is several dB better still. Thus ssrc retains its place as the best of the batch; SoX (polyphase method) and sndfile-resample remain as indivisible runners-up.

5 Harmonic Distortion

5.1 Method

The harmonic distortion test was performed on the three best resamplers from section 2.

A 7.5-second long 48 kHz 16 bit WAVE file was generated containing a SMPTE-standard 1 kHz THD test tone at -4 dBFS RMS plus dither. The dither was TPDF noise of 2 LSBs peak-to-peak (-96.3 dBFS RMS). The file was resampled to 44.1 kHz. A FFT-based spectrum analysis was performed using Octave.

5.2 Results

Figure 9. Harmonic distortion from ssrc

Harmonic distortion
    from ssrc

In figure 9 (left) we cannot see any obvious harmonically related spectral lines. A closer look shows harmonically-related lines at 5 kHz (-135 dBFS), 7 kHz (-130 dBFS), 8 kHz (-132 dBFS), 9 kHz (-127 dBFS), 12 kHz (-129 dBFS), 20 kHz (-134 dBFS), 21 kHz (-133 dBFS) and 22 kHz (-135 dBFS). There are some other lines at perhaps -128 dBFS which may be due to other effects.

Note that ssrc warned that clipping by 0.12 dB had occurred while processing the file. Presumably this was because of the high level of the tone at -1 dBFS peak. The clipping can be avoided using the --twopass parameter, and this resulted in a small but not too significant improvement.

Figure 10. Harmonic distortion from sndfile-resample

Harmonic
    distortion from sndfile-resample

Figure 10 (left) shows spurious lines at levels of up to -100 dBFS at 20.2 kHz but these are not clearly harmonic distortion and may be due to other effects.

Figure 11. Harmonic distortion from SoX (polyphase method)

Harmonic distortion
    from sox

In figure 11 (left) the result is clear of obvious harmonic content, but a close look shows harmonically-related lines across the spectrum at -131 dBFS and below. However some of these could be noise peaks since they are so low in level.

There are also a few other identifiable lines at up to -130 dBFS but these are not clearly harmonics of 1 kHz.

5.3 Discussion

Probably we have detected only very low levels of harmonic distortion in all of these resamplers. The spurious products at higher levels may be due to other mathematical effects such as clipping, rounding or truncation.

The worst performer, whether its spurious outputs are harmonic distortion or not is sndfile-resample. It achieved -100 dBFS which, although within the normal levels of audio noise, may possibly be at a higher level than the lowest audible level for music. The performances from SoX (polyphase method) and ssrc are about the same, with their worst spurious products at about -130 dBFS and -127 dBFS respectively.

Yet again ssrc retains its place as the best of the batch. Perhaps SoX (polyphase method) nudges ahead as runner-up because of sndfile-resample's relatively poor show here.

John A. Phillips, 21st August 2005.