Audio sample-rate conversion (SRC) on Cortex-M.

Problem

Data rate adaptation is a problem found in several situations, for example :

- when the rate of the data consumption an production are different. For example in a voice communications channel using 8kHz sampling rate and a multimedia mixer using 48kHz.

- when a data stream is exchanged between two systems having clocking schemes not synchronised.

- to down-sample an audio stream to save the recording memory space.

Searching for “sampling rate conversion” on your favorite web-crawler gives more than one million results. The reason is the variety of solutions depending on use-case and the focus on specific problems among many:

1) Out-of-band attenuation, spectral folding and the problem of minimal distortion

2) Delay and delay distortion

3) Non-integer ratio and the problem of drift compensation

1) Out-of-band attenuation, spectral folding and the problem of minimal distortion

Unless you know your original signal has no high-frequency content, down-sampling starts with a low-pass filtering H(z) followed by a decimation of the resulting stream (taking one sample over 6 for example). During decimation, the high-frequency signal energy is folded back to low-frequency bands. Any energy in high-frequency not being correctly filtered translates in signal distortion.

Telephony standards give the maximum distortion requirements, which translates to 35dB minimum attenuation starting from 3400Hz. Microsoft Skype Logo certification tells the distortion must stay below -65dB. Android Compatibility Definition “CDD” tells the distortion on the microphone path should stay better than 1% (THD+N < -40dB)

For multimedia audio recorders the distortion criterion is much strict and the filter attenuation is often larger than 90dB (when our ears can hardly detect distortions 70dB below peak levels, but this is another discussion).

On the digital to analog conversion side the interpolator’s frequency images should be attenuated in order not to be heard. For example a 1000Hz sine wave sampled at 8000Hz will have images at 7000Hz, 9000Hz, 15000Hz, 17000Hz,.., when up-sampled to 48kHz. The other good reason to remove images is power efficiency: the signal images will appear on your loudspeakers added like a modulation, which is limiting the dynamic range of the analog output before clipping.

The audio file below is a concatenation of one second of 997Hz sampled at 48kHz, followed by one second of the same signal sampled at 8kHz and up sampled at 48kHz without post-filtering.

The above picture gives the spectrum of the original and unfiltered waves. The frequency images can clearly be listened in the file here: (link)

2) Delay and delay distortion

Voice communication is altered with delay. The communication becomes half-duplex when delay increases. The audio converters are allocated a piece of the full path delay budget. Standards give two recommendations for delays: the absolute delay (measured at 1600Hz) must stay below 0.5ms and the variation of delay between 1000Hz and 2600Hz must stay below 0.13ms to avoid corrupting the performances of the echo cancellers. Those requirements make symmetric FIR filters incompatible for the absolute delay constraints. And the delay distortion requirement makes high-order IIR filters almost not applicable.

Audio delay (latency) is also a critical parameter in the context of live music and for professional musical devices. Android CDD recommends now to have an analog input to analog output round-trip delay below 10ms.

3) Non-integer ratios and the problem of drift compensation

Some other specific solutions must be found when the data rate must be adjusted by steps of less than 1%. For example the source of audio samples is drifting from the clocking scheme used in the D/A converter. Or to compensate for the last frequency errors made during a 44.1kHz to 48kHz after implementing a rate converter using the 12/11 approximation of 48/44.1.

References

- The Android compatibility definition. (link)

- Interpolation and Decimation of Digital Signals - A Tutorial Review, R E. CROCHIERE, PROCEEDINGS OF THE IEEE, VOL. 69, NO. 3, MARCH 1981 (link)

- A Digital Signal Processing Approach to Interpolation, R. SCHAFER, PROCEEDINGS OF THE IEEE, VOL. 61, NO. 6, JUNE (link)

- Distortion effects from aliasing in digital audio, MikaelVest and Peter Scheelke, Digital Audio Denmark (link)

- Intel Corporation and Microsoft Corporation - PC2001 system design guide - audio chapter 11 (link)

- Microsoft Windows Logo Program System and Device Requirementss (link)

- Permissible value of group delay distortion on tone quality du to low-pass filter, Y. Hoshino, T. Takegahara from NHK, ICASSP86, (link)

- Requirements for loudspeaker crossover networks, J. Robert Ashley, Sperry Corp, ICASSP84, (link)

- 3GPP 26.131 Terminal acoustic characteristics for telephony (link)

- 3GPP 26.132 Speech and video telephony terminal acoustic test specification (link)

Solutions

At Firmware-Developments we have cumulated years of expertise in those rate-conversion topics, both on the problems of signal quality, standards and fixed-point implementations.

You can contact us to tune for you the following programs:

An arbitrary sample-rate-converter (ASRC), you can use for example to converter any audio frequencies {8, 11.025, 12, 16, 22.05, 24, 32, 44.1, 48kHz}, or any other sampling-rate frequency. We provide for free an executable version of this program (link). This demonstrator ASRC has two configurations : one optimised for the minimum CPU footprint using 16bits arithmetics, and one high-resolution configuration (link)
A drift compensator optimised for drifts in the range of +/- 1%, with a very low CPU and memory consumption. The THDN/SNR varies with the drift compensation. For example at 0.01%, 0.1% and 1% drift the distortion is respectively: 95dB, 76dB and 44dB. The algorithm complexity is measured at 18 cycles/sample on Cortex-M4, and estimated to 30 cycles/sample on Cortex-M0. (link)

Tools

We compute SNR in audio bands with this tool : (THDN) Please contact us at contact@firmware-developments.com if you have specific needs for the tool.

The program can take several types of input format and sampling rates. It delivers SNR in dB, dB(A) and dB(CCIR). It can cope with signal distortions related to jitter and clipping. It was qualified on real hardware.

Contacts

Websites : firmware-developments.com and twitter.com/fw_devs

Firmware Developments email : contact @ firmware-developments.com

Phone Number +33 698 846 090

Address : “Les Alcyons”, 5b Av. de l’Ilette, 06600 Antibes, France.