Multimedia Signal Processing
Digitization of Audio

Thorsten Thormählen
May 29, 2020
Part 4, Chapter 1

This is the print version of the slides.

Advance slides with the → key or
by clicking on the right border of the slide

Control Keys

→ move to next slide (also Enter or Spacebar).
← move to previous slide.
d enable/disable drawing on slides
p toggles between print and presentation view
CTRL + zoom in
CTRL - zoom out
CTRL 0 reset zoom

Slides can also be advanced by clicking on the left or right border of the slide.

Notation

Type	Font	Examples
Variables (scalars)	italics	$a, b, x, y$
Functions	upright	$\mathrm{f}, \mathrm{g}(x), \mathrm{max}(x)$
Vectors	bold, elements row-wise	$\mathbf{a}, \mathbf{b}= \begin{pmatrix}x\\y\end{pmatrix} = (x, y)^\top,$ $\mathbf{B}=(x, y, z)^\top$
Matrices	Typewriter	$\mathtt{A}, \mathtt{B}= \begin{bmatrix}a & b\\c & d\end{bmatrix}$
Sets	calligraphic	$\mathcal{A}, B=\{a, b\}, b \in \mathcal{B}$
Number systems, Coordinate spaces	double-struck	$\mathbb{N}, \mathbb{Z}, \mathbb{R}^2, \mathbb{R}^3$

Digitization of Audio

Sound Propagation

A sound source (such as our voice, an instrument, or a passing car) sets an elastic medium (such as air or water) in motion
This causes a wave propagation in which the direction of oscillation corresponds to the direction of propagation (longitudinal wave)
Analogy: Stone falls into water
The propagation speed of the sound wave (speed of sound) in air is 343,2 m/s or 1236 km/h
Effects such as absorption, reflection, scattering, diffraction, etc. can occur

The Human Ear

Stirrup

Anvil

Hammer

Semicircular canals

Hearing nerve

Cochlea

Ear canal

Auricle

Eardrum

16 kHz

6 kHz

0.5 kHz

Image source: based on Chittka L, Brockmann A (2005) Perception Space—The Final Frontier. PLoS Biol 3(4): e137. (CC-BY)

The Human Ear

The sound wave hits the eardrum and is mechanically amplified via the anvil, hammer and stirrup and transmitted to the cochlea
Depending on the frequency of the sound wave, a resonance forms in a certain area in the liquid-filled chambers of cochlea (high frequencies at the beginning and low frequencies at the end of the channel)
The hair cells on the basilar membrane along the cochlear canal are mechanically stimulated and transmit the information via the auditory nerve to the auditory cortex in the brain
The ear therefore performs a kind of frequency transformation
The audible frequency range is from around 20 Hz to 20 kHz (depending on age)
The hair cells can be irreparably damaged by high sound stress (amplitude and duration of exposure)

Sound Pressure Level

The sound pressure level (SPL) is commonly used to measure the volume of certain sounds
The sound pressure level $L_p$ can be calculated from the logarithmic ratio of the RMS value of the sound pressure waves $\bar{p}_{\mathrm{rms}}$ and the reference value of the hearing threshold $\bar{p}_0$ of the human ear at a frequency of 1 kHz:
$L_p = 20 \log\left(\frac{\bar{p}_{\mathrm{rms}}}{\bar{p}_0}\right) \mathrm{dB}$
The sound pressure level uses the pseudo unit decibel (dB)
Damage to the ear occurs from approx. 120 dB for short-term exposure and from approx. 85 dB for long-term exposure

Sound:	Rifle	Trumpet	Nightclub	TV	Conversation	Leaf rustling
Distance:	1 m	0.5 m	1 m	1 m	1 m	at the ear
Sound pressure level:	170 dB	130 dB	100 dB	60 dB	40-60 dB	10 dB

Quelle: Wikipedia

Reference values for decibel specifications

In audio systems, the pseudo-unit decibel is used for various things. It always depends on the reference value. Some examples:
- The signal-to-noise ratio is the ratio between signal and noise power:
  $\mathrm{SNR} = 20 \log\left(\frac{\bar{p}_{\mathrm{signal}}}{\bar{p}_{\mathrm{noise}}}\right) \mathrm{dB}$
- dBFS: Ratio of the current amplitude to the maximum amplitude (decibels relative to full scale). With a digital 16-bit audio signal, a maximum amplitude of +32767 to -32768 is possible:
  $20 \log\left(\frac{|A|}{32768}\right) \mathrm{dBFS}$
  0 dBFS is the maximum amplitude, -6 dBFS half the maximum amplitude, -12 dBFS a quarter of the maximum amplitude and so on
  
  For signal levels above 0 dBFS, clipping occurs with digital audio signals (limitation to the maximum value). Therefore, 0 dBFS should not be exceeded.

Microphones

A microphone converts sound pressure waves into an analog electrical voltage
Typically, the movement of a thin membrane is measured, which is driven by the sound pressure wave
Important characteristics of a microphone are its frequency response and polar pattern

Frequency Response

Polar Pattern

Dynamic Microphones

Moving Coil Microphone

With dynamic microphones, it is not the displacement of the diaphragm that is measured, but its change in position (speed)
A widely used, robust and cost-effective design is the moving coil microphone (see image)
The movement of the coil in the magnetic field of a permanent magnet causes an induction of voltage
An external power supply is not required
Since the diaphragm is connected to the coil, both must be moved by the sound pressure waves, which is why moving coil microphones do not reproduce high frequencies so well
Particularly suitable for loud sound sources (such as live singing or wind instruments)

Condenser Microphone

In a condenser microphone, a thin conductive diaphragm is used as one plate of a plate condenser
Changing the position of the diaphragm changes the capacitance of the capacitor
By connecting a voltage source and a parallel high-impedance resistor, the capacitance change of the capacitor can be converted into a voltage change
However, the output voltage cannot be used directly. A impedance converter (amplifier circuit) is required within the microphone.
Since condenser microphones require a power source, they must be powered either by batteries or a 48-volt phantom power supply from the mixer/audio interface
Due to the low mass of the diaphragm, condenser microphones are very sensitive (good for high-frequency signals and distant sound sources)
Because of their good sound quality, they are particularly popular for studio recordings (since external noise can be avoided in a studio)

Unbalanced and balanced signal transmission

When wiring instruments and microphones to the mixing console or audio interface, long cables are often used. Long cables are sensitive to noise and interference.
For unbalanced transmission of a mono audio signal, only two lines (ground and signal) are required
As the ground of all devices is connected, it is the common potential reference point. If there is interference on a cable, the interference may become audible.
For balanced transmission of a mono audio signal, three lines are used (hot, cold and ground)
At the audio source, the inverted hot signal is used as the cold signal
At the receiver, the difference is formed and thus additive errors are eliminated

Signal

Ground

Hot

Cold

Ground

unbalanced

balanced

Differential amplifier

Audio Cables

XLR connector male

XLR connector female

1⁄4 Inch Jack (Tip, Sleeve)

⅛ Inch Jack (Tip, Ring, Sleeve)

RCA connector

Audio Cables

XLR plugs and sockets are often used for balanced audio transmission
Jack plugs are available in different sizes:
- 6,3 mm (1⁄4 Inch Jack)
- 3,5 mm (⅛ Inch Jack)
- 2.5 mm (⅒ Inch Jack)
Jack plugs are available in mono (ground and signal) or stereo versions (ground, left and right channel)
Jack plugs and jack sockets are typically used for unbalanced audio transmission
But beware: a few audio devices also use stereo jacks for balanced mono audio transmission
RCA plugs and sockets are used for unbalanced audio transmission (typically line level)

Audio Cable and Signal Level

Microphone level: Weakest signal with amplitudes of a few millivolts
- The audio signal is typically transmitted to the audio interface/mixing console using balanced XLR cables
Instrument level: between microphone and line level
- E.g. the output signal of an electric guitar
- Signal is typically transmitted unbalanced via jack cable
Line level: Amplitudes from approx. 0.5 to a maximum of 2.0 volts:
- This level is typically expected from the audio systems
- Microphone level and instrument level must be increased to line level by a preamplifier
- dBu: Here the reference value is an effective value of 0.775 V, i.e.
  $0 \,\mathrm{dBu}\, \mapsto 0.775\, \mathrm{V} $
- dbV: Here the reference value is an effective value of 1 V, i.e.
  $0 \,\mathrm{dBV}\, \mapsto 1.0\, \mathrm{V} $

Analog-to-Digital Conversion within an Audio Interface

analog signal

digital signal

Preamplifier

Anti-aliasing filter

Sample & Hold

Quantizer

Signal

Digitized signal

Time

Preamplifier

The task of the preamplifier is to adjust the amplitude of the input signal so that the available amplitude range of the digital signal is utilized as best as possible
The "leveling" of the preamplifier is usually carried out manually in the professional sector, as automatic approaches do not know what signal levels to expect
Level too high: Clipping occurs above 0 dbFS
Level too low: quantization noise is noticeable

Anti-Aliasing-Filter

To comply with the sampling theorem, an analog low-pass filter is used before sampling
The low-pass filter must be selected so that its cut-off frequency corresponds to half the sampling frequency $F_a$

Sample & Hold

The sample-and-hold circuit (see below) is used to keep the analogue input value constant during subsequent quantization
The electronic switch (transistor) is closed for a short time. The capacitor charges and adopts the current voltage value of the input signal (sample). The switch is then opened again.
When the switch is open, it is the capacitor's job to keep the voltage value constant (hold)
This process is repeated with the sampling frequency $F_a$

Operational amplifier

(Impedance converter)

Switch (Transistor)

F_a

Operational amplifier

Input signal

Output signal

Quantizer

Quantizers with at least 16-bit resolution are typically used for audio processing
Nowadays, A/D conversion in audio signal processing is therefore almost exclusively based on delta-sigma modulation
The idea is not to operate many analog comparator circuits in parallel (at 16 bits this would require 2¹⁶ = 65536) but only one comparator operating at a multiple of the sampling frequency
The error of the comparator is added up in each time step until it is again above or below the decision threshold of the comparator
A subsequent digital counter can then determine how often the detection threshold was exceeded or missed in order to determine the quantized digital value
Example: With a sampling rate of 48000 Hz and a 16-bit resolution, the delta-sigma modulator must be operated at 48000 Hz · 2¹⁶ = 3.15 GHz
A first-order delta sigma modulator is presented in the following. In practice, higher order delta sigma modulators (i.e. with several integration stages) are used, which have a better signal-to-noise ratio. However, the principle remains the same.

Quantizer: Delta-Sigma-Modulation

The analog comparator is a transistor circuit that implements the following function:
$V_{\mathrm{out}}= \begin{cases} 1 & \,\,:\,\, V_{\mathrm{in}} \ge 0.0 \\ 0 & \,\,:\,\,V_{\mathrm{in}} < 0.0 \\ \end{cases}$
where $1$ represents the value for the digital one.
The D-flip-flop (known from my Technical Computer Science lecture) realizes a delay of one clock cycle
The 1-bit D/A converter outputs:
$V_{\mathrm{out}}= \begin{cases} V_{\mathrm{max}} & \,\,:\,\, V_{\mathrm{in}} \ge 0.5\\ -V_{\mathrm{max}} & \,\,:\,\,V_{\mathrm{in}} < 0.5\\ \end{cases}$
where $0.5$ is the threshold value between the digital one and the digital zero.

Quantizer: Delta-Sigma-Modulation Example

Linear Pulse Code Modulation (LPCM)

The output of the quantizer is a so-called Pulse Code Modulation (PCM), i.e. a data stream of digital ones and zeros
With a 16-bit quantizer, 2¹⁶ = 65536 different 16-bit binary code words are available, which can be assigned to the individual quantization levels
Signed numbers in two's complement are often used (see my Technical Computer Science lecture)
With the 16-bit two's complement, this gives a value range from +32767 to -32768

Linear Pulse Code Modulation (LPCM)

Example for 3-bit LPCM:

Serial

PCM Signal

Code Word

(Two's complement)

Linear Pulse Code Modulation (LPCM)

Audio CD
- A signed 16-bit LPCM signal with a sampling rate of 44100 Hz is used for an audio CD ("Compact Disc Digital Audio")
- In alternating sequence, 16 bits are written for the left channel and then 16 bits for the right channel
- The effective data rate is therefore 2 · 16 bits · 44100 1/s = 1.41 MBit/s
DVD-Audio
- Also uses a signed LPCM signal
- Various quantizations are possible: 16 / 20 / 24 bits per sample
- Various sampling rates: 44.1 / 48 / 88.2 / 96 / 176.4 / 192 kHz
- 1 channel to 6 channels (maximum data rate: 9.6 MBit/s)
WAV Audio Files
- The widely used WAV format ("Waveform Audio File Format") is a container that can contain various audio formats
- In practice, WAV files usually contain uncompressed LPCM audio data

Non-Linear Pulse Code Modulation

PCM also works with non-linear quantization intervals
For this purpose, a compressor is inserted before the linear quantizer
The aim is to adapt to human hearing, which perceives quiet sounds more sensitively than loud sounds (logarithmic)
The amplitude range is divided into segments, each of which is then quantized linearly
Example: Digital telephone transmission (ISDN)
- 1 bit sign, 3 bit segment, 4 bit quantization value
- Sampling rate: 8 kHz
- Data rate: 8 Bit · 8 kHz = 64 kBit/s
A corresponding expander must then be applied to the receiver side after the D/A conversion (as a counterpart to the compressor)

Anti-Aliasing Filter

(Low pass)

A/D Converter

D/A Converter

Compressor

Expander

Reconstruction Filter

(Low pass)

Transmitter

Receiver

Telephone channel

Non-Linear Pulse Code Modulation: Compressor

1 bit for sign

3 bits for segment code

4 bits per segment

Digital-to-Analog Conversion

Analog signal after low-pass filtering

Analog step function

Digital representation

As known from the chapter “Sampling Theorem”, for a digital-to-analog conversion we would have have to convert the digital signal into anaologous Dirac pulses with a corresponding height and then apply a reconstruction filter (low-pass)
Of course, this is not feasible in an ideal way in practice
Instead, we could use a step function with a corresponding number of voltage levels, which is then smoothed with a reconstruction filter

Digital-to-Analog Conversion

Such an analog step function could be created with an resistor ladder, but for higher bit depths the accuracy of the resistors must be extremely high
Therefore, in practice, delta sigma modulators are also used for D/A conversion. They can be operated in the GHz range and generate a very high-frequency high-low analog signal that generates the desired analog audio signal when filtered with an analog low-pass filter (e.g., cut-off frequency 20 kHz)

Output of the delta-sigma modulation

Low pass

Analog signal

Are there any questions?

Please notify me by e-mail if you have questions, suggestions for improvement, or found typos: Contact

Multimedia Signal ProcessingDigitization of Audio

Control Keys

Notation

Digitization of Audio

Sound Propagation

The Human Ear

The Human Ear

Sound Pressure Level

Reference values for decibel specifications

Microphones

Dynamic Microphones

Condenser Microphone

Unbalanced and balanced signal transmission

Audio Cables

Audio Cables

Audio Cable and Signal Level

Analog-to-Digital Conversion within an Audio Interface

Preamplifier

Anti-Aliasing-Filter

Sample & Hold

Quantizer

Quantizer: Delta-Sigma-Modulation

Quantizer: Delta-Sigma-Modulation Example

Linear Pulse Code Modulation (LPCM)

Linear Pulse Code Modulation (LPCM)

Linear Pulse Code Modulation (LPCM)

Non-Linear Pulse Code Modulation

Non-Linear Pulse Code Modulation: Compressor

Digital-to-Analog Conversion

Digital-to-Analog Conversion

Are there any questions?

Multimedia Signal Processing
Digitization of Audio