Adaptive Filtering

Filters that learn: from system identification to noise cancellation

Nature’s adaptive filter

The electric fish Eigenmannia generates a weak electric field and senses distortions caused by nearby objects. When a neighbour’s signal interferes, the fish shifts its own frequency to avoid jamming: a biological frequency-tracking system that adjusts in real time based on detected interference (Heiligenberg 1991). The LMS and NLMS algorithms in this topic implement the same strategy digitally: adjust filter coefficients to minimize an error signal, converging on the optimal solution without knowing the statistics in advance.

Conventional filters have fixed coefficients chosen at design time. But what if the system you need to filter is unknown, or changes over time? Adaptive filters solve this by adjusting their coefficients automatically, driven by an error signal that measures how well the filter is performing.

Adaptive filtering underpins technologies we use daily: noise-cancelling headphones, echo cancellation in phone calls, channel equalisation in wireless communications, and active vibration control. The underlying algorithms are elegant and surprisingly simple.

Prerequisites

This topic assumes familiarity with FIR filters, autocorrelation, and basic z-domain concepts.

The setup

An adaptive filter has four signals:

$x[n]$: the input signal
$d[n]$: the desired (reference) signal
$y[n] = \mathbf{w}^T \mathbf{x}[n]$: the filter output, where $\mathbf{w}$ is the coefficient vector and $\mathbf{x}[n] = [x[n], x[n-1], \ldots, x[n-N+1]]^T$
$e[n] = d[n] - y[n]$: the error signal

The algorithm adjusts $\mathbf{w}$ to minimise some function of the error. What makes adaptive filtering powerful is that the same algorithm structure serves completely different applications. Only the signal routing changes.

The Wiener solution

Before diving into algorithms, it helps to know what the optimal solution looks like. For a stationary input, the FIR filter that minimises the mean squared error $E\{|e[n]|^2\}$ is given by the Wiener-Hopf equation:

\[\mathbf{w}_\text{opt} = \mathbf{R}_{xx}^{-1} \mathbf{p}_{xd}\]

where $\mathbf{R}_{xx} = E\{\mathbf{x}[n]\mathbf{x}[n]^T\}$ is the input autocorrelation matrix and $\mathbf{p}_{xd} = E\{\mathbf{x}[n] d[n]\}$ is the cross-correlation vector between input and desired signal.

In practice, we rarely know $\mathbf{R}_{xx}$ and $\mathbf{p}_{xd}$ in advance, and even if we did, the matrix inverse is expensive to compute and must be recomputed whenever statistics change. Adaptive algorithms approximate this solution iteratively, one sample at a time.

LMS: the least mean squares algorithm

The LMS algorithm (Widrow and Hoff 1960) replaces the true gradient of the MSE cost surface with an instantaneous estimate. At each time step:

\[\mathbf{w}[n+1] = \mathbf{w}[n] + \mu \, e[n] \, \mathbf{x}[n]\]

where $\mu$ is the step size (learning rate). The update rule is remarkably simple: multiply the error by the input vector and take a small step.

Convergence. LMS converges in the mean if $0 < \mu < 2 / \lambda_\text{max}$, where $\lambda_\text{max}$ is the largest eigenvalue of $\mathbf{R}_{xx}$. A safe practical bound is:

\[0 < \mu < \frac{2}{N \cdot P_x}\]

where $N$ is the filter length and $P_x$ is the input signal power. Smaller $\mu$ gives more accurate convergence but slower adaptation.

Eigenvalue spread. When the eigenvalues of $\mathbf{R}_{xx}$ are spread widely apart (large condition number), LMS converges slowly because the step size is limited by the largest eigenvalue but adaptation along the smallest eigenvalue direction is sluggish. This is a fundamental limitation.

Note

The gradient descent here works because the MSE surface of an FIR filter is a single convex bowl. For IIR filters the coefficients enter the denominator and the surface becomes non-convex, with local minima that trap a gradient. The PSO for filter design topic takes on that case with a gradient-free swarm.

NLMS: normalised LMS

The NLMS algorithm removes LMS’s sensitivity to the input power level by normalising the step size by the instantaneous input power (the slow convergence under a large input-correlation eigenvalue spread is a separate problem, the one RLS below addresses):

\[\mathbf{w}[n+1] = \mathbf{w}[n] + \frac{\mu}{\varepsilon + \|\mathbf{x}[n]\|^2} \, e[n] \, \mathbf{x}[n]\]

where $\varepsilon$ is a small constant preventing division by zero. The effective step size adapts to the signal level, giving convergence that is largely independent of the input power. NLMS converges for $0 < \mu < 2$.

NLMS is the workhorse of practical adaptive filtering. It is only marginally more expensive than LMS (one extra dot product per sample) but significantly more robust.

RLS: recursive least squares

Where LMS uses instantaneous gradient estimates, RLS (Haykin 2002) minimises a weighted least squares cost over all past data:

\[J[n] = \sum_{i=0}^{n} \lambda^{n-i} |e[i]|^2\]

The forgetting factor $\lambda$ (typically 0.95 to 1.0) discounts older data exponentially, allowing the filter to track non-stationary environments. A filter that weighs every past sample equally cannot follow a world that keeps moving; tracking change means letting the past count for a little less with each step.

The update equations involve a gain vector $\mathbf{k}[n]$ and an inverse correlation matrix $\mathbf{P}[n]$:

\[\mathbf{k}[n] = \frac{\mathbf{P}[n-1]\,\mathbf{x}[n]}{\lambda + \mathbf{x}[n]^T\,\mathbf{P}[n-1]\,\mathbf{x}[n]}\]

\[\mathbf{w}[n] = \mathbf{w}[n-1] + \mathbf{k}[n]\,e[n]\]

\[\mathbf{P}[n] = \frac{1}{\lambda}\left(\mathbf{P}[n-1] - \mathbf{k}[n]\,\mathbf{x}[n]^T\,\mathbf{P}[n-1]\right)\]

RLS converges much faster than LMS/NLMS, often in $N$ to $2N$ samples rather than hundreds. The cost is $O(N^2)$ computation per sample versus $O(N)$ for LMS, which matters for long filters.

Applications

System identification

The most direct application: estimate the impulse response of an unknown system. Drive both the unknown system and the adaptive filter with the same input signal. The adaptive filter converges to the unknown system’s impulse response.

Figure 1: System identification: the unknown system and the adaptive filter share the same input. The filter adapts until its output y[n] matches d[n] and the error e[n] vanishes. The dashed path is the adaptation: the error drives the coefficient updates.

This is useful for measuring room impulse responses, characterising communication channels, or calibrating sensors.

Adaptive noise cancellation

Noise-cancelling headphones use this principle. A reference microphone picks up ambient noise. An adaptive filter learns the transfer function from the reference to the primary microphone, and subtracts the estimated noise.

Figure 2: Adaptive noise cancellation: a reference mic captures the noise, the adaptive filter models the acoustic path to the primary mic, and its estimate is subtracted to leave the clean signal. Dashed lines: the acoustic coupling and the error-driven adaptation.

The error signal $e[n]$ is the cleaned signal: everything in $d[n]$ that is not correlated with the noise reference. The key assumption is that the desired signal (speech, music) is uncorrelated with the noise source.

Acoustic echo cancellation

In hands-free telephony, the far-end speaker’s voice plays through the loudspeaker, bounces around the room, and is picked up by the microphone, creating an echo. An adaptive filter estimates the room impulse response (the echo path) and subtracts the predicted echo.

Figure 3: Acoustic echo cancellation: the far-end signal also drives an adaptive filter that estimates the room echo path; subtracting the estimate removes the echo before the signal returns to the far end. The dashed path feeds the error back to adapt the filter.

AEC is challenging because room impulse responses are long (hundreds of taps at 8 kHz), non-stationary (people move), and complicated by double-talk (both parties speaking simultaneously). These challenges motivate frequency-domain adaptive filters (BFDAF) and double-talk detectors, topics beyond this introduction.

Audio applications: acoustic echo cancellation

Acoustic echo cancellation (AEC) is a textbook application of adaptive system identification, and one where algorithm choice matters. In a speakerphone or video conferencing system, the far-end voice plays through the loudspeaker, reflects off walls, furniture, and people, and arrives at the microphone as an attenuated, delayed, and coloured copy of the original. The adaptive filter must model the room impulse response (RIR) (the transfer function from loudspeaker to microphone) and subtract the predicted echo in real time.

Why NLMS dominates over LMS in audio AEC:

Varying input power across frequencies. Speech and music have strong spectral tilt: much more energy below 1 kHz than above 4 kHz. This gives the input autocorrelation matrix a large eigenvalue spread, which is exactly the condition that makes LMS converge slowly (the step size is limited by the dominant low-frequency components while high-frequency modes adapt sluggishly).
NLMS normalises per-sample, making convergence speed independent of the input level. This is critical because speech alternates between loud vowels and quiet consonants, and silence gaps reset the effective SNR.
RIR length. Typical room impulse responses are 50 to 300 ms, requiring 400 to 2400 taps at 8 kHz. At these filter lengths, RLS is prohibitively expensive ($O(N^2)$), so NLMS remains the practical choice. Frequency-domain variants (BFDAF, partitioned-block FDAF) extend this to $O(N \log N)$ for very long filters.

The multichannel extension, multichannel AEC (MCAEC), arises in stereo or surround-sound conferencing, where multiple loudspeakers create multiple echo paths. The core difficulty is that the loudspeaker signals are highly correlated (they carry the same far-end speech), making the combined autocorrelation matrix ill-conditioned. Decorrelation techniques (half-wave rectification, frequency-bin interleaving) are needed to make the adaptive filter converge. This connects directly to the system identification setup above, extended to MIMO.

Comparison

Property	LMS	NLMS	RLS
Computation	$O(N)$	$O(N)$	$O(N^2)$
Memory	$O(N)$	$O(N)$	$O(N^2)$
Convergence speed	Slow	Moderate	Fast
Sensitivity to input power	High	Low	Low
Tracking ability	Moderate	Good	Excellent
Numerical stability	Good	Good	Can diverge

For most practical applications, NLMS is the default choice. Use RLS when convergence speed matters more than computational cost, or when the filter length is short enough that $O(N^2)$ is acceptable.

Implementation

See adaptive.py for clean implementations of all three algorithms. Each class provides a sample-by-sample update() method for real-time use and a batch run() method for offline processing.

The identify_system() convenience function demonstrates the system identification scenario with configurable algorithm, SNR, and filter length.

Open questions

Step size selection. Optimal $\mu$ depends on signal statistics that are usually unknown. Variable step-size algorithms (VSS-LMS, VSS-NLMS) adapt $\mu$ over time, but add complexity and their own tuning parameters. There is no universally satisfying solution.
Tracking vs steady-state. Fast tracking (small forgetting factor in RLS, large $\mu$ in LMS) gives higher steady-state error. The optimal trade-off depends on how quickly the environment changes, which is itself unknown.
Long filters. AEC requires hundreds of taps. Frequency-domain adaptive filters (BFDAF, partitioned-block FDAF) reduce per-sample computation from $O(N)$ to $O(\log N)$ using FFTs. This is standard practice but adds latency.
Nonlinear echo paths. Loudspeaker distortion creates nonlinear echo components that linear adaptive filters cannot cancel. Kernel-based and neural network approaches exist but are computationally expensive.

For hardware implementations on STM32F4 and ESP32, see Adaptive Filtering on Hardware.

References

Haykin, Simon. 2002. Adaptive Filter Theory. 4th ed. Prentice Hall.

Heiligenberg, Walter. 1991. Neural Nets in Electric Fish. MIT Press.

Widrow, Bernard, and Marcian E. Hoff. 1960. “Adaptive Switching Circuits.” IRE WESCON Convention Record 4: 96–104.

--- title: "Adaptive Filtering" subtitle: "Filters that learn: from system identification to noise cancellation" --- ::: {.callout-tip title="Nature's adaptive filter" appearance="simple"} The electric fish *Eigenmannia* generates a weak electric field and senses distortions caused by nearby objects. When a neighbour's signal interferes, the fish shifts its own frequency to avoid jamming: a biological frequency-tracking system that adjusts in real time based on detected interference [@heiligenberg1991neural]. The LMS and NLMS algorithms in this topic implement the same strategy digitally: adjust filter coefficients to minimize an error signal, converging on the optimal solution without knowing the statistics in advance. ::: Conventional filters have fixed coefficients chosen at design time. But what if the system you need to filter is unknown, or changes over time? **Adaptive filters** solve this by adjusting their coefficients automatically, driven by an error signal that measures how well the filter is performing. Adaptive filtering underpins technologies we use daily: noise-cancelling headphones, echo cancellation in phone calls, channel equalisation in wireless communications, and active vibration control. The underlying algorithms are elegant and surprisingly simple. ::: {.callout-note title="Prerequisites"} This topic assumes familiarity with [FIR filters](../../basics/06-filter-design.qmd), [autocorrelation](../../basics/03-noise-snr.qmd#autocorrelation), and basic [z-domain](../../basics/04-z-domain.qmd) concepts. ::: <hr> ## The setup An adaptive filter has four signals: - **$x[n]$**: the input signal - **$d[n]$**: the desired (reference) signal - **$y[n] = \mathbf{w}^T \mathbf{x}[n]$**: the filter output, where $\mathbf{w}$ is the coefficient vector and $\mathbf{x}[n] = [x[n], x[n-1], \ldots, x[n-N+1]]^T$ - **$e[n] = d[n] - y[n]$**: the error signal The algorithm adjusts $\mathbf{w}$ to minimise some function of the error. What makes adaptive filtering powerful is that the same algorithm structure serves completely different applications. Only the signal routing changes. <hr> ## The Wiener solution Before diving into algorithms, it helps to know what the *optimal* solution looks like. For a stationary input, the FIR filter that minimises the mean squared error $E\{|e[n]|^2\}$ is given by the **Wiener-Hopf equation**: $$\mathbf{w}_\text{opt} = \mathbf{R}_{xx}^{-1} \mathbf{p}_{xd}$$ where $\mathbf{R}_{xx} = E\{\mathbf{x}[n]\mathbf{x}[n]^T\}$ is the input autocorrelation matrix and $\mathbf{p}_{xd} = E\{\mathbf{x}[n] d[n]\}$ is the cross-correlation vector between input and desired signal. In practice, we rarely know $\mathbf{R}_{xx}$ and $\mathbf{p}_{xd}$ in advance, and even if we did, the matrix inverse is expensive to compute and must be recomputed whenever statistics change. Adaptive algorithms approximate this solution iteratively, one sample at a time. <hr> ## LMS: the least mean squares algorithm The **LMS algorithm** [@widrow1960adaptive] replaces the true gradient of the MSE cost surface with an instantaneous estimate. At each time step: $$\mathbf{w}[n+1] = \mathbf{w}[n] + \mu \, e[n] \, \mathbf{x}[n]$$ where $\mu$ is the step size (learning rate). The update rule is remarkably simple: multiply the error by the input vector and take a small step. **Convergence.** LMS converges in the mean if $0 < \mu < 2 / \lambda_\text{max}$, where $\lambda_\text{max}$ is the largest eigenvalue of $\mathbf{R}_{xx}$. A safe practical bound is: $$0 < \mu < \frac{2}{N \cdot P_x}$$ where $N$ is the filter length and $P_x$ is the input signal power. Smaller $\mu$ gives more accurate convergence but slower adaptation. **Eigenvalue spread.** When the eigenvalues of $\mathbf{R}_{xx}$ are spread widely apart (large condition number), LMS converges slowly because the step size is limited by the largest eigenvalue but adaptation along the smallest eigenvalue direction is sluggish. This is a fundamental limitation. ::: {.callout-note} The gradient descent here works because the MSE surface of an **FIR** filter is a single convex bowl. For **IIR** filters the coefficients enter the denominator and the surface becomes non-convex, with local minima that trap a gradient. The [PSO for filter design](../pso-filter-design/index.qmd) topic takes on that case with a gradient-free swarm. ::: <hr> ## NLMS: normalised LMS The **NLMS algorithm** removes LMS's sensitivity to the input power level by normalising the step size by the instantaneous input power (the slow convergence under a large input-correlation eigenvalue spread is a separate problem, the one RLS below addresses): $$\mathbf{w}[n+1] = \mathbf{w}[n] + \frac{\mu}{\varepsilon + \|\mathbf{x}[n]\|^2} \, e[n] \, \mathbf{x}[n]$$ where $\varepsilon$ is a small constant preventing division by zero. The effective step size adapts to the signal level, giving convergence that is largely independent of the input power. NLMS converges for $0 < \mu < 2$. NLMS is the workhorse of practical adaptive filtering. It is only marginally more expensive than LMS (one extra dot product per sample) but significantly more robust. <hr> ## RLS: recursive least squares Where LMS uses instantaneous gradient estimates, **RLS** [@haykin2002adaptive] minimises a weighted least squares cost over all past data: $$J[n] = \sum_{i=0}^{n} \lambda^{n-i} |e[i]|^2$$ The forgetting factor $\lambda$ (typically 0.95 to 1.0) discounts older data exponentially, allowing the filter to track non-stationary environments. A filter that weighs every past sample equally cannot follow a world that keeps moving; tracking change means letting the past count for a little less with each step. The update equations involve a gain vector $\mathbf{k}[n]$ and an inverse correlation matrix $\mathbf{P}[n]$: $$\mathbf{k}[n] = \frac{\mathbf{P}[n-1]\,\mathbf{x}[n]}{\lambda + \mathbf{x}[n]^T\,\mathbf{P}[n-1]\,\mathbf{x}[n]}$$ $$\mathbf{w}[n] = \mathbf{w}[n-1] + \mathbf{k}[n]\,e[n]$$ $$\mathbf{P}[n] = \frac{1}{\lambda}\left(\mathbf{P}[n-1] - \mathbf{k}[n]\,\mathbf{x}[n]^T\,\mathbf{P}[n-1]\right)$$ RLS converges much faster than LMS/NLMS, often in $N$ to $2N$ samples rather than hundreds. The cost is $O(N^2)$ computation per sample versus $O(N)$ for LMS, which matters for long filters. <hr> ## Applications ### System identification The most direct application: estimate the impulse response of an unknown system. Drive both the unknown system and the adaptive filter with the same input signal. The adaptive filter converges to the unknown system's impulse response. ```{dot} //| label: fig-sysid //| echo: false //| fig-cap: "System identification: the unknown system and the adaptive filter share the same input. The filter adapts until its output y[n] matches d[n] and the error e[n] vanishes. The dashed path is the adaptation: the error drives the coefficient updates." graph { layout=neato node [fontname="sans-serif" fontsize=14] edge [arrowsize=0.7 dir=forward] x [label="x[n]" shape=plaintext fontname="serif" pos="-0.8,1.5!"] xb [label="" shape=point width=0.06 pos="0.2,1.5!"] cu [label="" shape=none width=0 height=0 margin=0 pos="0.2,2.4!"] cd [label="" shape=none width=0 height=0 margin=0 pos="0.2,0.6!"] unk [label="Unknown\nsystem" shape=box width=1.2 height=0.6 pos="1.8,2.4!"] adf [label="Adaptive\nfilter" shape=box width=1.2 height=0.6 pos="1.8,0.6!"] dl [label="d[n]" shape=plaintext fontname="serif" pos="3.0,2.64!"] yl [label="y[n]" shape=plaintext fontname="serif" pos="3.0,0.84!"] sum [label="+" shape=circle width=0.38 fixedsize=true pos="3.9,2.4!"] pp [label="+" shape=plaintext fontsize=12 pos="3.55,2.6!"] mm [label="−" shape=plaintext fontsize=12 pos="4.12,1.98!"] cy [label="" shape=none width=0 height=0 margin=0 pos="3.9,0.6!"] eb [label="" shape=point width=0.06 pos="4.7,2.4!"] e [label="e[n]" shape=plaintext fontname="serif" pos="5.5,2.4!"] f1 [label="" shape=none width=0 height=0 margin=0 pos="4.7,-0.3!"] f2 [label="" shape=none width=0 height=0 margin=0 pos="1.8,-0.3!"] x--xb [arrowhead=none] xb--cu [arrowhead=none] cu--unk xb--cd [arrowhead=none] cd--adf unk--sum adf--cy [arrowhead=none] cy--sum sum--eb [arrowhead=none] eb--e eb--f1 [arrowhead=none style=dashed] f1--f2 [arrowhead=none style=dashed] f2--adf [style=dashed] } ``` This is useful for measuring room impulse responses, characterising communication channels, or calibrating sensors. ### Adaptive noise cancellation Noise-cancelling headphones use this principle. A reference microphone picks up ambient noise. An adaptive filter learns the transfer function from the reference to the primary microphone, and subtracts the estimated noise. ```{dot} //| label: fig-anc //| echo: false //| fig-cap: "Adaptive noise cancellation: a reference mic captures the noise, the adaptive filter models the acoustic path to the primary mic, and its estimate is subtracted to leave the clean signal. Dashed lines: the acoustic coupling and the error-driven adaptation." graph { layout=neato node [fontname="sans-serif" fontsize=14] edge [arrowsize=0.7 dir=forward] no [label="noise" shape=plaintext fontname="serif" pos="-0.9,1.5!"] nb [label="" shape=point width=0.06 pos="0.0,1.5!"] cu [label="" shape=none width=0 height=0 margin=0 pos="0.0,2.4!"] cdn [label="" shape=none width=0 height=0 margin=0 pos="0.0,0.6!"] ap [label="acoustic path" shape=plaintext fontsize=12 pos="0.75,2.62!"] sn [label="signal + noise" shape=plaintext fontname="serif" pos="1.6,2.4!"] pri [label="primary\nmic" shape=box width=1.1 height=0.6 pos="3.3,2.4!"] dl [label="d[n]" shape=plaintext fontname="serif" pos="4.35,2.64!"] sum [label="+" shape=circle width=0.38 fixedsize=true pos="5.1,2.4!"] pp [label="+" shape=plaintext fontsize=12 pos="4.75,2.6!"] mm [label="−" shape=plaintext fontsize=12 pos="5.32,1.98!"] eb [label="" shape=point width=0.06 pos="5.8,2.4!"] e [label="e[n] ≈ signal" shape=plaintext fontname="serif" pos="7.0,2.4!"] ref [label="reference\nmic" shape=box width=1.2 height=0.6 pos="1.3,0.6!"] xl [label="x[n]" shape=plaintext fontname="serif" pos="2.5,0.84!"] adf [label="Adaptive\nfilter" shape=box width=1.2 height=0.6 pos="3.6,0.6!"] yl [label="y[n]" shape=plaintext fontname="serif" pos="4.7,0.84!"] cy [label="" shape=none width=0 height=0 margin=0 pos="5.1,0.6!"] f1 [label="" shape=none width=0 height=0 margin=0 pos="5.8,-0.3!"] f2 [label="" shape=none width=0 height=0 margin=0 pos="3.6,-0.3!"] no--nb [arrowhead=none] nb--cu [arrowhead=none style=dashed] cu--sn [style=dashed] sn--pri pri--sum nb--cdn [arrowhead=none] cdn--ref ref--adf adf--cy [arrowhead=none] cy--sum sum--eb [arrowhead=none] eb--e eb--f1 [arrowhead=none style=dashed] f1--f2 [arrowhead=none style=dashed] f2--adf [style=dashed] } ``` The error signal $e[n]$ is the cleaned signal: everything in $d[n]$ that is *not* correlated with the noise reference. The key assumption is that the desired signal (speech, music) is uncorrelated with the noise source. ### Acoustic echo cancellation In hands-free telephony, the far-end speaker's voice plays through the loudspeaker, bounces around the room, and is picked up by the microphone, creating an echo. An adaptive filter estimates the room impulse response (the echo path) and subtracts the predicted echo. ```{dot} //| label: fig-aec //| echo: false //| fig-cap: "Acoustic echo cancellation: the far-end signal also drives an adaptive filter that estimates the room echo path; subtracting the estimate removes the echo before the signal returns to the far end. The dashed path feeds the error back to adapt the filter." graph { layout=neato node [fontname="sans-serif" fontsize=14] edge [arrowsize=0.7 dir=forward] fe [label="far-end" shape=plaintext fontname="serif" pos="-1.0,2.4!"] fb [label="" shape=point width=0.06 pos="0.0,2.4!"] spk [label="loudspeaker" shape=box width=1.3 height=0.55 pos="1.4,2.4!"] room[label="room" shape=box width=0.9 height=0.55 pos="3.1,2.4!"] mic [label="microphone" shape=box width=1.3 height=0.55 pos="4.6,2.4!"] dl [label="d[n]" shape=plaintext fontname="serif" pos="5.75,2.64!"] sum [label="+" shape=circle width=0.38 fixedsize=true pos="6.5,2.4!"] pp [label="+" shape=plaintext fontsize=12 pos="6.15,2.6!"] mm [label="−" shape=plaintext fontsize=12 pos="6.72,1.98!"] eb [label="" shape=point width=0.06 pos="7.1,2.4!"] e [label="e[n] → far-end" shape=plaintext fontname="serif" pos="8.4,2.4!"] cdf [label="" shape=none width=0 height=0 margin=0 pos="0.0,0.6!"] xl [label="x[n]" shape=plaintext fontname="serif" pos="1.3,0.84!"] adf [label="Adaptive\nfilter" shape=box width=1.2 height=0.6 pos="3.1,0.6!"] yl [label="ŷ[n]" shape=plaintext fontname="serif" pos="4.8,0.84!"] cy [label="" shape=none width=0 height=0 margin=0 pos="6.5,0.6!"] f1 [label="" shape=none width=0 height=0 margin=0 pos="7.1,-0.3!"] f2 [label="" shape=none width=0 height=0 margin=0 pos="3.1,-0.3!"] fe--fb [arrowhead=none] fb--spk spk--room room--mic mic--sum fb--cdf [arrowhead=none] cdf--adf adf--cy [arrowhead=none] cy--sum sum--eb [arrowhead=none] eb--e eb--f1 [arrowhead=none style=dashed] f1--f2 [arrowhead=none style=dashed] f2--adf [style=dashed] } ``` AEC is challenging because room impulse responses are long (hundreds of taps at 8 kHz), non-stationary (people move), and complicated by **double-talk** (both parties speaking simultaneously). These challenges motivate frequency-domain adaptive filters (BFDAF) and double-talk detectors, topics beyond this introduction. ### Audio applications: acoustic echo cancellation {#audio-aec} Acoustic echo cancellation (AEC) is a textbook application of adaptive system identification, and one where algorithm choice matters. In a speakerphone or video conferencing system, the far-end voice plays through the loudspeaker, reflects off walls, furniture, and people, and arrives at the microphone as an attenuated, delayed, and coloured copy of the original. The adaptive filter must model the **room impulse response** (RIR) (the transfer function from loudspeaker to microphone) and subtract the predicted echo in real time. Why NLMS dominates over LMS in audio AEC: - **Varying input power across frequencies.** Speech and music have strong spectral tilt: much more energy below 1 kHz than above 4 kHz. This gives the input autocorrelation matrix a large eigenvalue spread, which is exactly the condition that makes LMS converge slowly (the step size is limited by the dominant low-frequency components while high-frequency modes adapt sluggishly). - **NLMS normalises per-sample**, making convergence speed independent of the input level. This is critical because speech alternates between loud vowels and quiet consonants, and silence gaps reset the effective SNR. - **RIR length.** Typical room impulse responses are 50 to 300 ms, requiring 400 to 2400 taps at 8 kHz. At these filter lengths, RLS is prohibitively expensive ($O(N^2)$), so NLMS remains the practical choice. Frequency-domain variants (BFDAF, partitioned-block FDAF) extend this to $O(N \log N)$ for very long filters. The multichannel extension, **multichannel AEC (MCAEC)**, arises in stereo or surround-sound conferencing, where multiple loudspeakers create multiple echo paths. The core difficulty is that the loudspeaker signals are highly correlated (they carry the same far-end speech), making the combined autocorrelation matrix ill-conditioned. Decorrelation techniques (half-wave rectification, frequency-bin interleaving) are needed to make the adaptive filter converge. This connects directly to the [system identification](#system-identification) setup above, extended to MIMO. <hr> ## Comparison | Property | LMS | NLMS | RLS | |---|---|---|---| | Computation | $O(N)$ | $O(N)$ | $O(N^2)$ | | Memory | $O(N)$ | $O(N)$ | $O(N^2)$ | | Convergence speed | Slow | Moderate | Fast | | Sensitivity to input power | High | Low | Low | | Tracking ability | Moderate | Good | Excellent | | Numerical stability | Good | Good | Can diverge | For most practical applications, **NLMS is the default choice**. Use RLS when convergence speed matters more than computational cost, or when the filter length is short enough that $O(N^2)$ is acceptable. <hr> ## Implementation See [`adaptive.py`](adaptive.py) for clean implementations of all three algorithms. Each class provides a sample-by-sample `update()` method for real-time use and a batch `run()` method for offline processing. The `identify_system()` convenience function demonstrates the system identification scenario with configurable algorithm, SNR, and filter length. <hr> ## Open questions - **Step size selection.** Optimal $\mu$ depends on signal statistics that are usually unknown. Variable step-size algorithms (VSS-LMS, VSS-NLMS) adapt $\mu$ over time, but add complexity and their own tuning parameters. There is no universally satisfying solution. - **Tracking vs steady-state.** Fast tracking (small forgetting factor in RLS, large $\mu$ in LMS) gives higher steady-state error. The optimal trade-off depends on how quickly the environment changes, which is itself unknown. - **Long filters.** AEC requires hundreds of taps. Frequency-domain adaptive filters (BFDAF, partitioned-block FDAF) reduce per-sample computation from $O(N)$ to $O(\log N)$ using FFTs. This is standard practice but adds latency. - **Nonlinear echo paths.** Loudspeaker distortion creates nonlinear echo components that linear adaptive filters cannot cancel. Kernel-based and neural network approaches exist but are computationally expensive. For hardware implementations on STM32F4 and ESP32, see [Adaptive Filtering on Hardware](embedded.qmd). ## References ::: {#refs} :::