Adaptive Filtering
Filters that learn: from system identification to noise cancellation
The electric fish Eigenmannia generates a weak electric field and senses distortions caused by nearby objects. When a neighbour’s signal interferes, the fish shifts its own frequency to avoid jamming: a biological frequency-tracking system that adjusts in real time based on detected interference (Heiligenberg 1991). The LMS and NLMS algorithms in this topic implement the same strategy digitally: adjust filter coefficients to minimize an error signal, converging on the optimal solution without knowing the statistics in advance.
Conventional filters have fixed coefficients chosen at design time. But what if the system you need to filter is unknown, or changes over time? Adaptive filters solve this by adjusting their coefficients automatically, driven by an error signal that measures how well the filter is performing.
Adaptive filtering underpins technologies we use daily: noise-cancelling headphones, echo cancellation in phone calls, channel equalisation in wireless communications, and active vibration control. The underlying algorithms are elegant and surprisingly simple.
This topic assumes familiarity with FIR filters, autocorrelation, and basic z-domain concepts.
The setup
An adaptive filter has four signals:
- \(x[n]\): the input signal
- \(d[n]\): the desired (reference) signal
- \(y[n] = \mathbf{w}^T \mathbf{x}[n]\): the filter output, where \(\mathbf{w}\) is the coefficient vector and \(\mathbf{x}[n] = [x[n], x[n-1], \ldots, x[n-N+1]]^T\)
- \(e[n] = d[n] - y[n]\): the error signal
The algorithm adjusts \(\mathbf{w}\) to minimise some function of the error. What makes adaptive filtering powerful is that the same algorithm structure serves completely different applications. Only the signal routing changes.
The Wiener solution
Before diving into algorithms, it helps to know what the optimal solution looks like. For a stationary input, the FIR filter that minimises the mean squared error \(E\{|e[n]|^2\}\) is given by the Wiener–Hopf equation:
\[\mathbf{w}_\text{opt} = \mathbf{R}_{xx}^{-1} \mathbf{p}_{xd}\]
where \(\mathbf{R}_{xx} = E\{\mathbf{x}[n]\mathbf{x}[n]^T\}\) is the input autocorrelation matrix and \(\mathbf{p}_{xd} = E\{\mathbf{x}[n] d[n]\}\) is the cross-correlation vector between input and desired signal.
In practice, we rarely know \(\mathbf{R}_{xx}\) and \(\mathbf{p}_{xd}\) in advance, and even if we did, the matrix inverse is expensive to compute and must be recomputed whenever statistics change. Adaptive algorithms approximate this solution iteratively, one sample at a time.
LMS: the least mean squares algorithm
The LMS algorithm (Widrow and Hoff 1960) replaces the true gradient of the MSE cost surface with an instantaneous estimate. At each time step:
\[\mathbf{w}[n+1] = \mathbf{w}[n] + \mu \, e[n] \, \mathbf{x}[n]\]
where \(\mu\) is the step size (learning rate). The update rule is remarkably simple: multiply the error by the input vector and take a small step.
Convergence. LMS converges in the mean if \(0 < \mu < 2 / \lambda_\text{max}\), where \(\lambda_\text{max}\) is the largest eigenvalue of \(\mathbf{R}_{xx}\). A safe practical bound is:
\[0 < \mu < \frac{2}{N \cdot P_x}\]
where \(N\) is the filter length and \(P_x\) is the input signal power. Smaller \(\mu\) gives more accurate convergence but slower adaptation.
Eigenvalue spread. When the eigenvalues of \(\mathbf{R}_{xx}\) are spread widely apart (large condition number), LMS converges slowly because the step size is limited by the largest eigenvalue but adaptation along the smallest eigenvalue direction is sluggish. This is a fundamental limitation.
The gradient descent here works because the MSE surface of an FIR filter is a single convex bowl. For IIR filters the coefficients enter the denominator and the surface becomes non-convex, with local minima that trap a gradient. The PSO for filter design topic takes on that case with a gradient-free swarm.
NLMS: normalised LMS
The NLMS algorithm fixes the eigenvalue spread problem by normalising the step size by the instantaneous input power:
\[\mathbf{w}[n+1] = \mathbf{w}[n] + \frac{\mu}{\varepsilon + \|\mathbf{x}[n]\|^2} \, e[n] \, \mathbf{x}[n]\]
where \(\varepsilon\) is a small constant preventing division by zero. The effective step size adapts to the signal level, giving convergence that is largely independent of the input power. NLMS converges for \(0 < \mu < 2\).
NLMS is the workhorse of practical adaptive filtering. It is only marginally more expensive than LMS (one extra dot product per sample) but significantly more robust.
RLS: recursive least squares
Where LMS uses instantaneous gradient estimates, RLS (Haykin 2002) minimises a weighted least squares cost over all past data:
\[J[n] = \sum_{i=0}^{n} \lambda^{n-i} |e[i]|^2\]
The forgetting factor \(\lambda\) (typically 0.95–1.0) discounts older data exponentially, allowing the filter to track non-stationary environments.
The update equations involve a gain vector \(\mathbf{k}[n]\) and an inverse correlation matrix \(\mathbf{P}[n]\):
\[\mathbf{k}[n] = \frac{\mathbf{P}[n-1]\,\mathbf{x}[n]}{\lambda + \mathbf{x}[n]^T\,\mathbf{P}[n-1]\,\mathbf{x}[n]}\]
\[\mathbf{w}[n] = \mathbf{w}[n-1] + \mathbf{k}[n]\,e[n]\]
\[\mathbf{P}[n] = \frac{1}{\lambda}\left(\mathbf{P}[n-1] - \mathbf{k}[n]\,\mathbf{x}[n]^T\,\mathbf{P}[n-1]\right)\]
RLS converges much faster than LMS/NLMS, often in \(N\) to \(2N\) samples rather than hundreds. The cost is \(O(N^2)\) computation per sample versus \(O(N)\) for LMS, which matters for long filters.
Applications
System identification
The most direct application: estimate the impulse response of an unknown system. Drive both the unknown system and the adaptive filter with the same input signal. The adaptive filter converges to the unknown system’s impulse response.
This is useful for measuring room impulse responses, characterising communication channels, or calibrating sensors.
Adaptive noise cancellation
Noise-cancelling headphones use this principle. A reference microphone picks up ambient noise. An adaptive filter learns the transfer function from the reference to the primary microphone, and subtracts the estimated noise.
The error signal \(e[n]\) is the cleaned signal: everything in \(d[n]\) that is not correlated with the noise reference. The key assumption is that the desired signal (speech, music) is uncorrelated with the noise source.
Acoustic echo cancellation
In hands-free telephony, the far-end speaker’s voice plays through the loudspeaker, bounces around the room, and is picked up by the microphone, creating an echo. An adaptive filter estimates the room impulse response (the echo path) and subtracts the predicted echo.
AEC is challenging because room impulse responses are long (hundreds of taps at 8 kHz), non-stationary (people move), and complicated by double-talk (both parties speaking simultaneously). These challenges motivate frequency-domain adaptive filters (BFDAF) and double-talk detectors, topics beyond this introduction.
Audio applications: acoustic echo cancellation
Acoustic echo cancellation (AEC) is a textbook application of adaptive system identification, and one where algorithm choice matters. In a speakerphone or video conferencing system, the far-end voice plays through the loudspeaker, reflects off walls, furniture, and people, and arrives at the microphone as an attenuated, delayed, and coloured copy of the original. The adaptive filter must model the room impulse response (RIR) (the transfer function from loudspeaker to microphone) and subtract the predicted echo in real time.
Why NLMS dominates over LMS in audio AEC:
- Varying input power across frequencies. Speech and music have strong spectral tilt: much more energy below 1 kHz than above 4 kHz. This gives the input autocorrelation matrix a large eigenvalue spread, which is exactly the condition that makes LMS converge slowly (the step size is limited by the dominant low-frequency components while high-frequency modes adapt sluggishly).
- NLMS normalises per-sample, making convergence speed independent of the input level. This is critical because speech alternates between loud vowels and quiet consonants, and silence gaps reset the effective SNR.
- RIR length. Typical room impulse responses are 50–300 ms, requiring 400–2400 taps at 8 kHz. At these filter lengths, RLS is prohibitively expensive (\(O(N^2)\)), so NLMS remains the practical choice. Frequency-domain variants (BFDAF, partitioned-block FDAF) extend this to \(O(N \log N)\) for very long filters.
The multichannel extension, multichannel AEC (MCAEC), arises in stereo or surround-sound conferencing, where multiple loudspeakers create multiple echo paths. The core difficulty is that the loudspeaker signals are highly correlated (they carry the same far-end speech), making the combined autocorrelation matrix ill-conditioned. Decorrelation techniques (half-wave rectification, frequency-bin interleaving) are needed to make the adaptive filter converge. This connects directly to the system identification setup above, extended to MIMO.
Comparison
| Property | LMS | NLMS | RLS |
|---|---|---|---|
| Computation | \(O(N)\) | \(O(N)\) | \(O(N^2)\) |
| Memory | \(O(N)\) | \(O(N)\) | \(O(N^2)\) |
| Convergence speed | Slow | Moderate | Fast |
| Sensitivity to input power | High | Low | Low |
| Tracking ability | Moderate | Good | Excellent |
| Numerical stability | Good | Good | Can diverge |
For most practical applications, NLMS is the default choice. Use RLS when convergence speed matters more than computational cost, or when the filter length is short enough that \(O(N^2)\) is acceptable.
Implementation
See adaptive.py for clean implementations of all three algorithms. Each class provides a sample-by-sample update() method for real-time use and a batch run() method for offline processing.
The identify_system() convenience function demonstrates the system identification scenario with configurable algorithm, SNR, and filter length.
Open questions
Step size selection. Optimal \(\mu\) depends on signal statistics that are usually unknown. Variable step-size algorithms (VSS-LMS, VSS-NLMS) adapt \(\mu\) over time, but add complexity and their own tuning parameters. There is no universally satisfying solution.
Tracking vs steady-state. Fast tracking (small forgetting factor in RLS, large \(\mu\) in LMS) gives higher steady-state error. The optimal trade-off depends on how quickly the environment changes, which is itself unknown.
Long filters. AEC requires hundreds of taps. Frequency-domain adaptive filters (BFDAF, partitioned-block FDAF) reduce per-sample computation from \(O(N)\) to \(O(\log N)\) using FFTs. This is standard practice but adds latency.
Nonlinear echo paths. Loudspeaker distortion creates nonlinear echo components that linear adaptive filters cannot cancel. Kernel-based and neural network approaches exist but are computationally expensive.
For hardware implementations on STM32F4 and ESP32, see Adaptive Filtering on Hardware.