Adaptive Filtering on Hardware
LMS/NLMS on STM32F4 and ESP32
Adaptive filters are computationally simple (an \(N\)-tap NLMS filter requires only \(3N\) multiply-accumulates per sample), making them well-suited for microcontroller implementation. This page covers practical considerations for STM32F4 and ESP32-S3 platforms. For the theory, convergence analysis, and Python prototypes, see the main adaptive filtering page.
STM32F4: ARM CMSIS-DSP
The STM32F4 series (Cortex-M4F, up to 180 MHz on the NUCLEO-F446RE, single-precision FPU) is a natural target. ARM’s CMSIS-DSP library includes optimised LMS and NLMS implementations:
arm_lms_f32: floating-point LMSarm_lms_norm_f32: floating-point NLMS- Q15 and Q31 fixed-point variants for applications without FPU
Using CMSIS-DSP LMS
#include "arm_math.h"
#define NUM_TAPS 32
#define BLOCK_SIZE 1
static float32_t firState[NUM_TAPS + BLOCK_SIZE - 1];
static float32_t coeffs[NUM_TAPS];
static arm_lms_instance_f32 lms;
void init_adaptive_filter(void) {
arm_lms_init_f32(&lms, NUM_TAPS, coeffs, firState, 0.01f, BLOCK_SIZE);
}
void process_sample(float32_t *input, float32_t *desired,
float32_t *output, float32_t *error) {
arm_lms_f32(&lms, input, desired, output, error, BLOCK_SIZE);
}The CMSIS functions process data in blocks. For sample-by-sample processing, set BLOCK_SIZE = 1. The library uses SIMD instructions (DSP extension) on the Cortex-M4 to process multiple coefficients in parallel.
Performance budget
At 180 MHz (NUCLEO-F446RE), a 32-tap NLMS filter at 8 kHz sample rate uses roughly:
| Operation | Cycles | Time |
|---|---|---|
| NLMS update (32 taps) | ~200 | 1.2 µs |
| Available per sample (8 kHz) | 22 500 | 125 µs |
| Utilisation | ~1% |
This leaves ample headroom for ADC/DAC handling, pre/post processing, and communication. Even a 256-tap filter for AEC would use under 10% of the CPU budget.
Hardware setup for system identification
A minimal test setup for system identification on STM32F4:
- DAC output → external analog filter (the “unknown system”) → ADC input
- Generate white noise on the DAC
- Read the filtered signal on the ADC
- Run NLMS to identify the analog filter’s impulse response
- Send coefficients over UART for analysis
The STM32F4-Discovery board has a 12-bit DAC and 12-bit ADC, sufficient for proof-of-concept. For audio-quality work, use an I2S codec (e.g., the CS43L22 on the Discovery board, or an external PCM5102).
Hardware setup for noise cancellation
For a noise cancellation demo:
- Reference microphone → ADC channel 1 (noise reference)
- Primary microphone → ADC channel 2 (signal + noise)
- Run NLMS: input = reference, desired = primary
- Output error signal (cleaned audio) on DAC or I2S
Both ADC channels must be sampled synchronously. Use DMA with double buffering to avoid sample drops.
ESP32: a low-cost alternative
The ESP32-S3 (dual-core Xtensa LX7, 240 MHz, single-precision FPU) offers a different trade-off: cheaper, Wi-Fi/Bluetooth built-in, good I2S support, but no CMSIS-DSP library and a less deterministic real-time environment (due to Wi-Fi interrupts and FreeRTOS).
Why ESP32 for adaptive filtering?
- Built-in I2S: direct connection to MEMS microphones (INMP441, SPH0645) and DAC codecs, no external ADC needed for audio
- Dual core: dedicate one core to audio processing, the other to communication
- Cost: ESP32-DevKitC boards cost under €5, complete audio dev boards (ESP32-LyraT) under €20
- ESP-ADF: Espressif’s Audio Development Framework includes AEC modules
Implementation approach
ESP-IDF (C/C++) for real-time processing. The ESP32 lacks CMSIS-DSP, so implement NLMS directly:
#include <string.h>
#include <math.h>
typedef struct {
float *w; // filter coefficients
float *x_buf; // input buffer
int n_taps;
float mu;
float eps;
} nlms_t;
void nlms_init(nlms_t *f, int n_taps, float mu) {
f->n_taps = n_taps;
f->mu = mu;
f->eps = 1e-8f;
f->w = calloc(n_taps, sizeof(float));
f->x_buf = calloc(n_taps, sizeof(float));
// Production code: check for NULL and handle allocation failure.
// For a fixed filter length, prefer static arrays over calloc.
}
float nlms_update(nlms_t *f, float x, float d) {
// Shift input buffer
memmove(f->x_buf + 1, f->x_buf, (f->n_taps - 1) * sizeof(float));
f->x_buf[0] = x;
// Compute output
float y = 0.0f;
float norm = f->eps;
for (int i = 0; i < f->n_taps; i++) {
y += f->w[i] * f->x_buf[i];
norm += f->x_buf[i] * f->x_buf[i];
}
// Update coefficients
float e = d - y;
float step = f->mu * e / norm;
for (int i = 0; i < f->n_taps; i++) {
f->w[i] += step * f->x_buf[i];
}
return e;
}ESP32 performance budget
At 240 MHz, single core:
| Operation | Cycles | Time |
|---|---|---|
| NLMS update (32 taps) | ~400 | 1.7 µs |
| Available per sample (16 kHz) | 15 000 | 62.5 µs |
| Utilisation | ~3% |
The ESP32 is more than capable. The challenge is jitter: Wi-Fi stack interrupts can steal hundreds of microseconds. Mitigate this by running audio processing on core 1 with Wi-Fi pinned to core 0.
ESP32 audio I/O with I2S
The code below uses the ESP-IDF v4.x I2S API (driver/i2s.h), which was removed in ESP-IDF v5.2. For the v5.x API (driver/i2s_std.h), see the pitch detection or beamforming embedded pages.
#include "driver/i2s.h"
// I2S config for INMP441 MEMS microphone
i2s_config_t i2s_config = {
.mode = I2S_MODE_MASTER | I2S_MODE_RX,
.sample_rate = 16000,
.bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.dma_buf_count = 4,
.dma_buf_len = 64,
.use_apll = true,
};For stereo input (reference + primary microphone), use I2S_CHANNEL_FMT_RIGHT_LEFT and wire two INMP441 modules with different L/R select pins.
Platform comparison
| Feature | STM32F4 | ESP32 |
|---|---|---|
| Clock | 180 MHz (NUCLEO-F446RE) | 240 MHz |
| FPU | Yes (Cortex-M4F) | Yes (Xtensa LX7) |
| CMSIS-DSP | Yes | No |
| I2S | Via codec | Built-in |
| Wi-Fi/BT | No (needs module) | Built-in |
| Real-time determinism | Excellent | Good (with core pinning) |
| Unit cost | ~€10 | ~€5 |
| Best for | Deterministic DSP, long filters | Audio prototypes, IoT |
Recommendation
- STM32F4 for serious DSP work: deterministic timing, CMSIS-DSP library, established in industrial audio. Use this when filter length or real-time guarantees matter.
- ESP32 for prototyping and demos: cheap, built-in audio I/O, easy to add wireless monitoring. Use this when you want a quick noise cancellation demo with minimal external hardware.
Bill of materials for a noise cancellation demo
A minimal adaptive noise cancellation demo on ESP32:
| Component | Purpose | Approx. cost |
|---|---|---|
| ESP32-DevKitC | Processing | €5 |
| 2× INMP441 breakout | Reference + primary microphone | €4 |
| MAX98357A breakout | I2S DAC + amplifier | €3 |
| Small speaker | Output | €2 |
| Breadboard + wires | €3 | |
| Total | ~€17 |
The same demo on STM32F4 requires an external I2S codec and more wiring, but gives more headroom for longer filters.