Adaptive Filtering on Hardware

LMS/NLMS on STM32F4 and ESP32

Adaptive filters are computationally simple (an $N$-tap NLMS filter requires only $3N$ multiply-accumulates per sample), making them well-suited for microcontroller implementation. This page covers practical considerations for STM32F4 and ESP32-S3 platforms. For the theory, convergence analysis, and Python prototypes, see the main adaptive filtering page.

STM32F4: ARM CMSIS-DSP

The STM32F4 series (Cortex-M4F, up to 180 MHz on the NUCLEO-F446RE, single-precision FPU) is a natural target. ARM’s CMSIS-DSP library includes optimised LMS and NLMS implementations:

arm_lms_f32: floating-point LMS
arm_lms_norm_f32: floating-point NLMS
Q15 and Q31 fixed-point variants for applications without FPU

Using CMSIS-DSP NLMS

The page’s framing (and the budget below) is NLMS, so use the normalised LMS instance and calls (arm_lms_norm_*); the plain arm_lms_* family is the unnormalised variant with the same signatures.

#include "arm_math.h"

#define NUM_TAPS    32
#define BLOCK_SIZE  1

static float32_t firState[NUM_TAPS + BLOCK_SIZE - 1];
static float32_t coeffs[NUM_TAPS];
static arm_lms_norm_instance_f32 lms;          // normalised LMS (NLMS)

void init_adaptive_filter(void) {
    arm_lms_norm_init_f32(&lms, NUM_TAPS, coeffs, firState, 0.01f, BLOCK_SIZE);
}

void process_sample(float32_t *input, float32_t *desired,
                    float32_t *output, float32_t *error) {
    arm_lms_norm_f32(&lms, input, desired, output, error, BLOCK_SIZE);
}

The CMSIS functions process data in blocks. For sample-by-sample processing, set BLOCK_SIZE = 1. The library uses SIMD instructions (DSP extension) on the Cortex-M4 to process multiple coefficients in parallel.

Performance budget

At 180 MHz (NUCLEO-F446RE), a 32-tap NLMS filter at 8 kHz sample rate uses roughly:

Operation	Cycles	Time
NLMS update (32 taps)	~200	1.2 µs
Available per sample (8 kHz)	22 500	125 µs
Utilisation		~1%

This leaves ample headroom for ADC/DAC handling, pre/post processing, and communication. Even a 256-tap filter for AEC would use under 10% of the CPU budget.

Hardware setup for system identification

A minimal test setup for system identification on STM32F4:

DAC output → external analog filter (the “unknown system”) → ADC input
Generate white noise on the DAC
Read the filtered signal on the ADC
Run NLMS to identify the analog filter’s impulse response
Send coefficients over UART for analysis

The STM32F4-Discovery board has a 12-bit DAC and 12-bit ADC, sufficient for proof-of-concept. For audio-quality work, use an I2S codec (e.g., the CS43L22 on the Discovery board, or an external PCM5102).

Hardware setup for noise cancellation

For a noise cancellation demo:

Reference microphone → ADC channel 1 (noise reference)
Primary microphone → ADC channel 2 (signal + noise)
Run NLMS: input = reference, desired = primary
Output error signal (cleaned audio) on DAC or I2S

Both ADC channels must be sampled synchronously. Use DMA with double buffering to avoid sample drops.

ESP32-S3: a low-cost alternative

The ESP32-S3 (dual-core Xtensa LX7, 240 MHz, single-precision FPU) offers a different trade-off: cheaper, Wi-Fi/Bluetooth built-in, good I2S support, but no CMSIS-DSP library and a less deterministic real-time environment (due to Wi-Fi interrupts and FreeRTOS).

Why ESP32 for adaptive filtering?

Built-in I2S: direct connection to MEMS microphones (INMP441, SPH0645) and DAC codecs, no external ADC needed for audio
Dual core: dedicate one core to audio processing, the other to communication
Cost: ESP32-DevKitC boards cost under €5, complete audio dev boards (ESP32-LyraT) under €20
ESP-ADF: Espressif’s Audio Development Framework includes AEC modules

Implementation approach

ESP-IDF (C/C++) for real-time processing. The ESP32 lacks CMSIS-DSP, so implement NLMS directly:

#include <string.h>
#include <math.h>

typedef struct {
    float *w;           // filter coefficients
    float *x_buf;       // input buffer
    int n_taps;
    float mu;
    float eps;
} nlms_t;

void nlms_init(nlms_t *f, int n_taps, float mu) {
    f->n_taps = n_taps;
    f->mu = mu;
    f->eps = 1e-8f;
    f->w = calloc(n_taps, sizeof(float));
    f->x_buf = calloc(n_taps, sizeof(float));
    // Production code: check for NULL and handle allocation failure.
    // For a fixed filter length, prefer static arrays over calloc.
}

float nlms_update(nlms_t *f, float x, float d) {
    // Shift input buffer
    memmove(f->x_buf + 1, f->x_buf, (f->n_taps - 1) * sizeof(float));
    f->x_buf[0] = x;

    // Compute output
    float y = 0.0f;
    float norm = f->eps;
    for (int i = 0; i < f->n_taps; i++) {
        y += f->w[i] * f->x_buf[i];
        norm += f->x_buf[i] * f->x_buf[i];
    }

    // Update coefficients
    float e = d - y;
    float step = f->mu * e / norm;
    for (int i = 0; i < f->n_taps; i++) {
        f->w[i] += step * f->x_buf[i];
    }
    return e;
}

ESP32 performance budget

At 240 MHz, single core:

Operation	Cycles	Time
NLMS update (32 taps)	~400	1.7 µs
Available per sample (16 kHz)	15 000	62.5 µs
Utilisation		~3%

The ESP32 is more than capable. The challenge is jitter: Wi-Fi stack interrupts can steal hundreds of microseconds. Mitigate this by running audio processing on core 1 with Wi-Fi pinned to core 0.

ESP32 audio I/O with I2S

Legacy API

The code below uses the ESP-IDF v4.x I2S API (driver/i2s.h), which was removed in ESP-IDF v5.2. For the v5.x API (driver/i2s_std.h), see the pitch detection or beamforming embedded pages.

#include "driver/i2s.h"

// I2S config for INMP441 MEMS microphone
i2s_config_t i2s_config = {
    .mode = I2S_MODE_MASTER | I2S_MODE_RX,
    .sample_rate = 16000,
    .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,
    .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
    .communication_format = I2S_COMM_FORMAT_STAND_I2S,
    .dma_buf_count = 4,
    .dma_buf_len = 64,
    .use_apll = true,
};

For stereo input (reference + primary microphone), use I2S_CHANNEL_FMT_RIGHT_LEFT and wire two INMP441 modules with different L/R select pins.

Platform comparison

Feature	STM32F4	ESP32
Clock	180 MHz (NUCLEO-F446RE)	240 MHz
FPU	Yes (Cortex-M4F)	Yes (Xtensa LX7)
CMSIS-DSP	Yes	No
I2S	Via codec	Built-in
Wi-Fi/BT	No (needs module)	Built-in
Real-time determinism	Excellent	Good (with core pinning)
Unit cost	~€10	~€5
Best for	Deterministic DSP, long filters	Audio prototypes, IoT

Recommendation

STM32F4 for serious DSP work: deterministic timing, CMSIS-DSP library, established in industrial audio. Use this when filter length or real-time guarantees matter.
ESP32 for prototyping and demos: cheap, built-in audio I/O, easy to add wireless monitoring. Use this when you want a quick noise cancellation demo with minimal external hardware.

Bill of materials for a noise cancellation demo

A minimal adaptive noise cancellation demo on ESP32:

Component	Purpose	Approx. cost
ESP32-DevKitC	Processing	€5
2× INMP441 breakout	Reference + primary microphone	€4
MAX98357A breakout	I2S DAC + amplifier	€3
Small speaker	Output	€2
Breadboard + wires		€3
Total		~€17

The same demo on STM32F4 requires an external I2S codec and more wiring, but gives more headroom for longer filters.

--- title: "Adaptive Filtering on Hardware" subtitle: "LMS/NLMS on STM32F4 and ESP32" --- Adaptive filters are computationally simple (an $N$-tap NLMS filter requires only $3N$ multiply-accumulates per sample), making them well-suited for microcontroller implementation. This page covers practical considerations for STM32F4 and ESP32-S3 platforms. For the theory, convergence analysis, and Python prototypes, see the [main adaptive filtering page](index.qmd). <hr> ## STM32F4: ARM CMSIS-DSP The STM32F4 series (Cortex-M4F, up to 180 MHz on the NUCLEO-F446RE, single-precision FPU) is a natural target. ARM's CMSIS-DSP library includes optimised LMS and NLMS implementations: - `arm_lms_f32`: floating-point LMS - `arm_lms_norm_f32`: floating-point NLMS - Q15 and Q31 fixed-point variants for applications without FPU ### Using CMSIS-DSP NLMS The page's framing (and the budget below) is NLMS, so use the *normalised* LMS instance and calls (`arm_lms_norm_*`); the plain `arm_lms_*` family is the unnormalised variant with the same signatures. ```c #include "arm_math.h" #define NUM_TAPS 32 #define BLOCK_SIZE 1 static float32_t firState[NUM_TAPS + BLOCK_SIZE - 1]; static float32_t coeffs[NUM_TAPS]; static arm_lms_norm_instance_f32 lms; // normalised LMS (NLMS) void init_adaptive_filter(void) { arm_lms_norm_init_f32(&lms, NUM_TAPS, coeffs, firState, 0.01f, BLOCK_SIZE); } void process_sample(float32_t *input, float32_t *desired, float32_t *output, float32_t *error) { arm_lms_norm_f32(&lms, input, desired, output, error, BLOCK_SIZE); } ``` The CMSIS functions process data in blocks. For sample-by-sample processing, set `BLOCK_SIZE = 1`. The library uses SIMD instructions (DSP extension) on the Cortex-M4 to process multiple coefficients in parallel. ### Performance budget At 180 MHz (NUCLEO-F446RE), a 32-tap NLMS filter at 8 kHz sample rate uses roughly: | Operation | Cycles | Time | |---|---|---| | NLMS update (32 taps) | ~200 | 1.2 µs | | Available per sample (8 kHz) | 22 500 | 125 µs | | **Utilisation** | | **~1%** | This leaves ample headroom for ADC/DAC handling, pre/post processing, and communication. Even a 256-tap filter for AEC would use under 10% of the CPU budget. ### Hardware setup for system identification A minimal test setup for system identification on STM32F4: 1. **DAC output** → external analog filter (the "unknown system") → **ADC input** 2. Generate white noise on the DAC 3. Read the filtered signal on the ADC 4. Run NLMS to identify the analog filter's impulse response 5. Send coefficients over UART for analysis The STM32F4-Discovery board has a 12-bit DAC and 12-bit ADC, sufficient for proof-of-concept. For audio-quality work, use an I2S codec (e.g., the CS43L22 on the Discovery board, or an external PCM5102). ### Hardware setup for noise cancellation For a noise cancellation demo: 1. **Reference microphone** → ADC channel 1 (noise reference) 2. **Primary microphone** → ADC channel 2 (signal + noise) 3. Run NLMS: input = reference, desired = primary 4. Output error signal (cleaned audio) on DAC or I2S Both ADC channels must be sampled synchronously. Use DMA with double buffering to avoid sample drops. <hr> ## ESP32-S3: a low-cost alternative The ESP32-S3 (dual-core Xtensa LX7, 240 MHz, single-precision FPU) offers a different trade-off: cheaper, Wi-Fi/Bluetooth built-in, good I2S support, but no CMSIS-DSP library and a less deterministic real-time environment (due to Wi-Fi interrupts and FreeRTOS). ### Why ESP32 for adaptive filtering? - **Built-in I2S**: direct connection to MEMS microphones (INMP441, SPH0645) and DAC codecs, no external ADC needed for audio - **Dual core**: dedicate one core to audio processing, the other to communication - **Cost**: ESP32-DevKitC boards cost under €5, complete audio dev boards (ESP32-LyraT) under €20 - **ESP-ADF**: Espressif's Audio Development Framework includes AEC modules ### Implementation approach ESP-IDF (C/C++) for real-time processing. The ESP32 lacks CMSIS-DSP, so implement NLMS directly: ```c #include <string.h> #include <math.h> typedef struct { float *w; // filter coefficients float *x_buf; // input buffer int n_taps; float mu; float eps; } nlms_t; void nlms_init(nlms_t *f, int n_taps, float mu) { f->n_taps = n_taps; f->mu = mu; f->eps = 1e-8f; f->w = calloc(n_taps, sizeof(float)); f->x_buf = calloc(n_taps, sizeof(float)); // Production code: check for NULL and handle allocation failure. // For a fixed filter length, prefer static arrays over calloc. } float nlms_update(nlms_t *f, float x, float d) { // Shift input buffer memmove(f->x_buf + 1, f->x_buf, (f->n_taps - 1) * sizeof(float)); f->x_buf[0] = x; // Compute output float y = 0.0f; float norm = f->eps; for (int i = 0; i < f->n_taps; i++) { y += f->w[i] * f->x_buf[i]; norm += f->x_buf[i] * f->x_buf[i]; } // Update coefficients float e = d - y; float step = f->mu * e / norm; for (int i = 0; i < f->n_taps; i++) { f->w[i] += step * f->x_buf[i]; } return e; } ``` ### ESP32 performance budget At 240 MHz, single core: | Operation | Cycles | Time | |---|---|---| | NLMS update (32 taps) | ~400 | 1.7 µs | | Available per sample (16 kHz) | 15 000 | 62.5 µs | | **Utilisation** | | **~3%** | The ESP32 is more than capable. The challenge is jitter: Wi-Fi stack interrupts can steal hundreds of microseconds. Mitigate this by running audio processing on core 1 with Wi-Fi pinned to core 0. ### ESP32 audio I/O with I2S ::: {.callout-warning title="Legacy API"} The code below uses the ESP-IDF v4.x I2S API (`driver/i2s.h`), which was removed in ESP-IDF v5.2. For the v5.x API (`driver/i2s_std.h`), see the [pitch detection](../pitch-detection/embedded.qmd) or [beamforming](../beamforming/embedded.qmd) embedded pages. ::: ```c #include "driver/i2s.h" // I2S config for INMP441 MEMS microphone i2s_config_t i2s_config = { .mode = I2S_MODE_MASTER | I2S_MODE_RX, .sample_rate = 16000, .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT, .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT, .communication_format = I2S_COMM_FORMAT_STAND_I2S, .dma_buf_count = 4, .dma_buf_len = 64, .use_apll = true, }; ``` For stereo input (reference + primary microphone), use `I2S_CHANNEL_FMT_RIGHT_LEFT` and wire two INMP441 modules with different L/R select pins. <hr> ## Platform comparison | Feature | STM32F4 | ESP32 | |---|---|---| | Clock | 180 MHz (NUCLEO-F446RE) | 240 MHz | | FPU | Yes (Cortex-M4F) | Yes (Xtensa LX7) | | CMSIS-DSP | Yes | No | | I2S | Via codec | Built-in | | Wi-Fi/BT | No (needs module) | Built-in | | Real-time determinism | Excellent | Good (with core pinning) | | Unit cost | ~€10 | ~€5 | | Best for | Deterministic DSP, long filters | Audio prototypes, IoT | ### Recommendation - **STM32F4** for serious DSP work: deterministic timing, CMSIS-DSP library, established in industrial audio. Use this when filter length or real-time guarantees matter. - **ESP32** for prototyping and demos: cheap, built-in audio I/O, easy to add wireless monitoring. Use this when you want a quick noise cancellation demo with minimal external hardware. <hr> ## Bill of materials for a noise cancellation demo A minimal adaptive noise cancellation demo on ESP32: | Component | Purpose | Approx. cost | |---|---|---| | ESP32-DevKitC | Processing | €5 | | 2× INMP441 breakout | Reference + primary microphone | €4 | | MAX98357A breakout | I2S DAC + amplifier | €3 | | Small speaker | Output | €2 | | Breadboard + wires | | €3 | | **Total** | | **~€17** | The same demo on STM32F4 requires an external I2S codec and more wiring, but gives more headroom for longer filters.