Adaptive Filtering on Hardware

LMS/NLMS on STM32F4 and ESP32

Adaptive filters are computationally simple (an \(N\)-tap NLMS filter requires only \(3N\) multiply-accumulates per sample), making them well-suited for microcontroller implementation. This page covers practical considerations for STM32F4 and ESP32-S3 platforms. For the theory, convergence analysis, and Python prototypes, see the main adaptive filtering page.


STM32F4: ARM CMSIS-DSP

The STM32F4 series (Cortex-M4F, up to 180 MHz on the NUCLEO-F446RE, single-precision FPU) is a natural target. ARM’s CMSIS-DSP library includes optimised LMS and NLMS implementations:

  • arm_lms_f32: floating-point LMS
  • arm_lms_norm_f32: floating-point NLMS
  • Q15 and Q31 fixed-point variants for applications without FPU

Using CMSIS-DSP LMS

#include "arm_math.h"

#define NUM_TAPS    32
#define BLOCK_SIZE  1

static float32_t firState[NUM_TAPS + BLOCK_SIZE - 1];
static float32_t coeffs[NUM_TAPS];
static arm_lms_instance_f32 lms;

void init_adaptive_filter(void) {
    arm_lms_init_f32(&lms, NUM_TAPS, coeffs, firState, 0.01f, BLOCK_SIZE);
}

void process_sample(float32_t *input, float32_t *desired,
                    float32_t *output, float32_t *error) {
    arm_lms_f32(&lms, input, desired, output, error, BLOCK_SIZE);
}

The CMSIS functions process data in blocks. For sample-by-sample processing, set BLOCK_SIZE = 1. The library uses SIMD instructions (DSP extension) on the Cortex-M4 to process multiple coefficients in parallel.

Performance budget

At 180 MHz (NUCLEO-F446RE), a 32-tap NLMS filter at 8 kHz sample rate uses roughly:

Operation Cycles Time
NLMS update (32 taps) ~200 1.2 µs
Available per sample (8 kHz) 22 500 125 µs
Utilisation ~1%

This leaves ample headroom for ADC/DAC handling, pre/post processing, and communication. Even a 256-tap filter for AEC would use under 10% of the CPU budget.

Hardware setup for system identification

A minimal test setup for system identification on STM32F4:

  1. DAC output → external analog filter (the “unknown system”) → ADC input
  2. Generate white noise on the DAC
  3. Read the filtered signal on the ADC
  4. Run NLMS to identify the analog filter’s impulse response
  5. Send coefficients over UART for analysis

The STM32F4-Discovery board has a 12-bit DAC and 12-bit ADC, sufficient for proof-of-concept. For audio-quality work, use an I2S codec (e.g., the CS43L22 on the Discovery board, or an external PCM5102).

Hardware setup for noise cancellation

For a noise cancellation demo:

  1. Reference microphone → ADC channel 1 (noise reference)
  2. Primary microphone → ADC channel 2 (signal + noise)
  3. Run NLMS: input = reference, desired = primary
  4. Output error signal (cleaned audio) on DAC or I2S

Both ADC channels must be sampled synchronously. Use DMA with double buffering to avoid sample drops.


ESP32: a low-cost alternative

The ESP32-S3 (dual-core Xtensa LX7, 240 MHz, single-precision FPU) offers a different trade-off: cheaper, Wi-Fi/Bluetooth built-in, good I2S support, but no CMSIS-DSP library and a less deterministic real-time environment (due to Wi-Fi interrupts and FreeRTOS).

Why ESP32 for adaptive filtering?

  • Built-in I2S: direct connection to MEMS microphones (INMP441, SPH0645) and DAC codecs, no external ADC needed for audio
  • Dual core: dedicate one core to audio processing, the other to communication
  • Cost: ESP32-DevKitC boards cost under €5, complete audio dev boards (ESP32-LyraT) under €20
  • ESP-ADF: Espressif’s Audio Development Framework includes AEC modules

Implementation approach

ESP-IDF (C/C++) for real-time processing. The ESP32 lacks CMSIS-DSP, so implement NLMS directly:

#include <string.h>
#include <math.h>

typedef struct {
    float *w;           // filter coefficients
    float *x_buf;       // input buffer
    int n_taps;
    float mu;
    float eps;
} nlms_t;

void nlms_init(nlms_t *f, int n_taps, float mu) {
    f->n_taps = n_taps;
    f->mu = mu;
    f->eps = 1e-8f;
    f->w = calloc(n_taps, sizeof(float));
    f->x_buf = calloc(n_taps, sizeof(float));
    // Production code: check for NULL and handle allocation failure.
    // For a fixed filter length, prefer static arrays over calloc.
}

float nlms_update(nlms_t *f, float x, float d) {
    // Shift input buffer
    memmove(f->x_buf + 1, f->x_buf, (f->n_taps - 1) * sizeof(float));
    f->x_buf[0] = x;

    // Compute output
    float y = 0.0f;
    float norm = f->eps;
    for (int i = 0; i < f->n_taps; i++) {
        y += f->w[i] * f->x_buf[i];
        norm += f->x_buf[i] * f->x_buf[i];
    }

    // Update coefficients
    float e = d - y;
    float step = f->mu * e / norm;
    for (int i = 0; i < f->n_taps; i++) {
        f->w[i] += step * f->x_buf[i];
    }
    return e;
}

ESP32 performance budget

At 240 MHz, single core:

Operation Cycles Time
NLMS update (32 taps) ~400 1.7 µs
Available per sample (16 kHz) 15 000 62.5 µs
Utilisation ~3%

The ESP32 is more than capable. The challenge is jitter: Wi-Fi stack interrupts can steal hundreds of microseconds. Mitigate this by running audio processing on core 1 with Wi-Fi pinned to core 0.

ESP32 audio I/O with I2S

Legacy API

The code below uses the ESP-IDF v4.x I2S API (driver/i2s.h), which was removed in ESP-IDF v5.2. For the v5.x API (driver/i2s_std.h), see the pitch detection or beamforming embedded pages.

#include "driver/i2s.h"

// I2S config for INMP441 MEMS microphone
i2s_config_t i2s_config = {
    .mode = I2S_MODE_MASTER | I2S_MODE_RX,
    .sample_rate = 16000,
    .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,
    .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
    .communication_format = I2S_COMM_FORMAT_STAND_I2S,
    .dma_buf_count = 4,
    .dma_buf_len = 64,
    .use_apll = true,
};

For stereo input (reference + primary microphone), use I2S_CHANNEL_FMT_RIGHT_LEFT and wire two INMP441 modules with different L/R select pins.


Platform comparison

Feature STM32F4 ESP32
Clock 180 MHz (NUCLEO-F446RE) 240 MHz
FPU Yes (Cortex-M4F) Yes (Xtensa LX7)
CMSIS-DSP Yes No
I2S Via codec Built-in
Wi-Fi/BT No (needs module) Built-in
Real-time determinism Excellent Good (with core pinning)
Unit cost ~€10 ~€5
Best for Deterministic DSP, long filters Audio prototypes, IoT

Recommendation

  • STM32F4 for serious DSP work: deterministic timing, CMSIS-DSP library, established in industrial audio. Use this when filter length or real-time guarantees matter.
  • ESP32 for prototyping and demos: cheap, built-in audio I/O, easy to add wireless monitoring. Use this when you want a quick noise cancellation demo with minimal external hardware.

Bill of materials for a noise cancellation demo

A minimal adaptive noise cancellation demo on ESP32:

Component Purpose Approx. cost
ESP32-DevKitC Processing €5
2× INMP441 breakout Reference + primary microphone €4
MAX98357A breakout I2S DAC + amplifier €3
Small speaker Output €2
Breadboard + wires €3
Total ~€17

The same demo on STM32F4 requires an external I2S codec and more wiring, but gives more headroom for longer filters.