Beamforming on Hardware

Multi-microphone direction-of-arrival estimation on ESP32-S3 and STM32F4

A microphone array on a microcontroller turns the delay-and-sum beamformer from a simulation into a real-time direction finder. The challenge is multi-channel synchronous acquisition: all microphones must be sampled at the same instant, with known inter-channel timing. I2S handles this naturally for 2 channels (stereo); more channels require either multiple I2S peripherals or TDM mode.

For the theory (array geometry, beam patterns, TDOA estimation, frequency-domain beamforming), see the main beamforming page.

Array hardware

Microphone array options

Configuration	Mics	Channels needed	I2S requirement	Use case
2-mic stereo	2x INMP441	2 (L+R on one I2S)	1 I2S peripheral	Left/right DOA, noise reduction
4-mic linear	4x INMP441	4 (2 stereo I2S)	2 I2S peripherals	Azimuth DOA, moderate resolution
4-mic square	4x INMP441	4	2 I2S peripherals	2D DOA (azimuth + elevation)

The INMP441 has a left/right channel select pin (L/R): tie it low for left channel, high for right. Two INMP441 modules on one I2S bus give synchronous stereo, the simplest array.

2-mic array on ESP32-S3

    d = 50 mm
  |<-------->|
[MIC0 (L)]  [MIC1 (R)]
  |             |
  |-- I2S_0 ---|
       |
   ESP32-S3

With $d = 50$ mm spacing and sound speed $c = 343$ m/s:

Maximum inter-mic delay: $\tau_\text{max} = d/c = 146$ us
At $f_s = 16$ kHz: $\tau_\text{max} \approx 2.3$ samples
Angular resolution limited by array size, adequate for left/right/centre classification

4-mic linear array on ESP32-S3

The ESP32-S3 has 2 I2S peripherals. Each runs in stereo, giving 4 synchronous channels:

    d = 50 mm
  |<-------->|
[M0(L)] [M1(R)] [M2(L)] [M3(R)]
  |         |      |         |
  |-- I2S0 -|      |-- I2S1 -|
       |                |
       ESP32-S3

Inter-peripheral synchronisation

The two I2S peripherals share the same APLL clock source, so their sample clocks are phase-locked. However, the DMA transfers may not start at exactly the same instant. For precise TDOA estimation, calibrate the inter-peripheral offset by correlating a known reference signal at startup, or tie both I2S word-select (WS) lines to the same GPIO and start them simultaneously.

ESP32-S3: delay-and-sum beamformer

I2S stereo microphone setup (2-mic)

#include "driver/i2s_std.h"

#define FS          16000
#define FRAME_SIZE  256

static i2s_chan_handle_t rx_chan;

void i2s_mic_array_init(void) {
    i2s_chan_config_t chan_cfg = I2S_CHANNEL_DEFAULT_CONFIG(
        I2S_NUM_0, I2S_ROLE_MASTER);
    i2s_new_channel(&chan_cfg, NULL, &rx_chan);

    i2s_std_config_t std_cfg = {
        .clk_cfg = I2S_STD_CLK_DEFAULT_CONFIG(FS),
        .slot_cfg = I2S_STD_PHILIPS_SLOT_DEFAULT_CONFIG(
            I2S_DATA_BIT_WIDTH_32BIT, I2S_SLOT_MODE_STEREO),
        .gpio_cfg = {
            .bclk = GPIO_NUM_26,
            .ws   = GPIO_NUM_25,
            .din  = GPIO_NUM_22,
            .dout = I2S_GPIO_UNUSED,
        },
    };
    i2s_channel_init_std_mode(rx_chan, &std_cfg);
    i2s_channel_enable(rx_chan);
}

Deinterleave stereo to two channels

I2S stereo data arrives interleaved: L, R, L, R, … Deinterleave into separate buffers:

void deinterleave(int32_t *interleaved, float *ch0, float *ch1, int n_frames) {
    for (int i = 0; i < n_frames; i++) {
        ch0[i] = (float)(interleaved[2*i]     >> 8) / 8388608.0f;
        ch1[i] = (float)(interleaved[2*i + 1] >> 8) / 8388608.0f;
    }
}

TDOA estimation via cross-correlation

The time-difference of arrival between two microphones is the lag of the cross-correlation peak. For a 2-mic array, one TDOA value gives the angle of arrival:

#include <math.h>

#define MAX_LAG  4   // max lag in samples (~250 us at 16 kHz)

// Cross-correlation for small lag range (brute-force, fast for small MAX_LAG)
float estimate_tdoa(float *ch0, float *ch1, int n) {
    float best_corr = -1e30f;
    int best_lag = 0;

    for (int lag = -MAX_LAG; lag <= MAX_LAG; lag++) {
        float sum = 0;
        int start = (lag > 0) ? lag : 0;
        int end   = (lag > 0) ? n : n + lag;
        for (int i = start; i < end; i++) {
            sum += ch0[i] * ch1[i - lag];
        }
        if (sum > best_corr) {
            best_corr = sum;
            best_lag = lag;
        }
    }

    return (float)best_lag / FS;  // TDOA in seconds
}

// Convert TDOA to angle of arrival
float tdoa_to_angle(float tdoa, float mic_spacing) {
    float sin_theta = tdoa * 343.0f / mic_spacing;
    // Clamp to valid range (numerical errors can push slightly outside [-1,1])
    if (sin_theta > 1.0f) sin_theta = 1.0f;
    if (sin_theta < -1.0f) sin_theta = -1.0f;
    return asinf(sin_theta) * 180.0f / M_PI;  // degrees from broadside
}

Delay-and-sum with fractional delay

Integer-sample delay is simple (index offset), but for angles that produce fractional-sample TDOA, linear interpolation improves accuracy:

// Apply fractional delay using linear interpolation.
// delay_samples can be positive or negative.
void apply_fractional_delay(float *in, float *out, int n, float delay_samples) {
    int d_int = (int)floorf(delay_samples);
    float frac = delay_samples - d_int;

    for (int i = 0; i < n; i++) {
        int idx = i - d_int;
        if (idx >= 1 && idx < n) {
            out[i] = (1.0f - frac) * in[idx] + frac * in[idx - 1];
        } else if (idx >= 0 && idx < n) {
            out[i] = (1.0f - frac) * in[idx];  // no past sample available
        } else {
            out[i] = 0.0f;
        }
    }
}

// Delay-and-sum beamformer for 2 channels
void beam_steer(float *ch0, float *ch1, float *output, int n,
                float angle_deg, float mic_spacing) {
    if (n > FRAME_SIZE) n = FRAME_SIZE;  // bounds guard

    float theta = angle_deg * M_PI / 180.0f;
    float delay_s = mic_spacing * sinf(theta) / 343.0f;
    float delay_samples = delay_s * FS;

    static float ch1_delayed[FRAME_SIZE];  // static: avoid stack overflow in FreeRTOS task
    apply_fractional_delay(ch1, ch1_delayed, n, delay_samples);

    for (int i = 0; i < n; i++)
        output[i] = 0.5f * (ch0[i] + ch1_delayed[i]);
}

Beam scanning for DOA estimation

Scan across angles and find the steering direction that maximises output power:

float estimate_doa(float *ch0, float *ch1, int n, float mic_spacing) {
    float best_power = 0;
    float best_angle = 0;
    static float output[FRAME_SIZE];  // static: avoid stack overflow

    // Scan from -90 to +90 degrees in 5-degree steps
    for (int angle = -90; angle <= 90; angle += 5) {
        beam_steer(ch0, ch1, output, n, (float)angle, mic_spacing);

        // Compute output power
        float power = 0;
        for (int i = 0; i < n; i++)
            power += output[i] * output[i];

        if (power > best_power) {
            best_power = power;
            best_angle = (float)angle;
        }
    }
    return best_angle;
}

Main task

void beamforming_task(void *param) {
    int32_t raw[FRAME_SIZE * 2];  // stereo interleaved
    float ch0[FRAME_SIZE], ch1[FRAME_SIZE];
    size_t bytes_read;

    float mic_spacing = 0.05f;  // 50 mm

    while (true) {
        i2s_channel_read(rx_chan, raw, sizeof(raw),
                         &bytes_read, portMAX_DELAY);
        int n = bytes_read / (2 * sizeof(int32_t));
        if (n > FRAME_SIZE) n = FRAME_SIZE;  // clamp to buffer size

        deinterleave(raw, ch0, ch1, n);

        // Choose one method; both shown for comparison
        // Method 1: TDOA via cross-correlation (fast, integer-sample resolution)
        float tdoa = estimate_tdoa(ch0, ch1, n);
        float angle_xcorr = tdoa_to_angle(tdoa, mic_spacing);

        // Method 2: Beam scan (more robust, fractional-sample, but slower)
        float angle_scan = estimate_doa(ch0, ch1, n, mic_spacing);

        // Output via UART, OLED, or BLE
    }
}

void app_main(void) {
    i2s_mic_array_init();
    xTaskCreatePinnedToCore(beamforming_task, "beam", 8192,
                            NULL, 5, NULL, 1);
}

STM32F4 (NUCLEO-F446RE): CMSIS-DSP cross-correlation

The STM32F4 approach uses arm_correlate_f32 for the TDOA estimation and multi-channel ADC with DMA for sensor acquisition.

Multi-channel ADC setup

For non-audio sensor arrays (vibration sensors, ultrasonic transducers), use the on-chip ADC in scan mode with DMA:

// ADC1 scanning 4 channels simultaneously via DMA
// Each ADC conversion takes ~1 us at 12-bit resolution
// Scan rate: configure timer trigger for desired sample rate

#define N_CHANNELS  4
#define FRAME_SIZE  256

static uint16_t adc_dma_buf[2][FRAME_SIZE * N_CHANNELS];  // circular DMA double buffer
static float channels[N_CHANNELS][FRAME_SIZE];

// Deinterleave one scan-ordered half-buffer (ch0,ch1,ch2,ch3, ch0,ch1,...) to channels[].
static void deinterleave(const uint16_t *buf) {
    for (int i = 0; i < FRAME_SIZE; i++) {
        for (int ch = 0; ch < N_CHANNELS; ch++) {
            channels[ch][i] = (float)buf[i * N_CHANNELS + ch] / 4096.0f;
        }
    }
}

// With a circular DMA the half-complete callback fires when adc_dma_buf[0] is full
// and the complete callback when adc_dma_buf[1] is. Handle BOTH, or half the frames
// are silently dropped.
void HAL_ADC_ConvHalfCpltCallback(ADC_HandleTypeDef *hadc) { deinterleave(adc_dma_buf[0]); }
void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef *hadc)     { deinterleave(adc_dma_buf[1]); }

TDOA via CMSIS-DSP

#include "arm_math.h"

// Match the ESP32 section and the ADC timer configuration above.
#define FRAME_SIZE  256
#define FS          16000
#define MAX_LAG     4
#define CORR_LEN    (2 * FRAME_SIZE - 1)

static float32_t corr_output[CORR_LEN];

float estimate_tdoa_cmsis(float32_t *ch_ref, float32_t *ch_test, int n) {
    arm_correlate_f32(ch_ref, n, ch_test, n, corr_output);

    // Peak is at index (n - 1) for zero lag
    // Search around zero lag within MAX_LAG
    int centre = n - 1;
    float32_t max_val;
    uint32_t max_idx;
    arm_max_f32(&corr_output[centre - MAX_LAG],
                2 * MAX_LAG + 1, &max_val, &max_idx);

    int lag = (int)max_idx - MAX_LAG;
    return (float)lag / FS;
}

Tip

For a 2-mic system where you only need lags $\pm 4$, the brute-force loop (9 dot products of 256 samples each) is faster than arm_correlate_f32 (which computes all 511 lags). Use arm_dot_prod_f32 in a loop, similar to the pitch detection restricted-lag approach.

Performance budget (NUCLEO-F446RE, 180 MHz)

Stage	Operation	Est. cycles	Time
ADC DMA deinterleave (4 ch)	1024 conversions + copy	~3K	17 us
TDOA: 9 dot products (256 samples)	2304 MACs	~3K	17 us
Angle computation	`asinf`	~100	0.6 us
Total per 16 ms frame		~6K	~35 us
Available per frame		2,880K	16 ms
Utilisation			~0.2%

The TDOA row budgets the restricted lag-9 dot-product loop recommended above. The estimate_tdoa_cmsis function shown calls the full arm_correlate_f32, which computes all 511 lags at roughly 65K MACs (about 30x more); a real build should use the lag-restricted loop, which is what this budget assumes.

For beam scanning (37 angles at 5-degree steps), multiply the TDOA cost (~3K cycles) by 37 ≈ 111K cycles per frame, about 4% of the 2,880K-cycle budget, comfortably under 5% CPU.

Platform comparison

Feature	STM32F4 (NUCLEO-F446RE)	ESP32-S3
Multi-channel input	ADC scan mode (4+ channels, DMA)	2x I2S stereo (4 channels max)
Correlation library	`arm_correlate_f32`, `arm_dot_prod_f32`	Manual or ESP-DSP dot product
Max channels (practical)	8+ (ADC scan)	4 (2 stereo I2S)
Sample rate	Up to 2.4 Msps (ultrasonic arrays)	16 to 48 kHz (audio arrays)
Wireless output	External module	Built-in WiFi/BLE
Best for	Ultrasonic/vibration arrays, high channel count	Audio mic arrays, smart speaker prototypes

Recommendation

ESP32-S3 for audio microphone arrays: voice direction detection, smart speaker prototyping, noise reduction. The I2S interface handles MEMS microphones directly, and BLE/WiFi enables wireless DOA output.
STM32F4 for non-audio sensor arrays: ultrasonic transducer arrays (vibration, ranging), seismic sensors, or any application needing more than 4 channels or MHz-rate sampling.

Bill of materials

ESP32-S3 stereo mic array (2-mic DOA)

Component	Purpose	Approx. cost
ESP32-S3-DevKitC	Processing + BLE/WiFi	EUR 8
2x INMP441 breakout	I2S MEMS stereo mic pair	EUR 4
SSD1306 OLED (128x64)	DOA angle display	EUR 3
3D-printed mic mount	Fixed 50 mm spacing	EUR 1
Breadboard + wires		EUR 3
Total		~EUR 19

ESP32-S3 quad mic array (4-mic DOA)

Component	Purpose	Approx. cost
ESP32-S3-DevKitC	Processing + BLE/WiFi	EUR 8
4x INMP441 breakout	2x stereo I2S pairs	EUR 8
SSD1306 OLED (128x64)	DOA display	EUR 3
3D-printed linear mount	Fixed 50 mm spacing	EUR 2
Breadboard + wires		EUR 3
Total		~EUR 24

--- title: "Beamforming on Hardware" subtitle: "Multi-microphone direction-of-arrival estimation on ESP32-S3 and STM32F4" --- A microphone array on a microcontroller turns the delay-and-sum beamformer from a simulation into a real-time direction finder. The challenge is multi-channel synchronous acquisition: all microphones must be sampled at the same instant, with known inter-channel timing. I2S handles this naturally for 2 channels (stereo); more channels require either multiple I2S peripherals or TDM mode. For the theory (array geometry, beam patterns, TDOA estimation, frequency-domain beamforming), see the [main beamforming page](index.qmd). <hr> ## Array hardware ### Microphone array options | Configuration | Mics | Channels needed | I2S requirement | Use case | |---|---|---|---|---| | **2-mic stereo** | 2x INMP441 | 2 (L+R on one I2S) | 1 I2S peripheral | Left/right DOA, noise reduction | | **4-mic linear** | 4x INMP441 | 4 (2 stereo I2S) | 2 I2S peripherals | Azimuth DOA, moderate resolution | | **4-mic square** | 4x INMP441 | 4 | 2 I2S peripherals | 2D DOA (azimuth + elevation) | The INMP441 has a left/right channel select pin (L/R): tie it low for left channel, high for right. Two INMP441 modules on one I2S bus give synchronous stereo, the simplest array. ### 2-mic array on ESP32-S3 ``` d = 50 mm |<-------->| [MIC0 (L)] [MIC1 (R)] | | |-- I2S_0 ---| | ESP32-S3 ``` With $d = 50$ mm spacing and sound speed $c = 343$ m/s: - Maximum inter-mic delay: $\tau_\text{max} = d/c = 146$ us - At $f_s = 16$ kHz: $\tau_\text{max} \approx 2.3$ samples - Angular resolution limited by array size, adequate for left/right/centre classification ### 4-mic linear array on ESP32-S3 The ESP32-S3 has 2 I2S peripherals. Each runs in stereo, giving 4 synchronous channels: ``` d = 50 mm |<-------->| [M0(L)] [M1(R)] [M2(L)] [M3(R)] | | | | |-- I2S0 -| |-- I2S1 -| | | ESP32-S3 ``` ::: {.callout-warning title="Inter-peripheral synchronisation"} The two I2S peripherals share the same APLL clock source, so their sample clocks are phase-locked. However, the DMA transfers may not start at exactly the same instant. For precise TDOA estimation, calibrate the inter-peripheral offset by correlating a known reference signal at startup, or tie both I2S word-select (WS) lines to the same GPIO and start them simultaneously. ::: <hr> ## ESP32-S3: delay-and-sum beamformer ### I2S stereo microphone setup (2-mic) ```c #include "driver/i2s_std.h" #define FS 16000 #define FRAME_SIZE 256 static i2s_chan_handle_t rx_chan; void i2s_mic_array_init(void) { i2s_chan_config_t chan_cfg = I2S_CHANNEL_DEFAULT_CONFIG( I2S_NUM_0, I2S_ROLE_MASTER); i2s_new_channel(&chan_cfg, NULL, &rx_chan); i2s_std_config_t std_cfg = { .clk_cfg = I2S_STD_CLK_DEFAULT_CONFIG(FS), .slot_cfg = I2S_STD_PHILIPS_SLOT_DEFAULT_CONFIG( I2S_DATA_BIT_WIDTH_32BIT, I2S_SLOT_MODE_STEREO), .gpio_cfg = { .bclk = GPIO_NUM_26, .ws = GPIO_NUM_25, .din = GPIO_NUM_22, .dout = I2S_GPIO_UNUSED, }, }; i2s_channel_init_std_mode(rx_chan, &std_cfg); i2s_channel_enable(rx_chan); } ``` ### Deinterleave stereo to two channels I2S stereo data arrives interleaved: L, R, L, R, ... Deinterleave into separate buffers: ```c void deinterleave(int32_t *interleaved, float *ch0, float *ch1, int n_frames) { for (int i = 0; i < n_frames; i++) { ch0[i] = (float)(interleaved[2*i] >> 8) / 8388608.0f; ch1[i] = (float)(interleaved[2*i + 1] >> 8) / 8388608.0f; } } ``` ### TDOA estimation via cross-correlation The time-difference of arrival between two microphones is the lag of the cross-correlation peak. For a 2-mic array, one TDOA value gives the angle of arrival: ```c #include <math.h> #define MAX_LAG 4 // max lag in samples (~250 us at 16 kHz) // Cross-correlation for small lag range (brute-force, fast for small MAX_LAG) float estimate_tdoa(float *ch0, float *ch1, int n) { float best_corr = -1e30f; int best_lag = 0; for (int lag = -MAX_LAG; lag <= MAX_LAG; lag++) { float sum = 0; int start = (lag > 0) ? lag : 0; int end = (lag > 0) ? n : n + lag; for (int i = start; i < end; i++) { sum += ch0[i] * ch1[i - lag]; } if (sum > best_corr) { best_corr = sum; best_lag = lag; } } return (float)best_lag / FS; // TDOA in seconds } // Convert TDOA to angle of arrival float tdoa_to_angle(float tdoa, float mic_spacing) { float sin_theta = tdoa * 343.0f / mic_spacing; // Clamp to valid range (numerical errors can push slightly outside [-1,1]) if (sin_theta > 1.0f) sin_theta = 1.0f; if (sin_theta < -1.0f) sin_theta = -1.0f; return asinf(sin_theta) * 180.0f / M_PI; // degrees from broadside } ``` ### Delay-and-sum with fractional delay Integer-sample delay is simple (index offset), but for angles that produce fractional-sample TDOA, linear interpolation improves accuracy: ```c // Apply fractional delay using linear interpolation. // delay_samples can be positive or negative. void apply_fractional_delay(float *in, float *out, int n, float delay_samples) { int d_int = (int)floorf(delay_samples); float frac = delay_samples - d_int; for (int i = 0; i < n; i++) { int idx = i - d_int; if (idx >= 1 && idx < n) { out[i] = (1.0f - frac) * in[idx] + frac * in[idx - 1]; } else if (idx >= 0 && idx < n) { out[i] = (1.0f - frac) * in[idx]; // no past sample available } else { out[i] = 0.0f; } } } // Delay-and-sum beamformer for 2 channels void beam_steer(float *ch0, float *ch1, float *output, int n, float angle_deg, float mic_spacing) { if (n > FRAME_SIZE) n = FRAME_SIZE; // bounds guard float theta = angle_deg * M_PI / 180.0f; float delay_s = mic_spacing * sinf(theta) / 343.0f; float delay_samples = delay_s * FS; static float ch1_delayed[FRAME_SIZE]; // static: avoid stack overflow in FreeRTOS task apply_fractional_delay(ch1, ch1_delayed, n, delay_samples); for (int i = 0; i < n; i++) output[i] = 0.5f * (ch0[i] + ch1_delayed[i]); } ``` ### Beam scanning for DOA estimation Scan across angles and find the steering direction that maximises output power: ```c float estimate_doa(float *ch0, float *ch1, int n, float mic_spacing) { float best_power = 0; float best_angle = 0; static float output[FRAME_SIZE]; // static: avoid stack overflow // Scan from -90 to +90 degrees in 5-degree steps for (int angle = -90; angle <= 90; angle += 5) { beam_steer(ch0, ch1, output, n, (float)angle, mic_spacing); // Compute output power float power = 0; for (int i = 0; i < n; i++) power += output[i] * output[i]; if (power > best_power) { best_power = power; best_angle = (float)angle; } } return best_angle; } ``` ### Main task ```c void beamforming_task(void *param) { int32_t raw[FRAME_SIZE * 2]; // stereo interleaved float ch0[FRAME_SIZE], ch1[FRAME_SIZE]; size_t bytes_read; float mic_spacing = 0.05f; // 50 mm while (true) { i2s_channel_read(rx_chan, raw, sizeof(raw), &bytes_read, portMAX_DELAY); int n = bytes_read / (2 * sizeof(int32_t)); if (n > FRAME_SIZE) n = FRAME_SIZE; // clamp to buffer size deinterleave(raw, ch0, ch1, n); // Choose one method; both shown for comparison // Method 1: TDOA via cross-correlation (fast, integer-sample resolution) float tdoa = estimate_tdoa(ch0, ch1, n); float angle_xcorr = tdoa_to_angle(tdoa, mic_spacing); // Method 2: Beam scan (more robust, fractional-sample, but slower) float angle_scan = estimate_doa(ch0, ch1, n, mic_spacing); // Output via UART, OLED, or BLE } } void app_main(void) { i2s_mic_array_init(); xTaskCreatePinnedToCore(beamforming_task, "beam", 8192, NULL, 5, NULL, 1); } ``` <hr> ## STM32F4 (NUCLEO-F446RE): CMSIS-DSP cross-correlation The STM32F4 approach uses `arm_correlate_f32` for the TDOA estimation and multi-channel ADC with DMA for sensor acquisition. ### Multi-channel ADC setup For non-audio sensor arrays (vibration sensors, ultrasonic transducers), use the on-chip ADC in scan mode with DMA: ```c // ADC1 scanning 4 channels simultaneously via DMA // Each ADC conversion takes ~1 us at 12-bit resolution // Scan rate: configure timer trigger for desired sample rate #define N_CHANNELS 4 #define FRAME_SIZE 256 static uint16_t adc_dma_buf[2][FRAME_SIZE * N_CHANNELS]; // circular DMA double buffer static float channels[N_CHANNELS][FRAME_SIZE]; // Deinterleave one scan-ordered half-buffer (ch0,ch1,ch2,ch3, ch0,ch1,...) to channels[]. static void deinterleave(const uint16_t *buf) { for (int i = 0; i < FRAME_SIZE; i++) { for (int ch = 0; ch < N_CHANNELS; ch++) { channels[ch][i] = (float)buf[i * N_CHANNELS + ch] / 4096.0f; } } } // With a circular DMA the half-complete callback fires when adc_dma_buf[0] is full // and the complete callback when adc_dma_buf[1] is. Handle BOTH, or half the frames // are silently dropped. void HAL_ADC_ConvHalfCpltCallback(ADC_HandleTypeDef *hadc) { deinterleave(adc_dma_buf[0]); } void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef *hadc) { deinterleave(adc_dma_buf[1]); } ``` ### TDOA via CMSIS-DSP ```c #include "arm_math.h" // Match the ESP32 section and the ADC timer configuration above. #define FRAME_SIZE 256 #define FS 16000 #define MAX_LAG 4 #define CORR_LEN (2 * FRAME_SIZE - 1) static float32_t corr_output[CORR_LEN]; float estimate_tdoa_cmsis(float32_t *ch_ref, float32_t *ch_test, int n) { arm_correlate_f32(ch_ref, n, ch_test, n, corr_output); // Peak is at index (n - 1) for zero lag // Search around zero lag within MAX_LAG int centre = n - 1; float32_t max_val; uint32_t max_idx; arm_max_f32(&corr_output[centre - MAX_LAG], 2 * MAX_LAG + 1, &max_val, &max_idx); int lag = (int)max_idx - MAX_LAG; return (float)lag / FS; } ``` ::: {.callout-tip} For a 2-mic system where you only need lags $\pm 4$, the brute-force loop (9 dot products of 256 samples each) is faster than `arm_correlate_f32` (which computes all 511 lags). Use `arm_dot_prod_f32` in a loop, similar to the [pitch detection](../pitch-detection/embedded.qmd) restricted-lag approach. ::: ### Performance budget (NUCLEO-F446RE, 180 MHz) | Stage | Operation | Est. cycles | Time | |---|---|---|---| | ADC DMA deinterleave (4 ch) | 1024 conversions + copy | ~3K | 17 us | | TDOA: 9 dot products (256 samples) | 2304 MACs | ~3K | 17 us | | Angle computation | `asinf` | ~100 | 0.6 us | | **Total per 16 ms frame** | | **~6K** | **~35 us** | | Available per frame | | 2,880K | 16 ms | | **Utilisation** | | | **~0.2%** | The TDOA row budgets the restricted lag-9 dot-product loop recommended above. The `estimate_tdoa_cmsis` function shown calls the full `arm_correlate_f32`, which computes all 511 lags at roughly 65K MACs (about 30x more); a real build should use the lag-restricted loop, which is what this budget assumes. For beam scanning (37 angles at 5-degree steps), multiply the TDOA cost (~3K cycles) by 37 ≈ 111K cycles per frame, about 4% of the 2,880K-cycle budget, comfortably under 5% CPU. <hr> ## Platform comparison | Feature | STM32F4 (NUCLEO-F446RE) | ESP32-S3 | |---|---|---| | Multi-channel input | ADC scan mode (4+ channels, DMA) | 2x I2S stereo (4 channels max) | | Correlation library | `arm_correlate_f32`, `arm_dot_prod_f32` | Manual or ESP-DSP dot product | | Max channels (practical) | 8+ (ADC scan) | 4 (2 stereo I2S) | | Sample rate | Up to 2.4 Msps (ultrasonic arrays) | 16 to 48 kHz (audio arrays) | | Wireless output | External module | Built-in WiFi/BLE | | Best for | Ultrasonic/vibration arrays, high channel count | Audio mic arrays, smart speaker prototypes | ### Recommendation - **ESP32-S3** for audio microphone arrays: voice direction detection, smart speaker prototyping, noise reduction. The I2S interface handles MEMS microphones directly, and BLE/WiFi enables wireless DOA output. - **STM32F4** for non-audio sensor arrays: ultrasonic transducer arrays (vibration, ranging), seismic sensors, or any application needing more than 4 channels or MHz-rate sampling. <hr> ## Bill of materials ### ESP32-S3 stereo mic array (2-mic DOA) | Component | Purpose | Approx. cost | |---|---|---| | ESP32-S3-DevKitC | Processing + BLE/WiFi | EUR 8 | | 2x INMP441 breakout | I2S MEMS stereo mic pair | EUR 4 | | SSD1306 OLED (128x64) | DOA angle display | EUR 3 | | 3D-printed mic mount | Fixed 50 mm spacing | EUR 1 | | Breadboard + wires | | EUR 3 | | **Total** | | **~EUR 19** | ### ESP32-S3 quad mic array (4-mic DOA) | Component | Purpose | Approx. cost | |---|---|---| | ESP32-S3-DevKitC | Processing + BLE/WiFi | EUR 8 | | 4x INMP441 breakout | 2x stereo I2S pairs | EUR 8 | | SSD1306 OLED (128x64) | DOA display | EUR 3 | | 3D-printed linear mount | Fixed 50 mm spacing | EUR 2 | | Breadboard + wires | | EUR 3 | | **Total** | | **~EUR 24** |