Power Law Noise Whitening

Preprocessing coloured noise for robust signal processing

Nature’s whitening filter

The barn owl localises sounds with remarkable precision using two separate neural pathways, one for interaural time differences, one for level differences, processing spatial information independently before combining results (Konishi 1993). The whitening filters in this topic implement an analogous preprocessing strategy: remove correlated structure from the noise so downstream algorithms can work with cleaner, more independent samples.

This is part two of a two-part series. It assumes familiarity with the outlier detection methods described in part one. Here we examine why coloured noise causes those methods to break down, and how to whiten the signal before detection.

Prerequisites

This topic assumes familiarity with noise and SNR and the frequency domain.

Many signal processing methods (including the outlier detectors from part one) assume that the noise in the signal consists of uncorrelated samples: so-called white noise. White noise has a flat power spectral density (PSD), meaning all frequencies contribute equally. The term is borrowed from optics: white light contains all visible frequencies in roughly equal measure.

In practice, noise is often coloured: its samples are correlated, and its power spectrum is not flat. One of the most commonly encountered forms is power law noise, also written as $1/f^\alpha$ noise, where the power spectral density falls off as a power of frequency. Two canonical cases are:

Pink noise ($\alpha = 1$): found in an extraordinary range of physical and biological systems, from electronic components to heartbeat intervals to music.
Brownian noise ($\alpha = 2$): the noise produced by Brownian motion, equivalent to the integral of white noise, also known as a random walk.

When an outlier detector designed for white noise is applied to a coloured noise signal, the correlated samples inflate or deflate the estimated spread, leading to missed detections or false positives. Whitening the signal first (transforming it into something closer to white noise) is a practical solution.

Generating power law noise

Before we can whiten power law noise, it helps to be able to generate it. Two approaches are practical. See whitening.py for the implementations.

Frequency domain shaping

The simplest approach shapes the PSD of a white noise process to match the desired power law. Given a white noise sequence $n(t)$ with Fourier transform $N(f)$, a power law noise process $x(t)$ with exponent $\alpha$ is produced by:

\[X(f) = f^{-\alpha/2} \cdot N(f) \quad \text{for } f > 0\]

so that the PSD of $x(t)$ is proportional to $f^{-\alpha}$.

Autoregressive model (all-pole filter)

A more flexible approach models power law noise as the output of an all-pole (autoregressive, AR) filter driven by white noise. An AR($p$) process is defined by:

\[x[k] = n[k] - \sum_{i=1}^{p} \theta_i \cdot x[k-i]\]

where $n[k]$ is white noise. Kasdin (1995) showed that the AR coefficients for a $1/f^\alpha$ process are given by the recurrence, with $\theta_0 = 1$ as the seed value:

\[\theta_i = \frac{(i - 1 - \alpha/2) \cdot \theta_{i-1}}{i} \quad \text{for } i > 0\]

For Brownian noise ($\alpha = 2$) this reduces to $\theta_1 = -1$ and $\theta_i = 0$ for $i > 1$, a simple integrator, which makes sense: Brownian motion is the cumulative sum of white noise steps.

Autocorrelation of power law noise

Whitening is fundamentally about removing correlations between samples. The autocorrelation function (ACF) at lag $\tau$ is:

\[r_{xx}(\tau) = E[x(t + \tau) \cdot x(t)]\]

For power law noise, the ACF decays very slowly; theoretically, it is infinitely long. White noise has an ACF that is essentially zero for all lags $\tau > 0$. Pink noise ($\alpha = 1$) has long-range dependence: its ACF decays slowly, and the integral of the ACF diverges. Brownian noise decays even more slowly. This long-range dependence means that naive statistics estimated from a short window will be substantially biased.

Whitening by AR model inversion

Since power law noise can be generated by filtering white noise through an all-pole filter $H(z)$, the natural whitening filter is its inverse, an all-zero (FIR) filter:

\[H_{\text{whitening}}(z) = 1 + \sum_{k} \theta_k \cdot z^{-k}\]

For Brownian noise ($\alpha = 2$), the whitening filter is simply first-order differencing:

\[y[k] = x[k] - x[k-1]\]

This is intuitively satisfying: if the signal is a random walk, taking differences recovers the underlying white noise steps.

Identifying $\alpha$ from the lag-1 autocorrelation

One approach to estimating $\alpha$ is to fit a line to the log-log PSD. This works but requires a block of samples and a Fourier transform. A much cheaper approach, proposed by Riley and Greenhall (2004), uses only the lag-1 autocorrelation coefficient $r_1$.

Plotting $r_1$ as a function of $\alpha$ for simulated power law sequences reveals a sigmoidal relationship on the interval $0 \leq \alpha \leq 2$. A logistic function fits it well:

\[r_1 \approx \frac{1}{1 + e^{-4\alpha + 3}}\]

This logistic fit is an empirical approximation to the theoretical relationship. Inverting it gives an estimator for $\alpha$:

\[\alpha \approx \frac{3 - \ln(1/r_1 - 1)}{4}\]

This is remarkably simple: compute one autocorrelation coefficient, apply a closed-form expression, and you have an estimate of the noise exponent.

Open question: recursive estimation for large $\alpha$

The recursive lag-1 estimator works reasonably well for $\alpha \approx 0.5$ but produces biased estimates for larger values. The most likely explanation is that power law noise with high $\alpha$ exhibits very long-range dependence: the correlation between samples persists over hundreds or thousands of lags. The frugal estimator updates using a step size derived from the local MAD, effectively assuming local stationarity. For strongly correlated noise this assumption fails.

A larger forgetting factor might help, but choosing it appropriately would itself require knowledge of $\alpha$, a circular dependency. A sliding window estimator of $r_1$ is probably more practical for $\alpha > 1$.

Summary and practical guidance

Generate or acquire the signal. If the noise source is known to follow a power law, the AR model is a convenient way to simulate it.
Estimate $\alpha$ from the lag-1 autocorrelation, either using a recursive estimator (works well for $\alpha < 1$) or a sliding window estimator (more reliable for larger $\alpha$).
Construct a whitening filter using the Kasdin AR coefficients for the estimated $\alpha$.
Apply the whitening filter to the signal before running the outlier detector.

For Brownian noise specifically, the whitening step is a single first-order difference, which is cheap enough to apply unconditionally even without estimating $\alpha$ first.

Going further

Random processes: the AR(p) process class is the engine behind the Kasdin coefficients: understanding the Markov/AR process explains why the recurrence produces $1/f^\alpha$ spectra.
PSD estimation: the spectral slope $\alpha$ is estimated from the PSD; Welch’s method gives a more reliable slope estimate than the lag-1 logistic fit for larger $\alpha$.
Noise generation: the synthesis counterpart: the Kasdin AR filter generates coloured noise; whitening is the inverse filter that removes it.
The noise & stochastic processing arc: the guided overview of all seven noise pages.

References

Kasdin, N. Jeremy. 1995. “Discrete Simulation of Colored Noise and Stochastic Processes and 1/f^α Power Law Noise Generation.” Proceedings of the IEEE 83 (5): 802–27. https://doi.org/10.1109/5.381848.

Konishi, Masakazu. 1993. “Listening with Two Ears.” Scientific American 268 (4): 66–73.

Riley, William, and Charles Greenhall. 2004. “Power Law Noise Identification Using the Lag 1 Autocorrelation.” In Proceedings of the 18th European Frequency and Time Forum (EFTF 2004). https://doi.org/10.1049/cp:20040932.

--- title: "Power Law Noise Whitening" subtitle: "Preprocessing coloured noise for robust signal processing" --- ::: {.callout-tip title="Nature's whitening filter" appearance="simple"} The barn owl localises sounds with remarkable precision using two separate neural pathways, one for interaural time differences, one for level differences, processing spatial information independently before combining results [@konishi1993listening]. The whitening filters in this topic implement an analogous preprocessing strategy: remove correlated structure from the noise so downstream algorithms can work with cleaner, more independent samples. ::: *This is part two of a two-part series. It assumes familiarity with the outlier detection methods described in [part one](../outlier-detection/index.qmd). Here we examine why coloured noise causes those methods to break down, and how to whiten the signal before detection.* ::: {.callout-note title="Prerequisites"} This topic assumes familiarity with [noise and SNR](../../basics/03-noise-snr.qmd) and the [frequency domain](../../basics/05-frequency-domain.qmd). ::: <hr> Many signal processing methods (including the outlier detectors from part one) assume that the noise in the signal consists of uncorrelated samples: so-called **white noise**. White noise has a flat power spectral density (PSD), meaning all frequencies contribute equally. The term is borrowed from optics: white light contains all visible frequencies in roughly equal measure. In practice, noise is often *coloured*: its samples are correlated, and its power spectrum is not flat. One of the most commonly encountered forms is **power law noise**, also written as $1/f^\alpha$ noise, where the power spectral density falls off as a power of frequency. Two canonical cases are: - **Pink noise** ($\alpha = 1$): found in an extraordinary range of physical and biological systems, from electronic components to heartbeat intervals to music. - **Brownian noise** ($\alpha = 2$): the noise produced by Brownian motion, equivalent to the integral of white noise, also known as a random walk. When an outlier detector designed for white noise is applied to a coloured noise signal, the correlated samples inflate or deflate the estimated spread, leading to missed detections or false positives. Whitening the signal first (transforming it into something closer to white noise) is a practical solution. <hr> ## Generating power law noise Before we can whiten power law noise, it helps to be able to generate it. Two approaches are practical. See [`whitening.py`](whitening.py) for the implementations. ### Frequency domain shaping The simplest approach shapes the PSD of a white noise process to match the desired power law. Given a white noise sequence $n(t)$ with Fourier transform $N(f)$, a power law noise process $x(t)$ with exponent $\alpha$ is produced by: $$X(f) = f^{-\alpha/2} \cdot N(f) \quad \text{for } f > 0$$ so that the PSD of $x(t)$ is proportional to $f^{-\alpha}$. ### Autoregressive model (all-pole filter) A more flexible approach models power law noise as the output of an all-pole (autoregressive, AR) filter driven by white noise. An AR($p$) process is defined by: $$x[k] = n[k] - \sum_{i=1}^{p} \theta_i \cdot x[k-i]$$ where $n[k]$ is white noise. @kasdin1995discrete showed that the AR coefficients for a $1/f^\alpha$ process are given by the recurrence, with $\theta_0 = 1$ as the seed value: $$\theta_i = \frac{(i - 1 - \alpha/2) \cdot \theta_{i-1}}{i} \quad \text{for } i > 0$$ For Brownian noise ($\alpha = 2$) this reduces to $\theta_1 = -1$ and $\theta_i = 0$ for $i > 1$, a simple integrator, which makes sense: Brownian motion is the cumulative sum of white noise steps. <hr> ## Autocorrelation of power law noise Whitening is fundamentally about removing correlations between samples. The autocorrelation function (ACF) at lag $\tau$ is: $$r_{xx}(\tau) = E[x(t + \tau) \cdot x(t)]$$ For power law noise, the ACF decays very slowly; theoretically, it is infinitely long. White noise has an ACF that is essentially zero for all lags $\tau > 0$. Pink noise ($\alpha = 1$) has long-range dependence: its ACF decays slowly, and the integral of the ACF diverges. Brownian noise decays even more slowly. This long-range dependence means that naive statistics estimated from a short window will be substantially biased. <hr> ## Whitening by AR model inversion Since power law noise can be generated by filtering white noise through an all-pole filter $H(z)$, the natural whitening filter is its inverse, an all-zero (FIR) filter: $$H_{\text{whitening}}(z) = 1 + \sum_{k} \theta_k \cdot z^{-k}$$ For Brownian noise ($\alpha = 2$), the whitening filter is simply first-order differencing: $$y[k] = x[k] - x[k-1]$$ This is intuitively satisfying: if the signal is a random walk, taking differences recovers the underlying white noise steps. <hr> ## Identifying $\alpha$ from the lag-1 autocorrelation One approach to estimating $\alpha$ is to fit a line to the log-log PSD. This works but requires a block of samples and a Fourier transform. A much cheaper approach, proposed by @riley2004power, uses only the **lag-1 autocorrelation coefficient** $r_1$. Plotting $r_1$ as a function of $\alpha$ for simulated power law sequences reveals a sigmoidal relationship on the interval $0 \leq \alpha \leq 2$. A logistic function fits it well: $$r_1 \approx \frac{1}{1 + e^{-4\alpha + 3}}$$ This logistic fit is an empirical approximation to the theoretical relationship. Inverting it gives an estimator for $\alpha$: $$\alpha \approx \frac{3 - \ln(1/r_1 - 1)}{4}$$ This is remarkably simple: compute one autocorrelation coefficient, apply a closed-form expression, and you have an estimate of the noise exponent. ### Open question: recursive estimation for large $\alpha$ {.callout-warning} The recursive lag-1 estimator works reasonably well for $\alpha \approx 0.5$ but produces biased estimates for larger values. The most likely explanation is that power law noise with high $\alpha$ exhibits very long-range dependence: the correlation between samples persists over hundreds or thousands of lags. The frugal estimator updates using a step size derived from the local MAD, effectively assuming local stationarity. For strongly correlated noise this assumption fails. A larger forgetting factor might help, but choosing it appropriately would itself require knowledge of $\alpha$, a circular dependency. A sliding window estimator of $r_1$ is probably more practical for $\alpha > 1$. <hr> ## Summary and practical guidance 1. **Generate or acquire** the signal. If the noise source is known to follow a power law, the AR model is a convenient way to simulate it. 2. **Estimate $\alpha$** from the lag-1 autocorrelation, either using a recursive estimator (works well for $\alpha < 1$) or a sliding window estimator (more reliable for larger $\alpha$). 3. **Construct a whitening filter** using the Kasdin AR coefficients for the estimated $\alpha$. 4. **Apply the whitening filter** to the signal before running the outlier detector. For Brownian noise specifically, the whitening step is a single first-order difference, which is cheap enough to apply unconditionally even without estimating $\alpha$ first. <hr> ## Going further - **[Random processes](../random-processes/index.qmd):** the AR(p) process class is the engine behind the Kasdin coefficients: understanding the Markov/AR process explains *why* the recurrence produces $1/f^\alpha$ spectra. - **[PSD estimation](../psd-estimation/index.qmd):** the spectral slope $\alpha$ is estimated from the PSD; Welch's method gives a more reliable slope estimate than the lag-1 logistic fit for larger $\alpha$. - **[Noise generation](../noise-generation/index.qmd):** the synthesis counterpart: the Kasdin AR filter generates coloured noise; whitening is the inverse filter that removes it. - **[The noise & stochastic processing arc](../noise-and-stochastic-processing.qmd):** the guided overview of all seven noise pages. ## References ::: {#refs} :::