# `aom_preprocess` — AOM (Adaptive Operator Mixture) preprocessing bank
_Group_: **Diagnostic** · _Registry tolerance_: `5.0`
## Description
AOM operator-bank preprocessing primitive (`aom_pop.aom_preprocessing`)
> **Registry note** — `nirs4all-methods` exposes this primitive directly as
> `n4m.aom_preprocess` and `n4m.aom.aom_preprocess`. In-tree
> `nirs4all.operators.models.sklearn.aom_pls` remains the sanctioned reference
> provider for qualitative AOM parity.
### Parameters
| Name | Type | Default | Notes |
|------|------|---------|-------|
| `operators` | sequence of AOM operator specs | compact strict-linear bank | Same syntax as `n4m.aom_pls` / `n4m.aom_chain_sweep_run`: strings, integer enum ids, `(kind, params)` tuples or `{"kind": ..., "params": ...}` mappings. |
| `gating_mode` | `{"soft", "hard", 1, 0}` | `"soft"` | `soft` averages all operator outputs; `hard` selects the first operator deterministically. |
| `y` | array-like or `None` | `None` | Optional response matrix passed to the native fit path for operators that need supervised fit state. |
## Explanations
### Bibliographic source
Beurier, G., Reiter, R., Noûs, C., Rouan, L. & Cornet, D. (2026). *Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: a large-scale benchmark of operator-adaptive PLS and Ridge models*. arXiv:2605.13587. https://arxiv.org/abs/2605.13587 — introduces operator-adaptive PLS (AOM-PLS / POP-PLS) and the bench against 50+ NIRS datasets that the git-pinned oracle `nirs4all.operators.models.sklearn.aom_pls` is calibrated against.
### Mathematical principle
`aom_preprocess` is the **operator-bank primitive** that AOM-PLS and POP-PLS build on. Given the centered spectral matrix $\mathbf{X} \in \mathbb{R}^{n\times p}$ and a finite bank of strict-linear operators $\{\mathbf{A}_b\}_{b=1}^{M} \subset \mathbb{R}^{p\times p}$ — matrices fully determined by the wavelength grid — `aom_preprocess` materializes the $M$ preprocessed views $\mathbf{X}_b = \mathbf{X}\mathbf{A}_b^{\top}$ and gates them. The direct `n4m_aom_preprocess_fit` surface currently supports the reusable strict-linear operator subset covered by the smoke tests: identity, first-degree polynomial detrending, Savitzky-Golay smoothing, Savitzky-Golay first derivative, Norris-Williams, finite difference, Gaussian smoothing, Whittaker smoothing and FCK. Strict chains and model-scoring diversity are handled by the AOM sweep/campaign operator-moment paths. SNV / MSC / EMSC / ASLS / OSC remain excluded from the moment contract because they depend on per-sample normalization, $\mathbf{y}$, or a reference spectrum.
Two gating modes are supported:
* **soft** ($\texttt{gating\_mode}=1$): equal-weight average
$$\mathbf{X}_{\text{AOM}}^{\text{soft}} \;=\; \frac{1}{M}\sum_{b=1}^{M}\mathbf{X}\mathbf{A}_b^{\top}.$$
* **hard** ($\texttt{gating\_mode}=0$): deterministic first-operator selection,
$$\mathbf{X}_{\text{AOM}}^{\text{hard}} \;=\; \mathbf{X}\mathbf{A}_{1}^{\top}.$$
Both modes preserve the **cross-covariance identity** exploited by the AOM/POP selectors: with $\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}$ and any $\mathbf{A}_b$ in the bank,
$$\bigl(\mathbf{X}\mathbf{A}_b^{\top}\bigr)^{\top}\mathbf{Y} \;=\; \mathbf{A}_b\,\mathbf{S},$$
so a downstream PLS step can score the whole bank by $M$ cheap $O(pq)$ left actions instead of $M$ full $O(np)$ matrix products. The motivation is that **no single preprocessing is best on all calibrations** — different wavelength regions favour different transforms — and the AOM-PLS / POP-PLS selectors exploit that by picking, respectively, a global operator (one $b^{\star}$ for the whole model) or a per-component operator (one $b_a$ for each latent direction). Predictions on new spectra reuse the absorbed operator(s) through the recovered original-space coefficients — **no preprocessing replay at predict time**.
### Implementation
`n4m_aom_preprocess_fit` via the native C ABI. Python exposes this as
`n4m.aom_preprocess` and `n4m.aom.aom_preprocess`; both return the native
MethodResult fields as NumPy arrays/scalars:
- `transformed`: final gated transform, shape `(n_samples, n_features)`;
- `operator_outputs`: operator-major matrix of per-operator transformed views;
- `weights`: gating weights;
- `operator_kinds`: integer AOM operator ids;
- `n_operators`, `n_samples`, `n_features`, `mode`.
Reference: git-pinned oracle `nirs4all.operators.models.sklearn.aom_pls`
(sanctioned exception).
MATLAB header (`bindings/matlab/+pls4all/aom_preprocess.m`):
```text
pls4all.aom_preprocess AOM preprocessing fit/transform.
```
### Usage
The `nirs4all-methods` Python package exposes the product surface directly.
The lower-level C ABI and legacy `pls4all` examples below dispatch into the
same native kernel.
**nirs4all-methods Python**
```python
import n4m
import n4m.aom as aom
res = n4m.aom_preprocess(
X,
y,
operators=[
"identity",
("savgol_smooth", [5, 2]),
("detrend_poly", [1]),
("savgol_derivative", [5, 2, 1]),
("norris_williams", [5, 5, 1]),
("finite_difference", [1]),
("gaussian", [1.0]),
("whittaker", [100.0]),
("fck", [1.0]),
],
gating_mode="soft",
)
X_aom = res["transformed"]
operator_views = res["operator_outputs"]
weights = res["weights"]
assert aom.aom_preprocess is n4m.aom_preprocess
```
For model selection rather than standalone preprocessing, prefer
`n4m.aom_pls`, `n4m.pop_pls`, `n4m.aom_sweep_run` or
`n4m.aom_chain_sweep_run`; those surfaces fold selected operators back into
input-space coefficients for direct prediction reuse.
**Native and compatibility bindings**
::::{tab-set}
:class: pls4all-bindings
:::{tab-item} C ABI · libn4m
:sync: c
:class-label: lang-c
```c
/* C ABI — libn4m */
n4m_context_t* ctx = NULL;
n4m_operator_bank_t* bank = NULL;
n4m_gating_strategy_t* gate = NULL;
n4m_method_result_t* res = NULL;
n4m_context_create(&ctx);
n4m_operator_bank_create(&bank);
n4m_gating_strategy_create(&gate, N4M_GATING_SOFT);
/* add operators to bank with n4m_operator_bank_add */
n4m_aom_preprocess_fit(ctx, bank, gate, &x_view, &y_view, &res);
/* read transformed/operator_outputs/weights via double-matrix getters */
/* read operator_kinds via n4m_method_result_get_int64_vector */
n4m_method_result_destroy(res);
n4m_gating_strategy_destroy(gate);
n4m_operator_bank_destroy(bank);
n4m_context_destroy(ctx);
```
:::
:::{tab-item} Python · n4m
:sync: python-raw
:class-label: lang-python
```python
import n4m
res = n4m.aom_preprocess(X, y, operators=["identity"], gating_mode="soft")
X_aom = res["transformed"]
operator_kinds = res["operator_kinds"]
```
:::
:::{tab-item} Python · n4m.sklearn
:sync: python-sklearn
:class-label: lang-python
```python
from n4m.sklearn import NativeAOMPLSRegressor
model = NativeAOMPLSRegressor(max_components=2, cv=4).fit(X, y)
yhat = model.predict(X)
```
:::
:::{tab-item} R · pls4all_method()
:sync: r-dispatcher
:class-label: lang-r
```r
library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("aom_preprocess", X, y,
n_components = 2L, params = list(n_operators = 3L, gating_mode = 0L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.
```
:::
:::{tab-item} MATLAB · pls4all (MEX)
:sync: matlab-mex
:class-label: lang-matlab
```matlab
res = pls4all.aom_preprocess(X, y, 2);
% see header of bindings/matlab/+pls4all/aom_preprocess.m for full
% parameter surface:
% res = aom_preprocess(X, Y, n_operators, gating_mode)
yhat = predict(res, Xtest);
```
:::
:::{tab-item} MATLAB · pls4all (classdef)
:sync: matlab-classdef
:class-label: lang-matlab
_No idiomatic classdef wrapper — invoke `pls4all.fit("aom_preprocess", X, y, …)` directly from the unified MEX factory._
:::
::::
**Registry parity references** 📐
:::{card}
:class-card: external-refs
- 📐 **`nirs4all`** (python · python) — `nirs4all` in-tree · qualitative (rmse_rel ≤ 5e+00) — In-tree nirs4all AOM provider (sanctioned external reference). pls4all's current primitive exposes a small operator-bank preprocessing kernel, while nirs4all exposes the full AOM/POP estimator stack; the parity remains qualitative.
:::
### Benchmarks
Adaptive wall-clock per cell measured against [`full_matrix.csv`](../benchmarks/overview.md). Only backends that implement this method are listed; libraries without the method are omitted.
**Verdict** · ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance · ✓ bind = pls4all binding agrees with the C++ baseline · ✗ divergent · ⚠ error · — not run. The fastest backend per column is marked 🏆.
**Reference gate**: qualitative — shape/smoke comparison only. The external library and pls4all do not produce numerically equivalent output for this method (see the MethodSpec notes); the `rmse_rel_tol ≤ 5e+00` budget is set wide on purpose. Treat ~ shape as *“we ran both, both finished”*, not as numerical agreement.
Rows tagged with **📐** are the canonical parity references for this method (declared in [`parity_timing.registry`](../benchmarks/methodology.md)). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.
::::{tab-set}
:class: parity-tabs
:::{tab-item} 1 thread
:sync: threads-1
| Backend | Parity | 50×250 (ms) | 100×50 (ms) | 100×500 (ms) | 100×2500 (ms) | 200×40 (ms) | 250×50 (ms) | 500×50 (ms) | 500×500 (ms) | 500×2500 (ms) | 2500×50 (ms) | 2500×500 (ms) | 2500×2500 (ms) | 10000×50 (ms) | 10000×500 (ms) |
| C++ native · libn4m |
pls4all.cpp.blas | ⚠ | — | 2.49 ms | 14.6 ms🏆 | 75.4 ms | 1.79 ms | — | 7.75 ms | 75.8 ms | 390.8 ms | 38.9 ms | 401.3 ms | 2.3 s | 148.1 ms | 1.8 s |
pls4all.cpp.blas+omp | ⚠ | — | 2.44 ms | 15.7 ms | 74.2 ms | 1.77 ms | — | 9.04 ms | 72.9 ms🏆 | 381.1 ms | 39.5 ms | 395.5 ms | 2.3 s | 146.4 ms🏆 | 1.8 s🏆 |
pls4all.cpp.omp | ⚠ | — | 2.58 ms | 15.5 ms | 74.5 ms | 1.67 ms | — | 7.13 ms🏆 | 76.3 ms | 378.3 ms🏆 | 38.6 ms | 391.1 ms🏆 | 2.3 s | 155.5 ms | 1.8 s |
pls4all.cpp.ref | ⚠ | — | 1.81 ms🏆 | 14.8 ms | 71.4 ms🏆 | 1.66 ms🏆 | — | 7.79 ms | 82.6 ms | 381.9 ms | 37.9 ms🏆 | 393.4 ms | 2.2 s🏆 | 148.9 ms | 1.8 s |
| Python · pls4all |
pls4all.python | ⚠ | — | — | — | — | 1.77 ms | — | — | — | — | — | — | — | — | — |
pls4all.sklearn | ✓ bind | 2.89 ms🏆 | — | — | — | 1.86 ms | 2.60 ms🏆 | — | — | — | — | — | — | — | — |
| R · pls4all |
pls4all.R | ✓ bind | 13.8 ms | — | — | — | 5.56 ms | 15.2 ms | — | — | — | — | — | — | — | — |
pls4all.R.formula | ✓ bind | 24.0 ms | — | — | — | 6.46 ms | 13.0 ms | — | — | — | — | — | — | — | — |
pls4all.R.mdatools | ✓ bind | 22.7 ms | — | — | — | 7.29 ms | 11.0 ms | — | — | — | — | — | — | — | — |
pls4all.R.pls | ✓ bind | 24.9 ms | — | — | — | 6.75 ms | 12.0 ms | — | — | — | — | — | — | — | — |
| MATLAB · pls4all |
pls4all.matlab | ✗ +6e+00 | 4.67 ms | — | — | — | 2.76 ms | 5.96 ms | — | — | — | — | — | — | — | — |
pls4all.matlab.classdef | ✗ +6e+00 | 10.1 ms | — | — | — | 3.25 ms | 7.83 ms | — | — | — | — | — | — | — | — |
| Python · external |
📐nirs4all | ⚠ | — | — | — | — | 1.95 ms | — | — | — | — | — | — | — | — | — |
:::
:::{tab-item} 3 threads
:sync: threads-3
| Backend | Parity | 50×250 (ms) | 100×50 (ms) | 100×500 (ms) | 100×2500 (ms) | 200×40 (ms) | 250×50 (ms) | 500×50 (ms) | 500×500 (ms) | 500×2500 (ms) | 2500×50 (ms) | 2500×500 (ms) | 2500×2500 (ms) | 10000×50 (ms) | 10000×500 (ms) |
| C++ native · libn4m |
pls4all.cpp.blas | ~ shape | — | — | — | — | 2.79 ms | — | — | — | — | — | — | — | — | — |
pls4all.cpp.blas+omp | ~ shape | — | — | — | — | 1.53 ms🏆 | — | — | — | — | — | — | — | — | — |
pls4all.cpp.omp | ~ shape | — | — | — | — | 1.60 ms | — | — | — | — | — | — | — | — | — |
pls4all.cpp.ref | ~ shape | — | — | — | — | 1.69 ms | — | — | — | — | — | — | — | — | — |
| Python · pls4all |
pls4all.python | ✓ bind | — | — | — | — | 1.69 ms | — | — | — | — | — | — | — | — | — |
pls4all.sklearn | ✓ bind | — | — | — | — | 1.81 ms | — | — | — | — | — | — | — | — | — |
| R · pls4all |
pls4all.R | ✓ bind | — | — | — | — | 5.24 ms | — | — | — | — | — | — | — | — | — |
pls4all.R.formula | ✓ bind | — | — | — | — | 8.08 ms | — | — | — | — | — | — | — | — | — |
pls4all.R.mdatools | ✓ bind | — | — | — | — | 6.64 ms | — | — | — | — | — | — | — | — | — |
pls4all.R.pls | ✓ bind | — | — | — | — | 6.62 ms | — | — | — | — | — | — | — | — | — |
| MATLAB · pls4all |
pls4all.matlab | ✗ +6e+00 | — | — | — | — | 4.16 ms | — | — | — | — | — | — | — | — | — |
pls4all.matlab.classdef | ✗ +6e+00 | — | — | — | — | 4.04 ms | — | — | — | — | — | — | — | — | — |
| Python · external |
📐nirs4all | source | — | — | — | — | 1.97 ms | — | — | — | — | — | — | — | — | — |
:::
:::{tab-item} 10 threads
:sync: threads-10
| Backend | Parity | 50×250 (ms) | 100×50 (ms) | 100×500 (ms) | 100×2500 (ms) | 200×40 (ms) | 250×50 (ms) | 500×50 (ms) | 500×500 (ms) | 500×2500 (ms) | 2500×50 (ms) | 2500×500 (ms) | 2500×2500 (ms) | 10000×50 (ms) | 10000×500 (ms) |
| C++ native · libn4m |
pls4all.cpp.blas | ~ shape | — | — | — | — | 1.45 ms | — | — | — | — | — | — | — | — | — |
pls4all.cpp.blas+omp | ~ shape | — | — | — | — | 1.50 ms | — | — | — | — | — | — | — | — | — |
pls4all.cpp.omp | ~ shape | — | — | — | — | 1.47 ms | — | — | — | — | — | — | — | — | — |
pls4all.cpp.ref | ~ shape | — | — | — | — | 1.49 ms | — | — | — | — | — | — | — | — | — |
| Python · pls4all |
pls4all.python | ✓ bind | — | — | — | — | 1.48 ms | — | — | — | — | — | — | — | — | — |
pls4all.sklearn | ✓ bind | — | — | — | — | 1.43 ms🏆 | — | — | — | — | — | — | — | — | — |
| R · pls4all |
pls4all.R | ✓ bind | — | — | — | — | 3.96 ms | — | — | — | — | — | — | — | — | — |
pls4all.R.formula | ✓ bind | — | — | — | — | 4.71 ms | — | — | — | — | — | — | — | — | — |
pls4all.R.mdatools | ✓ bind | — | — | — | — | 4.81 ms | — | — | — | — | — | — | — | — | — |
pls4all.R.pls | ✓ bind | — | — | — | — | 5.05 ms | — | — | — | — | — | — | — | — | — |
| MATLAB · pls4all |
pls4all.matlab | ✗ +6e+00 | — | — | — | — | 2.43 ms | — | — | — | — | — | — | — | — | — |
pls4all.matlab.classdef | ✗ +6e+00 | — | — | — | — | 2.67 ms | — | — | — | — | — | — | — | — | — |
| Python · external |
📐nirs4all | source | — | — | — | — | 1.74 ms | — | — | — | — | — | — | — | — | — |
:::
::::
---
_See also_: [benchmark overview](../benchmarks/overview.md) · [methods index](index.md) · [interactive dashboard](../landing/dashboard.md)