aom_preprocess — AOM (Adaptive Operator Mixture) preprocessing bank¶
Group: Diagnostic · Registry tolerance: 5.0
Description¶
AOM operator-bank preprocessing primitive (aom_pop.aom_preprocessing)
Registry note —
nirs4all-methodsexposes this primitive directly asn4m.aom_preprocessandn4m.aom.aom_preprocess. In-treenirs4all.operators.models.sklearn.aom_plsremains the sanctioned reference provider for qualitative AOM parity.
Parameters¶
Name |
Type |
Default |
Notes |
|---|---|---|---|
|
sequence of AOM operator specs |
compact strict-linear bank |
Same syntax as |
|
|
|
|
|
array-like or |
|
Optional response matrix passed to the native fit path for operators that need supervised fit state. |
Explanations¶
Bibliographic source¶
Beurier, G., Reiter, R., Noûs, C., Rouan, L. & Cornet, D. (2026). Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: a large-scale benchmark of operator-adaptive PLS and Ridge models. arXiv:2605.13587. https://arxiv.org/abs/2605.13587 — introduces operator-adaptive PLS (AOM-PLS / POP-PLS) and the bench against 50+ NIRS datasets that the git-pinned oracle nirs4all.operators.models.sklearn.aom_pls is calibrated against.
Mathematical principle¶
aom_preprocess is the operator-bank primitive that AOM-PLS and POP-PLS build on. Given the centered spectral matrix \(\mathbf{X} \in \mathbb{R}^{n\times p}\) and a finite bank of strict-linear operators \(\{\mathbf{A}_b\}_{b=1}^{M} \subset \mathbb{R}^{p\times p}\) — matrices fully determined by the wavelength grid — aom_preprocess materializes the \(M\) preprocessed views \(\mathbf{X}_b = \mathbf{X}\mathbf{A}_b^{\top}\) and gates them. The direct n4m_aom_preprocess_fit surface currently supports the reusable strict-linear operator subset covered by the smoke tests: identity, first-degree polynomial detrending, Savitzky-Golay smoothing, Savitzky-Golay first derivative, Norris-Williams, finite difference, Gaussian smoothing, Whittaker smoothing and FCK. Strict chains and model-scoring diversity are handled by the AOM sweep/campaign operator-moment paths. SNV / MSC / EMSC / ASLS / OSC remain excluded from the moment contract because they depend on per-sample normalization, \(\mathbf{y}\), or a reference spectrum.
Two gating modes are supported:
soft (\(\texttt{gating\_mode}=1\)): equal-weight average
hard (\(\texttt{gating\_mode}=0\)): deterministic first-operator selection,
Both modes preserve the cross-covariance identity exploited by the AOM/POP selectors: with \(\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}\) and any \(\mathbf{A}_b\) in the bank,
so a downstream PLS step can score the whole bank by \(M\) cheap \(O(pq)\) left actions instead of \(M\) full \(O(np)\) matrix products. The motivation is that no single preprocessing is best on all calibrations — different wavelength regions favour different transforms — and the AOM-PLS / POP-PLS selectors exploit that by picking, respectively, a global operator (one \(b^{\star}\) for the whole model) or a per-component operator (one \(b_a\) for each latent direction). Predictions on new spectra reuse the absorbed operator(s) through the recovered original-space coefficients — no preprocessing replay at predict time.
Implementation¶
n4m_aom_preprocess_fit via the native C ABI. Python exposes this as
n4m.aom_preprocess and n4m.aom.aom_preprocess; both return the native
MethodResult fields as NumPy arrays/scalars:
transformed: final gated transform, shape(n_samples, n_features);operator_outputs: operator-major matrix of per-operator transformed views;weights: gating weights;operator_kinds: integer AOM operator ids;n_operators,n_samples,n_features,mode.
Reference: git-pinned oracle nirs4all.operators.models.sklearn.aom_pls
(sanctioned exception).
MATLAB header (bindings/matlab/+pls4all/aom_preprocess.m):
pls4all.aom_preprocess AOM preprocessing fit/transform.
Usage¶
The nirs4all-methods Python package exposes the product surface directly.
The lower-level C ABI and legacy pls4all examples below dispatch into the
same native kernel.
nirs4all-methods Python
import n4m
import n4m.aom as aom
res = n4m.aom_preprocess(
X,
y,
operators=[
"identity",
("savgol_smooth", [5, 2]),
("detrend_poly", [1]),
("savgol_derivative", [5, 2, 1]),
("norris_williams", [5, 5, 1]),
("finite_difference", [1]),
("gaussian", [1.0]),
("whittaker", [100.0]),
("fck", [1.0]),
],
gating_mode="soft",
)
X_aom = res["transformed"]
operator_views = res["operator_outputs"]
weights = res["weights"]
assert aom.aom_preprocess is n4m.aom_preprocess
For model selection rather than standalone preprocessing, prefer
n4m.aom_pls, n4m.pop_pls, n4m.aom_sweep_run or
n4m.aom_chain_sweep_run; those surfaces fold selected operators back into
input-space coefficients for direct prediction reuse.
Native and compatibility bindings
/* C ABI — libn4m */
n4m_context_t* ctx = NULL;
n4m_operator_bank_t* bank = NULL;
n4m_gating_strategy_t* gate = NULL;
n4m_method_result_t* res = NULL;
n4m_context_create(&ctx);
n4m_operator_bank_create(&bank);
n4m_gating_strategy_create(&gate, N4M_GATING_SOFT);
/* add operators to bank with n4m_operator_bank_add */
n4m_aom_preprocess_fit(ctx, bank, gate, &x_view, &y_view, &res);
/* read transformed/operator_outputs/weights via double-matrix getters */
/* read operator_kinds via n4m_method_result_get_int64_vector */
n4m_method_result_destroy(res);
n4m_gating_strategy_destroy(gate);
n4m_operator_bank_destroy(bank);
n4m_context_destroy(ctx);
import n4m
res = n4m.aom_preprocess(X, y, operators=["identity"], gating_mode="soft")
X_aom = res["transformed"]
operator_kinds = res["operator_kinds"]
from n4m.sklearn import NativeAOMPLSRegressor
model = NativeAOMPLSRegressor(max_components=2, cv=4).fit(X, y)
yhat = model.predict(X)
library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("aom_preprocess", X, y,
n_components = 2L, params = list(n_operators = 3L, gating_mode = 0L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.
res = pls4all.aom_preprocess(X, y, 2);
% see header of bindings/matlab/+pls4all/aom_preprocess.m for full
% parameter surface:
% res = aom_preprocess(X, Y, n_operators, gating_mode)
yhat = predict(res, Xtest);
No idiomatic classdef wrapper — invoke pls4all.fit("aom_preprocess", X, y, …) directly from the unified MEX factory.
Registry parity references 📐
📐
nirs4all(python · python) —nirs4allin-tree · qualitative (rmse_rel ≤ 5e+00) — In-tree nirs4all AOM provider (sanctioned external reference). pls4all’s current primitive exposes a small operator-bank preprocessing kernel, while nirs4all exposes the full AOM/POP estimator stack; the parity remains qualitative.
Benchmarks¶
Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.
Verdict · ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance · ✓ bind = pls4all binding agrees with the C++ baseline · ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle · ✗ divergent · ⚠ error · — not run. The fastest backend per column is marked 🏆.
Reference gate: strict — numeric equivalence (rmse_rel_tol ≤ 1e-08).
Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.
| Backend | Parity | 200×40 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref | 1.85 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 1.76 ms |
pls4all.sklearn | ✓ bind | 1.75 ms🏆 |
| R · pls4all | ||
pls4all.R | ✓ bind | 6.07 ms |
pls4all.R.formula | ✓ bind | 7.96 ms |
pls4all.R.mdatools | ✓ bind | 6.56 ms |
pls4all.R.pls | ✓ bind | 7.33 ms |
| Python · external | ||
📐nirs4all | source | 2.13 ms |
| Backend | Parity | 200×40 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref | 2.95 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 1.74 ms🏆 |
pls4all.sklearn | ✓ bind | 3.37 ms |
| R · pls4all | ||
pls4all.R | ✓ bind | 10.4 ms |
pls4all.R.formula | ✓ bind | 35.7 ms |
pls4all.R.mdatools | ✓ bind | 29.5 ms |
pls4all.R.pls | ✓ bind | 32.9 ms |
| Python · external | ||
📐nirs4all | source | 11.5 ms |
| Backend | Parity | 200×40 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref | 3.32 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 7.59 ms |
pls4all.sklearn | ✓ bind | 4.10 ms |
| R · pls4all | ||
pls4all.R | ✓ bind | 9.94 ms |
pls4all.R.formula | ✓ bind | 9.00 ms |
pls4all.R.mdatools | ✓ bind | 8.97 ms |
pls4all.R.pls | ✓ bind | 8.32 ms |
| Python · external | ||
📐nirs4all | source | 1.87 ms🏆 |
See also: benchmark overview · methods index · interactive dashboard