aom_preprocess — AOM (Adaptive Operator Mixture) preprocessing bank

Group: Diagnostic · Registry tolerance: 5.0

Description

AOM operator-bank preprocessing primitive (aom_pop.aom_preprocessing)

Registry notenirs4all-methods exposes this primitive directly as n4m.aom_preprocess and n4m.aom.aom_preprocess. In-tree nirs4all.operators.models.sklearn.aom_pls remains the sanctioned reference provider for qualitative AOM parity.

Parameters

Name

Type

Default

Notes

operators

sequence of AOM operator specs

compact strict-linear bank

Same syntax as n4m.aom_pls / n4m.aom_chain_sweep_run: strings, integer enum ids, (kind, params) tuples or {"kind": ..., "params": ...} mappings.

gating_mode

{"soft", "hard", 1, 0}

"soft"

soft averages all operator outputs; hard selects the first operator deterministically.

y

array-like or None

None

Optional response matrix passed to the native fit path for operators that need supervised fit state.

Explanations

Bibliographic source

Beurier, G., Reiter, R., Noûs, C., Rouan, L. & Cornet, D. (2026). Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: a large-scale benchmark of operator-adaptive PLS and Ridge models. arXiv:2605.13587. https://arxiv.org/abs/2605.13587 — introduces operator-adaptive PLS (AOM-PLS / POP-PLS) and the bench against 50+ NIRS datasets that the git-pinned oracle nirs4all.operators.models.sklearn.aom_pls is calibrated against.

Mathematical principle

aom_preprocess is the operator-bank primitive that AOM-PLS and POP-PLS build on. Given the centered spectral matrix \(\mathbf{X} \in \mathbb{R}^{n\times p}\) and a finite bank of strict-linear operators \(\{\mathbf{A}_b\}_{b=1}^{M} \subset \mathbb{R}^{p\times p}\) — matrices fully determined by the wavelength grid — aom_preprocess materializes the \(M\) preprocessed views \(\mathbf{X}_b = \mathbf{X}\mathbf{A}_b^{\top}\) and gates them. The direct n4m_aom_preprocess_fit surface currently supports the reusable strict-linear operator subset covered by the smoke tests: identity, first-degree polynomial detrending, Savitzky-Golay smoothing, Savitzky-Golay first derivative, Norris-Williams, finite difference, Gaussian smoothing, Whittaker smoothing and FCK. Strict chains and model-scoring diversity are handled by the AOM sweep/campaign operator-moment paths. SNV / MSC / EMSC / ASLS / OSC remain excluded from the moment contract because they depend on per-sample normalization, \(\mathbf{y}\), or a reference spectrum.

Two gating modes are supported:

  • soft (\(\texttt{gating\_mode}=1\)): equal-weight average

\[\mathbf{X}_{\text{AOM}}^{\text{soft}} \;=\; \frac{1}{M}\sum_{b=1}^{M}\mathbf{X}\mathbf{A}_b^{\top}.\]
  • hard (\(\texttt{gating\_mode}=0\)): deterministic first-operator selection,

\[\mathbf{X}_{\text{AOM}}^{\text{hard}} \;=\; \mathbf{X}\mathbf{A}_{1}^{\top}.\]

Both modes preserve the cross-covariance identity exploited by the AOM/POP selectors: with \(\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}\) and any \(\mathbf{A}_b\) in the bank,

\[\bigl(\mathbf{X}\mathbf{A}_b^{\top}\bigr)^{\top}\mathbf{Y} \;=\; \mathbf{A}_b\,\mathbf{S},\]

so a downstream PLS step can score the whole bank by \(M\) cheap \(O(pq)\) left actions instead of \(M\) full \(O(np)\) matrix products. The motivation is that no single preprocessing is best on all calibrations — different wavelength regions favour different transforms — and the AOM-PLS / POP-PLS selectors exploit that by picking, respectively, a global operator (one \(b^{\star}\) for the whole model) or a per-component operator (one \(b_a\) for each latent direction). Predictions on new spectra reuse the absorbed operator(s) through the recovered original-space coefficients — no preprocessing replay at predict time.

Implementation

n4m_aom_preprocess_fit via the native C ABI. Python exposes this as n4m.aom_preprocess and n4m.aom.aom_preprocess; both return the native MethodResult fields as NumPy arrays/scalars:

  • transformed: final gated transform, shape (n_samples, n_features);

  • operator_outputs: operator-major matrix of per-operator transformed views;

  • weights: gating weights;

  • operator_kinds: integer AOM operator ids;

  • n_operators, n_samples, n_features, mode.

Reference: git-pinned oracle nirs4all.operators.models.sklearn.aom_pls (sanctioned exception).

MATLAB header (bindings/matlab/+pls4all/aom_preprocess.m):

pls4all.aom_preprocess  AOM preprocessing fit/transform.

Usage

The nirs4all-methods Python package exposes the product surface directly. The lower-level C ABI and legacy pls4all examples below dispatch into the same native kernel.

nirs4all-methods Python

import n4m
import n4m.aom as aom

res = n4m.aom_preprocess(
    X,
    y,
    operators=[
        "identity",
        ("savgol_smooth", [5, 2]),
        ("detrend_poly", [1]),
        ("savgol_derivative", [5, 2, 1]),
        ("norris_williams", [5, 5, 1]),
        ("finite_difference", [1]),
        ("gaussian", [1.0]),
        ("whittaker", [100.0]),
        ("fck", [1.0]),
    ],
    gating_mode="soft",
)

X_aom = res["transformed"]
operator_views = res["operator_outputs"]
weights = res["weights"]
assert aom.aom_preprocess is n4m.aom_preprocess

For model selection rather than standalone preprocessing, prefer n4m.aom_pls, n4m.pop_pls, n4m.aom_sweep_run or n4m.aom_chain_sweep_run; those surfaces fold selected operators back into input-space coefficients for direct prediction reuse.

Native and compatibility bindings

/* C ABI — libn4m */
n4m_context_t* ctx = NULL;
n4m_operator_bank_t* bank = NULL;
n4m_gating_strategy_t* gate = NULL;
n4m_method_result_t* res = NULL;
n4m_context_create(&ctx);
n4m_operator_bank_create(&bank);
n4m_gating_strategy_create(&gate, N4M_GATING_SOFT);
/* add operators to bank with n4m_operator_bank_add */
n4m_aom_preprocess_fit(ctx, bank, gate, &x_view, &y_view, &res);
/* read transformed/operator_outputs/weights via double-matrix getters */
/* read operator_kinds via n4m_method_result_get_int64_vector */
n4m_method_result_destroy(res);
n4m_gating_strategy_destroy(gate);
n4m_operator_bank_destroy(bank);
n4m_context_destroy(ctx);
import n4m

res = n4m.aom_preprocess(X, y, operators=["identity"], gating_mode="soft")
X_aom = res["transformed"]
operator_kinds = res["operator_kinds"]
from n4m.sklearn import NativeAOMPLSRegressor

model = NativeAOMPLSRegressor(max_components=2, cv=4).fit(X, y)
yhat = model.predict(X)
library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("aom_preprocess", X, y,
                      n_components = 2L, params = list(n_operators = 3L, gating_mode = 0L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.
res = pls4all.aom_preprocess(X, y, 2);
% see header of bindings/matlab/+pls4all/aom_preprocess.m for full
% parameter surface:
%   res = aom_preprocess(X, Y, n_operators, gating_mode)
yhat = predict(res, Xtest);

No idiomatic classdef wrapper — invoke pls4all.fit("aom_preprocess", X, y, …) directly from the unified MEX factory.

Registry parity references 📐

  • 📐 nirs4all (python · python) — nirs4all in-tree · qualitative (rmse_rel ≤ 5e+00) — In-tree nirs4all AOM provider (sanctioned external reference). pls4all’s current primitive exposes a small operator-bank preprocessing kernel, while nirs4all exposes the full AOM/POP estimator stack; the parity remains qualitative.

Benchmarks

Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.

Verdict  ·  ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance  ·  ✓ bind = pls4all binding agrees with the C++ baseline  ·  ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle  ·  ✗ divergent  ·  ⚠ error  ·  — not run. The fastest backend per column is marked 🏆.

Reference gate: strict — numeric equivalence (rmse_rel_tol 1e-08).

Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.

BackendParity200×40 (ms)
C++ native · libn4m
pls4all.cpp.blas+omp✓ ref1.85 ms
Python · pls4all
pls4all.python✓ bind1.76 ms
pls4all.sklearn✓ bind1.75 ms🏆
R · pls4all
pls4all.R✓ bind6.07 ms
pls4all.R.formula✓ bind7.96 ms
pls4all.R.mdatools✓ bind6.56 ms
pls4all.R.pls✓ bind7.33 ms
Python · external
📐nirs4allsource2.13 ms
BackendParity200×40 (ms)
C++ native · libn4m
pls4all.cpp.blas+omp✓ ref2.95 ms
Python · pls4all
pls4all.python✓ bind1.74 ms🏆
pls4all.sklearn✓ bind3.37 ms
R · pls4all
pls4all.R✓ bind10.4 ms
pls4all.R.formula✓ bind35.7 ms
pls4all.R.mdatools✓ bind29.5 ms
pls4all.R.pls✓ bind32.9 ms
Python · external
📐nirs4allsource11.5 ms
BackendParity200×40 (ms)
C++ native · libn4m
pls4all.cpp.blas+omp✓ ref3.32 ms
Python · pls4all
pls4all.python✓ bind7.59 ms
pls4all.sklearn✓ bind4.10 ms
R · pls4all
pls4all.R✓ bind9.94 ms
pls4all.R.formula✓ bind9.00 ms
pls4all.R.mdatools✓ bind8.97 ms
pls4all.R.pls✓ bind8.32 ms
Python · external
📐nirs4allsource1.87 ms🏆

See also: benchmark overview · methods index · interactive dashboard