`aom_preprocess` — AOM (Adaptive Operator Mixture) preprocessing bank¶

Group: Diagnostic · Registry tolerance: 1e-08

Description¶

AOM preprocessing pipeline (§17 Phase 4)

Registry note — In-tree nirs4all.operators.models.sklearn.aom_pls is the sanctioned provider. pls4all currently exposes the preprocessing primitive, while nirs4all exposes the full AOM/POP estimator stack; parity is qualitative.

Parameters¶

Name	Type	Default	Notes
`n_operators`	`int`	`3`	registry benchmark cell value
`gating_mode`	`int`	`0`	registry benchmark cell value

Explanations¶

Bibliographic source¶

Beurier, G., Reiter, R., Noûs, C., Rouan, L. & Cornet, D. (2026). Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: a large-scale benchmark of operator-adaptive PLS and Ridge models. arXiv:2605.13587. https://arxiv.org/abs/2605.13587 — introduces operator-adaptive PLS (AOM-PLS / POP-PLS) and the bench against 50+ NIRS datasets that the git-pinned oracle nirs4all.operators.models.sklearn.aom_pls is calibrated against.

Mathematical principle¶

aom_preprocess is the operator-bank primitive that AOM-PLS and POP-PLS build on. Given the centered spectral matrix \(\mathbf{X} \in \mathbb{R}^{n\times p}\) and a finite bank of strict-linear operators \(\{\mathbf{A}_b\}_{b=1}^{M} \subset \mathbb{R}^{p\times p}\) — matrices fully determined by the wavelength grid (identity, Savitzky–Golay smooth/derivative, finite difference, polynomial detrending, Norris–Williams, Whittaker; SNV / MSC / EMSC / ASLS / OSC are excluded because they depend on \(\mathbf{y}\) or on a reference spectrum) — aom_preprocess materializes the \(M\) preprocessed views \(\mathbf{X}_b = \mathbf{X}\mathbf{A}_b^{\top}\) and gates them.

Two gating modes are supported:

soft (\(\texttt{gating\_mode}=1\)): equal-weight average

\[\mathbf{X}_{\text{AOM}}^{\text{soft}} \;=\; \frac{1}{M}\sum_{b=1}^{M}\mathbf{X}\mathbf{A}_b^{\top}.\]

hard (\(\texttt{gating\_mode}=0\)): deterministic first-operator selection,

\[\mathbf{X}_{\text{AOM}}^{\text{hard}} \;=\; \mathbf{X}\mathbf{A}_{1}^{\top}.\]

Both modes preserve the cross-covariance identity exploited by the AOM/POP selectors: with \(\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}\) and any \(\mathbf{A}_b\) in the bank,

\[\bigl(\mathbf{X}\mathbf{A}_b^{\top}\bigr)^{\top}\mathbf{Y} \;=\; \mathbf{A}_b\,\mathbf{S},\]

so a downstream PLS step can score the whole bank by \(M\) cheap \(O(pq)\) left actions instead of \(M\) full \(O(np)\) matrix products. The motivation is that no single preprocessing is best on all calibrations — different wavelength regions favour different transforms — and the AOM-PLS / POP-PLS selectors exploit that by picking, respectively, a global operator (one \(b^{\star}\) for the whole model) or a per-component operator (one \(b_a\) for each latent direction). Predictions on new spectra reuse the absorbed operator(s) through the recovered original-space coefficients — no preprocessing replay at predict time.

Implementation¶

n4m_model_selection_aom_preprocessing_fit. Reference: git-pinned oracle nirs4all.operators.models.sklearn.aom_pls (sanctioned exception).

R roxygen note (methods_extra.R::aom_preprocess):

Adaptive Operator-Mixture preprocessing fit/transform.

MATLAB header (bindings/matlab/+pls4all/aom_preprocess.m):

pls4all.aom_preprocess  AOM preprocessing fit/transform.

Usage¶

Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in benchmarks.parity_timing.registry. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN pls package (plsr, pcr, mvr) and for the mdatools::pls(x, y, ...) matrix idiom — those tabs appear only on the methods that have a meaningful equivalence.

pls4all bindings

C ABI · libn4m

/* C ABI — libn4m */
n4m_context_t* ctx = n4m_context_create();
n4m_config_t*  cfg = n4m_config_create();
n4m_method_result_t* res = NULL;
n4m_model_selection_aom_preprocessing_fit(ctx, cfg, &x_view, &y_view, /* hyperparams */, &res);
/* … read coefficients / mask / scores via */
/* n4m_method_result_get_double_matrix / vector / scalar … */
n4m_method_result_destroy(res);
n4m_config_destroy(cfg);
n4m_context_destroy(ctx);

Python · pls4all (raw)

import pls4all
from pls4all._methods import aom_preprocess_fit
with pls4all.Context() as ctx, pls4all.Config() as cfg:
    res = aom_preprocess_fit(ctx, cfg, X, y)
# then: res.matrix("predictions"), res.matrix("coefficients"),
# res.vector("mask"), res.scalar("intercept"), …

Python · pls4all.sklearn

from pls4all.sklearn import aom_preprocess
result = aom_preprocess(X, y, n_components=2)

R · pls4all_method()

library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("aom_preprocess", X, y,
                      n_components = 2L, params = list(n_operators = 3L, gating_mode = 0L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.

R · pls4all (raw fn)

library(pls4all)
res  <- aom_preprocess(X, Y = NULL, n_operators = 1L, gating_mode = 0L)
yhat <- pls4all_predict(res, X_test)

MATLAB · pls4all (MEX)

res = pls4all.aom_preprocess(X, y, 2);
% see header of bindings/matlab/+pls4all/aom_preprocess.m for full
% parameter surface:
%   res = aom_preprocess(X, Y, n_operators, gating_mode)
yhat = predict(res, Xtest);

MATLAB · pls4all (classdef)

No idiomatic classdef wrapper — invoke pls4all.fit("aom_preprocess", X, y, …) directly from the unified MEX factory.

Registry parity references 📐

📐 nirs4all (python · python) — nirs4all in-tree · strict (rmse_rel ≤ 1e-08) — In-tree nirs4all AOM provider (sanctioned external reference). pls4all’s current primitive exposes a small operator-bank preprocessing kernel, while nirs4all exposes the full AOM/POP estimator stack; the parity remains qualitative.

Benchmarks¶

Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.

Verdict · ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance · ✓ bind = pls4all binding agrees with the C++ baseline · ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle · ✗ divergent · ⚠ error · — not run. The fastest backend per column is marked 🏆.

Reference gate: strict — numeric equivalence (rmse_rel_tol ≤ 1e-08).

Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.

1 thread

Backend	Parity	200×40 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ ref	1.85 ms
Python · pls4all
`pls4all.python`	✓ bind	1.76 ms
`pls4all.sklearn`	✓ bind	1.75 ms🏆
R · pls4all
`pls4all.R`	✓ bind	6.07 ms
`pls4all.R.formula`	✓ bind	7.96 ms
`pls4all.R.mdatools`	✓ bind	6.56 ms
`pls4all.R.pls`	✓ bind	7.33 ms
Python · external
📐`nirs4all`	source	2.13 ms

3 threads

Backend	Parity	200×40 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ ref	2.95 ms
Python · pls4all
`pls4all.python`	✓ bind	1.74 ms🏆
`pls4all.sklearn`	✓ bind	3.37 ms
R · pls4all
`pls4all.R`	✓ bind	10.4 ms
`pls4all.R.formula`	✓ bind	35.7 ms
`pls4all.R.mdatools`	✓ bind	29.5 ms
`pls4all.R.pls`	✓ bind	32.9 ms
Python · external
📐`nirs4all`	source	11.5 ms

10 threads

Backend	Parity	200×40 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ ref	3.32 ms
Python · pls4all
`pls4all.python`	✓ bind	7.59 ms
`pls4all.sklearn`	✓ bind	4.10 ms
R · pls4all
`pls4all.R`	✓ bind	9.94 ms
`pls4all.R.formula`	✓ bind	9.00 ms
`pls4all.R.mdatools`	✓ bind	8.97 ms
`pls4all.R.pls`	✓ bind	8.32 ms
Python · external
📐`nirs4all`	source	1.87 ms🏆

nirs4all-methods

Navigation

`aom_preprocess` — AOM (Adaptive Operator Mixture) preprocessing bank¶

Description¶

Parameters¶

Explanations¶

Bibliographic source¶

Mathematical principle¶

Implementation¶

Usage¶

Benchmarks¶

aom_preprocess — AOM (Adaptive Operator Mixture) preprocessing bank¶

Description¶

Parameters¶

Explanations¶

Bibliographic source¶

Mathematical principle¶

Implementation¶

Usage¶

Benchmarks¶

`aom_preprocess` — AOM (Adaptive Operator Mixture) preprocessing bank¶