aom_pls — AOM-PLS (global adaptive operator selection)¶
Group: Adaptive · Registry tolerance: 5.0
Description¶
AOM-PLS — global adaptive operator selection
Registry note — Global AOMPLS/AOM-PLS selector with the compact strict-linear nirs4all bank: identity, Savitzky-Golay smooth/derivative, detrend and finite-difference operators. Reference is the in-tree nirs4all estimator stack; parity remains qualitative because selection tie-breaking and CV scoring details differ across implementations.
Parameters¶
Name |
Type |
Default |
Notes |
|---|---|---|---|
|
|
|
registry benchmark cell value |
|
|
|
registry benchmark cell value |
|
|
|
registry benchmark cell value |
Explanations¶
Bibliographic source¶
Beurier, G., Reiter, R., Noûs, C., Rouan, L. & Cornet, D. (2026). Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: a large-scale benchmark of operator-adaptive PLS and Ridge models. arXiv:2605.13587. https://arxiv.org/abs/2605.13587.
Mathematical principle¶
AOM-PLS treats spectroscopic preprocessing as a learnable step inside the PLS calibration. Let \(\mathbf{X} \in \mathbb{R}^{n\times p}\) be the centered spectral matrix (rows = samples), \(\mathbf{Y} \in \mathbb{R}^{n\times q}\) the centered response, and \(\{\mathbf{A}_b\}_{b=1}^{B} \subset \mathbb{R}^{p\times p}\) a finite bank of strict-linear spectral operators. An operator is strict-linear when its action \(\mathbf{X}_b = \mathbf{X}\mathbf{A}_b^{\top}\) depends only on the fixed wavelength grid (identity, Savitzky–Golay smoothing and derivatives, finite differences, polynomial detrending, Norris–Williams gap derivatives, Whittaker smoothing); SNV, MSC, EMSC, ASLS and OSC are not strict-linear and are handled as fold-local branches in nirs4all.
Cross-covariance identity (Eq. 2 of the paper). For centered \(\mathbf{X}, \mathbf{Y}\) and any strict-linear \(\mathbf{A}\),
Writing \(\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}\), every operator can therefore be screened by the cheap left action \(\mathbf{S}_b = \mathbf{A}_b\mathbf{S}\) (\(O(p q)\) per candidate) instead of materializing \(\mathbf{X}_b\) (\(O(n p)\)).
Global selection (the AOM in AOM-PLS). A single operator index \(b^{\star}\) is chosen for all \(K\) components by minimising a selection criterion \(\mathcal{C}\) over \(b\):
The default criterion is K-fold CV-RMSE (criterion='cv'); alternatives include the covariance proxy \(-\lVert\mathbf{A}_b\mathbf{S}\rVert\) (fast prescreen), leverage-corrected approximate PRESS, and a hybrid covariance-then-CV refinement. The optimal prefix length \(k \le K\) is selected jointly when auto_prefix=True.
SIMPLS-covariance engine. With \(\mathbf{S}_b = \mathbf{A}_b\mathbf{S}\) precomputed, SIMPLS extracts component \(a\) from the leading left singular vector \(\mathbf{r}_a = \mathbf{u}_1(\mathbf{S}_b)\) and maps it back to the original wavelength grid through the operator’s adjoint:
Stacking \(\mathbf{Z} = [\mathbf{z}_1\;\cdots\;\mathbf{z}_K]\), with original-space loadings \(\mathbf{P} = \mathbf{X}^{\top}\mathbf{T}\,\operatorname{diag}(1/\lVert\mathbf{t}_a\rVert^{2})\) and \(\mathbf{Q} = \mathbf{Y}^{\top}\mathbf{T}\,\operatorname{diag}(1/\lVert\mathbf{t}_a\rVert^{2})\), the recovered coefficient matrix is
Because \(\mathbf{B}\) lives in the original feature space, the fitted model is a single linear calibration on the raw wavelength grid: there is no preprocessing stage to replay at predict time — the operator has been absorbed into the coefficients (paper §3.2). Computationally the bank exploration cost is roughly that of a single SIMPLS fit on \(\mathbf{S}\) plus \(B\) tiny left actions, which is the algorithmic gain that makes AOM-PLS comparable to vanilla PLS even with a \(\sim\)77-operator default bank.
Implementation¶
n4m_aom_global_select via the native C ABI. Python exposes this as
n4m.aom_global_select and the catalog alias n4m.aom_pls; the wrapper builds
the compact strict-linear bank by default and also accepts caller-provided
strict operators. Result buffers include input_coefficients and intercept,
so callers can reuse the selected model on new spectra as
X_new @ input_coefficients + intercept without replaying the selected
operator. The sklearn-style n4m.sklearn.NativeAOMPLSRegressor wraps the same
native result. Reference: git-pinned oracle
nirs4all.operators.models.sklearn.aom_pls.AOMPLSRegressor (sanctioned
exception).
MATLAB header (bindings/matlab/+pls4all/aom_pls.m):
pls4all.aom_pls AOM-PLS global operator selection.
Usage¶
Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in benchmarks.parity_timing.registry. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN pls package (plsr, pcr, mvr) and for the mdatools::pls(x, y, ...) matrix idiom — those tabs appear only on the methods that have a meaningful equivalence.
pls4all bindings
/* C ABI — libn4m AOM/POP selector path */
n4m_context_t* ctx = n4m_context_create();
n4m_config_t* cfg = n4m_config_create();
n4m_operator_bank_t* bank = NULL;
n4m_validation_plan_t* plan = NULL;
n4m_aom_global_result_t* res = NULL;
n4m_operator_bank_create(&bank);
/* add compact nirs4all-style operators: identity, SG, detrend, FD */
n4m_validation_plan_create(&plan);
/* fill CV folds on plan */
n4m_aom_global_select(ctx, cfg, bank, &x_view, &y_view, plan,
/* max_components */ 2, &res);
/* read predictions and selection diagnostics via result getters */
n4m_aom_global_result_destroy(res);
n4m_validation_plan_destroy(plan);
n4m_operator_bank_destroy(bank);
n4m_config_destroy(cfg);
n4m_context_destroy(ctx);
import n4m
res = n4m.aom_pls(
X,
y,
max_components=2,
cv=4,
operators=[
"identity",
("savgol_smooth", [5, 2]),
("finite_difference", [1]),
],
)
yhat = res["predictions"]
rmse_curves = res["rmse_curves"]
coef = res["input_coefficients"]
intercept = res["intercept"]
yhat_new = X_new @ coef + intercept
from n4m.sklearn import NativeAOMPLSRegressor
model = NativeAOMPLSRegressor(max_components=2, cv=4).fit(X, y)
yhat_new = model.predict(X_new)
library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("aom_pls", X, y,
n_components = 2L, params = list(max_components = 3L, n_operators = 9L, cv = 3L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.
res = pls4all.aom_pls(X, y, 2);
% see header of bindings/matlab/+pls4all/aom_pls.m for full
% parameter surface:
% res = aom_pls(X, Y, max_components, n_operators, cv)
yhat = predict(res, Xtest);
No idiomatic classdef wrapper — invoke pls4all.fit("aom_pls", X, y, …) directly from the unified MEX factory.
Registry parity references 📐
📐
nirs4all(python · python) —nirs4allin-tree · qualitative (rmse_rel ≤ 5e+00) — In-tree nirs4all AOM/POP estimator stack (sanctioned reference). The pls4all ABI uses the same compact strict-linear bank and contiguous folds for cross-binding determinism; nirs4all remains the qualitative algorithmic reference.
Benchmarks¶
Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.
Verdict · ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance · ✓ bind = pls4all binding agrees with the C++ baseline · ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle · ✗ divergent · ⚠ error · — not run. The fastest backend per column is marked 🏆.
Reference gate: strict — numeric equivalence (rmse_rel_tol ≤ 1e-08).
Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.
| Backend | Parity | 200×40 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref 6e-16 | 4.20 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 4.16 ms |
pls4all.sklearn | ✓ bind | 4.15 ms🏆 |
| R · pls4all | ||
pls4all.R | ✓ 6e-15 | 7.84 ms |
pls4all.R.formula | ✓ 6e-15 | 9.43 ms |
pls4all.R.mdatools | ✓ 6e-15 | 17.6 ms |
pls4all.R.pls | ✓ 6e-15 | 9.87 ms |
| Python · external | ||
📐nirs4all | source | 41.9 ms |
| Backend | Parity | 200×40 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref 6e-16 | 14.6 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 8.48 ms🏆 |
pls4all.sklearn | ✓ bind | 14.7 ms |
| R · pls4all | ||
pls4all.R | ✓ 6e-15 | 27.0 ms |
pls4all.R.formula | ✓ 6e-15 | 40.4 ms |
pls4all.R.mdatools | ✓ 6e-15 | 50.0 ms |
pls4all.R.pls | ✓ 6e-15 | 47.9 ms |
| Python · external | ||
📐nirs4all | source | 59.7 ms |
| Backend | Parity | 200×40 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref 6e-16 | 23.7 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 23.7 ms |
pls4all.sklearn | ✓ bind | 15.5 ms |
| R · pls4all | ||
pls4all.R | ✓ 6e-15 | 13.4 ms |
pls4all.R.formula | ✓ 6e-15 | 16.4 ms |
pls4all.R.mdatools | ✓ 6e-15 | 12.9 ms |
pls4all.R.pls | ✓ 6e-15 | 12.4 ms🏆 |
| Python · external | ||
📐nirs4all | source | 16.6 ms |
See also: benchmark overview · methods index · interactive dashboard