# `aom_pls` — AOM-PLS (global adaptive operator selection) _Group_: **Adaptive** · _Registry tolerance_: `5.0` ## Description AOM-PLS — global adaptive operator selection > **Registry note** — Global AOMPLS/AOM-PLS selector with the compact strict-linear nirs4all bank: identity, Savitzky-Golay smooth/derivative, detrend and finite-difference operators. Reference is the in-tree nirs4all estimator stack; parity remains qualitative because selection tie-breaking and CV scoring details differ across implementations. ### Parameters | Name | Type | Default | Notes | |------|------|---------|-------| | `max_components` | `int` | `3` | registry benchmark cell value | | `n_operators` | `int` | `9` | registry benchmark cell value | | `cv` | `int` | `3` | registry benchmark cell value | ## Explanations ### Bibliographic source Beurier, G., Reiter, R., Noûs, C., Rouan, L. & Cornet, D. (2026). *Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: a large-scale benchmark of operator-adaptive PLS and Ridge models*. arXiv:2605.13587. https://arxiv.org/abs/2605.13587. ### Mathematical principle AOM-PLS treats spectroscopic preprocessing as a learnable step *inside* the PLS calibration. Let $\mathbf{X} \in \mathbb{R}^{n\times p}$ be the centered spectral matrix (rows = samples), $\mathbf{Y} \in \mathbb{R}^{n\times q}$ the centered response, and $\{\mathbf{A}_b\}_{b=1}^{B} \subset \mathbb{R}^{p\times p}$ a finite bank of **strict-linear** spectral operators. An operator is strict-linear when its action $\mathbf{X}_b = \mathbf{X}\mathbf{A}_b^{\top}$ depends only on the fixed wavelength grid (identity, Savitzky–Golay smoothing and derivatives, finite differences, polynomial detrending, Norris–Williams gap derivatives, Whittaker smoothing); SNV, MSC, EMSC, ASLS and OSC are *not* strict-linear and are handled as fold-local branches in `nirs4all`. **Cross-covariance identity** (Eq. 2 of the paper). For centered $\mathbf{X}, \mathbf{Y}$ and any strict-linear $\mathbf{A}$, $$\bigl(\mathbf{X}\mathbf{A}^{\top}\bigr)^{\top}\mathbf{Y} \;=\; \mathbf{A}\,\mathbf{X}^{\top}\mathbf{Y}.$$ Writing $\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}$, every operator can therefore be *screened* by the cheap left action $\mathbf{S}_b = \mathbf{A}_b\mathbf{S}$ ($O(p q)$ per candidate) instead of materializing $\mathbf{X}_b$ ($O(n p)$). **Global selection (the AOM in AOM-PLS).** A single operator index $b^{\star}$ is chosen for *all* $K$ components by minimising a selection criterion $\mathcal{C}$ over $b$: $$b^{\star} \;=\; \operatorname*{arg\,min}_{b\in\{1,\dots,B\}} \; \mathcal{C}\!\bigl(\text{SIMPLS}(\mathbf{X}_b, \mathbf{Y}; K)\bigr).$$ The default criterion is K-fold CV-RMSE (`criterion='cv'`); alternatives include the covariance proxy $-\lVert\mathbf{A}_b\mathbf{S}\rVert$ (fast prescreen), leverage-corrected approximate PRESS, and a hybrid covariance-then-CV refinement. The optimal prefix length $k \le K$ is selected jointly when `auto_prefix=True`. **SIMPLS-covariance engine.** With $\mathbf{S}_b = \mathbf{A}_b\mathbf{S}$ precomputed, SIMPLS extracts component $a$ from the leading left singular vector $\mathbf{r}_a = \mathbf{u}_1(\mathbf{S}_b)$ and maps it back to the original wavelength grid through the operator's adjoint: $$\mathbf{z}_a \;=\; \mathbf{A}_{b^{\star}}^{\top}\,\mathbf{r}_a, \qquad \mathbf{t}_a = \mathbf{X}\mathbf{z}_a.$$ Stacking $\mathbf{Z} = [\mathbf{z}_1\;\cdots\;\mathbf{z}_K]$, with original-space loadings $\mathbf{P} = \mathbf{X}^{\top}\mathbf{T}\,\operatorname{diag}(1/\lVert\mathbf{t}_a\rVert^{2})$ and $\mathbf{Q} = \mathbf{Y}^{\top}\mathbf{T}\,\operatorname{diag}(1/\lVert\mathbf{t}_a\rVert^{2})$, the recovered coefficient matrix is $$\mathbf{B} \;=\; \mathbf{Z}\,\bigl(\mathbf{P}^{\top}\mathbf{Z}\bigr)^{+}\mathbf{Q}^{\top}, \qquad \hat{\mathbf{Y}}(\mathbf{X}^{\star}) = \mathbf{X}^{\star}\mathbf{B}.$$ Because $\mathbf{B}$ lives in the *original* feature space, **the fitted model is a single linear calibration on the raw wavelength grid: there is no preprocessing stage to replay at predict time** — the operator has been absorbed into the coefficients (paper §3.2). Computationally the bank exploration cost is roughly that of a single SIMPLS fit on $\mathbf{S}$ plus $B$ tiny left actions, which is the algorithmic gain that makes AOM-PLS comparable to vanilla PLS even with a $\sim$77-operator default bank. ### Implementation `n4m_aom_global_select` via the native C ABI. Python exposes this as `n4m.aom_global_select` and the catalog alias `n4m.aom_pls`; the wrapper builds the compact strict-linear bank by default and also accepts caller-provided strict operators. Result buffers include `input_coefficients` and `intercept`, so callers can reuse the selected model on new spectra as `X_new @ input_coefficients + intercept` without replaying the selected operator. The sklearn-style `n4m.sklearn.NativeAOMPLSRegressor` wraps the same native result. Reference: git-pinned oracle `nirs4all.operators.models.sklearn.aom_pls.AOMPLSRegressor` (sanctioned exception). MATLAB header (`bindings/matlab/+pls4all/aom_pls.m`): ```text pls4all.aom_pls AOM-PLS global operator selection. ``` ### Usage Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in `benchmarks.parity_timing.registry`. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN `pls` package (`plsr`, `pcr`, `mvr`) and for the `mdatools::pls(x, y, ...)` matrix idiom — those tabs appear only on the methods that have a meaningful equivalence. **pls4all bindings** ::::{tab-set} :class: pls4all-bindings :::{tab-item} C ABI · libn4m :sync: c :class-label: lang-c ```c /* C ABI — libn4m AOM/POP selector path */ n4m_context_t* ctx = n4m_context_create(); n4m_config_t* cfg = n4m_config_create(); n4m_operator_bank_t* bank = NULL; n4m_validation_plan_t* plan = NULL; n4m_aom_global_result_t* res = NULL; n4m_operator_bank_create(&bank); /* add compact nirs4all-style operators: identity, SG, detrend, FD */ n4m_validation_plan_create(&plan); /* fill CV folds on plan */ n4m_aom_global_select(ctx, cfg, bank, &x_view, &y_view, plan, /* max_components */ 2, &res); /* read predictions and selection diagnostics via result getters */ n4m_aom_global_result_destroy(res); n4m_validation_plan_destroy(plan); n4m_operator_bank_destroy(bank); n4m_config_destroy(cfg); n4m_context_destroy(ctx); ``` ::: :::{tab-item} Python · pls4all (raw) :sync: python-raw :class-label: lang-python ```python import n4m res = n4m.aom_pls( X, y, max_components=2, cv=4, operators=[ "identity", ("savgol_smooth", [5, 2]), ("finite_difference", [1]), ], ) yhat = res["predictions"] rmse_curves = res["rmse_curves"] coef = res["input_coefficients"] intercept = res["intercept"] yhat_new = X_new @ coef + intercept ``` ::: :::{tab-item} Python · pls4all.sklearn :sync: python-sklearn :class-label: lang-python ```python from n4m.sklearn import NativeAOMPLSRegressor model = NativeAOMPLSRegressor(max_components=2, cv=4).fit(X, y) yhat_new = model.predict(X_new) ``` ::: :::{tab-item} R · pls4all_method() :sync: r-dispatcher :class-label: lang-r ```r library(pls4all) # Unified low-level dispatcher (May 2026 R cleanup): res <- pls4all_method("aom_pls", X, y, n_components = 2L, params = list(max_components = 3L, n_operators = 9L, cv = 3L)) # res is a named list with MethodResult arrays/scalars. # selected_indices / top_k_intervals are 1-based. ``` ::: :::{tab-item} MATLAB · pls4all (MEX) :sync: matlab-mex :class-label: lang-matlab ```matlab res = pls4all.aom_pls(X, y, 2); % see header of bindings/matlab/+pls4all/aom_pls.m for full % parameter surface: % res = aom_pls(X, Y, max_components, n_operators, cv) yhat = predict(res, Xtest); ``` ::: :::{tab-item} MATLAB · pls4all (classdef) :sync: matlab-classdef :class-label: lang-matlab _No idiomatic classdef wrapper — invoke `pls4all.fit("aom_pls", X, y, …)` directly from the unified MEX factory._ ::: :::: **Registry parity references** 📐 :::{card} :class-card: external-refs - 📐 **`nirs4all`** (python · python) — `nirs4all` in-tree · qualitative (rmse_rel ≤ 5e+00) — In-tree nirs4all AOM/POP estimator stack (sanctioned reference). The pls4all ABI uses the same compact strict-linear bank and contiguous folds for cross-binding determinism; nirs4all remains the qualitative algorithmic reference. ::: ### Benchmarks Adaptive wall-clock per cell measured against [`full_matrix.csv`](../benchmarks/overview.md). Only backends that implement this method are listed; libraries without the method are omitted. **Verdict**  ·  ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance  ·  ✓ bind = pls4all binding agrees with the C++ baseline  ·  ✗ divergent  ·  ⚠ error  ·  — not run. The fastest backend per column is marked 🏆. **Reference gate**: qualitative — shape/smoke comparison only. The external library and pls4all do not produce numerically equivalent output for this method (see the MethodSpec notes); the `rmse_rel_tol ≤ 5e+00` budget is set wide on purpose. Treat ~ shape as *“we ran both, both finished”*, not as numerical agreement. Rows tagged with **📐** are the canonical parity references for this method (declared in [`parity_timing.registry`](../benchmarks/methodology.md)). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band. ::::{tab-set} :class: parity-tabs :::{tab-item} 1 thread :sync: threads-1
BackendParity50×250 (ms)100×50 (ms)100×500 (ms)100×2500 (ms)200×40 (ms)250×50 (ms)500×50 (ms)500×500 (ms)500×2500 (ms)2500×50 (ms)2500×500 (ms)2500×2500 (ms)10000×50 (ms)10000×500 (ms)
C++ native · libn4m
pls4all.cpp.blas≈ +6e-166.91 ms4.67 ms41.9 ms266.5 ms4.42 ms8.21 ms17.0 ms210.2 ms🏆1.2 s116.0 ms1.2 s7.7 s490.2 ms6.0 s
pls4all.cpp.blas+omp≈ +6e-167.00 ms4.09 ms🏆31.9 ms🏆241.5 ms🏆4.16 ms🏆6.55 ms🏆16.2 ms🏆228.9 ms1.2 s🏆109.5 ms🏆1.2 s🏆7.7 s456.2 ms🏆5.9 s🏆
pls4all.cpp.omp≈ +1e-157.22 ms4.21 ms40.5 ms251.8 ms4.41 ms7.75 ms20.1 ms224.8 ms1.3 s123.3 ms1.3 s7.7 s🏆473.2 ms6.1 s
pls4all.cpp.ref≈ +1e-157.04 ms4.74 ms39.3 ms259.6 ms4.39 ms7.39 ms19.0 ms222.0 ms1.3 s116.1 ms1.3 s7.9 s489.6 ms6.1 s
Python · pls4all
pls4all.python✓ bind6.54 ms🏆4.59 ms7.17 ms
pls4all.sklearn✓ bind6.75 ms6.81 ms6.91 ms
R · pls4all
pls4all.R✓ 6e-1515.2 ms7.80 ms19.1 ms
pls4all.R.formula✓ 6e-1529.3 ms10.5 ms17.0 ms
pls4all.R.mdatools✓ 6e-1528.6 ms8.82 ms13.6 ms
pls4all.R.pls✓ 6e-1527.6 ms10.5 ms16.9 ms
MATLAB · pls4all
pls4all.matlab✗ +1e+0110.5 ms4.58 ms9.08 ms
pls4all.matlab.classdef✗ +1e+0111.7 ms5.10 ms9.72 ms
Python · external
📐nirs4all12.4 ms
::: :::{tab-item} 3 threads :sync: threads-3
BackendParity50×250 (ms)100×50 (ms)100×500 (ms)100×2500 (ms)200×40 (ms)250×50 (ms)500×50 (ms)500×500 (ms)500×2500 (ms)2500×50 (ms)2500×500 (ms)2500×2500 (ms)10000×50 (ms)10000×500 (ms)
C++ native · libn4m
pls4all.cpp.blas~ shape 6e-166.31 ms
pls4all.cpp.blas+omp~ shape 6e-165.93 ms
pls4all.cpp.omp~ shape 1e-154.28 ms
pls4all.cpp.ref~ shape 1e-155.40 ms
Python · pls4all
pls4all.python✓ 6e-154.15 ms🏆
pls4all.sklearn✓ 6e-154.25 ms
R · pls4all
pls4all.R✓ bind7.68 ms
pls4all.R.formula✓ bind10.4 ms
pls4all.R.mdatools✓ bind9.05 ms
pls4all.R.pls✓ bind8.40 ms
MATLAB · pls4all
pls4all.matlab✗ +1e+014.60 ms
pls4all.matlab.classdef✗ +1e+015.65 ms
Python · external
📐nirs4allsource13.7 ms
::: :::{tab-item} 10 threads :sync: threads-10
BackendParity50×250 (ms)100×50 (ms)100×500 (ms)100×2500 (ms)200×40 (ms)250×50 (ms)500×50 (ms)500×500 (ms)500×2500 (ms)2500×50 (ms)2500×500 (ms)2500×2500 (ms)10000×50 (ms)10000×500 (ms)
C++ native · libn4m
pls4all.cpp.blas~ shape 6e-163.97 ms
pls4all.cpp.blas+omp~ shape 6e-163.93 ms
pls4all.cpp.omp~ shape 1e-154.11 ms
pls4all.cpp.ref~ shape 1e-154.11 ms
Python · pls4all
pls4all.python✓ 6e-153.97 ms
pls4all.sklearn✓ 6e-153.85 ms🏆
R · pls4all
pls4all.R✓ bind6.42 ms
pls4all.R.formula✓ bind7.29 ms
pls4all.R.mdatools✓ bind7.37 ms
pls4all.R.pls✓ bind7.14 ms
MATLAB · pls4all
pls4all.matlab✗ +1e+014.25 ms
pls4all.matlab.classdef✗ +1e+014.51 ms
Python · external
📐nirs4allsource11.1 ms
::: :::: --- _See also_: [benchmark overview](../benchmarks/overview.md) · [methods index](index.md) · [interactive dashboard](../landing/dashboard.md)