pop_pls — POP-PLS (per-component operator selection)¶
Group: Adaptive · Registry tolerance: 5.0
Description¶
POP-PLS — per-component adaptive operator selection
Registry note — POPPLS/POP-PLS uses per-component operator selection over the same compact nirs4all bank. Reference is the in-tree nirs4all POPPLSRegressor; parity is qualitative.
Parameters¶
Name |
Type |
Default |
Notes |
|---|---|---|---|
|
|
|
registry benchmark cell value |
|
|
|
registry benchmark cell value |
|
|
|
registry benchmark cell value |
Explanations¶
Bibliographic source¶
Beurier, G., Reiter, R., Noûs, C., Rouan, L. & Cornet, D. (2026). Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: a large-scale benchmark of operator-adaptive PLS and Ridge models. arXiv:2605.13587. https://arxiv.org/abs/2605.13587.
Mathematical principle¶
POP-PLS (Per-Operator PLS) is the per-component ablation of AOM-PLS: each latent component may pick a different operator from the bank, rather than committing to one global operator. The setting is the same — centered \(\mathbf{X} \in \mathbb{R}^{n\times p}\), response \(\mathbf{Y}\), strict-linear bank \(\{\mathbf{A}_b\}_{b=1}^{B}\), cross-covariance matrix \(\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}\) — but the selection rule is local to each component.
Per-component greedy selection. Initialise \(\mathbf{S}^{(0)} \leftarrow \mathbf{S}\). For \(a = 1, \dots, K\):
Score the bank on the current deflated cross-covariance: for every \(b\) evaluate the criterion \(\mathcal{C}_a(b)\) of the SIMPLS-covariance step that would result from picking operator \(b\) at component \(a\) (covariance proxy \(\lVert\mathbf{A}_b\mathbf{S}^{(a-1)}\rVert\), K-fold CV-RMSE on the resulting prefix, or approximate PRESS — same family of criteria as AOM-PLS).
Pick the local minimiser \(b_a = \operatorname*{arg\,min}_b \mathcal{C}_a(b)\).
Extract the component \(\mathbf{r}_a = \mathbf{u}_1\!\bigl(\mathbf{A}_{b_a}\mathbf{S}^{(a-1)}\bigr)\) in transformed space and lift it back through the component-specific adjoint:
Deflate in the original space so that the next component sees a residual cross-covariance free of \(\mathbf{t}_a\):
Closed-form coefficient. With the selected sequence \((b_1, \dots, b_K)\) the model coefficients use exactly the same SIMPLS recovery formula as AOM-PLS, only with a component-dependent adjoint:
\(\mathbf{B}\) lives in the original wavelength space, so — exactly as for AOM-PLS — predictions are a single dot product \(\hat{\mathbf{Y}}(\mathbf{X}^{\star}) = \mathbf{X}^{\star}\mathbf{B}\), with no preprocessing replay at predict time. The relaxation buys wavelength-region adaptivity (the model can pick a smoother for one component and a derivative for the next), at the cost of \(B\) extra cheap left actions per component.
Implementation¶
n4m_aom_per_component_select via the native C ABI. Python exposes this as
n4m.aom_per_component_select and the catalog alias n4m.pop_pls; the wrapper
uses the same compact strict-linear default bank as AOM-PLS and accepts
caller-provided strict operators. Result buffers include input_coefficients
and intercept, so callers can reuse the selected per-component model on new
spectra as X_new @ input_coefficients + intercept. The sklearn-style
n4m.sklearn.NativePOPPLSRegressor wraps the same native result. Reference:
git-pinned oracle
nirs4all.operators.models.sklearn.aom_pls.POPPLSRegressor (sanctioned
exception).
MATLAB header (bindings/matlab/+pls4all/pop_pls.m):
pls4all.pop_pls POP-PLS per-component operator selection.
Usage¶
Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in benchmarks.parity_timing.registry. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN pls package (plsr, pcr, mvr) and for the mdatools::pls(x, y, ...) matrix idiom — those tabs appear only on the methods that have a meaningful equivalence.
pls4all bindings
/* C ABI — libn4m AOM/POP selector path */
n4m_context_t* ctx = n4m_context_create();
n4m_config_t* cfg = n4m_config_create();
n4m_operator_bank_t* bank = NULL;
n4m_validation_plan_t* plan = NULL;
n4m_aom_per_component_result_t* res = NULL;
n4m_operator_bank_create(&bank);
/* add compact nirs4all-style operators: identity, SG, detrend, FD */
n4m_validation_plan_create(&plan);
/* fill CV folds on plan */
n4m_aom_per_component_select(ctx, cfg, bank, &x_view, &y_view, plan,
/* max_components */ 2, &res);
/* read predictions and selection diagnostics via result getters */
n4m_aom_per_component_result_destroy(res);
n4m_validation_plan_destroy(plan);
n4m_operator_bank_destroy(bank);
n4m_config_destroy(cfg);
n4m_context_destroy(ctx);
import n4m
res = n4m.pop_pls(
X,
y,
max_components=2,
cv=4,
operators=[
"identity",
("savgol_smooth", [5, 2]),
("finite_difference", [1]),
],
)
yhat = res["predictions"]
selected_ops = res["selected_operator_indices"]
coef = res["input_coefficients"]
intercept = res["intercept"]
yhat_new = X_new @ coef + intercept
from n4m.sklearn import NativePOPPLSRegressor
model = NativePOPPLSRegressor(max_components=2, cv=4).fit(X, y)
yhat_new = model.predict(X_new)
library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("pop_pls", X, y,
n_components = 2L, params = list(max_components = 3L, n_operators = 9L, cv = 3L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.
res = pls4all.pop_pls(X, y, 2);
% see header of bindings/matlab/+pls4all/pop_pls.m for full
% parameter surface:
% res = pop_pls(X, Y, max_components, n_operators, cv)
yhat = predict(res, Xtest);
No idiomatic classdef wrapper — invoke pls4all.fit("pop_pls", X, y, …) directly from the unified MEX factory.
Registry parity references 📐
📐
nirs4all(python · python) —nirs4allin-tree · qualitative (rmse_rel ≤ 5e+00) — In-tree nirs4all AOM/POP estimator stack (sanctioned reference). The pls4all ABI uses the same compact strict-linear bank and contiguous folds for cross-binding determinism; nirs4all remains the qualitative algorithmic reference.
Benchmarks¶
Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.
Verdict · ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance · ✓ bind = pls4all binding agrees with the C++ baseline · ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle · ✗ divergent · ⚠ error · — not run. The fastest backend per column is marked 🏆.
Reference gate: strict — numeric equivalence (rmse_rel_tol ≤ 1e-08).
Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.
| Backend | Parity | 200×40 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref 5e-15 | 6.24 ms🏆 |
| Python · pls4all | ||
pls4all.python | ✓ bind | 7.23 ms |
pls4all.sklearn | ✓ bind | 6.51 ms |
| R · pls4all | ||
pls4all.R | ✓ bind | 12.0 ms |
pls4all.R.formula | ✓ bind | 12.5 ms |
pls4all.R.mdatools | ✓ bind | 24.7 ms |
pls4all.R.pls | ✓ bind | 13.2 ms |
| Python · external | ||
📐nirs4all | source | 47.8 ms |
| Backend | Parity | 200×40 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref 5e-15 | 7.34 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 7.46 ms |
pls4all.sklearn | ✓ bind | 6.77 ms🏆 |
| R · pls4all | ||
pls4all.R | ✓ bind | 13.2 ms |
pls4all.R.formula | ✓ bind | 12.9 ms |
pls4all.R.mdatools | ✓ bind | 10.0 ms |
pls4all.R.pls | ✓ bind | 12.4 ms |
| Python · external | ||
📐nirs4all | source | 36.4 ms |
| Backend | Parity | 200×40 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref 5e-15 | 15.3 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 11.1 ms |
pls4all.sklearn | ✓ bind | 7.06 ms🏆 |
| R · pls4all | ||
pls4all.R | ✓ bind | 34.2 ms |
pls4all.R.formula | ✓ bind | 22.8 ms |
pls4all.R.mdatools | ✓ bind | 13.5 ms |
pls4all.R.pls | ✓ bind | 12.5 ms |
| Python · external | ||
📐nirs4all | source | 38.6 ms |
See also: benchmark overview · methods index · interactive dashboard