`pop_pls` — POP-PLS (per-component operator selection)¶

Group: Adaptive · Registry tolerance: 1e-08

Description¶

POP-PLS — per-component adaptive operator selection

Registry note — POPPLS/POP-PLS uses per-component operator selection over the same compact nirs4all bank. Reference is the in-tree nirs4all POPPLSRegressor; parity is qualitative.

Parameters¶

Name	Type	Default	Notes
`max_components`	`int`	`3`	registry benchmark cell value
`n_operators`	`int`	`9`	registry benchmark cell value
`cv`	`int`	`3`	registry benchmark cell value

Explanations¶

Bibliographic source¶

Beurier, G., Reiter, R., Noûs, C., Rouan, L. & Cornet, D. (2026). Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: a large-scale benchmark of operator-adaptive PLS and Ridge models. arXiv:2605.13587. https://arxiv.org/abs/2605.13587.

Mathematical principle¶

POP-PLS (Per-Operator PLS) is the per-component ablation of AOM-PLS: each latent component may pick a different operator from the bank, rather than committing to one global operator. The setting is the same — centered \(\mathbf{X} \in \mathbb{R}^{n\times p}\), response \(\mathbf{Y}\), strict-linear bank \(\{\mathbf{A}_b\}_{b=1}^{B}\), cross-covariance matrix \(\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}\) — but the selection rule is local to each component.

Per-component greedy selection. Initialise \(\mathbf{S}^{(0)} \leftarrow \mathbf{S}\). For \(a = 1, \dots, K\):

Score the bank on the current deflated cross-covariance: for every \(b\) evaluate the criterion \(\mathcal{C}_a(b)\) of the SIMPLS-covariance step that would result from picking operator \(b\) at component \(a\) (covariance proxy \(\lVert\mathbf{A}_b\mathbf{S}^{(a-1)}\rVert\), K-fold CV-RMSE on the resulting prefix, or approximate PRESS — same family of criteria as AOM-PLS).
Pick the local minimiser \(b_a = \operatorname*{arg\,min}_b \mathcal{C}_a(b)\).
Extract the component \(\mathbf{r}_a = \mathbf{u}_1\!\bigl(\mathbf{A}_{b_a}\mathbf{S}^{(a-1)}\bigr)\) in transformed space and lift it back through the component-specific adjoint:

\[\mathbf{z}_a \;=\; \mathbf{A}_{b_a}^{\top}\,\mathbf{r}_a, \qquad \mathbf{t}_a = \mathbf{X}\mathbf{z}_a.\]

Deflate in the original space so that the next component sees a residual cross-covariance free of \(\mathbf{t}_a\):

\[\mathbf{S}^{(a)} \;=\; \bigl(\mathbf{I}_p - \mathbf{v}_a\mathbf{v}_a^{\top}\bigr)\mathbf{S}^{(a-1)}, \quad \mathbf{v}_a = \mathbf{p}_a / \lVert\mathbf{p}_a\rVert, \quad \mathbf{p}_a = \mathbf{X}^{\top}\mathbf{t}_a / \lVert\mathbf{t}_a\rVert^{2}.\]

Closed-form coefficient. With the selected sequence \((b_1, \dots, b_K)\) the model coefficients use exactly the same SIMPLS recovery formula as AOM-PLS, only with a component-dependent adjoint:

\[\mathbf{Z} = \bigl[\mathbf{A}_{b_1}^{\top}\mathbf{r}_1\;\cdots\;\mathbf{A}_{b_K}^{\top}\mathbf{r}_K\bigr], \qquad \mathbf{B} = \mathbf{Z}\bigl(\mathbf{P}^{\top}\mathbf{Z}\bigr)^{+}\mathbf{Q}^{\top}.\]

\(\mathbf{B}\) lives in the original wavelength space, so — exactly as for AOM-PLS — predictions are a single dot product \(\hat{\mathbf{Y}}(\mathbf{X}^{\star}) = \mathbf{X}^{\star}\mathbf{B}\), with no preprocessing replay at predict time. The relaxation buys wavelength-region adaptivity (the model can pick a smoother for one component and a derivative for the next), at the cost of \(B\) extra cheap left actions per component.

Implementation¶

n4m_model_selection_pop_pls_select via the Python/R/MATLAB dispatchers. Uses the same compact strict-linear bank as AOM-PLS; the per-component greedy is implemented in select_per_component (aom_nirs/pls/selection.py). Reference: git-pinned oracle nirs4all.operators.models.sklearn.aom_pls.POPPLSRegressor (sanctioned exception).

R roxygen note (methods_extra.R::pop_pls):

POP-PLS with per-component operator selection.

MATLAB header (bindings/matlab/+pls4all/pop_pls.m):

pls4all.pop_pls  POP-PLS per-component operator selection.

Usage¶

Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in benchmarks.parity_timing.registry. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN pls package (plsr, pcr, mvr) and for the mdatools::pls(x, y, ...) matrix idiom — those tabs appear only on the methods that have a meaningful equivalence.

pls4all bindings

C ABI · libn4m

/* C ABI — libn4m AOM/POP selector path */
n4m_context_t* ctx = n4m_context_create();
n4m_config_t*  cfg = n4m_config_create();
n4m_operator_bank_t* bank = NULL;
n4m_validation_plan_t* plan = NULL;
n4m_aom_per_component_result_t* res = NULL;
n4m_operator_bank_create(&bank);
/* add compact nirs4all-style operators: identity, SG, detrend, FD */
n4m_validation_plan_create(&plan);
/* fill CV folds on plan */
n4m_model_selection_pop_pls_select(ctx, cfg, bank, &x_view, &y_view, plan,
              /* max_components */ 2, &res);
/* read predictions and selection diagnostics via result getters */
n4m_model_selection_pop_pls_result_destroy(res);
n4m_validation_plan_destroy(plan);
n4m_operator_bank_destroy(bank);
n4m_config_destroy(cfg);
n4m_context_destroy(ctx);

Python · pls4all (raw)

import pls4all

with pls4all.Context() as ctx, pls4all.Config() as cfg:
    bank = pls4all.OperatorBank()
    plan = pls4all.ValidationPlan()
    # Add compact nirs4all-style operators and CV folds.
    res = pls4all.aom_per_component_select(
        ctx, cfg, bank, X.ravel(), y.ravel(), plan,
        max_components=2,
        x_rows=X.shape[0], x_cols=X.shape[1],
        y_rows=y.shape[0], y_cols=1,
    )
    values, rows, cols = res.predictions

Python · pls4all.sklearn

No tier-2 sklearn-style class yet — exposed via the pls4all.aom_global_select / pls4all.aom_per_component_select low-level ABI.

R · pls4all_method()

library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("pop_pls", X, y,
                      n_components = 2L, params = list(max_components = 3L, n_operators = 9L, cv = 3L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.

R · pls4all (raw fn)

library(pls4all)
res  <- pop_pls(X, Y, max_components = 3L, n_operators = 9L, cv = 3L)
yhat <- pls4all_predict(res, X_test)

MATLAB · pls4all (MEX)

res = pls4all.pop_pls(X, y, 2);
% see header of bindings/matlab/+pls4all/pop_pls.m for full
% parameter surface:
%   res = pop_pls(X, Y, max_components, n_operators, cv)
yhat = predict(res, Xtest);

MATLAB · pls4all (classdef)

No idiomatic classdef wrapper — invoke pls4all.fit("pop_pls", X, y, …) directly from the unified MEX factory.

Registry parity references 📐

📐 nirs4all (python · python) — nirs4all in-tree · strict (rmse_rel ≤ 1e-08) — In-tree nirs4all AOM/POP estimator stack (sanctioned reference). The pls4all ABI uses the same compact strict-linear bank and contiguous folds for cross-binding determinism; nirs4all remains the qualitative algorithmic reference.

Benchmarks¶

Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.

Verdict · ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance · ✓ bind = pls4all binding agrees with the C++ baseline · ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle · ✗ divergent · ⚠ error · — not run. The fastest backend per column is marked 🏆.

Reference gate: strict — numeric equivalence (rmse_rel_tol ≤ 1e-08).

Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.

1 thread

Backend	Parity	200×40 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ ref 5e-15	6.24 ms🏆
Python · pls4all
`pls4all.python`	✓ bind	7.23 ms
`pls4all.sklearn`	✓ bind	6.51 ms
R · pls4all
`pls4all.R`	✓ bind	12.0 ms
`pls4all.R.formula`	✓ bind	12.5 ms
`pls4all.R.mdatools`	✓ bind	24.7 ms
`pls4all.R.pls`	✓ bind	13.2 ms
Python · external
📐`nirs4all`	source	47.8 ms

3 threads

Backend	Parity	200×40 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ ref 5e-15	7.34 ms
Python · pls4all
`pls4all.python`	✓ bind	7.46 ms
`pls4all.sklearn`	✓ bind	6.77 ms🏆
R · pls4all
`pls4all.R`	✓ bind	13.2 ms
`pls4all.R.formula`	✓ bind	12.9 ms
`pls4all.R.mdatools`	✓ bind	10.0 ms
`pls4all.R.pls`	✓ bind	12.4 ms
Python · external
📐`nirs4all`	source	36.4 ms

10 threads

Backend	Parity	200×40 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ ref 5e-15	15.3 ms
Python · pls4all
`pls4all.python`	✓ bind	11.1 ms
`pls4all.sklearn`	✓ bind	7.06 ms🏆
R · pls4all
`pls4all.R`	✓ bind	34.2 ms
`pls4all.R.formula`	✓ bind	22.8 ms
`pls4all.R.mdatools`	✓ bind	13.5 ms
`pls4all.R.pls`	✓ bind	12.5 ms
Python · external
📐`nirs4all`	source	38.6 ms

nirs4all-methods

Navigation

`pop_pls` — POP-PLS (per-component operator selection)¶

Description¶

Parameters¶

Explanations¶

Bibliographic source¶

Mathematical principle¶

Implementation¶

Usage¶

Benchmarks¶

pop_pls — POP-PLS (per-component operator selection)¶

Description¶

Parameters¶

Explanations¶

Bibliographic source¶

Mathematical principle¶

Implementation¶

Usage¶

Benchmarks¶

`pop_pls` — POP-PLS (per-component operator selection)¶