pop_pls — POP-PLS (per-component operator selection)

Group: Adaptive · Registry tolerance: 5.0

Description

POP-PLS — per-component adaptive operator selection

Registry note — POPPLS/POP-PLS uses per-component operator selection over the same compact nirs4all bank. Reference is the in-tree nirs4all POPPLSRegressor; parity is qualitative.

Parameters

Name

Type

Default

Notes

max_components

int

3

registry benchmark cell value

n_operators

int

9

registry benchmark cell value

cv

int

3

registry benchmark cell value

Explanations

Bibliographic source

Beurier, G., Reiter, R., Noûs, C., Rouan, L. & Cornet, D. (2026). Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: a large-scale benchmark of operator-adaptive PLS and Ridge models. arXiv:2605.13587. https://arxiv.org/abs/2605.13587.

Mathematical principle

POP-PLS (Per-Operator PLS) is the per-component ablation of AOM-PLS: each latent component may pick a different operator from the bank, rather than committing to one global operator. The setting is the same — centered \(\mathbf{X} \in \mathbb{R}^{n\times p}\), response \(\mathbf{Y}\), strict-linear bank \(\{\mathbf{A}_b\}_{b=1}^{B}\), cross-covariance matrix \(\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}\) — but the selection rule is local to each component.

Per-component greedy selection. Initialise \(\mathbf{S}^{(0)} \leftarrow \mathbf{S}\). For \(a = 1, \dots, K\):

  1. Score the bank on the current deflated cross-covariance: for every \(b\) evaluate the criterion \(\mathcal{C}_a(b)\) of the SIMPLS-covariance step that would result from picking operator \(b\) at component \(a\) (covariance proxy \(\lVert\mathbf{A}_b\mathbf{S}^{(a-1)}\rVert\), K-fold CV-RMSE on the resulting prefix, or approximate PRESS — same family of criteria as AOM-PLS).

  2. Pick the local minimiser \(b_a = \operatorname*{arg\,min}_b \mathcal{C}_a(b)\).

  3. Extract the component \(\mathbf{r}_a = \mathbf{u}_1\!\bigl(\mathbf{A}_{b_a}\mathbf{S}^{(a-1)}\bigr)\) in transformed space and lift it back through the component-specific adjoint:

\[\mathbf{z}_a \;=\; \mathbf{A}_{b_a}^{\top}\,\mathbf{r}_a, \qquad \mathbf{t}_a = \mathbf{X}\mathbf{z}_a.\]
  1. Deflate in the original space so that the next component sees a residual cross-covariance free of \(\mathbf{t}_a\):

\[\mathbf{S}^{(a)} \;=\; \bigl(\mathbf{I}_p - \mathbf{v}_a\mathbf{v}_a^{\top}\bigr)\mathbf{S}^{(a-1)}, \quad \mathbf{v}_a = \mathbf{p}_a / \lVert\mathbf{p}_a\rVert, \quad \mathbf{p}_a = \mathbf{X}^{\top}\mathbf{t}_a / \lVert\mathbf{t}_a\rVert^{2}.\]

Closed-form coefficient. With the selected sequence \((b_1, \dots, b_K)\) the model coefficients use exactly the same SIMPLS recovery formula as AOM-PLS, only with a component-dependent adjoint:

\[\mathbf{Z} = \bigl[\mathbf{A}_{b_1}^{\top}\mathbf{r}_1\;\cdots\;\mathbf{A}_{b_K}^{\top}\mathbf{r}_K\bigr], \qquad \mathbf{B} = \mathbf{Z}\bigl(\mathbf{P}^{\top}\mathbf{Z}\bigr)^{+}\mathbf{Q}^{\top}.\]

\(\mathbf{B}\) lives in the original wavelength space, so — exactly as for AOM-PLS — predictions are a single dot product \(\hat{\mathbf{Y}}(\mathbf{X}^{\star}) = \mathbf{X}^{\star}\mathbf{B}\), with no preprocessing replay at predict time. The relaxation buys wavelength-region adaptivity (the model can pick a smoother for one component and a derivative for the next), at the cost of \(B\) extra cheap left actions per component.

Implementation

n4m_aom_per_component_select via the native C ABI. Python exposes this as n4m.aom_per_component_select and the catalog alias n4m.pop_pls; the wrapper uses the same compact strict-linear default bank as AOM-PLS and accepts caller-provided strict operators. Result buffers include input_coefficients and intercept, so callers can reuse the selected per-component model on new spectra as X_new @ input_coefficients + intercept. The sklearn-style n4m.sklearn.NativePOPPLSRegressor wraps the same native result. Reference: git-pinned oracle nirs4all.operators.models.sklearn.aom_pls.POPPLSRegressor (sanctioned exception).

MATLAB header (bindings/matlab/+pls4all/pop_pls.m):

pls4all.pop_pls  POP-PLS per-component operator selection.

Usage

Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in benchmarks.parity_timing.registry. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN pls package (plsr, pcr, mvr) and for the mdatools::pls(x, y, ...) matrix idiom — those tabs appear only on the methods that have a meaningful equivalence.

pls4all bindings

/* C ABI — libn4m AOM/POP selector path */
n4m_context_t* ctx = n4m_context_create();
n4m_config_t*  cfg = n4m_config_create();
n4m_operator_bank_t* bank = NULL;
n4m_validation_plan_t* plan = NULL;
n4m_aom_per_component_result_t* res = NULL;
n4m_operator_bank_create(&bank);
/* add compact nirs4all-style operators: identity, SG, detrend, FD */
n4m_validation_plan_create(&plan);
/* fill CV folds on plan */
n4m_aom_per_component_select(ctx, cfg, bank, &x_view, &y_view, plan,
              /* max_components */ 2, &res);
/* read predictions and selection diagnostics via result getters */
n4m_aom_per_component_result_destroy(res);
n4m_validation_plan_destroy(plan);
n4m_operator_bank_destroy(bank);
n4m_config_destroy(cfg);
n4m_context_destroy(ctx);
import n4m

res = n4m.pop_pls(
    X,
    y,
    max_components=2,
    cv=4,
    operators=[
        "identity",
        ("savgol_smooth", [5, 2]),
        ("finite_difference", [1]),
    ],
)
yhat = res["predictions"]
selected_ops = res["selected_operator_indices"]
coef = res["input_coefficients"]
intercept = res["intercept"]
yhat_new = X_new @ coef + intercept
from n4m.sklearn import NativePOPPLSRegressor

model = NativePOPPLSRegressor(max_components=2, cv=4).fit(X, y)
yhat_new = model.predict(X_new)
library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("pop_pls", X, y,
                      n_components = 2L, params = list(max_components = 3L, n_operators = 9L, cv = 3L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.
res = pls4all.pop_pls(X, y, 2);
% see header of bindings/matlab/+pls4all/pop_pls.m for full
% parameter surface:
%   res = pop_pls(X, Y, max_components, n_operators, cv)
yhat = predict(res, Xtest);

No idiomatic classdef wrapper — invoke pls4all.fit("pop_pls", X, y, …) directly from the unified MEX factory.

Registry parity references 📐

  • 📐 nirs4all (python · python) — nirs4all in-tree · qualitative (rmse_rel ≤ 5e+00) — In-tree nirs4all AOM/POP estimator stack (sanctioned reference). The pls4all ABI uses the same compact strict-linear bank and contiguous folds for cross-binding determinism; nirs4all remains the qualitative algorithmic reference.

Benchmarks

Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.

Verdict  ·  ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance  ·  ✓ bind = pls4all binding agrees with the C++ baseline  ·  ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle  ·  ✗ divergent  ·  ⚠ error  ·  — not run. The fastest backend per column is marked 🏆.

Reference gate: strict — numeric equivalence (rmse_rel_tol 1e-08).

Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.

BackendParity200×40 (ms)
C++ native · libn4m
pls4all.cpp.blas+omp✓ ref 5e-156.24 ms🏆
Python · pls4all
pls4all.python✓ bind7.23 ms
pls4all.sklearn✓ bind6.51 ms
R · pls4all
pls4all.R✓ bind12.0 ms
pls4all.R.formula✓ bind12.5 ms
pls4all.R.mdatools✓ bind24.7 ms
pls4all.R.pls✓ bind13.2 ms
Python · external
📐nirs4allsource47.8 ms
BackendParity200×40 (ms)
C++ native · libn4m
pls4all.cpp.blas+omp✓ ref 5e-157.34 ms
Python · pls4all
pls4all.python✓ bind7.46 ms
pls4all.sklearn✓ bind6.77 ms🏆
R · pls4all
pls4all.R✓ bind13.2 ms
pls4all.R.formula✓ bind12.9 ms
pls4all.R.mdatools✓ bind10.0 ms
pls4all.R.pls✓ bind12.4 ms
Python · external
📐nirs4allsource36.4 ms
BackendParity200×40 (ms)
C++ native · libn4m
pls4all.cpp.blas+omp✓ ref 5e-1515.3 ms
Python · pls4all
pls4all.python✓ bind11.1 ms
pls4all.sklearn✓ bind7.06 ms🏆
R · pls4all
pls4all.R✓ bind34.2 ms
pls4all.R.formula✓ bind22.8 ms
pls4all.R.mdatools✓ bind13.5 ms
pls4all.R.pls✓ bind12.5 ms
Python · external
📐nirs4allsource38.6 ms

See also: benchmark overview · methods index · interactive dashboard