`pls` — PLS regression (SIMPLS)¶

Group: Core PLS · Registry tolerance: 1e-08

Description¶

SIMPLS PLS regression baseline

From the pls4all.sklearn.PLSRegression docstring:

Partial Least Squares regression backed by pls4all’s C core.

Full Python sklearn-wrapper docstring

Partial Least Squares regression backed by `pls4all`'s C core.

Drop-in replacement for `sklearn.cross_decomposition.PLSRegression`
with two distinguishing knobs:

* `solver` selects the inner algorithm (NIPALS, SIMPLS, SVD, …)
  directly — sklearn only exposes 'nipals' / 'svd'.
* Round-trip via `pickle.dumps` is bit-exact, backed by the raw C ABI
  N4MM fitted-model payload (`n4m_model_export_to_buffer`).

Parameters
----------
n_components : int, default=2
    Number of latent components.
solver : str, default='simpls'
    One of 'nipals', 'simpls', 'orthogonal-scores', 'kernel',
    'wide-kernel', 'svd', 'power', 'randomized-svd'.
center_x, scale_x : bool, default=True
    Standardize X columns to zero mean / unit variance.
center_y : bool, default=True
    Center y to zero mean.
scale_y : bool, default=False
    Standardize y columns to unit variance.
tol : float, default=1e-6
    Convergence tolerance for iterative solvers.
max_iter : int, default=500
    Max NIPALS iterations.
store_scores : bool, default=False
    Keep the latent score matrices (`x_scores_`) after fit.

Registry note — Baseline SIMPLS cell. sklearn uses NIPALS and ikpls uses improved-kernel PLS, so exact bit parity is not expected; the row exists to anchor timing comparisons.

Parameters¶

Name	Type	Default	Notes
`n_components`	`int`	`2`	Number of latent components extracted (k).
`solver`	`str`	`'simpls'`	Inner algorithm: ‘nipals’, ‘simpls’, ‘svd’, ‘kernel’, ‘orthogonal-scores’, ‘power’, ‘randomized-svd’, ‘wide-kernel’.
`center_x`	`bool`	`True`	Subtract the column mean of X before fitting.
`scale_x`	`bool`	`True`	Standardize X columns to unit variance before fitting.
`center_y`	`bool`	`True`	Subtract the column mean of y before fitting.
`scale_y`	`bool`	`False`	Standardize y columns to unit variance before fitting.
`tol`	`float`	`1e-06`	Convergence tolerance for iterative solvers (NIPALS / power-iteration).
`max_iter`	`int`	`500`	Maximum iterations for iterative solvers.
`store_scores`	`bool`	`False`	If True, keep the latent score matrix (`x_scores_`) after fit.

Explanations¶

Bibliographic source¶

de Jong, S. (1993). SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems 18(3), 251–263.

Mathematical principle¶

Partial Least Squares regression seeks a set of latent directions in the predictor space that maximise the covariance with the response, in contrast to PCA which maximises only the variance of \(\mathbf{X}\).

Given centred \(\mathbf{X} \in \mathbb{R}^{n\times p}\) and \(\mathbf{Y} \in \mathbb{R}^{n\times q}\), the first PLS component is the unit-norm direction \(\mathbf{w}_1\) maximising \(\operatorname{Cov}(\mathbf{X}\mathbf{w}_1, \mathbf{Y})\). Closed form: \(\mathbf{w}_1 \propto \mathbf{X}^{\top}\mathbf{Y}\) (or its dominant left singular vector when \(q>1\)). Subsequent components are extracted from the deflated residual matrix so the resulting scores \(\mathbf{T} = \mathbf{X}\mathbf{W}\) are orthogonal.

SIMPLS (de Jong 1993) is algebraically equivalent to NIPALS but computes the loading weights directly from the cross-product \(\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}\) without re-deflating \(\mathbf{X}\) at each step. This avoids accumulating floating-point error from iterative deflation and runs in roughly half the time of NIPALS for the same number of components. SIMPLS is the variant exposed by MATLAB’s plsregress.

Once \(k\) latent scores have been extracted the regression coefficients are reconstructed as \(\mathbf{B} = \mathbf{W}(\mathbf{P}^{\top}\mathbf{W})^{-1}\mathbf{Q}^{\top}\), where \(\mathbf{P}, \mathbf{Q}\) are the X- and Y-loadings. Predictions on new \(\mathbf{X}^{\star}\) follow \(\hat{\mathbf{Y}} = \mathbf{X}^{\star}\mathbf{B} + \bar{\mathbf{y}}\). The choice of \(k\) trades bias and variance: use cross-validated PRESS or the one-SE rule of Hastie et al. (2009) to select it.

Implementation¶

Dispatched through Algorithm.PLS_REGRESSION + Solver.SIMPLS in libn4m (the n4m_model_fit C entry point). The same Model.fit / Model.predict surface is used by every binding. NIPALS, SVD, power-iteration, randomised-SVD, orthogonal-scores, kernel and wide-kernel solver variants are all available — see the Solver enum.

R roxygen note (sklearn.R::pls):

Formula-based PLS regression wrapper around the n4m C ABI.

MATLAB header (bindings/matlab/+pls4all/Regression.m):

pls4all.Regression — Statistics Toolbox-style class for PLS regression.

 Tier-2 idiomatic MATLAB / Octave wrapper around the tier-1
 pls4all.pls_fit(X, Y, n_components) primitive. Mirrors the shape
 of MATLAB's built-in RegressionPartialLeastSquares: object-oriented
 properties + methods, factory function `pls4all.fitrpls`, and a

Usage¶

Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in benchmarks.parity_timing.registry. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN pls package (plsr, pcr, mvr) and for the mdatools::pls(x, y, ...) matrix idiom — those tabs appear only on the methods that have a meaningful equivalence.

pls4all bindings

C ABI · libn4m

/* C ABI — libn4m (Model.fit path) */
n4m_context_t* ctx = n4m_context_create();
n4m_config_t*  cfg = n4m_config_create();
n4m_config_set_algorithm(cfg, N4M_ALGORITHM_PLS_REGRESSION);
n4m_config_set_solver   (cfg, N4M_SOLVER_SIMPLS);
n4m_config_set_n_components(cfg, 4);
n4m_model_t* mdl = NULL;
n4m_model_fit(ctx, cfg, &x_view, &y_view, &mdl);
n4m_model_predict(ctx, mdl, &x_test_view, &y_hat_view);
n4m_model_destroy(mdl);
n4m_config_destroy(cfg);
n4m_context_destroy(ctx);

Python · pls4all (raw)

import pls4all
from pls4all import Algorithm, Solver
with pls4all.Context() as ctx, pls4all.Config() as cfg:
    cfg.algorithm = Algorithm.PLS_REGRESSION
    cfg.solver = Solver.SIMPLS
    cfg.n_components = 4
    with pls4all.Model.fit(ctx, cfg, X, y) as mdl:
        y_hat = mdl.predict(X_test)

Python · pls4all.sklearn

from pls4all.sklearn import PLSRegression
mdl = PLSRegression(n_components=2, solver='simpls', center_x=True, scale_x=True, center_y=True, scale_y=False, tol=1e-06, max_iter=500, store_scores=False)
mdl.fit(X, y)
y_hat = mdl.predict(X_test)

R · pls4all_method()

library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("pls", X, y,
                      n_components = 4L)
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.

R · pls4all (formula+S3)

library(pls4all)
fit  <- pls(y ~ ., data = train, ncomp = 4L)
yhat <- predict(fit, newdata = test)
summary(fit)

R · pls package compat

library(pls4all)
# Drop-in for CRAN `pls::plsr` (same signature).
fit  <- plsr(y ~ ., ncomp = 4L, data = train,
                           validation = "CV", segments = 10L)
yhat <- predict(fit, newdata = test, ncomp = 4L)
RMSEP(fit)

R · mdatools compat

library(pls4all)
# Drop-in for `mdatools::pls(x, y, ncomp, method = "simpls")`.
fit  <- pls_mdatools(X, y, ncomp = 4L, method = "simpls",
               center = TRUE, scale = FALSE)
yhat <- predict(fit, newdata = X_test, ncomp = 4L)

MATLAB · pls4all (MEX)

res = pls4all.pls_fit(X, y, 4);
% see header of bindings/matlab/+pls4all/pls_fit.m for full
% parameter surface:
%   [coefs, x_mean, y_mean, predictions] = pls_fit(X, Y, n_components)
yhat = predict(res, Xtest);

MATLAB · pls4all (classdef)

mdl  = pls4all.fit("pls", X, y, "NumComponents", 4);
yhat = predict(mdl, Xtest);

Registry parity references 📐

📐 ref.python_ikpls (python · ikpls) — ikpls MISSING · strict (rmse_rel ≤ 1e-08) — ikpls.numpy_ikpls.PLS algorithm 1.
📐 ref.python_scikit_learn (python · python) — scikit-learn 1.7.2 · strict (rmse_rel ≤ 1e-08) — sklearn.cross_decomposition.PLSRegression(scale=False).
📐 ref.r_mixomics (R · mixOmics) — mixOmics 6.26.0 · strict (rmse_rel ≤ 1e-08) — Bioconductor mixOmics::pls(mode=’regression’, scale=FALSE).
📐 ref.r_pls (R · r) — pls 2.8.5 · strict (rmse_rel ≤ 1e-08) — R pls::plsr(method=’simpls’, scale=FALSE).

Benchmarks¶

Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.

Verdict · ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance · ✓ bind = pls4all binding agrees with the C++ baseline · ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle · ✗ divergent · ⚠ error · — not run. The fastest backend per column is marked 🏆.

Reference gate: strict — numeric equivalence (rmse_rel_tol ≤ 1e-08).

Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.

1 thread

Backend	Parity	200×50 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ ref 6e-16	1.70 ms
Python · pls4all
`pls4all.python`	✓ bind	1.69 ms🏆
`pls4all.sklearn`	✓ bind	1.97 ms
R · pls4all
`pls4all.R`	✓ 7e-15	4.79 ms
`pls4all.R.formula`	✓ 7e-15	5.26 ms
`pls4all.R.mdatools`	✓ 7e-15	5.92 ms
`pls4all.R.pls`	✓ 7e-15	9.99 ms
Python · external
📐`ref.python_ikpls`	⇄ +9e-03	1.92 ms
📐`ref.python_scikit_learn`	source	2.16 ms
R · external
📐`ref.r_mixomics`	⇄ +6e-16	9.72 ms
📐`ref.r_pls`	⇄ +1e-14	8.01 ms

3 threads

Backend	Parity	200×50 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ ref 6e-16	1.79 ms
Python · pls4all
`pls4all.python`	✓ bind	1.76 ms🏆
`pls4all.sklearn`	✓ bind	1.95 ms
R · pls4all
`pls4all.R`	✓ 7e-15	4.43 ms
`pls4all.R.formula`	✓ 7e-15	5.76 ms
`pls4all.R.mdatools`	✓ 7e-15	6.29 ms
`pls4all.R.pls`	✓ 7e-15	10.3 ms
Python · external
📐`ref.python_ikpls`	⇄ +9e-03	1.90 ms
📐`ref.python_scikit_learn`	source	2.16 ms
R · external
📐`ref.r_mixomics`	⇄ +6e-16	9.82 ms
📐`ref.r_pls`	⇄ +1e-14	7.50 ms

10 threads

Backend	Parity	200×50 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ ref 6e-16	1.79 ms
Python · pls4all
`pls4all.python`	✓ bind	1.78 ms🏆
`pls4all.sklearn`	✓ bind	1.93 ms
R · pls4all
`pls4all.R`	✓ 7e-15	4.69 ms
`pls4all.R.formula`	✓ 7e-15	5.49 ms
`pls4all.R.mdatools`	✓ 7e-15	5.59 ms
`pls4all.R.pls`	✓ 7e-15	10.4 ms
Python · external
📐`ref.python_ikpls`	⇄ +9e-03	2.04 ms
📐`ref.python_scikit_learn`	source	2.17 ms
R · external
📐`ref.r_mixomics`	⇄ +6e-16	9.91 ms
📐`ref.r_pls`	⇄ +1e-14	8.15 ms

nirs4all-methods

Navigation

`pls` — PLS regression (SIMPLS)¶

Description¶

Parameters¶

Explanations¶

Bibliographic source¶

Mathematical principle¶

Implementation¶

Usage¶

Benchmarks¶

pls — PLS regression (SIMPLS)¶

Description¶

Parameters¶

Explanations¶

Bibliographic source¶

Mathematical principle¶

Implementation¶

Usage¶

Benchmarks¶

`pls` — PLS regression (SIMPLS)¶