pls — PLS regression (SIMPLS)¶
Group: Core PLS · Registry tolerance: 0.1
Description¶
SIMPLS PLS regression baseline
From the pls4all.sklearn.PLSRegression docstring:
Partial Least Squares regression backed by
pls4all’s C core.
Full Python sklearn-wrapper docstring
Partial Least Squares regression backed by `pls4all`'s C core.
Drop-in replacement for `sklearn.cross_decomposition.PLSRegression`
with two distinguishing knobs:
* `solver` selects the inner algorithm (NIPALS, SIMPLS, SVD, …)
directly — sklearn only exposes 'nipals' / 'svd'.
* Round-trip via `pickle.dumps` is bit-exact, backed by the C ABI
`.n4a` bundle (`n4m_model_export_to_buffer`).
Parameters
----------
n_components : int, default=2
Number of latent components.
solver : str, default='simpls'
One of 'nipals', 'simpls', 'orthogonal-scores', 'kernel',
'wide-kernel', 'svd', 'power', 'randomized-svd'.
center_x, scale_x : bool, default=True
Standardize X columns to zero mean / unit variance.
center_y : bool, default=True
Center y to zero mean.
scale_y : bool, default=False
Standardize y columns to unit variance.
tol : float, default=1e-6
Convergence tolerance for iterative solvers.
max_iter : int, default=500
Max NIPALS iterations.
store_scores : bool, default=False
Keep the latent score matrices (`x_scores_`) after fit.
Registry note — Baseline SIMPLS cell. sklearn uses NIPALS and ikpls uses improved-kernel PLS, so exact bit parity is not expected; the row exists to anchor timing comparisons.
Parameters¶
Name |
Type |
Default |
Notes |
|---|---|---|---|
|
|
|
Number of latent components extracted (k). |
|
|
|
Inner algorithm: ‘nipals’, ‘simpls’, ‘svd’, ‘kernel’, ‘orthogonal-scores’, ‘power’, ‘randomized-svd’, ‘wide-kernel’. |
|
|
|
Subtract the column mean of X before fitting. |
|
|
|
Standardize X columns to unit variance before fitting. |
|
|
|
Subtract the column mean of y before fitting. |
|
|
|
Standardize y columns to unit variance before fitting. |
|
|
|
Convergence tolerance for iterative solvers (NIPALS / power-iteration). |
|
|
|
Maximum iterations for iterative solvers. |
|
|
|
If True, keep the latent score matrix ( |
Explanations¶
Bibliographic source¶
de Jong, S. (1993). SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems 18(3), 251–263.
Mathematical principle¶
Partial Least Squares regression seeks a set of latent directions in the predictor space that maximise the covariance with the response, in contrast to PCA which maximises only the variance of \(\mathbf{X}\).
Given centred \(\mathbf{X} \in \mathbb{R}^{n\times p}\) and \(\mathbf{Y} \in \mathbb{R}^{n\times q}\), the first PLS component is the unit-norm direction \(\mathbf{w}_1\) maximising \(\operatorname{Cov}(\mathbf{X}\mathbf{w}_1, \mathbf{Y})\). Closed form: \(\mathbf{w}_1 \propto \mathbf{X}^{\top}\mathbf{Y}\) (or its dominant left singular vector when \(q>1\)). Subsequent components are extracted from the deflated residual matrix so the resulting scores \(\mathbf{T} = \mathbf{X}\mathbf{W}\) are orthogonal.
SIMPLS (de Jong 1993) is algebraically equivalent to NIPALS but computes the loading weights directly from the cross-product \(\mathbf{S} = \mathbf{X}^{\top}\mathbf{Y}\) without re-deflating \(\mathbf{X}\) at each step. This avoids accumulating floating-point error from iterative deflation and runs in roughly half the time of NIPALS for the same number of components. SIMPLS is the variant exposed by MATLAB’s plsregress.
Once \(k\) latent scores have been extracted the regression coefficients are reconstructed as \(\mathbf{B} = \mathbf{W}(\mathbf{P}^{\top}\mathbf{W})^{-1}\mathbf{Q}^{\top}\), where \(\mathbf{P}, \mathbf{Q}\) are the X- and Y-loadings. Predictions on new \(\mathbf{X}^{\star}\) follow \(\hat{\mathbf{Y}} = \mathbf{X}^{\star}\mathbf{B} + \bar{\mathbf{y}}\). The choice of \(k\) trades bias and variance: use cross-validated PRESS or the one-SE rule of Hastie et al. (2009) to select it.
Implementation¶
Dispatched through Algorithm.PLS_REGRESSION + Solver.SIMPLS in libn4m (the n4m_pls_fit C entry point). The same Model.fit / Model.predict surface is used by every binding. NIPALS, SVD, power-iteration, randomised-SVD, orthogonal-scores, kernel and wide-kernel solver variants are all available — see the Solver enum.
MATLAB header (bindings/matlab/+pls4all/Regression.m):
pls4all.Regression — Statistics Toolbox-style class for PLS regression.
Tier-2 idiomatic MATLAB / Octave wrapper around the tier-1
pls4all.pls_fit(X, Y, n_components) primitive. Mirrors the shape
of MATLAB's built-in RegressionPartialLeastSquares: object-oriented
properties + methods, factory function `pls4all.fitrpls`, and a
Usage¶
Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in benchmarks.parity_timing.registry. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN pls package (plsr, pcr, mvr) and for the mdatools::pls(x, y, ...) matrix idiom — those tabs appear only on the methods that have a meaningful equivalence.
pls4all bindings
/* C ABI — libn4m (Model.fit path) */
n4m_context_t* ctx = n4m_context_create();
n4m_config_t* cfg = n4m_config_create();
n4m_config_set_algorithm(cfg, N4M_ALGORITHM_PLS_REGRESSION);
n4m_config_set_solver (cfg, N4M_SOLVER_SIMPLS);
n4m_config_set_n_components(cfg, 4);
n4m_model_t* mdl = NULL;
n4m_model_fit(ctx, cfg, &x_view, &y_view, &mdl);
n4m_model_predict(ctx, mdl, &x_test_view, &y_hat_view);
n4m_model_destroy(mdl);
n4m_config_destroy(cfg);
n4m_context_destroy(ctx);
import pls4all
from pls4all import Algorithm, Solver
with pls4all.Context() as ctx, pls4all.Config() as cfg:
cfg.algorithm = Algorithm.PLS_REGRESSION
cfg.solver = Solver.SIMPLS
cfg.n_components = 4
with pls4all.Model.fit(ctx, cfg, X, y) as mdl:
y_hat = mdl.predict(X_test)
import n4m
from n4m.sklearn import NativePLSRegressor
# Fixed component count through the moment sweep path.
res = n4m.pls(X, y, n_components=4, cv=5)
coef = res["coefficients"]
# Reusable estimator; predict() replays X @ coefficients + intercept.
model = NativePLSRegressor(n_components=4, cv=5).fit(X, y)
y_hat = model.predict(X_test)
# Optional train-CV component grid, still train-only and moment-based.
model_grid = NativePLSRegressor(pls_components=(1, 2, 3, 4), cv=5).fit(X, y)
from pls4all.sklearn import PLSRegression
mdl = PLSRegression(n_components=2, solver='simpls', center_x=True, scale_x=True, center_y=True, scale_y=False, tol=1e-06, max_iter=500, store_scores=False)
mdl.fit(X, y)
y_hat = mdl.predict(X_test)
library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("pls", X, y,
n_components = 4L)
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.
library(pls4all)
# Drop-in for CRAN `pls::plsr` (same signature).
fit <- plsr(y ~ ., ncomp = 4L, data = train,
validation = "CV", segments = 10L)
yhat <- predict(fit, newdata = test, ncomp = 4L)
RMSEP(fit)
library(pls4all)
# Drop-in for `mdatools::pls(x, y, ncomp, method = "simpls")`.
fit <- pls_mdatools(X, y, ncomp = 4L, method = "simpls",
center = TRUE, scale = FALSE)
yhat <- predict(fit, newdata = X_test, ncomp = 4L)
res = pls4all.pls_fit(X, y, 4);
% see header of bindings/matlab/+pls4all/pls_fit.m for full
% parameter surface:
% [coefs, x_mean, y_mean, predictions] = pls_fit(X, Y, n_components)
yhat = predict(res, Xtest);
mdl = pls4all.fit("pls", X, y, "NumComponents", 4);
yhat = predict(mdl, Xtest);
Registry parity references 📐
📐
ref.python_ikpls(python · ikpls) —ikpls4.0.1.post1 · qualitative (rmse_rel ≤ 1e-01) — ikpls.numpy_ikpls.PLS algorithm 1.📐
ref.python_scikit_learn(python · python) —scikit-learn1.8.0 · qualitative (rmse_rel ≤ 1e-01) — sklearn.cross_decomposition.PLSRegression(scale=False).📐
ref.r_mixomics(R · mixOmics) —mixOmics6.26.0 · qualitative (rmse_rel ≤ 1e-01) — Bioconductor mixOmics::pls(mode=’regression’, scale=FALSE).📐
ref.r_pls(R · r) —pls2.8.5 · qualitative (rmse_rel ≤ 1e-01) — R pls::plsr(method=’simpls’, scale=FALSE).
Benchmarks¶
Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.
Verdict · ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance · ✓ bind = pls4all binding agrees with the C++ baseline · ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle · ✗ divergent · ⚠ error · — not run. The fastest backend per column is marked 🏆.
Reference gate: strict — numeric equivalence (rmse_rel_tol ≤ 1e-08).
Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.
| Backend | Parity | 200×50 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref 6e-16 | 1.70 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 1.69 ms🏆 |
pls4all.sklearn | ✓ bind | 1.97 ms |
| R · pls4all | ||
pls4all.R | ✓ 7e-15 | 4.79 ms |
pls4all.R.formula | ✓ 7e-15 | 5.26 ms |
pls4all.R.mdatools | ✓ 7e-15 | 5.92 ms |
pls4all.R.pls | ✓ 7e-15 | 9.99 ms |
| Python · external | ||
📐ref.python_ikpls | ⇄ +9e-03 | 1.92 ms |
📐ref.python_scikit_learn | source | 2.16 ms |
| R · external | ||
📐ref.r_mixomics | ⇄ +6e-16 | 9.72 ms |
📐ref.r_pls | ⇄ +1e-14 | 8.01 ms |
| Backend | Parity | 200×50 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref 6e-16 | 1.79 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 1.76 ms🏆 |
pls4all.sklearn | ✓ bind | 1.95 ms |
| R · pls4all | ||
pls4all.R | ✓ 7e-15 | 4.43 ms |
pls4all.R.formula | ✓ 7e-15 | 5.76 ms |
pls4all.R.mdatools | ✓ 7e-15 | 6.29 ms |
pls4all.R.pls | ✓ 7e-15 | 10.3 ms |
| Python · external | ||
📐ref.python_ikpls | ⇄ +9e-03 | 1.90 ms |
📐ref.python_scikit_learn | source | 2.16 ms |
| R · external | ||
📐ref.r_mixomics | ⇄ +6e-16 | 9.82 ms |
📐ref.r_pls | ⇄ +1e-14 | 7.50 ms |
| Backend | Parity | 200×50 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ ref 6e-16 | 1.79 ms |
| Python · pls4all | ||
pls4all.python | ✓ bind | 1.78 ms🏆 |
pls4all.sklearn | ✓ bind | 1.93 ms |
| R · pls4all | ||
pls4all.R | ✓ 7e-15 | 4.69 ms |
pls4all.R.formula | ✓ 7e-15 | 5.49 ms |
pls4all.R.mdatools | ✓ 7e-15 | 5.59 ms |
pls4all.R.pls | ✓ 7e-15 | 10.4 ms |
| Python · external | ||
📐ref.python_ikpls | ⇄ +9e-03 | 2.04 ms |
📐ref.python_scikit_learn | source | 2.17 ms |
| R · external | ||
📐ref.r_mixomics | ⇄ +6e-16 | 9.91 ms |
📐ref.r_pls | ⇄ +1e-14 | 8.15 ms |
See also: benchmark overview · methods index · interactive dashboard