Core concepts¶
A 5-minute mental model of pls4all. If you are coming from sklearn,
pls::plsr or MATLAB plsregress, the diagram in
the overview is the shortest path —
this page expands the moving parts.
1. Context, Config, Model¶
Every call goes through three handles:
Handle |
Role |
Lifetime |
|---|---|---|
|
RNG seed, thread count, backend (CPU / CUDA), error string |
usually one per program |
|
Algorithm choice, hyperparameters, centring / scaling flags |
one per fit |
|
The fitted artifact — coefficients, x_mean, y_mean, scores |
one per |
In every binding the three are wrapped idiomatically. In Python:
import pls4all
with pls4all.Context() as ctx, pls4all.Config() as cfg:
ctx.seed = 42
cfg.algorithm = pls4all.Algorithm.PLS_REGRESSION
cfg.solver = pls4all.Solver.SIMPLS
cfg.deflation = pls4all.Deflation.REGRESSION
cfg.n_components = 5
model = pls4all.Model.fit(ctx, cfg, X, y)
y_hat = model.predict(ctx, X_new)
model.close()
In R:
fit <- pls(y ~ ., data = df, ncomp = 5, algo = "pls_simpls")
y_hat <- predict(fit, newdata = df_new)
In MATLAB / Octave:
mdl = pls4all.fitrpls(X, y, "NumComponents", 5);
yhat = predict(mdl, Xnew);
The C ABI handles n4m_context_t*, n4m_config_t*, n4m_model_t*
are the same objects underneath.
2. Two API tiers¶
Each binding exposes the same surface in two flavours:
Tier-1 — raw / canonical. A 1:1 mapping to the C ABI. Direct, minimum overhead, accepts NumPy / R matrix / MATLAB double exactly as the C layer wants it. This is what the parity benchmark drives.
Tier-2 — idiomatic. Wraps tier-1 in the host language’s expected estimator shape (sklearn
BaseEstimator, R formula+S3 withpredict()/summary()/coef(), MATLAB classdef withpredict/loss/score).
You can mix the two. Save a tier-2 estimator, reload it in tier-1, or
the reverse — the .n4a bundle is the lingua franca.
3. Five C++ acceleration builds¶
libn4m is built five ways. The build is a property of the dynamic
library you load, not of the API — every binding sees the same
function names regardless.
Build |
What it enables |
When to use |
|---|---|---|
|
Scalar reference loops, no BLAS, no OMP |
Parity anchor; debugging |
|
OpenBLAS GEMM only |
Single-thread tight loops |
|
OpenMP in kernel loops, no BLAS |
Many small cells |
|
OpenBLAS + OpenMP (the production combo) |
Default |
|
cuBLAS GEMM offload |
Very large |
The build axis appears explicitly in the
benchmark dashboard under pls4all.cpp.<suffix>.
4. Determinism and reference policy¶
Every algorithm has a parity reference — the external library whose implementation pls4all reproduces within the method’s numerical tolerance:
Algorithm family |
Reference library |
Language |
|---|---|---|
Plain PLS / PCR |
|
Python / R |
OPLS |
|
R |
Sparse PLS |
|
R |
PLS-DA / sPLS-DA |
|
R |
Kernel PLS |
|
R / paper |
AOM-PLS, POP-PLS |
|
Python |
MB-PLS, LW-PLS |
|
Python |
Calibration |
|
R |
Selectors |
|
R / Python |
The reference is declared per-method in
benchmarks/parity_timing/registry.py. The orchestrator generates
predictions on the same CSV bytes for every backend in a cell so
that R’s set.seed, NumPy’s RNG and Octave’s randn("state",…) do
not introduce cross-language drift.
Tolerances are per-algorithm; see benchmark methodology.
5. The .n4a bundle¶
.n4a is a small, content-addressed binary that captures everything
needed to reproduce a fit:
Algorithm + config (algorithm enum, solver, deflation, all hyperparameters)
Centring / scaling means + stds
Coefficients and (where applicable) loadings / scores
pls4all version + ABI version
A SHA256 of the training X used for fit
Any binding can read any other binding’s .n4a. The export functions
are:
Python:
model.export("file.n4a")R:
pls4all_export(model, "file.n4a")MATLAB:
pls4all.export(mdl, "file.n4a")
6. The algorithm taxonomy¶
Algorithms are grouped for the dashboard and the methods index:
Group |
Representative methods |
|---|---|
Core PLS |
|
Sparse |
|
Ensemble |
|
Robust / weighted |
|
Nonlinear / local |
|
Multi-block |
|
Calibration transfer |
|
Classification |
|
Missing data |
|
Regularised |
|
Selectors (28) |
VIP, coefficient, selectivity ratio, SPA, stability, UVE, CARS, random-frog, SCARS, GA, PSO, VISSA, shaving, BVE, REP, IPW, ST, interval, BiPLS, SiPLS, T², WVC, WVC-threshold, EMCUVE, randomization, IRIV, IRF, VIP-SPA |
Diagnostics |
T², Q, DModX, PRESS approx, one-SE rule, monitoring, AOM bank |
Each method has a dedicated page documenting its parameters, bibliographic source, math principle, every binding’s signature and its parity + timing rows.