What is pls4all?¶

pls4all is a portable PLS / NIRS engine written in C++17, exposed through a stable C ABI, and packaged behind thin first-class bindings for the current target languages — Python, R, MATLAB / Octave, and JavaScript / WebAssembly. Archived language PoCs (Go, Rust, Julia, Ruby, .NET, Lua, Nim, JNI, Android) live under bindings/_archive/ and are not current release targets.

It is built around a single claim: the same numerical PLS result, in every language, with timings that match or beat each language’s established library.

Why this project exists¶

PLS (Partial Least Squares) and the wider chemometrics catalogue — sparse PLS, OPLS, CPPLS, MB-PLS, kernel PLS, AOM/POP, calibration transfer, variable selection (VIP / CARS / GA / SPA / …) — are scattered across an ecosystem:

Language	Where the algorithms live
Python	`sklearn.cross_decomposition`, `ikpls`, `diPLSlib`, `hoggorm`, `tensorly`, `pybaselines`, in-tree implementations of papers
R	`pls`, `spls`, `OmicsPLS`, `prospectr`, `mdatools`, `multiway`, `kernlab`, `plsVarSel`, `enpls`, `mixOmics`, `chemometrics`, `ropls`, `sgPLS`, `multiblock`, `plsRglm`, `plsRcox`, `softImpute`, `mboost`, …
MATLAB	`plsregress`, `libPLS`

Each library has its own numerical conventions (NIPALS vs SIMPLS, unit-variance vs centring-only, deflation policy, intercept handling). Comparing two methods across two languages quickly becomes a multi-month integration project. pls4all collapses that surface to a single C++ kernel with a single set of conventions, then exposes each language’s idiomatic API on top.

The benefits stack:

Determinism across languages. Same kernel and same generated datasets, with numerical parity checked by explicit gates instead of claiming byte-identical outputs.
Performance — BLAS / OpenMP / CUDA accelerated tiers, with a scalar reference tier kept around as the parity anchor.
Reproducibility — the C ABI provides the versioned N4MM format for raw fitted-model state, and the Python pls4all.Model wrapper exposes it through to_bytes() / from_bytes(). N4MM has no canonical filename extension and is distinct from the nirs4all .n4a full-pipeline bundle.
Auditability — the parity gate compares pls4all predictions to the external reference library that “owns” each algorithm (sklearn for PLS, ropls for OPLS, spls for sparse PLS, …) and publishes a verdict for every cell.

The three layers¶

┌──────────────────────────────────────────────────────────────┐
│  Tier-2 idiomatic API                                       │
│    pls4all.sklearn.PLSRegression(...)      (Python)         │
│    pls(y ~ ., data, ncomp=)                (R)              │
│    n4m.fitrpls(X, y, "NumComponents", k)  (MATLAB)      │
│    new pls4all.PLS({nComponents: k})       (JS)             │
└────────────────────────┬─────────────────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────────────────┐
│  Tier-1 raw / canonical API                                  │
│    pls4all._methods.pls_fit(ctx, cfg, X, y, k)  (Python)    │
│    pls4all_method("pls", X, y, n_components=k)  (R)         │
│    n4m.pls_fit(X, y, k)                     (MATLAB)    │
└────────────────────────┬─────────────────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────────────────┐
│  Tier-0 — C ABI (libn4m)                                     │
│    n4m_*  symbols  (96 of them, frozen at ABI 1.x)           │
└──────────────────────────────────────────────────────────────┘

The C ABI is the only place numerical algorithms live. Every binding above is a reformatter — no PLS math is duplicated in Python or R.

What’s in the box¶

~70 algorithms — every PLS variant in mainstream use plus the full chemometrics variable-selection catalogue. The complete catalogue is the methods index.
A cross-binding parity gate — for each (algorithm, n, p, threads) cell, every binding’s predictions are compared element-wise to the reference library for that algorithm. See the benchmark overview and the interactive dashboard.
A stable C ABI — frozen at 1.x; semantic versioning enforced by a per-PR ABI symbol gate. See abi/reference.
A raw fitted-model format — versioned N4MM serialisation behind the C ABI, with an integrity checksum and no canonical filename extension.
Acceleration matrix — five libn4m builds (ref, blas, omp, blas+omp, cuda) so every cell can also serve as a benchmark of the acceleration stack itself.

What pls4all is not¶

Not a data-loading framework. pls4all assumes you arrive with (X, y) already in memory. Spectroscopy file formats, signal-type detection, dataset versioning live in upstream tooling (e.g. nirs4all).
Not a pipeline DSL. Pipelines are composed in the host language (sklearn Pipeline, R caret / mlr3, MATLAB function chains).
Not a deep-learning library. pls4all is strictly the PLS family plus the chemometrics adjuncts (variable selection, calibration transfer, diagnostics).

Where to go next¶

If you want to…	Read
Run your first fit in your language	Getting started
Understand the data model and tiers	Core concepts
See what’s measured and how	Benchmark overview
Browse the algorithm catalogue	Methods index
Compare bindings in a live UI	GitHub Pages dashboard
Read pls4all in your language	Python · R · MATLAB / Octave · JS

nirs4all-methods

Navigation