nirs4all-methods documentation¶

A portable PLS / NIRS engine in C++17 with a stable C ABI and thin bindings for the current target languages: Python, R, MATLAB / Octave, and JS / WebAssembly.

→ Interactive cross-binding benchmark dashboard — landing page generated from the canonical benchmark registry, filterable / sortable / column-toggleable.

Latest highlights (May 2026)¶

R binding COMPLETE — n4m_method() dispatcher covers all 33 MethodResult fits + 24 selectors + 4 diagnostics ; 16 formula+S3 tier-2 wrappers ; parsnip + mlr3 meta-models that dispatch to all 16 algorithms via the algorithm arg.
MATLAB / Octave binding — shared +n4m package over one MEX dispatcher; Octave is CI-gated, while MATLAB uses the same package but remains a manual runtime/release check because GitHub-hosted runners do not provide a MATLAB license.
Archived PoCs stay archived — Julia, JNI / Android, Go, Rust, .NET, Ruby, Lua, and Nim live under bindings/_archive/ and are not current release targets.
Cross-binding benchmark — generated from the canonical benchmarks.parity_timing.registry method catalog, with registry-driven external reference columns in addition to the language bindings. → benchmarks/cross_binding.md · methodology

Quick links¶

Architecture — overview · memory model · error model · threading · serialization
ABI — reference · stability policy · changes log · ABI 2.0 migration guide
Bindings — Python · R · MATLAB / Octave · JavaScript / WebAssembly
Parity — methodology · tolerances
Benchmarks — index · overview · cross-binding parity + timing · methodology
Development — workflow · build · testing · stabilisation plan · style · release process
Reviews — Codex transcripts under docs/reviews/
Roadmap — ROADMAP.md, per-phase plans under roadmap/

Current Status¶

The C ABI surface is feature-complete (96 n4m_* symbols). Lifecycle for context / config / matrix-view / operator-bank / gating-strategy / pipeline is implemented, and pipeline fit/transform is live for identity, center, autoscale, Pareto scaling, SNV, MSC, EMSC, polynomial detrending and Savitzky-Golay smoothing/derivatives, Norris-Williams gap-segment derivatives, ASLS baseline correction, Haar wavelet denoising and supervised one-component OSC / EPO. Internal regression metric kernels cover RMSE, MAE, bias, R2/Q2, observed-vs-predicted slope/intercept, RPD and RPIQ, validation splitters cover deterministic k-fold, leave-one-out, holdout, external-fold, repeated k-fold, Monte-Carlo, Kennard-Stone and SPXY plans, and the internal CV engine refits fold-local regression models to produce out-of-sample predictions and aggregate metrics. Binary classification metrics cover sensitivity, specificity, balanced accuracy, precision/F1, MCC and average-rank AUC; multiclass extensions cover macro/micro averages with one-vs-rest AUC, and binary calibration curves use fixed probability bins. Variable-importance kernels compute VIP scores and selectivity ratio from fitted models with stored scores, and Phase 5a variable-selection rankers expose deterministic top-k ordering for VIP, original-scale coefficient magnitude and selectivity ratio. Phase 5b interval-selection kernels scan contiguous feature windows with deterministic k-fold PLS CV for moving-window / iPLS-style selection. Phase 5p biPLS kernels perform deterministic backward interval elimination by CV. Phase 5q siPLS kernels exhaustively score fixed-size interval combinations by CV. Phase 5c stability-selection kernels compute Monte-Carlo coefficient mean/std ratios for MCUVE-style feature ranking, Phase 5d UVE kernels threshold real-feature stability against deterministic artificial variables, and Phase 5n EMCUVE kernels ensemble MC-UVE members with a deterministic vote rule. Phase 5o randomization-test kernels compare observed PLS coefficient scores against deterministic Y permutations with empirical p-values. Phase 5e SPA kernels seed from PLS coefficient magnitude and add projection-diverse variables, and Phase 5f CARS kernels run deterministic exponential-retention subset elimination with k-fold CV scoring. Phase 5g Random Frog kernels run deterministic subset walks and rank variables by inclusion frequency. Phase 5h SCARS kernels run deterministic calibration subsampling with stability-weighted CARS retention. Phase 5i GA-PLS kernels use deterministic population search with elitism, crossover, mutation and k-fold CV fitness. Phase 5j shaving kernels recursively eliminate low-score PLS variables while scoring each retained subset by deterministic k-fold CV. Phase 5s REP kernels remove a fixed number of weak coefficient-score variables per recursive step and keep the lowest-CV-error retained subset. Phase 5t IPW kernels iteratively reweight coefficient scores, expose score and weight paths, and keep the lowest-CV-error top-k subset. Phase 5u ST-PLS kernels apply deterministic score thresholds with min-selected fallbacks and keep the lowest-CV-error threshold subset. Phase 5k BVE kernels greedily evaluate every one-variable removal by deterministic k-fold CV RMSE and keep the best backward path/subset. Phase 5l T2-PLS kernels compute Hotelling T2 on PLS loading weights, apply alpha-specific UCL thresholds with top-k fallback, and score subsets by k-fold CV. Phase 5m WVC-PLS kernels compute normalized weighted-variable-contribution scores from SVD PLS components and expose deterministic top-k selection. Phase 5r WVC-threshold kernels apply fixed-threshold and factor-of-mean rules with a minimum-selected fallback. Component-coefficient kernels expose the original-scale regression coefficients for each latent prefix. SIMPLS component prefixes can also be scored by deterministic k-fold CV for component-count selection, and the internal PLS-LDA and PLS-logistic kernels fit classifier scores on PLS score spaces. The internal MB-PLS kernel fits block-autoscaled, block-weighted PLS models and maps coefficients back to original feature space. The internal LW-PLS kernel performs stable k-nearest-neighbor local-window refits and records the neighbor plan for every prediction. Phase 6a adds an internal AOM preprocessing-bank primitive with soft equal weights and hard first-operator gating; full AOM-PLS parity is explicitly anchored on nirs4all/bench/AOM_v0/aompls. Phase 6b adds internal global AOM-SIMPLS CV selection against that bench oracle for the identity/detrend strict-linear tranche. Phase 6c adds bench-parity strict-linear zero-padded Savitzky-Golay, finite-difference and Norris-Williams AOM operators and runs them through the global selector. Phase 6d extends the strict-linear AOM tranche to Whittaker smoothing and FCK operators, with direct transform parity and global AOM-SIMPLS selection parity against the bench oracle. Phase 6e adds internal POP-PLS per-component SIMPLS covariance selection, including bench-compatible CV scoring semantics, selected operator sequences and full-fit prediction parity. The supported fitted-model path is now live for NIPALS, orthogonal-scores, SIMPLS, kernel, wide-kernel, SVD, power-iteration and randomized-SVD PLS regression (PLS1 / PLS2), PLSCanonical with NIPALS/SVD, PLSSVD direct cross-covariance scores, PLS-DA dummy-response scores, OPLS / OPLS-DA common predictive scores with orthogonal corrections, plus PCR: fit, predict, transform, fitted-array accessors and binary import/export.

Build matrix: Linux × {gcc-12, gcc-13, clang-16}, macOS × clang (arm64 + universal2), Windows × {MSVC, MinGW}. ASAN / UBSAN / TSAN green. ABI symbol gate enforced on every PR.