Stabilisation plan — parity, dashboard and releases¶

Date: 2026-05-19 Scope: parity gates, cross-binding dashboard, slow methods, and PyPI/CRAN readiness for pls4all 0.97.0 / ABI 1.16.0.

Audit summary¶

The project is close to the intended architecture, but the gates are not yet strict enough to be a release barrier. The main issue is semantic: binding parity and reference parity are both present in parts of the pipeline, but older fields, docs and dashboard filters still collapse them into one verdict.

Local audit results:

Check	Result
`ctest --preset dev-release --output-on-failure`	passed
`python -m benchmarks.parity_timing.lockfile --check`	passed, structural only
full Python binding tests	failed on `UVESelector` pipeline smoke
sklearn wrapper parity script	passed, but narrower than full tests
fixture regeneration check	blocked by missing historical `AOM_v0` oracle
small cross-binding PLS/PCR sample	confirmed external rows can be mislabeled as binding failures
slow-method pls4all smoke	confirmed selector/PCR timing and adapter issues need focused work
`scripts/bump_version.sh --check`	passed
ABI symbol diff	failed: the current library exports additional `n4m_*` symbols absent from `cpp/abi/expected_symbols_linux.txt`

Implementation status¶

Stabilization status:

P0 gate semantics implemented in the orchestrator: external rows are no longer binding-parity failures, reference parity compares all successful rows against the canonical oracle, and --only-pls4all consumes stored oracle snapshots instead of skipping Gate 2.
P1 dashboard/static docs updated to render one relevant gate per cell and to merge canonical ref_* rows atomically. C++ and external cells render reference parity; internal bindings render binding parity.
P2 Python selector smoke fixed for UVE, and tier-2 selector wrappers now fail closed on unknown registry parameters. Python/R/MATLAB selector ValidationPlan defaults are aligned to the canonical 3-fold contiguous plan.
P2 dashboard refresh data covers the previously red 100x50 cells for continuum_regression, PCR and the selector smoke set; unavailable formula/classdef selector wrappers are classified as not available rather than failed parity.
P3 first performance pass landed for PCR batch projection and cross-validation fold-buffer reuse.
P4 ABI snapshot refreshed for the public 1.16.0 symbols already exported by the current shared library.

P0 — make gates truthful¶

In benchmarks/cross_binding/orchestrator.py, compute binding parity only for n4m_core and pls4all_binding rows. External rows must get binding_parity_ok = None or an explicit not-applicable code.
Keep reference parity for every successful row, including external libraries, against the canonical registry reference.
When a run intentionally omits canonical external references (--only-pls4all), load the stored oracle snapshot. Missing snapshots are setup failures that must be fixed by running the canonical reference backend.
Make missing required reference oracles a hard error in release-gate mode, with allowlisted paper_only methods only.
Move workstation-specific reference paths to environment/configuration or pinned packages. The AOM/POP oracle must be reproducible from a clean clone or explicitly excluded from a strict gate.

P1 — fix dashboard and generated docs¶

Update docs/_extras/build_landing.py so canonical ref_* rows replace stale legacy cells atomically: ok, reason, both parity verdicts, timings, reference metadata and canonical flags.
Update dashboard filtering to use reference_parity for C++ and external libraries, and binding_parity for internal pls4all bindings.
Propagate method tolerance into CSV/JSON so drift/divergent thresholds use “10x method tolerance” instead of a hardcoded rmse_rel < 10.
Render the relevant gate in static Markdown tables: reference parity for C++/external rows, binding parity for internal rows. Prefer using the existing dual_parity_label() helper instead of ad hoc legacy output.
Exclude the synthetic reference column from timed-cell statistics and preset matching.
Keep sphinx-design enabled and load tab-combo.js; otherwise the generated method pages lose their tabbed content.

P2 — restore binding parity¶

Fix the UVE sklearn pipeline failure by choosing an explicit policy for empty selections: add a min_features/fallback option or use a fixture parameter set that cannot select zero features in pipeline smoke tests. Done.
Stop silently dropping registry parameters in tier-2 wrappers. Add adapter maps for alias names or fail closed when a registry parameter is unsupported by a wrapper constructor. Done for selector smoke.
Unify selector validation plans across Python registry, sklearn classes, R dispatcher and MATLAB MEX. The cheapest deterministic option is a shared 3-fold contiguous plan; the more flexible option is to serialize fold indices through benchmark parameters. Done with the 3-fold contiguous plan.
Add C++ fixture coverage for selectors currently covered only by registry smoke tests.

P3 — performance work¶

PCR: replace full p x p Jacobi eigensolve with a deterministic SVD/LAPACK or partial top-component solver, and use an n x n path when p >> n. Partially done: PCR now batches component projections and avoids score storage when not requested.
R vendoring: regenerate the vendored libn4m copy instead of manually carrying divergent model.cpp code.
Selectors: introduce a shared fitness evaluator that reuses buffers, validation folds and prediction arrays instead of reallocating for every candidate. Started: cross-validation fold buffers are reused across candidate evaluations.
Parallelize independent candidate evaluations for PSO, VISSA, BVE and IRIV while reducing results in deterministic order to preserve tie-breaks and RNG behavior.
Replace repeated full sorts with nth_element where only top-k masks are needed.

P4 — packaging and release gates¶

Refresh the ABI snapshot intentionally. The audit saw more exported n4m_* symbols than cpp/abi/expected_symbols_linux.txt records.
Ensure Python sdist is either a real source build with CMake inputs included, or do not publish sdists until that path is supported.
Keep Python wheels smoke-tested from the built artifact, not from the editable checkout.
Keep R CRAN checks on the built tarball, and remove non-portable flags such as architecture-specific -march=* from CRAN builds.
Add a vendored-core sync check for the R package.
Treat MATLAB packaging as separate from PyPI/CRAN readiness until toolbox.prj, release.m, the complete MEX build and File Exchange workflow exist.

Definition of “green”¶

The project is ready to resume method additions when:

C++ fixture parity is reproducible from a clean clone;
full Python tests, including sklearn pipeline smoke, are green;
cross-binding Gate 1 is green for every shipped pls4all binding;
cross-binding Gate 2 is green or explicitly relaxed for every shipped method and scheduled external reference;
dashboard cells display both gates without legacy alias confusion;
pip install pls4all and R CMD check --as-cran are validated from built artifacts;
slow methods have baseline benchmarks and at least one profiling-backed optimization plan each.

nirs4all-methods

Navigation