Stabilisation plan — parity, dashboard and releases¶
Date: 2026-05-19 Scope: parity gates, cross-binding dashboard, slow methods, and PyPI/CRAN readiness for pls4all 0.97.0 / ABI 1.16.0.
Audit summary¶
The project is close to the intended architecture, but the gates are not yet strict enough to be a release barrier. The main issue is semantic: binding parity and reference parity are both present in parts of the pipeline, but older fields, docs and dashboard filters still collapse them into one verdict.
Local audit results:
Check |
Result |
|---|---|
|
passed |
|
passed, structural only |
full Python binding tests |
failed on |
sklearn wrapper parity script |
passed, but narrower than full tests |
fixture regeneration check |
blocked by missing historical |
small cross-binding PLS/PCR sample |
confirmed external rows can be mislabeled as binding failures |
slow-method pls4all smoke |
confirmed selector/PCR timing and adapter issues need focused work |
|
passed |
ABI symbol diff |
failed: the current library exports additional |
Implementation status¶
Stabilization status:
P0 gate semantics implemented in the orchestrator: external rows are no longer binding-parity failures, reference parity compares all successful rows against the canonical oracle, and
--only-pls4allconsumes stored oracle snapshots instead of skipping Gate 2.P1 dashboard/static docs updated to render one relevant gate per cell and to merge canonical
ref_*rows atomically. C++ and external cells render reference parity; internal bindings render binding parity.P2 Python selector smoke fixed for UVE, and tier-2 selector wrappers now fail closed on unknown registry parameters. Python/R/MATLAB selector ValidationPlan defaults are aligned to the canonical 3-fold contiguous plan.
P2 dashboard refresh data covers the previously red
100x50cells forcontinuum_regression, PCR and the selector smoke set; unavailable formula/classdef selector wrappers are classified as not available rather than failed parity.P3 first performance pass landed for PCR batch projection and cross-validation fold-buffer reuse.
P4 ABI snapshot refreshed for the public 1.16.0 symbols already exported by the current shared library.
P0 — make gates truthful¶
In
benchmarks/cross_binding/orchestrator.py, compute binding parity only forn4m_coreandpls4all_bindingrows. External rows must getbinding_parity_ok = Noneor an explicit not-applicable code.Keep reference parity for every successful row, including external libraries, against the canonical registry reference.
When a run intentionally omits canonical external references (
--only-pls4all), load the stored oracle snapshot. Missing snapshots are setup failures that must be fixed by running the canonical reference backend.Make missing required reference oracles a hard error in release-gate mode, with allowlisted
paper_onlymethods only.Move workstation-specific reference paths to environment/configuration or pinned packages. The AOM/POP oracle must be reproducible from a clean clone or explicitly excluded from a strict gate.
P1 — fix dashboard and generated docs¶
Update
docs/_extras/build_landing.pyso canonicalref_*rows replace stale legacy cells atomically:ok,reason, both parity verdicts, timings, reference metadata and canonical flags.Update dashboard filtering to use
reference_parityfor C++ and external libraries, andbinding_parityfor internal pls4all bindings.Propagate method tolerance into CSV/JSON so drift/divergent thresholds use “10x method tolerance” instead of a hardcoded
rmse_rel < 10.Render the relevant gate in static Markdown tables: reference parity for C++/external rows, binding parity for internal rows. Prefer using the existing
dual_parity_label()helper instead of ad hoc legacy output.Exclude the synthetic reference column from timed-cell statistics and preset matching.
Keep
sphinx-designenabled and loadtab-combo.js; otherwise the generated method pages lose their tabbed content.
P2 — restore binding parity¶
Fix the UVE sklearn pipeline failure by choosing an explicit policy for empty selections: add a
min_features/fallback option or use a fixture parameter set that cannot select zero features in pipeline smoke tests. Done.Stop silently dropping registry parameters in tier-2 wrappers. Add adapter maps for alias names or fail closed when a registry parameter is unsupported by a wrapper constructor. Done for selector smoke.
Unify selector validation plans across Python registry, sklearn classes, R dispatcher and MATLAB MEX. The cheapest deterministic option is a shared 3-fold contiguous plan; the more flexible option is to serialize fold indices through benchmark parameters. Done with the 3-fold contiguous plan.
Add C++ fixture coverage for selectors currently covered only by registry smoke tests.
P3 — performance work¶
PCR: replace full
p x pJacobi eigensolve with a deterministic SVD/LAPACK or partial top-component solver, and use ann x npath whenp >> n. Partially done: PCR now batches component projections and avoids score storage when not requested.R vendoring: regenerate the vendored libn4m copy instead of manually carrying divergent
model.cppcode.Selectors: introduce a shared fitness evaluator that reuses buffers, validation folds and prediction arrays instead of reallocating for every candidate. Started: cross-validation fold buffers are reused across candidate evaluations.
Parallelize independent candidate evaluations for PSO, VISSA, BVE and IRIV while reducing results in deterministic order to preserve tie-breaks and RNG behavior.
Replace repeated full sorts with
nth_elementwhere only top-k masks are needed.
P4 — packaging and release gates¶
Refresh the ABI snapshot intentionally. The audit saw more exported
n4m_*symbols thancpp/abi/expected_symbols_linux.txtrecords.Ensure Python sdist is either a real source build with CMake inputs included, or do not publish sdists until that path is supported.
Keep Python wheels smoke-tested from the built artifact, not from the editable checkout.
Keep R CRAN checks on the built tarball, and remove non-portable flags such as architecture-specific
-march=*from CRAN builds.Add a vendored-core sync check for the R package.
Treat MATLAB packaging as separate from PyPI/CRAN readiness until
toolbox.prj,release.m, the complete MEX build and File Exchange workflow exist.
Definition of “green”¶
The project is ready to resume method additions when:
C++ fixture parity is reproducible from a clean clone;
full Python tests, including sklearn pipeline smoke, are green;
cross-binding Gate 1 is green for every shipped pls4all binding;
cross-binding Gate 2 is green or explicitly relaxed for every shipped method and scheduled external reference;
dashboard cells display both gates without legacy alias confusion;
pip install pls4allandR CMD check --as-cranare validated from built artifacts;slow methods have baseline benchmarks and at least one profiling-backed optimization plan each.