# Stabilisation plan — parity, dashboard and releases Date: 2026-05-19 Scope: parity gates, cross-binding dashboard, slow methods, and PyPI/CRAN readiness for pls4all 0.97.0 / ABI 1.16.0. ## Audit summary The project is close to the intended architecture, but the gates are not yet strict enough to be a release barrier. The main issue is semantic: binding parity and reference parity are both present in parts of the pipeline, but older fields, docs and dashboard filters still collapse them into one verdict. Local audit results: | Check | Result | |---|---| | `ctest --preset dev-release --output-on-failure` | passed | | `python -m benchmarks.parity_timing.lockfile --check` | passed, structural only | | full Python binding tests | failed on `UVESelector` pipeline smoke | | sklearn wrapper parity script | passed, but narrower than full tests | | fixture regeneration check | blocked by missing historical `AOM_v0` oracle | | small cross-binding PLS/PCR sample | confirmed external rows can be mislabeled as binding failures | | slow-method pls4all smoke | confirmed selector/PCR timing and adapter issues need focused work | | `scripts/bump_version.sh --check` | passed | | ABI symbol diff | failed: the current library exports additional `n4m_*` symbols absent from `cpp/abi/expected_symbols_linux.txt` | ## Implementation status Stabilization status: - P0 gate semantics implemented in the orchestrator: external rows are no longer binding-parity failures, reference parity compares all successful rows against the canonical oracle, and `--only-pls4all` consumes stored oracle snapshots instead of skipping Gate 2. - P1 dashboard/static docs updated to render one relevant gate per cell and to merge canonical `ref_*` rows atomically. C++ and external cells render reference parity; internal bindings render binding parity. - P2 Python selector smoke fixed for UVE, and tier-2 selector wrappers now fail closed on unknown registry parameters. Python/R/MATLAB selector ValidationPlan defaults are aligned to the canonical 3-fold contiguous plan. - P2 dashboard refresh data covers the previously red `100x50` cells for `continuum_regression`, PCR and the selector smoke set; unavailable formula/classdef selector wrappers are classified as not available rather than failed parity. - P3 first performance pass landed for PCR batch projection and cross-validation fold-buffer reuse. - P4 ABI snapshot refreshed for the public 1.16.0 symbols already exported by the current shared library. ## P0 — make gates truthful 1. In `benchmarks/cross_binding/orchestrator.py`, compute binding parity only for `n4m_core` and `pls4all_binding` rows. External rows must get `binding_parity_ok = None` or an explicit not-applicable code. 2. Keep reference parity for every successful row, including external libraries, against the canonical registry reference. 3. When a run intentionally omits canonical external references (`--only-pls4all`), load the stored oracle snapshot. Missing snapshots are setup failures that must be fixed by running the canonical reference backend. 4. Make missing required reference oracles a hard error in release-gate mode, with allowlisted `paper_only` methods only. 5. Move workstation-specific reference paths to environment/configuration or pinned packages. The AOM/POP oracle must be reproducible from a clean clone or explicitly excluded from a strict gate. ## P1 — fix dashboard and generated docs 1. Update `docs/_extras/build_landing.py` so canonical `ref_*` rows replace stale legacy cells atomically: `ok`, `reason`, both parity verdicts, timings, reference metadata and canonical flags. 2. Update dashboard filtering to use `reference_parity` for C++ and external libraries, and `binding_parity` for internal pls4all bindings. 3. Propagate method tolerance into CSV/JSON so drift/divergent thresholds use "10x method tolerance" instead of a hardcoded `rmse_rel < 10`. 4. Render the relevant gate in static Markdown tables: reference parity for C++/external rows, binding parity for internal rows. Prefer using the existing `dual_parity_label()` helper instead of ad hoc legacy output. 5. Exclude the synthetic reference column from timed-cell statistics and preset matching. 6. Keep `sphinx-design` enabled and load `tab-combo.js`; otherwise the generated method pages lose their tabbed content. ## P2 — restore binding parity 1. Fix the UVE sklearn pipeline failure by choosing an explicit policy for empty selections: add a `min_features`/fallback option or use a fixture parameter set that cannot select zero features in pipeline smoke tests. **Done.** 2. Stop silently dropping registry parameters in tier-2 wrappers. Add adapter maps for alias names or fail closed when a registry parameter is unsupported by a wrapper constructor. **Done for selector smoke.** 3. Unify selector validation plans across Python registry, sklearn classes, R dispatcher and MATLAB MEX. The cheapest deterministic option is a shared 3-fold contiguous plan; the more flexible option is to serialize fold indices through benchmark parameters. **Done with the 3-fold contiguous plan.** 4. Add C++ fixture coverage for selectors currently covered only by registry smoke tests. ## P3 — performance work 1. PCR: replace full `p x p` Jacobi eigensolve with a deterministic SVD/LAPACK or partial top-component solver, and use an `n x n` path when `p >> n`. **Partially done:** PCR now batches component projections and avoids score storage when not requested. 2. R vendoring: regenerate the vendored libn4m copy instead of manually carrying divergent `model.cpp` code. 3. Selectors: introduce a shared fitness evaluator that reuses buffers, validation folds and prediction arrays instead of reallocating for every candidate. **Started:** cross-validation fold buffers are reused across candidate evaluations. 4. Parallelize independent candidate evaluations for PSO, VISSA, BVE and IRIV while reducing results in deterministic order to preserve tie-breaks and RNG behavior. 5. Replace repeated full sorts with `nth_element` where only top-k masks are needed. ## P4 — packaging and release gates 1. Refresh the ABI snapshot intentionally. The audit saw more exported `n4m_*` symbols than `cpp/abi/expected_symbols_linux.txt` records. 2. Ensure Python sdist is either a real source build with CMake inputs included, or do not publish sdists until that path is supported. 3. Keep Python wheels smoke-tested from the built artifact, not from the editable checkout. 4. Keep R CRAN checks on the built tarball, and remove non-portable flags such as architecture-specific `-march=*` from CRAN builds. 5. Add a vendored-core sync check for the R package. 6. Treat MATLAB packaging as separate from PyPI/CRAN readiness until `toolbox.prj`, `release.m`, the complete MEX build and File Exchange workflow exist. ## Definition of "green" The project is ready to resume method additions when: - C++ fixture parity is reproducible from a clean clone; - full Python tests, including sklearn pipeline smoke, are green; - cross-binding Gate 1 is green for every shipped pls4all binding; - cross-binding Gate 2 is green or explicitly relaxed for every shipped method and scheduled external reference; - dashboard cells display both gates without legacy alias confusion; - `pip install pls4all` and `R CMD check --as-cran` are validated from built artifacts; - slow methods have baseline benchmarks and at least one profiling-backed optimization plan each.