# Stabilisation plan — parity, dashboard and releases

Date: 2026-05-19
Scope: parity gates, cross-binding dashboard, slow methods, and
PyPI/CRAN readiness for pls4all 0.97.0 / ABI 1.16.0.

## Audit summary

The project is close to the intended architecture, but the gates are not
yet strict enough to be a release barrier. The main issue is semantic:
binding parity and reference parity are both present in parts of the
pipeline, but older fields, docs and dashboard filters still collapse them
into one verdict.

Local audit results:

| Check | Result |
|---|---|
| `ctest --preset dev-release --output-on-failure` | passed |
| `python -m benchmarks.parity_timing.lockfile --check` | passed, structural only |
| full Python binding tests | failed on `UVESelector` pipeline smoke |
| sklearn wrapper parity script | passed, but narrower than full tests |
| fixture regeneration check | blocked by missing historical `AOM_v0` oracle |
| small cross-binding PLS/PCR sample | confirmed external rows can be mislabeled as binding failures |
| slow-method pls4all smoke | confirmed selector/PCR timing and adapter issues need focused work |
| `scripts/bump_version.sh --check` | passed |
| ABI symbol diff | failed: the current library exports additional `n4m_*` symbols absent from `cpp/abi/expected_symbols_linux.txt` |

## Implementation status

Stabilization status:

- P0 gate semantics implemented in the orchestrator: external rows are no
  longer binding-parity failures, reference parity compares all successful
  rows against the canonical oracle, and `--only-pls4all` consumes stored
  oracle snapshots instead of skipping Gate 2.
- P1 dashboard/static docs updated to render one relevant gate per cell and
  to merge canonical `ref_*` rows atomically. C++ and external cells render
  reference parity; internal bindings render binding parity.
- P2 Python selector smoke fixed for UVE, and tier-2 selector wrappers now
  fail closed on unknown registry parameters. Python/R/MATLAB selector
  ValidationPlan defaults are aligned to the canonical 3-fold contiguous
  plan.
- P2 dashboard refresh data covers the previously red `100x50` cells for
  `continuum_regression`, PCR and the selector smoke set; unavailable
  formula/classdef selector wrappers are classified as not available rather
  than failed parity.
- P3 first performance pass landed for PCR batch projection and
  cross-validation fold-buffer reuse.
- P4 ABI snapshot refreshed for the public 1.16.0 symbols already exported
  by the current shared library.

## P0 — make gates truthful

1. In `benchmarks/cross_binding/orchestrator.py`, compute binding parity
   only for `n4m_core` and `pls4all_binding` rows. External rows must
   get `binding_parity_ok = None` or an explicit not-applicable code.
2. Keep reference parity for every successful row, including external
   libraries, against the canonical registry reference.
3. When a run intentionally omits canonical external references
   (`--only-pls4all`), load the stored oracle snapshot. Missing snapshots
   are setup failures that must be fixed by running the canonical reference
   backend.
4. Make missing required reference oracles a hard error in release-gate mode,
   with
   allowlisted `paper_only` methods only.
5. Move workstation-specific reference paths to environment/configuration
   or pinned packages. The AOM/POP oracle must be reproducible from a clean
   clone or explicitly excluded from a strict gate.

## P1 — fix dashboard and generated docs

1. Update `docs/_extras/build_landing.py` so canonical `ref_*` rows replace
   stale legacy cells atomically: `ok`, `reason`, both parity verdicts,
   timings, reference metadata and canonical flags.
2. Update dashboard filtering to use `reference_parity` for C++ and external
   libraries, and `binding_parity` for internal pls4all bindings.
3. Propagate method tolerance into CSV/JSON so drift/divergent thresholds
   use "10x method tolerance" instead of a hardcoded `rmse_rel < 10`.
4. Render the relevant gate in static Markdown tables: reference parity for
   C++/external rows, binding parity for internal rows. Prefer using the
   existing `dual_parity_label()` helper instead of ad hoc legacy output.
5. Exclude the synthetic reference column from timed-cell statistics and
   preset matching.
6. Keep `sphinx-design` enabled and load `tab-combo.js`; otherwise the
   generated method pages lose their tabbed content.

## P2 — restore binding parity

1. Fix the UVE sklearn pipeline failure by choosing an explicit policy for
   empty selections: add a `min_features`/fallback option or use a fixture
   parameter set that cannot select zero features in pipeline smoke tests.
   **Done.**
2. Stop silently dropping registry parameters in tier-2 wrappers. Add
   adapter maps for alias names or fail closed when a registry parameter is
   unsupported by a wrapper constructor. **Done for selector smoke.**
3. Unify selector validation plans across Python registry, sklearn classes,
   R dispatcher and MATLAB MEX. The cheapest deterministic option is a
   shared 3-fold contiguous plan; the more flexible option is to serialize
   fold indices through benchmark parameters. **Done with the 3-fold
   contiguous plan.**
4. Add C++ fixture coverage for selectors currently covered only by
   registry smoke tests.

## P3 — performance work

1. PCR: replace full `p x p` Jacobi eigensolve with a deterministic
   SVD/LAPACK or partial top-component solver, and use an `n x n` path when
   `p >> n`. **Partially done:** PCR now batches component projections and
   avoids score storage when not requested.
2. R vendoring: regenerate the vendored libn4m copy instead of manually
   carrying divergent `model.cpp` code.
3. Selectors: introduce a shared fitness evaluator that reuses buffers,
   validation folds and prediction arrays instead of reallocating for every
   candidate. **Started:** cross-validation fold buffers are reused across
   candidate evaluations.
4. Parallelize independent candidate evaluations for PSO, VISSA, BVE and
   IRIV while reducing results in deterministic order to preserve tie-breaks
   and RNG behavior.
5. Replace repeated full sorts with `nth_element` where only top-k masks are
   needed.

## P4 — packaging and release gates

1. Refresh the ABI snapshot intentionally. The audit saw more exported
   `n4m_*` symbols than `cpp/abi/expected_symbols_linux.txt` records.
2. Ensure Python sdist is either a real source build with CMake inputs
   included, or do not publish sdists until that path is supported.
3. Keep Python wheels smoke-tested from the built artifact, not from the
   editable checkout.
4. Keep R CRAN checks on the built tarball, and remove non-portable flags
   such as architecture-specific `-march=*` from CRAN builds.
5. Add a vendored-core sync check for the R package.
6. Treat MATLAB packaging as separate from PyPI/CRAN readiness until
   `toolbox.prj`, `release.m`, the complete MEX build and File Exchange
   workflow exist.

## Definition of "green"

The project is ready to resume method additions when:

- C++ fixture parity is reproducible from a clean clone;
- full Python tests, including sklearn pipeline smoke, are green;
- cross-binding Gate 1 is green for every shipped pls4all binding;
- cross-binding Gate 2 is green or explicitly relaxed for every shipped
  method and scheduled external reference;
- dashboard cells display both gates without legacy alias confusion;
- `pip install pls4all` and `R CMD check --as-cran` are validated from
  built artifacts;
- slow methods have baseline benchmarks and at least one profiling-backed
  optimization plan each.