# ABI — Changes Log

## 2026-06-06 — ABI 1.22.0: PLS CV reference surface

One additive public symbol:

- `n4m_pls_cross_validate`

This is a C/Python ABI entry point for exact PLS-only cross-validation over one
input matrix. The current implementation delegates to the PLS branch of
`n4m_sweep_run`, so candidate scores and CPU/CUDA route counters match the
existing sweep path. It is intentionally catalogued as ABI infrastructure, not
as a production method. The future fused/batched IKPLS-style multi-chain
executor can replace the internals without changing this signature.

## 2026-06-05 — ABI 1.21.0: CUDA PLS many-design batching config

Two additive public config helpers:

- `n4m_config_set_cuda_pls_many_batched`
- `n4m_config_get_cuda_pls_many_batched`

The default remains off. When enabled on a CUDA build, eligible PLS1 moment
many-design jobs may use the experimental tiled/strided-batched route that
also remains reachable through the `N4M_CUDA_PLS_MANY_BATCHED` environment
fallback. This changes only GPU scheduling and timings; candidate scores remain
fold-level exact for the selected scoring path.

## 2026-06-05 — ABI 1.20.0: CUDA PLS device threshold config

Two additive public config helpers:

- `n4m_config_set_cuda_pls_min_device_features`
- `n4m_config_get_cuda_pls_min_device_features`

The default threshold remains 1024 features, matching the conservative
historical CUDA PLS1 moment guard. Lower positive values let CPU/CUDA
crossover campaigns explicitly test medium-width PLS moment screens on the
selected single GPU without recompiling. This changes only route eligibility
and timing; candidate scores are unchanged for a given exact scoring path.

## 2026-06-05 — ABI 1.19.0: CUDA PLS fold scheduling config

Two additive public config helpers:

- `n4m_config_set_cuda_pls_parallel_folds`
- `n4m_config_get_cuda_pls_parallel_folds`

When enabled on a CUDA build, eligible exact PLS1 moment CV jobs may run in
bounded stream-parallel batches on the single selected GPU. This changes only
scheduling and timings; candidate scores are unchanged. Sweep and AOM
MethodResults also expose additive scalar counters
`n_pls_moment_cuda_parallel_fold_batches` and
`n_pls_moment_cuda_parallel_fold_jobs` for fit-cost auditing.

## 2026-06-05 — ABI 1.18.x: strict AOM Gaussian operator kind

No public symbol or result layout change. The public operator enum gains one
additive value:

- `N4M_OP_GAUSSIAN = 18`

This value is accepted by the strict AOM chain sweep and represents a fixed,
shape-preserving zero-padding Gaussian convolution with a banded
operator-moment descriptor. It is distinct from the full `pp_gaussian`
preprocessing transformer surface.

## 2026-06-05 — ABI 1.18.x: AOM chain fixed final fit

One additive public symbol:

- `n4m_aom_chain_fixed_fit_run(n4m_context_t*, const n4m_config_t*,
  const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y,
  const int32_t* chain_offsets, int64_t n_chain_offsets,
  const int32_t* op_kinds, int64_t n_op_kinds,
  const int32_t* param_offsets, int64_t n_param_offsets,
  const double* params, int64_t n_params, int32_t head_id, double param,
  n4m_method_result_t** out_result)`

This fits one already-selected caller-provided strict-linear AOM
chain/head/parameter on all rows without running CV. It is a model-building
endpoint, not a ranking endpoint: CV score fields are NaN unless a higher-level
wrapper injects an externally verified exact-CV score. Python uses this in
`NativeAOMScreenRefitRegressor` after exact-CV refit so reusable model
construction no longer repays one-candidate CV.

## 2026-06-05 — ABI 1.18.x: AOM score-only screen output mode

Two additive public config helpers:

- `n4m_config_set_aom_score_only`
- `n4m_config_get_aom_score_only`

When enabled for `n4m_aom_sweep_run` or `n4m_aom_chain_sweep_run`, the result
keeps the candidate-score table, selected identifiers, route counters and fold
ids, but omits selected-model matrices by returning them as `0 x 0`. This is
an additive output/cost-control knob for large preprocessing ranking passes.

## 2026-06-04 — ABI 1.18.0: native AOM operator PLS score stack

One additive public symbol (ABI MINOR bump 1.17.0 -> 1.18.0),
backward-compatible (no signature/layout change, nothing removed):

- `n4m_aom_operator_pls_stack_fit(n4m_context_t*, const n4m_config_t*,
  const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t profile,
  int32_t cv, const int32_t* fold_ids, int64_t n_fold_ids,
  const int32_t* components, int64_t n_components, const double* alphas,
  int64_t n_alphas, double std_penalty, double gap_penalty,
  n4m_method_result_t** out_result)`

This exposes a native strict-linear AOM operator PLS1 score stack. The method
builds compact or wide AOM operator banks, fits fold-local PLS1 score
projectors per operator, concatenates the scores, selects `(n_components,
alpha)` by train-only CV criterion, and refits the selected stack on all rows
with a Ridge head.

The returned `n4m_method_result_t` carries `candidate_scores`, `fold_scores`,
`oof_predictions`, `predictions`, `stack_features`, `coefficients`,
`intercept`, `fold_ids` and `operator_feature_offsets`. `candidate_scores`
columns are `spec_id`, `n_components`, `alpha`, `mean_oof_rmse`,
`std_oof_rmse`, `mean_train_rmse`, `criterion`.

Native v1 is single-target (`Y.cols == 1`) and not yet a fused batched GPU
stack. Custom Python operator matrices, shuffled/both CV and baseline admission
gating remain in the Python `AOMOperatorPLSStack` estimator.

The implementation lives in `cpp/src/core/aom_operator_pls_stack.cpp` and is
dispatched from `cpp/src/c_api/c_api_method_result.cpp`. The symbol is declared
in `cpp/include/n4m/pls.h`, exported in all ABI snapshots, catalogued as
`aom_pop.operator_pls_stack`, wrapped in Python as
`n4m.aom_operator_pls_stack`, and documented in
`docs/methods/aom_operator_pls_stack.md`.

## 2026-06-04 — ABI 1.17.0: native AOM Ridge OOF simplex blender

One additive public symbol (ABI MINOR bump 1.16.0 -> 1.17.0),
backward-compatible (no signature/layout change, nothing removed):

- `n4m_aom_ridge_blender_fit(n4m_context_t*, const n4m_config_t*,
  const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t profile,
  int32_t cv, const int32_t* fold_ids, int64_t n_fold_ids,
  const double* ridge_lambdas, int64_t n_ridge_lambdas, double regularizer,
  n4m_method_result_t** out_result)`

This exposes a native strict-linear AOM Ridge candidate blender. The method
builds compact or wide AOM chain banks, scores each chain/lambda candidate by
fold-local OOF Ridge predictions, solves a regularized non-negative simplex
blend, and refits all candidates on the full training data for final blended
predictions.

The returned `n4m_method_result_t` carries `candidate_scores`, `weights`,
`oof_predictions`, `predictions`, `oof_candidate_predictions`,
`candidate_predictions` and `fold_ids`. `candidate_scores` columns are
`candidate_id`, `chain_id`, `lambda`, `cv_rmse`, `weight`.

Native v1 requires strictly positive Ridge lambdas and is not yet a fused
batched GPU blender. It builds in CUDA-enabled configurations, but the
candidate loop still uses the existing native Ridge path per fold/candidate.

The implementation lives in `cpp/src/core/aom_ridge_blender.cpp` and is
dispatched from `cpp/src/c_api/c_api_method_result.cpp`. The symbol is declared
in `cpp/include/n4m/pls.h`, exported in all ABI snapshots, catalogued as
`aom_pop.ridge_blender`, wrapped in Python as `n4m.aom_ridge_blender`, and
documented in `docs/methods/aom_ridge_blender.md`.

## 2026-06-04 — ABI 1.16.0: user-defined AOM chain sweep

One additive public symbol (ABI MINOR bump 1.15.0 -> 1.16.0),
backward-compatible (no signature/layout change, nothing removed):

- `n4m_aom_chain_sweep_run(n4m_context_t*, const n4m_config_t*,
  const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t cv,
  const int32_t* fold_ids, int64_t n_fold_ids,
  const int32_t* chain_offsets, int64_t n_chain_offsets,
  const int32_t* op_kinds, int64_t n_op_kinds,
  const int32_t* param_offsets, int64_t n_param_offsets,
  const double* params, int64_t n_params,
  const double* ridge_lambdas, int64_t n_ridge_lambdas,
  const int32_t* pls_components, int64_t n_pls_components,
  int32_t heads_mask, n4m_method_result_t** out_result)`

This exposes a flat descriptor for caller-provided strict-linear preprocessing
chains. `chain_offsets` partitions `op_kinds`; `param_offsets` partitions the
flat `params` payload. Empty chains are rejected; callers use an explicit
identity operator for raw spectra. Supported operators are identity, polynomial
detrend, Savitzky-Golay smooth/derivative, Norris-Williams, finite difference,
Whittaker, FCK and Gaussian.

The result shape matches `n4m_aom_sweep_run`; `candidate_scores` columns are
`candidate_id`, `chain_id`, `head_id`, `param`, `cv_rmse`, and scalar
`profile` is `-1` for caller-provided chains.

This is the first ABI-stable arbitrary strict-linear preprocessing-chain
surface. It still materializes transformed matrices per chain and uses
materialized PLS CV; fused operator-moment updates, batched IKPLS and CUDA
kernels remain later acceleration work.

The implementation lives in `cpp/src/core/aom_sweep.cpp` and is dispatched from
`cpp/src/c_api/c_api_method_result.cpp`. The symbol is declared in
`cpp/include/n4m/pls.h`, catalogued as `aom_pop.aom_chain_sweep`, wrapped in
Python as `n4m.aom_chain_sweep_run`, and documented in
`docs/methods/aom_chain_sweep_run.md`.

## 2026-06-04 — ABI 1.15.0: configurable native AOM preprocessing sweep

One additive public symbol (ABI MINOR bump 1.14.0 -> 1.15.0),
backward-compatible (no signature/layout change, nothing removed):

- `n4m_aom_sweep_run(n4m_context_t*, const n4m_config_t*,
  const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t profile,
  int32_t cv, const int32_t* fold_ids, int64_t n_fold_ids,
  const double* ridge_lambdas, int64_t n_ridge_lambdas,
  const int32_t* pls_components, int64_t n_pls_components,
  int32_t heads_mask, n4m_method_result_t** out_result)`

The symbol applies the native strict-linear AOM compact/wide preprocessing
chain bank, then delegates candidate scoring to `n4m_sweep_run` over Ridge
lambdas and/or PLS component counts. It returns `candidate_scores`,
`oof_predictions`, final `predictions`, coefficients/intercept and fold ids.
`candidate_scores` has columns `candidate_id`, `chain_id`, `head_id`, `param`,
`cv_rmse`; `head_id` is `0` for Ridge and `1` for PLS.

This is a configurable product sweep over the fixed AOM strict-linear banks. It
is not yet the arbitrary operator-descriptor layer or fused batched IKPLS/CUDA
grinder.

The implementation lives in `cpp/src/core/aom_sweep.cpp` and is dispatched from
`cpp/src/c_api/c_api_method_result.cpp`. The symbol is declared in
`cpp/include/n4m/pls.h`, exported in all ABI snapshots, catalogued as
`aom_pop.aom_sweep`, wrapped in Python as `n4m.aom_sweep_run`, and documented
in `docs/methods/aom_sweep_run.md`.

## 2026-06-04 — ABI 1.14.0: native Ridge/PLS sweep

One additive public symbol (ABI MINOR bump 1.13.0 -> 1.14.0),
backward-compatible (no signature/layout change, nothing removed):

- `n4m_sweep_run(n4m_context_t*, const n4m_config_t*,
  const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t cv,
  const int32_t* fold_ids, int64_t n_fold_ids, const double* ridge_lambdas,
  int64_t n_ridge_lambdas, const int32_t* pls_components,
  int64_t n_pls_components, int32_t heads_mask,
  n4m_method_result_t** out_result)`

ABI v1 supports exact Ridge CV over row-additive moments where efficient, with
a precomputed dual Ridge path when `p > n_train`. It also supports fold-local
materialized PLS component screening through the existing native PLS model path.
The returned `n4m_method_result_t` carries `candidate_scores`,
`oof_predictions`, final `predictions`, coefficients/intercept and fold ids.
`candidate_scores[:,1]` is `0` for Ridge and `1` for PLS; `param` is lambda for
Ridge and `n_components` for PLS.

The fused batched IKPLS/operator-descriptor grinder is not part of ABI v1.

The implementation lives in `cpp/src/core/sweep.cpp` and is dispatched from
`cpp/src/c_api/c_api_method_result.cpp`. The symbol is declared in
`cpp/include/n4m/pls.h`, exported in all ABI snapshots, catalogued as
`utilities.sweep`, wrapped in Python as `n4m.sweep_run`, and documented in
`docs/methods/sweep_run.md`.

## 2026-06-04 — ABI 1.13.0: native row-additive moment substrate

Three additive public symbols (ABI MINOR bump 1.12.0 -> 1.13.0),
backward-compatible (no signature/layout change, nothing removed):

- `n4m_moments_compute(n4m_context_t*, const n4m_matrix_view_t* X,
  const n4m_matrix_view_t* Y, n4m_method_result_t** out_result)`
- `n4m_moments_subset_compute(n4m_context_t*, const n4m_matrix_view_t* X,
  const n4m_matrix_view_t* Y, const int64_t* row_indices, int64_t n_indices,
  n4m_method_result_t** out_result)`
- `n4m_moments_subtract(n4m_context_t*, const n4m_method_result_t* lhs,
  const n4m_method_result_t* rhs, n4m_method_result_t** out_result)`

The result is a `n4m_method_result_t` carrying raw additive moments
(`x_sum`, `y_sum`, `xtx`, `xty`, `yty`) and centered moments recomputed from
the raw sums (`x_mean`, `y_mean`, `cxx`, `cxy`, `cyy`). This gives an exact
fold-subtraction primitive for PLS/Ridge screens: compute all rows, compute the
held-out rows, subtract raw moments, then recenter on the remaining train rows.

The implementation lives in `cpp/src/core/moments.cpp` and is dispatched from
`cpp/src/c_api/c_api_method_result.cpp`. The symbols are declared in
`cpp/include/n4m/pls.h`, exported in all ABI snapshots, catalogued as
`utilities.moments`, wrapped in Python as `n4m.moments` /
`n4m.moments_train_from_heldout`, and documented in `docs/methods/moments.md`.

## 2026-06-04 — ABI 1.12.0: native AOM robust-HPO screen

One additive public symbol (ABI MINOR bump 1.11.0 -> 1.12.0),
backward-compatible (no signature/layout change, nothing removed):

- `n4m_aom_robust_hpo_fit(n4m_context_t*, const n4m_config_t*,
  const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t profile,
  int32_t cv, int32_t heads_mask, n4m_method_result_t** out_result)`

This exposes the product AOM robust-HPO preprocessing screen through the public
C ABI. Native v1 screens compact/wide banks of strict-linear, shape-preserving
AOM preprocessing chains and Ridge/PLS heads by contiguous K-fold CV RMSE. It
returns a `n4m_method_result_t` carrying in-sample predictions after refitting
the selected candidate, transformed-space coefficients, intercept, scalar
selection diagnostics and the full `candidate_scores` matrix
(`chain_id`, `head_id`, `param`, `mean_cv_rmse`).

The implementation lives in `cpp/src/core/aom_robust_hpo.cpp` and is dispatched
from `cpp/src/c_api/c_api_method_result.cpp`. The symbol is declared in
`cpp/include/n4m/pls.h`, exported in all ABI snapshots, catalogued as
`aom_pop.robust_hpo`, wrapped in Python as `n4m.aom_robust_hpo`, and documented
in `docs/methods/aom_robust_hpo.md`.

## 2026-06-03 — ABI 1.11.0: direct (closed-form) Ridge regression

One additive public symbol (ABI MINOR bump 1.10.0 → 1.11.0), backward-compatible
(no signature/layout change, nothing removed):

- `n4m_ridge_fit(n4m_context_t*, const n4m_config_t*, const n4m_matrix_view_t* X,
  const n4m_matrix_view_t* Y, const double* lambdas, int64_t n_lambdas,
  n4m_method_result_t** out_result)`

This is a **genuine closed-form** multi-output Ridge — `beta = (Xc'Xc + lambda I)^-1
Xc'Yc` on column-centered X/Y with `intercept = y_mean - x_mean.beta` (the penalty is
not applied to the intercept, for `sklearn.linear_model.Ridge` parity). It is distinct
from the pre-existing `n4m_ridge_pls_fit` (ridge-augmented SIMPLS, rank-truncated by
`n_components`). The solver is chosen automatically by shape (PRIMAL augmented-QR for
p ≤ n, DUAL Gram-on-samples for p > n; identical coefficients up to round-off).

Declared with `N4M_API` in `cpp/include/n4m/pls.h` (after `n4m_continuum_regression_fit`),
implemented in `cpp/src/c_api/c_api_method_result.cpp` over the new core kernel
`cpp/src/core/ridge.cpp`. Result keys: `coefficients` (p×q), `intercept` (1×q),
`x_mean`, `x_scale` (1×p), `y_mean` (1×q), `predictions` (n×q), scalar `rmse`,
scalar `lambda`.

Snapshots regenerated for all three platforms via
`scripts/regen_abi_snapshots.sh --derive` (linux from the lib; macos/windows derived
= linux minus the `N4M_1` version node). `n4m_ridge_fit` is present in
`cpp/abi/expected_symbols_{linux,macos,windows}.txt`. Header
`N4M_ABI_VERSION_MINOR` and `bindings/python/src/n4m/_ffi.py:ABI_VERSION_MINOR`
both bumped 10 → 11; `bump_version.sh --check` is green (project version unchanged
at 0.98.0).

## 2026-06-03 — macOS/Windows snapshot correction + cross-platform gate enforced

No ABI surface change (still ABI 1.10.0). This is an **audit-trail and CI
correction**: the 2026-05-30 entry below claimed "Snapshots regenerated for all
three platforms", but `expected_symbols_{macos,windows}.txt` were in fact a stale,
truncated copy of an old Linux `nm -D` dump — 500 lines, carrying the Linux-only
`@@N4M_1` version tag (which macOS `nm -gU` / Windows `dumpbin` never emit), and
missing ~171 symbols (the whole selection / method-result / aom / config family).
They were also **not diffed by CI** on macOS/Windows (only Linux was fail-closed).

Corrected here:

- `expected_symbols_{macos,windows}.txt` regenerated to the real 671-symbol set
  — identical to the Linux `n4m_*` names minus the Linux-only `N4M_1` version
  node (the only legitimate cross-platform difference).
- `.github/workflows/abi-check.yml` now diffs the committed snapshot **fail-closed
  on all three platforms** (macOS `diff`, Windows `Compare-Object` set comparison),
  with `LC_ALL=C`-pinned sorts so ordering is reproducible.
- Added a SONAME / RPATH-RUNPATH linkage gate to the Linux job (asserts
  `SONAME == libn4m.so.1` and no baked-in absolute search path).
- Added `scripts/regen_abi_snapshots.sh` — the single canonical regenerator
  (`--check` for CI/pre-commit, `--derive` to produce the macOS/Windows files
  from the Linux snapshot when only a Linux box is available).

## 2026-05-18 — Linux export baseline for ABI 1.16.0

`build/dev-release/cpp/src/libn4m.so.1.16.0` exports 27 additional
`n4m_*` symbols compared with the previous Linux baseline. Each added symbol is
declared with `N4M_API` in the public header `cpp/include/pls4all/p4a.h`, so the
Linux ABI gate now treats them as intentional public additions:

- `n4m_method_result_get_int64_vector`
- `n4m_mb_pls_fit`, `n4m_lw_pls_fit`, `n4m_pls_lda_fit`,
  `n4m_pls_logistic_fit`, `n4m_aom_preprocess_fit`
- `n4m_variable_select_rank`, `n4m_interval_select`,
  `n4m_stability_select`, `n4m_uve_select`, `n4m_spa_select`,
  `n4m_cars_select`, `n4m_random_frog_select`, `n4m_scars_select`,
  `n4m_ga_select`, `n4m_shaving_select`, `n4m_bve_select`,
  `n4m_t2_select`, `n4m_wvc_select`, `n4m_wvc_threshold_select`,
  `n4m_emcuve_select`, `n4m_randomization_select`, `n4m_bipls_select`,
  `n4m_sipls_select`, `n4m_rep_select`, `n4m_ipw_select`, `n4m_st_select`

## 2026-05-30 — ABI 1.10.0: additive RNG-kind config selector

Two additive public symbols (ABI MINOR bump 1.9.0 → 1.10.0), backward-compatible
(no signature/layout change, nothing removed):

- `n4m_config_set_rng_kind(n4m_config_t*, n4m_rng_kind_t)`
- `n4m_config_get_rng_kind(const n4m_config_t*, n4m_rng_kind_t*)`

New enum `n4m_rng_kind_t` { `N4M_RNG_SPLITMIX64`=0 (default), `N4M_RNG_PCG64`=1,
`N4M_RNG_MT_R`=2, `N4M_RNG_NUMPY_MT`=3 } selects the RNG engine a stochastic
method draws from, so its output can match an external reference library's exact
RNG (numpy default_rng / base R / numpy RandomState) for parity. Default
SPLITMIX64 reproduces n4m's historical streams bit-for-bit — leaving it unset
changes nothing. Snapshots regenerated for all three platforms
(expected_symbols_{linux,macos,windows}.txt). Engines verified bit-exact:
docs/dev/RNG_TIER0_INVENTORY.md, cpp/tests/test_rng_engine.cpp.