ABI — Changes Log

2026-06-06 — ABI 1.22.0: PLS CV reference surface

One additive public symbol:

  • n4m_pls_cross_validate

This is a C/Python ABI entry point for exact PLS-only cross-validation over one input matrix. The current implementation delegates to the PLS branch of n4m_sweep_run, so candidate scores and CPU/CUDA route counters match the existing sweep path. It is intentionally catalogued as ABI infrastructure, not as a production method. The future fused/batched IKPLS-style multi-chain executor can replace the internals without changing this signature.

2026-06-05 — ABI 1.21.0: CUDA PLS many-design batching config

Two additive public config helpers:

  • n4m_config_set_cuda_pls_many_batched

  • n4m_config_get_cuda_pls_many_batched

The default remains off. When enabled on a CUDA build, eligible PLS1 moment many-design jobs may use the experimental tiled/strided-batched route that also remains reachable through the N4M_CUDA_PLS_MANY_BATCHED environment fallback. This changes only GPU scheduling and timings; candidate scores remain fold-level exact for the selected scoring path.

2026-06-05 — ABI 1.20.0: CUDA PLS device threshold config

Two additive public config helpers:

  • n4m_config_set_cuda_pls_min_device_features

  • n4m_config_get_cuda_pls_min_device_features

The default threshold remains 1024 features, matching the conservative historical CUDA PLS1 moment guard. Lower positive values let CPU/CUDA crossover campaigns explicitly test medium-width PLS moment screens on the selected single GPU without recompiling. This changes only route eligibility and timing; candidate scores are unchanged for a given exact scoring path.

2026-06-05 — ABI 1.19.0: CUDA PLS fold scheduling config

Two additive public config helpers:

  • n4m_config_set_cuda_pls_parallel_folds

  • n4m_config_get_cuda_pls_parallel_folds

When enabled on a CUDA build, eligible exact PLS1 moment CV jobs may run in bounded stream-parallel batches on the single selected GPU. This changes only scheduling and timings; candidate scores are unchanged. Sweep and AOM MethodResults also expose additive scalar counters n_pls_moment_cuda_parallel_fold_batches and n_pls_moment_cuda_parallel_fold_jobs for fit-cost auditing.

2026-06-05 — ABI 1.18.x: strict AOM Gaussian operator kind

No public symbol or result layout change. The public operator enum gains one additive value:

  • N4M_OP_GAUSSIAN = 18

This value is accepted by the strict AOM chain sweep and represents a fixed, shape-preserving zero-padding Gaussian convolution with a banded operator-moment descriptor. It is distinct from the full pp_gaussian preprocessing transformer surface.

2026-06-05 — ABI 1.18.x: AOM chain fixed final fit

One additive public symbol:

  • n4m_aom_chain_fixed_fit_run(n4m_context_t*, const n4m_config_t*, const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, const int32_t* chain_offsets, int64_t n_chain_offsets, const int32_t* op_kinds, int64_t n_op_kinds, const int32_t* param_offsets, int64_t n_param_offsets, const double* params, int64_t n_params, int32_t head_id, double param, n4m_method_result_t** out_result)

This fits one already-selected caller-provided strict-linear AOM chain/head/parameter on all rows without running CV. It is a model-building endpoint, not a ranking endpoint: CV score fields are NaN unless a higher-level wrapper injects an externally verified exact-CV score. Python uses this in NativeAOMScreenRefitRegressor after exact-CV refit so reusable model construction no longer repays one-candidate CV.

2026-06-05 — ABI 1.18.x: AOM score-only screen output mode

Two additive public config helpers:

  • n4m_config_set_aom_score_only

  • n4m_config_get_aom_score_only

When enabled for n4m_aom_sweep_run or n4m_aom_chain_sweep_run, the result keeps the candidate-score table, selected identifiers, route counters and fold ids, but omits selected-model matrices by returning them as 0 x 0. This is an additive output/cost-control knob for large preprocessing ranking passes.

2026-06-04 — ABI 1.18.0: native AOM operator PLS score stack

One additive public symbol (ABI MINOR bump 1.17.0 -> 1.18.0), backward-compatible (no signature/layout change, nothing removed):

  • n4m_aom_operator_pls_stack_fit(n4m_context_t*, const n4m_config_t*, const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t profile, int32_t cv, const int32_t* fold_ids, int64_t n_fold_ids, const int32_t* components, int64_t n_components, const double* alphas, int64_t n_alphas, double std_penalty, double gap_penalty, n4m_method_result_t** out_result)

This exposes a native strict-linear AOM operator PLS1 score stack. The method builds compact or wide AOM operator banks, fits fold-local PLS1 score projectors per operator, concatenates the scores, selects (n_components, alpha) by train-only CV criterion, and refits the selected stack on all rows with a Ridge head.

The returned n4m_method_result_t carries candidate_scores, fold_scores, oof_predictions, predictions, stack_features, coefficients, intercept, fold_ids and operator_feature_offsets. candidate_scores columns are spec_id, n_components, alpha, mean_oof_rmse, std_oof_rmse, mean_train_rmse, criterion.

Native v1 is single-target (Y.cols == 1) and not yet a fused batched GPU stack. Custom Python operator matrices, shuffled/both CV and baseline admission gating remain in the Python AOMOperatorPLSStack estimator.

The implementation lives in cpp/src/core/aom_operator_pls_stack.cpp and is dispatched from cpp/src/c_api/c_api_method_result.cpp. The symbol is declared in cpp/include/n4m/pls.h, exported in all ABI snapshots, catalogued as aom_pop.operator_pls_stack, wrapped in Python as n4m.aom_operator_pls_stack, and documented in docs/methods/aom_operator_pls_stack.md.

2026-06-04 — ABI 1.17.0: native AOM Ridge OOF simplex blender

One additive public symbol (ABI MINOR bump 1.16.0 -> 1.17.0), backward-compatible (no signature/layout change, nothing removed):

  • n4m_aom_ridge_blender_fit(n4m_context_t*, const n4m_config_t*, const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t profile, int32_t cv, const int32_t* fold_ids, int64_t n_fold_ids, const double* ridge_lambdas, int64_t n_ridge_lambdas, double regularizer, n4m_method_result_t** out_result)

This exposes a native strict-linear AOM Ridge candidate blender. The method builds compact or wide AOM chain banks, scores each chain/lambda candidate by fold-local OOF Ridge predictions, solves a regularized non-negative simplex blend, and refits all candidates on the full training data for final blended predictions.

The returned n4m_method_result_t carries candidate_scores, weights, oof_predictions, predictions, oof_candidate_predictions, candidate_predictions and fold_ids. candidate_scores columns are candidate_id, chain_id, lambda, cv_rmse, weight.

Native v1 requires strictly positive Ridge lambdas and is not yet a fused batched GPU blender. It builds in CUDA-enabled configurations, but the candidate loop still uses the existing native Ridge path per fold/candidate.

The implementation lives in cpp/src/core/aom_ridge_blender.cpp and is dispatched from cpp/src/c_api/c_api_method_result.cpp. The symbol is declared in cpp/include/n4m/pls.h, exported in all ABI snapshots, catalogued as aom_pop.ridge_blender, wrapped in Python as n4m.aom_ridge_blender, and documented in docs/methods/aom_ridge_blender.md.

2026-06-04 — ABI 1.16.0: user-defined AOM chain sweep

One additive public symbol (ABI MINOR bump 1.15.0 -> 1.16.0), backward-compatible (no signature/layout change, nothing removed):

  • n4m_aom_chain_sweep_run(n4m_context_t*, const n4m_config_t*, const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t cv, const int32_t* fold_ids, int64_t n_fold_ids, const int32_t* chain_offsets, int64_t n_chain_offsets, const int32_t* op_kinds, int64_t n_op_kinds, const int32_t* param_offsets, int64_t n_param_offsets, const double* params, int64_t n_params, const double* ridge_lambdas, int64_t n_ridge_lambdas, const int32_t* pls_components, int64_t n_pls_components, int32_t heads_mask, n4m_method_result_t** out_result)

This exposes a flat descriptor for caller-provided strict-linear preprocessing chains. chain_offsets partitions op_kinds; param_offsets partitions the flat params payload. Empty chains are rejected; callers use an explicit identity operator for raw spectra. Supported operators are identity, polynomial detrend, Savitzky-Golay smooth/derivative, Norris-Williams, finite difference, Whittaker, FCK and Gaussian.

The result shape matches n4m_aom_sweep_run; candidate_scores columns are candidate_id, chain_id, head_id, param, cv_rmse, and scalar profile is -1 for caller-provided chains.

This is the first ABI-stable arbitrary strict-linear preprocessing-chain surface. It still materializes transformed matrices per chain and uses materialized PLS CV; fused operator-moment updates, batched IKPLS and CUDA kernels remain later acceleration work.

The implementation lives in cpp/src/core/aom_sweep.cpp and is dispatched from cpp/src/c_api/c_api_method_result.cpp. The symbol is declared in cpp/include/n4m/pls.h, catalogued as aom_pop.aom_chain_sweep, wrapped in Python as n4m.aom_chain_sweep_run, and documented in docs/methods/aom_chain_sweep_run.md.

2026-06-04 — ABI 1.15.0: configurable native AOM preprocessing sweep

One additive public symbol (ABI MINOR bump 1.14.0 -> 1.15.0), backward-compatible (no signature/layout change, nothing removed):

  • n4m_aom_sweep_run(n4m_context_t*, const n4m_config_t*, const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t profile, int32_t cv, const int32_t* fold_ids, int64_t n_fold_ids, const double* ridge_lambdas, int64_t n_ridge_lambdas, const int32_t* pls_components, int64_t n_pls_components, int32_t heads_mask, n4m_method_result_t** out_result)

The symbol applies the native strict-linear AOM compact/wide preprocessing chain bank, then delegates candidate scoring to n4m_sweep_run over Ridge lambdas and/or PLS component counts. It returns candidate_scores, oof_predictions, final predictions, coefficients/intercept and fold ids. candidate_scores has columns candidate_id, chain_id, head_id, param, cv_rmse; head_id is 0 for Ridge and 1 for PLS.

This is a configurable product sweep over the fixed AOM strict-linear banks. It is not yet the arbitrary operator-descriptor layer or fused batched IKPLS/CUDA grinder.

The implementation lives in cpp/src/core/aom_sweep.cpp and is dispatched from cpp/src/c_api/c_api_method_result.cpp. The symbol is declared in cpp/include/n4m/pls.h, exported in all ABI snapshots, catalogued as aom_pop.aom_sweep, wrapped in Python as n4m.aom_sweep_run, and documented in docs/methods/aom_sweep_run.md.

2026-06-04 — ABI 1.14.0: native Ridge/PLS sweep

One additive public symbol (ABI MINOR bump 1.13.0 -> 1.14.0), backward-compatible (no signature/layout change, nothing removed):

  • n4m_sweep_run(n4m_context_t*, const n4m_config_t*, const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t cv, const int32_t* fold_ids, int64_t n_fold_ids, const double* ridge_lambdas, int64_t n_ridge_lambdas, const int32_t* pls_components, int64_t n_pls_components, int32_t heads_mask, n4m_method_result_t** out_result)

ABI v1 supports exact Ridge CV over row-additive moments where efficient, with a precomputed dual Ridge path when p > n_train. It also supports fold-local materialized PLS component screening through the existing native PLS model path. The returned n4m_method_result_t carries candidate_scores, oof_predictions, final predictions, coefficients/intercept and fold ids. candidate_scores[:,1] is 0 for Ridge and 1 for PLS; param is lambda for Ridge and n_components for PLS.

The fused batched IKPLS/operator-descriptor grinder is not part of ABI v1.

The implementation lives in cpp/src/core/sweep.cpp and is dispatched from cpp/src/c_api/c_api_method_result.cpp. The symbol is declared in cpp/include/n4m/pls.h, exported in all ABI snapshots, catalogued as utilities.sweep, wrapped in Python as n4m.sweep_run, and documented in docs/methods/sweep_run.md.

2026-06-04 — ABI 1.13.0: native row-additive moment substrate

Three additive public symbols (ABI MINOR bump 1.12.0 -> 1.13.0), backward-compatible (no signature/layout change, nothing removed):

  • n4m_moments_compute(n4m_context_t*, const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, n4m_method_result_t** out_result)

  • n4m_moments_subset_compute(n4m_context_t*, const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, const int64_t* row_indices, int64_t n_indices, n4m_method_result_t** out_result)

  • n4m_moments_subtract(n4m_context_t*, const n4m_method_result_t* lhs, const n4m_method_result_t* rhs, n4m_method_result_t** out_result)

The result is a n4m_method_result_t carrying raw additive moments (x_sum, y_sum, xtx, xty, yty) and centered moments recomputed from the raw sums (x_mean, y_mean, cxx, cxy, cyy). This gives an exact fold-subtraction primitive for PLS/Ridge screens: compute all rows, compute the held-out rows, subtract raw moments, then recenter on the remaining train rows.

The implementation lives in cpp/src/core/moments.cpp and is dispatched from cpp/src/c_api/c_api_method_result.cpp. The symbols are declared in cpp/include/n4m/pls.h, exported in all ABI snapshots, catalogued as utilities.moments, wrapped in Python as n4m.moments / n4m.moments_train_from_heldout, and documented in docs/methods/moments.md.

2026-06-04 — ABI 1.12.0: native AOM robust-HPO screen

One additive public symbol (ABI MINOR bump 1.11.0 -> 1.12.0), backward-compatible (no signature/layout change, nothing removed):

  • n4m_aom_robust_hpo_fit(n4m_context_t*, const n4m_config_t*, const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, int32_t profile, int32_t cv, int32_t heads_mask, n4m_method_result_t** out_result)

This exposes the product AOM robust-HPO preprocessing screen through the public C ABI. Native v1 screens compact/wide banks of strict-linear, shape-preserving AOM preprocessing chains and Ridge/PLS heads by contiguous K-fold CV RMSE. It returns a n4m_method_result_t carrying in-sample predictions after refitting the selected candidate, transformed-space coefficients, intercept, scalar selection diagnostics and the full candidate_scores matrix (chain_id, head_id, param, mean_cv_rmse).

The implementation lives in cpp/src/core/aom_robust_hpo.cpp and is dispatched from cpp/src/c_api/c_api_method_result.cpp. The symbol is declared in cpp/include/n4m/pls.h, exported in all ABI snapshots, catalogued as aom_pop.robust_hpo, wrapped in Python as n4m.aom_robust_hpo, and documented in docs/methods/aom_robust_hpo.md.

2026-06-03 — ABI 1.11.0: direct (closed-form) Ridge regression

One additive public symbol (ABI MINOR bump 1.10.0 → 1.11.0), backward-compatible (no signature/layout change, nothing removed):

  • n4m_ridge_fit(n4m_context_t*, const n4m_config_t*, const n4m_matrix_view_t* X, const n4m_matrix_view_t* Y, const double* lambdas, int64_t n_lambdas, n4m_method_result_t** out_result)

This is a genuine closed-form multi-output Ridge — beta = (Xc'Xc + lambda I)^-1 Xc'Yc on column-centered X/Y with intercept = y_mean - x_mean.beta (the penalty is not applied to the intercept, for sklearn.linear_model.Ridge parity). It is distinct from the pre-existing n4m_ridge_pls_fit (ridge-augmented SIMPLS, rank-truncated by n_components). The solver is chosen automatically by shape (PRIMAL augmented-QR for p ≤ n, DUAL Gram-on-samples for p > n; identical coefficients up to round-off).

Declared with N4M_API in cpp/include/n4m/pls.h (after n4m_continuum_regression_fit), implemented in cpp/src/c_api/c_api_method_result.cpp over the new core kernel cpp/src/core/ridge.cpp. Result keys: coefficients (p×q), intercept (1×q), x_mean, x_scale (1×p), y_mean (1×q), predictions (n×q), scalar rmse, scalar lambda.

Snapshots regenerated for all three platforms via scripts/regen_abi_snapshots.sh --derive (linux from the lib; macos/windows derived = linux minus the N4M_1 version node). n4m_ridge_fit is present in cpp/abi/expected_symbols_{linux,macos,windows}.txt. Header N4M_ABI_VERSION_MINOR and bindings/python/src/n4m/_ffi.py:ABI_VERSION_MINOR both bumped 10 → 11; bump_version.sh --check is green (project version unchanged at 0.98.0).

2026-06-03 — macOS/Windows snapshot correction + cross-platform gate enforced

No ABI surface change (still ABI 1.10.0). This is an audit-trail and CI correction: the 2026-05-30 entry below claimed “Snapshots regenerated for all three platforms”, but expected_symbols_{macos,windows}.txt were in fact a stale, truncated copy of an old Linux nm -D dump — 500 lines, carrying the Linux-only @@N4M_1 version tag (which macOS nm -gU / Windows dumpbin never emit), and missing ~171 symbols (the whole selection / method-result / aom / config family). They were also not diffed by CI on macOS/Windows (only Linux was fail-closed).

Corrected here:

  • expected_symbols_{macos,windows}.txt regenerated to the real 671-symbol set — identical to the Linux n4m_* names minus the Linux-only N4M_1 version node (the only legitimate cross-platform difference).

  • .github/workflows/abi-check.yml now diffs the committed snapshot fail-closed on all three platforms (macOS diff, Windows Compare-Object set comparison), with LC_ALL=C-pinned sorts so ordering is reproducible.

  • Added a SONAME / RPATH-RUNPATH linkage gate to the Linux job (asserts SONAME == libn4m.so.1 and no baked-in absolute search path).

  • Added scripts/regen_abi_snapshots.sh — the single canonical regenerator (--check for CI/pre-commit, --derive to produce the macOS/Windows files from the Linux snapshot when only a Linux box is available).

2026-05-18 — Linux export baseline for ABI 1.16.0

build/dev-release/cpp/src/libn4m.so.1.16.0 exports 27 additional n4m_* symbols compared with the previous Linux baseline. Each added symbol is declared with N4M_API in the public header cpp/include/pls4all/p4a.h, so the Linux ABI gate now treats them as intentional public additions:

  • n4m_method_result_get_int64_vector

  • n4m_mb_pls_fit, n4m_lw_pls_fit, n4m_pls_lda_fit, n4m_pls_logistic_fit, n4m_aom_preprocess_fit

  • n4m_variable_select_rank, n4m_interval_select, n4m_stability_select, n4m_uve_select, n4m_spa_select, n4m_cars_select, n4m_random_frog_select, n4m_scars_select, n4m_ga_select, n4m_shaving_select, n4m_bve_select, n4m_t2_select, n4m_wvc_select, n4m_wvc_threshold_select, n4m_emcuve_select, n4m_randomization_select, n4m_bipls_select, n4m_sipls_select, n4m_rep_select, n4m_ipw_select, n4m_st_select

2026-05-30 — ABI 1.10.0: additive RNG-kind config selector

Two additive public symbols (ABI MINOR bump 1.9.0 → 1.10.0), backward-compatible (no signature/layout change, nothing removed):

  • n4m_config_set_rng_kind(n4m_config_t*, n4m_rng_kind_t)

  • n4m_config_get_rng_kind(const n4m_config_t*, n4m_rng_kind_t*)

New enum n4m_rng_kind_t { N4M_RNG_SPLITMIX64=0 (default), N4M_RNG_PCG64=1, N4M_RNG_MT_R=2, N4M_RNG_NUMPY_MT=3 } selects the RNG engine a stochastic method draws from, so its output can match an external reference library’s exact RNG (numpy default_rng / base R / numpy RandomState) for parity. Default SPLITMIX64 reproduces n4m’s historical streams bit-for-bit — leaving it unset changes nothing. Snapshots regenerated for all three platforms (expected_symbols_{linux,macos,windows}.txt). Engines verified bit-exact: docs/dev/RNG_TIER0_INVENTORY.md, cpp/tests/test_rng_engine.cpp.