aom_staged_chain_campaign - staged strict-chain cartesian screen/refit¶
Group: AOM / moment campaign · Catalog: aom_pop.aom_staged_chain_campaign
· Backed by: aom_chain_screen_refit_campaign (pure Python orchestration over
the single libn4m runtime)
Description¶
aom_staged_chain_campaign is the first-class staged-cartesian workflow for AOM /
moment preprocessing selection. It runs several score-only strict-linear
preprocessing screens in sequence — the compact, wide and lab profiles,
focused family plans such as savgol_focus / strict_family_focus, or an
explicit stages list mixing profiles and Ridge / PLS / mixed head plans —
then:
merges the per-stage retained candidates (deduplicated by chain/head/param, keeping the best screen score and recording every stage a candidate came from);
keeps the top global and top per-head rows across all stages;
exact-CV refits that retained union exactly once; and
attaches preprocessing-impact and screen-vs-refit rank diagnostics, and an optional offline holdout audit.
It does not add any new numerical kernel: every fit flows through the existing
aom_chain_score_campaign (screen), aom_refit_candidates (exact-CV refit),
aom_candidate_preprocessing_impact, and aom_candidate_rank_diagnostics
helpers, which all call the single libn4m C/CUDA runtime. The orchestrator is
the staged equivalent of the lab proto cartesian.py / impact.py, expressed
over the catalogued n4m moment routes.
Constraints (load-bearing)¶
Strict-linear only. Stages screen the strict-linear AOM chain grids (
build_aom_strict_chain_grid). No hors-moment nonlinear or supervised lifts are introduced.No identity selection. The campaign consumes only
X/yarrays. Stagenamevalues are cosmetic labels; no dataset, source, id or name is ever read or used to pick chains, heads or the winner. Renaming stages cannot change the selected model.Train-only production selection. The production winner is the minimum exact-CV refit row (
selection_metric="refit_cv_rmse",selection_uses_test_set=False). A held-out set, when supplied, is scored only for the offline audit and is never used for selection.Single infrastructure. This is Python orchestration;
libn4mstays the one C/binding engine.
Parameters¶
Name |
Type |
Default |
Notes |
|---|---|---|---|
|
arrays |
— |
Training spectra and target(s); |
|
list | None |
|
Explicit stage list (profile name or override dict). |
|
str |
|
One of |
|
int |
|
Exact CV folds (screen + refit) |
|
seq |
small grids |
Default head grids (per-stage overridable) |
|
seq |
|
Default heads (per-stage overridable) |
|
int |
|
Rows each stage keeps in its own screen |
|
int | None |
|
Global retained rows refit with exact CV |
|
int | None |
|
Extra per-head retained rows ( |
|
path | None |
|
Directory for one resumable score checkpoint per stage |
|
bool |
|
Resume matching stage checkpoints when present |
|
int | None |
|
Limit new chunks processed per stage in this call |
|
seq | None |
|
Optional grid such as |
|
str |
|
|
|
str |
|
Screen / refit moment routing |
|
bool |
|
Toggle the post-hoc audit reports |
|
arrays | None |
|
Optional offline held-out audit |
|
bool |
|
Include the raw per-stage screen reports |
Stage override dict keys: name, profile, heads, ridge_lambdas,
pls_components, top_k, max_chains, families, templates,
pls_score_mode, moment_policy, chain_ordering, split_head_scoring.
Missing keys fall back to the campaign defaults. Unknown keys raise.
Returned report¶
A JSON-friendly dict keyed by report_schema = "n4m.aom_staged_chain_campaign.v1":
Key |
Meaning |
|---|---|
|
Per-stage summaries ( |
|
Exact-CV refit rows (the retained union), each with |
|
Production winner = minimum |
|
Per-head best refit row |
|
Cross-stage deduplicated global screen pool |
|
|
|
Staged resume state; partial screens remain exact-refit-able over currently retained rows |
|
|
|
|
|
Offline holdout report ( |
|
The full |
|
Selection provenance ( |
|
Present when |
rows is directly consumable by NativeAOMFixedCandidateRegressor.from_refit_report.
Python usage¶
import numpy as np
import n4m
rng = np.random.default_rng(7)
X = rng.standard_normal((64, 256))
y = X[:, 8] - 0.4 * X[:, 19] + 0.05 * rng.standard_normal(64)
# Staged compact -> wide -> lab screen over mixed Ridge/PLS heads.
report = n4m.aom_staged_chain_campaign(
X, y,
plan="compact_wide_lab",
cv=5,
refit_top_k=20, # global retained rows
refit_per_head_top_k=5, # extra per-head rows
checkpoint_dir="artifacts/aom_staged_checkpoints",
)
best = report["best"] # production winner, exact-CV on train
print(best["head"], best["param"], best["refit_cv_rmse"])
print(report["retention"]) # how many rows were exact-refit
print(report["impact"]["by_operator"][:3]) # which preprocessing families paid off
print(report["rank_diagnostics"]["spearman_rank_correlation"])
print(report["screen_complete"], report["n_remaining_stage_chunks_total"])
# Materialize the winner with the existing reusable estimator.
model = n4m.NativeAOMFixedCandidateRegressor.from_refit_report(report).fit(X, y)
y_hat = model.predict(X)
Model-configuration grid selection, still train-only:
report = n4m.aom_staged_chain_campaign(
X, y,
plan="compact",
scale_x_values=[False, True],
refit_top_k=12,
refit_per_head_top_k=2,
)
assert report["selection_uses_test_set"] is False
print(report["selected_model_config"]) # {"scale_x": True/False, ...}
print(report["model_config_summaries"])
When a model-config grid is used, rows and best come from the selected
config, while route/candidate counters such as n_screen_pls_moment_cv_fits
sum the work paid across every config.
Focused preprocessing-family plans, useful when max_chains is intentionally
small. Start with savgol_focus for the fast incremental campaign; use
strict_family_focus as a heavier family-audit profile because Gaussian/FCK /
Whittaker stages can dominate wall time on some datasets.
# Prioritize SavGol diversity instead of waiting for late lab-profile entries.
report = n4m.aom_staged_chain_campaign(
X, y,
plan="savgol_focus",
max_chains=8, # applied per focused stage
refit_top_k=12,
scale_x_values=[False, True],
)
# Also force strict Gaussian/FCK/Whittaker stages to be screened early.
report = n4m.aom_staged_chain_campaign(
X, y,
plan="strict_family_focus",
max_chains=8,
refit_top_k=12,
)
These plans are fixed source-free stage recipes over the existing strict-linear lab families. They do not read dataset/source/name/id metadata and they still select the final row only by train exact-CV refit.
Sklearn estimator form (same train-CV selection, no held-out audit inputs):
model = n4m.NativeAOMStagedChainCampaignRegressor(
plan="compact_wide_lab",
cv=5,
refit_top_k=20,
refit_per_head_top_k=5,
checkpoint_dir="artifacts/aom_staged_checkpoints",
).fit(X, y)
y_hat = model.predict(X)
diag = model.get_diagnostics()
assert diag["selection_uses_test_set"] is False
print(model.selected_head_, model.selected_param_, model.selected_cv_rmse_)
Fast SavGol-focused reusable preset:
model = n4m.NativeAOMSavgolFocusRegressor(
cv=5,
checkpoint_dir="artifacts/aom_savgol_focus_checkpoints",
).fit(X, y)
diag = model.get_diagnostics()
assert diag["plan"] == "savgol_focus"
assert diag["selection_uses_test_set"] is False
print(diag["selected_stage"], diag["selected_model_config"])
The preset delegates to the same staged campaign engine with
plan="savgol_focus", max_chains=6, top_k=10, refit_top_k=8,
refit_per_head_top_k=2, scale_x_values=[False, True] and
split_head_scoring="auto" by default. On a
CUDA build it also defaults to the one-GPU PLS route knobs used in the local
benchmark (cuda_pls_parallel_folds=True, cuda_pls_min_device_features=1,
backend_min_cuda_product=1). Use the generic staged estimator when you need
custom plans or family templates.
Cost-safe strict-family audit preset:
model = n4m.NativeAOMStrictFamilyLiteRegressor(
cv=5,
checkpoint_dir="artifacts/aom_strict_family_lite_checkpoints",
).fit(X, y)
diag = model.get_diagnostics()
assert diag["plan"] == "strict_family_focus"
assert diag["selection_uses_test_set"] is False
print(diag["selected_stage"], diag["n_refit_candidates"])
NativeAOMStrictFamilyLiteRegressor exercises the broader
strict_family_focus stage recipe, but defaults to a small audit budget:
max_chains=2, top_k=6, refit_top_k=4, refit_per_head_top_k=1,
scale_x=False, no scale_x_values grid and split_head_scoring="auto".
It is meant to sample SavGol,
Norris-Williams, finite-difference, Gaussian, FCK and Whittaker behavior without
paying for the heavier strict-family benchmark profile.
Custom stages (e.g. a Ridge-only compact pass then a PLS-only wide pass):
report = n4m.aom_staged_chain_campaign(
X, y,
stages=[
{"name": "ridge_compact", "profile": "compact", "heads": ("ridge",)},
{"name": "pls_wide", "profile": "wide", "heads": ("pls",),
"pls_score_mode": "gcv_proxy"},
],
refit_per_head_top_k=5,
)
Offline audit (test ranking only — never used for selection):
report = n4m.aom_staged_chain_campaign(
X_train, y_train,
plan="compact_wide",
X_audit=X_test, y_audit=y_test, # offline audit only
)
assert report["audit"]["audit_only"] is True
assert report["selection_uses_test_set"] is False
# report["best"] is identical whether or not the audit set is supplied.
The same callable is exported from n4m, n4m.aom and n4m.moment.
The reusable sklearn estimator is exported as
NativeAOMStagedChainCampaignRegressor from n4m, n4m.sklearn, n4m.aom
and n4m.moment.
Resume is stage-local: each stage delegates to
aom_chain_score_campaign(..., checkpoint_path=...). With
max_chunks_per_run, the report may have screen_complete=False; the retained
rows available so far are still exact-CV refit on train and can be reused as a
partial audit. A later call with the same data/configuration and
checkpoint_dir resumes the remaining chunks.
Reused building blocks¶
Step |
Helper |
|---|---|
Strict-chain grids |
|
Per-stage screen |
|
Global + per-head retention |
|
Exact-CV refit |
|
Preprocessing impact |
|
Screen-vs-refit rank |
|
Offline audit |
|
Report save/load |
|
Workflow / benchmark note¶
This is the staged runner the AOM benchmark campaign calls for: run
compact/wide/lab strict-chain screens with Ridge, PLS and mixed heads, retain the
top global and per-head candidates, exact-refit the retained rows, and read
impact / rank_diagnostics to decide whether a wider cartesian budget is
justified before comparing against the robust AOM / TabPFN baselines. Because
each stage screens in chunks of chain_chunk_size and only keeps its own
top_k, large lab cartesians stream rather than materialize every scored
candidate.
Timing smoke¶
benchmarks/cross_binding/bench_aom_staged_chain_campaign_timing.py records the
end-to-end wall-clock cost of the staged campaign on synthetic data, one CSV row
per --plans entry, with the retention / selection / impact / rank-diagnostics
and refit route / CUDA counters pulled from the report. It accepts the small
screen controls (--plans, --max-chains, --top-k, --refit-top-k,
--refit-per-head-top-k, --chain-chunk-size, --heads, --components,
--ridge-lambdas, --repeats) and the campaign’s CPU/GPU knobs
(--cuda-pls-parallel-folds, --cuda-pls-min-device-features,
--cuda-pls-many-batched, --backend-min-cuda-product, --moment-policy).
This measures orchestration and exact-refit timing only — the staged screen, cross-stage retention, the single exact-CV refit of the retained union, and the post-hoc diagnostics. It is not a benchmark of the future fused IKPLS grinder; the per-candidate exact CV it times is the target that grinder must beat, not the grinder itself.
PYTHONPATH=bindings/python/src N4M_LIB_PATH=build/dev-release/cpp/src/libn4m.so \
python benchmarks/cross_binding/bench_aom_staged_chain_campaign_timing.py \
--plans compact,compact_wide --max-chains 16 --top-k 24 --refit-top-k 8 \
--output /tmp/aom_staged_chain_campaign_timing.csv
Tiny one-GPU CUDA smoke used for release readiness:
CUDA_VISIBLE_DEVICES=0 \
PYTHONPATH=bindings/python/src \
N4M_LIB_PATH=build/cuda-on/cpp/src/libn4m.so \
/home/delete/.venv/bin/python benchmarks/cross_binding/bench_aom_staged_chain_campaign_timing.py \
--output benchmarks/cross_binding/aom_staged_chain_campaign_timing_cuda_smoke.csv \
--repeats 1 --plans compact --n-samples 96 --n-features 128 --cv 3 \
--heads pls --components 1 --ridge-lambdas 0.1 --max-chains 4 \
--chain-chunk-size 2 --top-k 4 --refit-top-k 3 \
--refit-per-head-top-k 1 --moment-policy auto \
--cuda-pls-min-device-features 1 --cuda-pls-parallel-folds
The smoke is expected to keep selection_uses_test_set=False and show nonzero
n_pls_moment_cuda_device_cv_fits with zero host PLS CV fits.