# `aom_robust_hpo` - native AOM robust-HPO preprocessing screen _Group_: **Diagnostic / AOM** ยท _ABI_: `n4m_aom_robust_hpo_fit` ## Description `aom_robust_hpo` screens a fixed bank of strict-linear spectral preprocessing chains and selects the best Ridge or PLS head by contiguous K-fold CV RMSE. It is intended for fast, reproducible preprocessing-candidate campaigns where the user wants the candidate-score table, the selected chain/head/parameter, and a reusable linear prediction surface in the original input feature space. Native v1 deliberately excludes stateful or sample-fitted preprocessings (`SNV`, `MSC`, `EMSC`, `ASLS`, etc.). Those remain available in the Python sklearn estimator `AOMRobustHPORegressor`, which fits each chain fold-locally. ## Backend Status The public method is a native C ABI method and builds in both the regular CPU and CUDA-enabled `libn4m` configurations. The preprocessing bank and Ridge head are strict CPU kernels. The PLS head goes through the existing native PLS model path, so a CUDA build can use the library's configured accelerated linear algebra path where available. This is not yet the lab-scale batched 200k-chain GPU grinder. It is the catalogued product method: deterministic, source-free, ABI-stable, and suitable for compact/wide preprocessing selection from Python or C. ## Parameters | Name | Type | Default | Notes | |------|------|---------|-------| | `profile` | `int` | `0` | `0=compact`, `1=wide` | | `cv` | `int` | `5` | Contiguous folds, clipped to `n_samples` | | `heads_mask` | `int` | `3` | Bitmask: `1=Ridge`, `2=PLS`, `3=both` | ## Result The C ABI returns `n4m_method_result_t` with: | Key | Shape | Meaning | |-----|-------|---------| | `predictions` | `n_samples x 1` | In-sample predictions after refitting the selected candidate | | `coefficients_transformed` | `n_features x 1` | Linear coefficients in the selected transformed feature space | | `input_coefficients` | `n_input_features x 1` | Selected transformed-space coefficients folded back into the original input feature space | | `intercept` | `1 x 1` | Fitted intercept | | `candidate_scores` | `n_candidates x 4` | `chain_id`, `head_id`, `param`, `mean_cv_rmse` | Scalar diagnostics: `selected_chain_id`, `selected_head_id`, `selected_param`, `selected_cv_rmse`, `n_chains`, `n_candidates`, `profile`, `cv`, `n_samples`, `n_features`, `n_features_transformed`, `n_targets`. The selected model can be replayed on any compatible input matrix as: ```python y_hat = X @ res["input_coefficients"] + res["intercept"] ``` ## Python Usage ```python import numpy as np import n4m rng = np.random.default_rng(7) X = rng.standard_normal((64, 256)) y = X[:, 8] - 0.4 * X[:, 19] + 0.05 * rng.standard_normal(64) res = n4m.aom_robust_hpo(X, y, profile="compact", cv=5, heads=("ridge", "pls")) print(res["selected_chain_id"], res["selected_head_id"], res["selected_cv_rmse"]) print(res["candidate_scores"][:5]) np.testing.assert_allclose( X @ res["input_coefficients"] + res["intercept"], res["predictions"], ) ``` The native sklearn wrapper uses the same folded coefficients: ```python model = n4m.NativeAOMRobustHPORegressor(profile="compact", cv=5).fit(X, y) pred = model.predict(X_new) diag = model.get_diagnostics() ``` ## C ABI Usage ```c n4m_context_t* ctx = NULL; n4m_config_t* cfg = NULL; n4m_method_result_t* res = NULL; n4m_context_create(&ctx); n4m_config_create(&cfg); n4m_aom_robust_hpo_fit(ctx, cfg, &x_view, &y_view, /*profile=*/0, /*cv=*/5, /*heads_mask=*/3, &res); const double* scores = NULL; int64_t rows = 0, cols = 0; n4m_method_result_get_double_matrix(res, "candidate_scores", &scores, &rows, &cols); n4m_method_result_destroy(res); n4m_config_destroy(cfg); n4m_context_destroy(ctx); ``` ## Native Profiles `compact` includes raw, detrend degree 1/2, six Savitzky-Golay-style smooth/derivative variants, Norris-Williams, finite difference, and a few strict-linear compositions. Compact `chain_id` mapping: | ID | Chain | |----|-------| | 0 | `raw` | | 1 | `detrend1` | | 2 | `detrend2` | | 3 | `savgol_w5_p2_d0` | | 4 | `savgol_w7_p2_d0` | | 5 | `savgol_w7_p2_d1` | | 6 | `savgol_w11_p2_d2` | | 7 | `nw_s5_g5_d1` | | 8 | `finite_diff1` | | 9 | `detrend1_savgol_w7_p2_d1` | | 10 | `detrend1_nw_s5_g5_d1` | | 11 | `savgol_w5_p2_d0_finite_diff1` | `wide` has 31 chains. It adds larger Savitzky-Golay windows, more Norris-Williams variants, second finite difference, Gaussian/FCK variants, Whittaker smoothing, and additional strict-linear compositions. ## Benchmarks Timing script: `benchmarks/cross_binding/bench_aom_robust_hpo_timing.py`. Latest checked-in CSV: `benchmarks/cross_binding/aom_robust_hpo_timing.csv`. CUDA-build native smoke timing can be regenerated with: ```bash PYTHONPATH=bindings/python/src \ N4M_LIB_PATH=build/cuda-on/cpp/src/libn4m.so \ python benchmarks/cross_binding/bench_aom_robust_hpo_timing.py \ --native-only \ --output benchmarks/cross_binding/aom_robust_hpo_timing_cuda_smoke.csv ``` The checked-in compact smoke timing on ABI `1.16.0` shows the native ABI path and the Python sklearn preset selecting from the same 84 compact candidates. CPU medians were 3.14 ms for 32 x 64, 16.33 ms for 64 x 128, and 60.47 ms for 96 x 256. The Python sklearn reference wrapper took 29.03 ms, 49.74 ms, and 263.59 ms on the same cells. The CUDA-build native smoke medians were 506.16 ms, 313.89 ms, and 245.42 ms on those cells. This validates the CUDA-enabled build path; it is not evidence of fused GPU acceleration for compact AOM robust-HPO.