# `moment_stack` — OOF stack over native moment heads _Group_: **Ensemble** · _Registry tolerance_: `n/a` ## Description `NativeMomentStackRegressor` builds a small linear meta-model over out-of-fold predictions from native moment-compatible base models: - Ridge (`NativeRidgeRegressor`) - PLS via `NativeMomentSweepRegressor(heads=("pls", ...))` - PCR (`NativePCRRegressor`) - Continuum regression (`NativeContinuumRegressionRegressor`) - ECR (`NativeECRRegressor`) - CPPLS (`NativeCPPLSRegressor`) The base models are fit only on training folds when producing OOF predictions. The final reusable estimator refits the same base models on all training rows, then applies a small Ridge meta-model on their predictions. This is intentionally still a strict moment-model composition. It does not add RFF/RBF lifts, trees, neural models, TabPFN, transformed-spectrum stacking, or dataset-name routing. ## Parameters | Name | Type | Default | Notes | |------|------|---------|-------| | `base_models` | sequence[str] | `("ridge", "pls", "pcr", "continuum", "ecr", "cppls")` | Allowed bases. `continuum_regression` aliases `continuum`. | | `cv` | `int` | `5` | Outer OOF folds used for the meta-model. | | `fold_ids` | array-like or `None` | `None` | Explicit train-only fold ids. | | `inner_cv` | `int` | `3` | Inner CV for the PLS sweep base. | | `meta_alpha` | `float` | `1e-6` | Ridge penalty for the meta-model. | | `ridge_alpha` | `float` | `0.1` | Ridge base penalty. | | `n_components` | `int` | `3` | Component count for PCR/continuum/ECR/CPPLS and max PLS grid when `pls_components=None`. | | `pls_components` | sequence[int] or `None` | `None` | Explicit PLS component grid. | | `cppls_gamma` | `float` | `0.5` | CPPLS gamma. | | `continuum_tau` | `float` | `0.5` | Continuum tau. | | `ecr_alpha` | `float` | `0.5` | ECR alpha. | | `center_x`, `scale_x`, `center_y`, `scale_y` | bool or `None` | mixed | Forwarded to bases that expose those options. | | `cuda_pls_parallel_folds`, `cuda_pls_min_device_features`, `cuda_pls_many_batched` | optional | `None` | Forwarded to the PLS sweep base. | ## Usage ```python import n4m from n4m.sklearn import NativeMomentStackRegressor model = n4m.moment_stack( X_train, y_train, base_models=("ridge", "pcr", "pls"), cv=5, inner_cv=3, n_components=3, scale_x=False, ) same_model_type = NativeMomentStackRegressor( base_models=("ridge", "pcr", "pls"), cv=5, inner_cv=3, n_components=3, scale_x=False, ).fit(X_train, y_train) y_pred = model.predict(X_test) diagnostics = model.get_diagnostics() ``` Key fitted attributes: - `base_model_names_` - `base_models_` - `base_oof_predictions_` - `oof_predictions_` - `meta_coefficients_` - `intercept_` - `oof_rmse_` - `rmse_` - `base_oof_diagnostics_` - `base_final_diagnostics_` `get_diagnostics()` includes aggregate PLS route counters for both phases, for example `n_base_oof_pls_moment_cuda_device_cv_fits` and `n_base_final_pls_moment_cuda_device_cv_fits`. These counters are audit-only; they do not affect the OOF split, meta-model fit, or production selection. CUDA smoke used for release readiness: ```bash CUDA_VISIBLE_DEVICES=0 \ PYTHONPATH=bindings/python/src \ N4M_LIB_PATH=build/cuda-on/cpp/src/libn4m.so \ /home/delete/.venv/bin/python benchmarks/cross_binding/bench_moment_stack_timing.py \ --output benchmarks/cross_binding/moment_stack_timing_cuda_smoke.csv \ --repeats 1 --shapes 80x1024 --base-models pls --cv 4 --inner-cv 4 \ --n-components 1 --cuda-pls-min-device-features 1 \ --cuda-pls-parallel-folds ``` The smoke should show nonzero OOF and final `n_base_*_pls_moment_cuda_device_cv_fits` and zero corresponding host PLS CV fits. ## Validation Covered by `bindings/python/tests/test_moment_model_wrappers.py` and `benchmarks/cross_binding/bench_moment_stack_timing.py`.