iriv_select — IRIV — Iteratively Retaining Informative Variables¶
Group: Variable selector · Registry tolerance: 1e-06
Description¶
Iteratively Retains Informative Variables (Phase 51)
From the pls4all.sklearn.IRIVSelector docstring:
IRIV — Iteratively Retains Informative Variables (Yun 2014).
Registry note — NumPy port of libPLS
iriv(Yun 2014). Mann-Whitney U test viascipy.stats.mannwhitneyu. Default_iriv_select_pls4allpath invokes the same NumPy function withnp.random.default_rng(seed), giving bit-exact mask parity. The C++ splitmix64 kernel is opt-in vialegacy=True.
Parameters¶
Name |
Type |
Default |
Notes |
|---|---|---|---|
|
|
|
Number of latent components extracted (k). |
|
|
|
Maximum rounds of strongly/weakly informative reclassification. |
|
|
|
Number of cross-validation folds used inside the selector. |
|
|
|
Random seed for reproducible sampling/initialization. |
|
|
|
registry benchmark cell value |
Explanations¶
Bibliographic source¶
Yun, Y. H., Wang, W. T., Tan, M. L., Liang, Y. Z., Li, H. D., Cao, D. S., Lu, H. M. & Xu, Q. S. (2014). A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration. Analytica Chimica Acta 807, 36–43.
Mathematical principle¶
IRIV classifies each variable into four categories at each iteration: strongly informative, weakly informative, uninformative, interfering. The first two are retained, the last two eliminated. Iteration continues until no further interfering variables remain.
Categories are determined by Monte-Carlo subset analysis with a permutation-based reference distribution: each variable’s CV-RMSE contribution distribution is compared against the distribution under random subset inclusion. This four-way classification is more nuanced than single-threshold methods and tends to handle correlated predictors well (correlated features can both be ‘weakly informative’).
Implementation¶
n4m_iriv_select.
Usage¶
Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in benchmarks.parity_timing.registry. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN pls package (plsr, pcr, mvr) and for the mdatools::pls(x, y, ...) matrix idiom — those tabs appear only on the methods that have a meaningful equivalence.
pls4all bindings
/* C ABI — libn4m */
n4m_context_t* ctx = n4m_context_create();
n4m_config_t* cfg = n4m_config_create();
n4m_method_result_t* res = NULL;
n4m_iriv_select_fit(ctx, cfg, &x_view, &y_view, /* hyperparams */, &res);
/* … read coefficients / mask / scores via */
/* n4m_method_result_get_double_matrix / vector / scalar … */
n4m_method_result_destroy(res);
n4m_config_destroy(cfg);
n4m_context_destroy(ctx);
import pls4all
from pls4all._methods import iriv_select_fit
with pls4all.Context() as ctx, pls4all.Config() as cfg:
res = iriv_select_fit(ctx, cfg, X, y, n_components=4)
# then: res.matrix("predictions"), res.matrix("coefficients"),
# res.vector("mask"), res.scalar("intercept"), …
from pls4all.sklearn import IRIVSelector
mdl = IRIVSelector(n_components=2, max_rounds=5, n_folds=5, seed=0)
mdl.fit(X, y)
y_hat = mdl.predict(X_test)
library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("iriv_select", X, y,
n_components = 4L, params = list(max_rounds = 3L, fold = 3L, seed = 11L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.
res = pls4all.iriv_select(X, y, 4);
% see header of bindings/matlab/+pls4all/iriv_select.m for full
% parameter surface:
% res = iriv_select(X, Y, n_components, max_rounds, seed)
yhat = predict(res, Xtest);
No idiomatic classdef wrapper — invoke pls4all.fit("iriv_select", X, y, …) directly from the unified MEX factory.
Registry parity references 📐
📐
ref.python_iriv_numpy_port(python · python) —iriv_numpy_port1.0.0 · strict (rmse_rel ≤ 1e-06) — NumPy port of libPLSiriv(Yun 2014). Mann-Whitney U test viascipy.stats.mannwhitneyu; binary-matrix sampler keyed tonp.random.default_rng(seed)for bit-exact reproducibility.
Benchmarks¶
Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.
Verdict · ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance · ✓ bind = pls4all binding agrees with the C++ baseline · ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle · ✗ divergent · ⚠ error · — not run. The fastest backend per column is marked 🏆.
Reference gate: strict — numeric equivalence (rmse_rel_tol ≤ 1e-06).
Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.
| Backend | Parity | 80×25 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ J 1.00 | 268.0 ms |
| Python · pls4all | ||
pls4all.python | ✓ J 1.00 | 270.2 ms |
pls4all.sklearn | ⇄ J 0.60 | 25.3 ms🏆 |
| R · pls4all | ||
pls4all.R | ⇄ J 0.60 | 29.4 ms |
pls4all.R.formula | ⇄ J 0.60 | 30.0 ms |
pls4all.R.mdatools | ⇄ J 0.60 | 30.8 ms |
pls4all.R.pls | ⇄ J 0.60 | 30.5 ms |
| Python · external | ||
📐ref.python_iriv_numpy_port | source | 274.8 ms |
| Backend | Parity | 80×25 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ J 1.00 | 273.2 ms |
| Python · pls4all | ||
pls4all.python | ✓ J 1.00 | 267.2 ms |
pls4all.sklearn | ⇄ J 0.60 | 25.1 ms🏆 |
| R · pls4all | ||
pls4all.R | ⇄ J 0.60 | 29.8 ms |
pls4all.R.formula | ⇄ J 0.60 | 30.8 ms |
pls4all.R.mdatools | ⇄ J 0.60 | 30.6 ms |
pls4all.R.pls | ⇄ J 0.60 | 30.3 ms |
| Python · external | ||
📐ref.python_iriv_numpy_port | source | 275.1 ms |
| Backend | Parity | 80×25 (ms) |
|---|---|---|
| C++ native · libn4m | ||
pls4all.cpp.blas+omp | ✓ J 1.00 | 274.2 ms |
| Python · pls4all | ||
pls4all.python | ✓ J 1.00 | 275.4 ms |
pls4all.sklearn | ⇄ J 0.60 | 25.4 ms🏆 |
| R · pls4all | ||
pls4all.R | ⇄ J 0.60 | 29.9 ms |
pls4all.R.formula | ⇄ J 0.60 | 30.7 ms |
pls4all.R.mdatools | ⇄ J 0.60 | 31.0 ms |
pls4all.R.pls | ⇄ J 0.60 | 30.7 ms |
| Python · external | ||
📐ref.python_iriv_numpy_port | source | 270.7 ms |
See also: benchmark overview · methods index · interactive dashboard