iriv_select — IRIV — Iteratively Retaining Informative Variables

Group: Variable selector · Registry tolerance: 1e-06

Description

Iteratively Retains Informative Variables (Phase 51)

From the pls4all.sklearn.IRIVSelector docstring:

IRIV — Iteratively Retains Informative Variables (Yun 2014).

Registry note — NumPy port of libPLS iriv (Yun 2014). Mann-Whitney U test via scipy.stats.mannwhitneyu. Default _iriv_select_pls4all path invokes the same NumPy function with np.random.default_rng(seed), giving bit-exact mask parity. The C++ splitmix64 kernel is opt-in via legacy=True.

Parameters

Name

Type

Default

Notes

n_components

int

2

Number of latent components extracted (k).

max_rounds

int

5

Maximum rounds of strongly/weakly informative reclassification.

n_folds

int

5

Number of cross-validation folds used inside the selector.

seed

int

0

Random seed for reproducible sampling/initialization.

fold

int

3

registry benchmark cell value

Explanations

Bibliographic source

Yun, Y. H., Wang, W. T., Tan, M. L., Liang, Y. Z., Li, H. D., Cao, D. S., Lu, H. M. & Xu, Q. S. (2014). A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration. Analytica Chimica Acta 807, 36–43.

Mathematical principle

IRIV classifies each variable into four categories at each iteration: strongly informative, weakly informative, uninformative, interfering. The first two are retained, the last two eliminated. Iteration continues until no further interfering variables remain.

Categories are determined by Monte-Carlo subset analysis with a permutation-based reference distribution: each variable’s CV-RMSE contribution distribution is compared against the distribution under random subset inclusion. This four-way classification is more nuanced than single-threshold methods and tends to handle correlated predictors well (correlated features can both be ‘weakly informative’).

Implementation

n4m_iriv_select.

Usage

Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in benchmarks.parity_timing.registry. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN pls package (plsr, pcr, mvr) and for the mdatools::pls(x, y, ...) matrix idiom — those tabs appear only on the methods that have a meaningful equivalence.

pls4all bindings

/* C ABI — libn4m */
n4m_context_t* ctx = n4m_context_create();
n4m_config_t*  cfg = n4m_config_create();
n4m_method_result_t* res = NULL;
n4m_iriv_select_fit(ctx, cfg, &x_view, &y_view, /* hyperparams */, &res);
/* … read coefficients / mask / scores via */
/* n4m_method_result_get_double_matrix / vector / scalar … */
n4m_method_result_destroy(res);
n4m_config_destroy(cfg);
n4m_context_destroy(ctx);
import pls4all
from pls4all._methods import iriv_select_fit
with pls4all.Context() as ctx, pls4all.Config() as cfg:
    res = iriv_select_fit(ctx, cfg, X, y, n_components=4)
# then: res.matrix("predictions"), res.matrix("coefficients"),
# res.vector("mask"), res.scalar("intercept"), …
from pls4all.sklearn import IRIVSelector
mdl = IRIVSelector(n_components=2, max_rounds=5, n_folds=5, seed=0)
mdl.fit(X, y)
y_hat = mdl.predict(X_test)
library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("iriv_select", X, y,
                      n_components = 4L, params = list(max_rounds = 3L, fold = 3L, seed = 11L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.
res = pls4all.iriv_select(X, y, 4);
% see header of bindings/matlab/+pls4all/iriv_select.m for full
% parameter surface:
%   res = iriv_select(X, Y, n_components, max_rounds, seed)
yhat = predict(res, Xtest);

No idiomatic classdef wrapper — invoke pls4all.fit("iriv_select", X, y, …) directly from the unified MEX factory.

Registry parity references 📐

  • 📐 ref.python_iriv_numpy_port (python · python) — iriv_numpy_port 1.0.0 · strict (rmse_rel ≤ 1e-06) — NumPy port of libPLS iriv (Yun 2014). Mann-Whitney U test via scipy.stats.mannwhitneyu; binary-matrix sampler keyed to np.random.default_rng(seed) for bit-exact reproducibility.

Benchmarks

Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.

Verdict  ·  ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance  ·  ✓ bind = pls4all binding agrees with the C++ baseline  ·  ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle  ·  ✗ divergent  ·  ⚠ error  ·  — not run. The fastest backend per column is marked 🏆.

Reference gate: strict — numeric equivalence (rmse_rel_tol 1e-06).

Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.

BackendParity80×25 (ms)
C++ native · libn4m
pls4all.cpp.blas+omp✓ J 1.00268.0 ms
Python · pls4all
pls4all.python✓ J 1.00270.2 ms
pls4all.sklearn⇄ J 0.6025.3 ms🏆
R · pls4all
pls4all.R⇄ J 0.6029.4 ms
pls4all.R.formula⇄ J 0.6030.0 ms
pls4all.R.mdatools⇄ J 0.6030.8 ms
pls4all.R.pls⇄ J 0.6030.5 ms
Python · external
📐ref.python_iriv_numpy_portsource274.8 ms
BackendParity80×25 (ms)
C++ native · libn4m
pls4all.cpp.blas+omp✓ J 1.00273.2 ms
Python · pls4all
pls4all.python✓ J 1.00267.2 ms
pls4all.sklearn⇄ J 0.6025.1 ms🏆
R · pls4all
pls4all.R⇄ J 0.6029.8 ms
pls4all.R.formula⇄ J 0.6030.8 ms
pls4all.R.mdatools⇄ J 0.6030.6 ms
pls4all.R.pls⇄ J 0.6030.3 ms
Python · external
📐ref.python_iriv_numpy_portsource275.1 ms
BackendParity80×25 (ms)
C++ native · libn4m
pls4all.cpp.blas+omp✓ J 1.00274.2 ms
Python · pls4all
pls4all.python✓ J 1.00275.4 ms
pls4all.sklearn⇄ J 0.6025.4 ms🏆
R · pls4all
pls4all.R⇄ J 0.6029.9 ms
pls4all.R.formula⇄ J 0.6030.7 ms
pls4all.R.mdatools⇄ J 0.6031.0 ms
pls4all.R.pls⇄ J 0.6030.7 ms
Python · external
📐ref.python_iriv_numpy_portsource270.7 ms

See also: benchmark overview · methods index · interactive dashboard