`iriv_select` — IRIV — Iteratively Retaining Informative Variables¶

Group: Variable selector · Registry tolerance: 1e-06

Description¶

Iteratively Retains Informative Variables (Phase 51)

From the pls4all.sklearn.IRIVSelector docstring:

IRIV — Iteratively Retains Informative Variables (Yun 2014).

Registry note — NumPy port of libPLS iriv (Yun 2014). Mann-Whitney U test via scipy.stats.mannwhitneyu. Default _iriv_select_pls4all path invokes the same NumPy function with np.random.default_rng(seed), giving bit-exact mask parity. The C++ splitmix64 kernel is opt-in via legacy=True.

Parameters¶

Name	Type	Default	Notes
`n_components`	`int`	`2`	Number of latent components extracted (k).
`max_rounds`	`int`	`5`	Maximum rounds of strongly/weakly informative reclassification.
`n_folds`	`int`	`5`	Number of cross-validation folds used inside the selector.
`seed`	`int`	`0`	Random seed for reproducible sampling/initialization.
`fold`	`int`	`3`	registry benchmark cell value

Explanations¶

Bibliographic source¶

Yun, Y. H., Wang, W. T., Tan, M. L., Liang, Y. Z., Li, H. D., Cao, D. S., Lu, H. M. & Xu, Q. S. (2014). A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration. Analytica Chimica Acta 807, 36–43.

Mathematical principle¶

IRIV classifies each variable into four categories at each iteration: strongly informative, weakly informative, uninformative, interfering. The first two are retained, the last two eliminated. Iteration continues until no further interfering variables remain.

Categories are determined by Monte-Carlo subset analysis with a permutation-based reference distribution: each variable’s CV-RMSE contribution distribution is compared against the distribution under random subset inclusion. This four-way classification is more nuanced than single-threshold methods and tends to handle correlated predictors well (correlated features can both be ‘weakly informative’).

Implementation¶

n4m_feature_selection_iriv_select.

R roxygen note (methods_extra.R::iriv_select):

IRIV — Iteratively Retains Informative Variables. @param n_components Integer. Number of latent components. @param max_rounds Method-specific parameter. See the underlying *_fit() function for the exact semantics. @param seed Integer. Random seed for reproducibility. @param X Numeric matrix of predictors (rows = samples, cols = features). @param Y Numeric matrix or vector of responses, with one row per sample. @export

Usage¶

Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in benchmarks.parity_timing.registry. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN pls package (plsr, pcr, mvr) and for the mdatools::pls(x, y, ...) matrix idiom — those tabs appear only on the methods that have a meaningful equivalence.

pls4all bindings

C ABI · libn4m

/* C ABI — libn4m */
n4m_context_t* ctx = n4m_context_create();
n4m_config_t*  cfg = n4m_config_create();
n4m_method_result_t* res = NULL;
n4m_feature_selection_iriv_select(ctx, cfg, &x_view, &y_view, /* hyperparams */, &res);
/* … read coefficients / mask / scores via */
/* n4m_method_result_get_double_matrix / vector / scalar … */
n4m_method_result_destroy(res);
n4m_config_destroy(cfg);
n4m_context_destroy(ctx);

Python · pls4all (raw)

import pls4all
from pls4all._methods import iriv_select_fit
with pls4all.Context() as ctx, pls4all.Config() as cfg:
    res = iriv_select_fit(ctx, cfg, X, y, n_components=4)
# then: res.matrix("predictions"), res.matrix("coefficients"),
# res.vector("mask"), res.scalar("intercept"), …

Python · pls4all.sklearn

from pls4all.sklearn import IRIVSelector
mdl = IRIVSelector(n_components=2, max_rounds=5, n_folds=5, seed=0)
mdl.fit(X, y)
y_hat = mdl.predict(X_test)

R · pls4all_method()

library(pls4all)
# Unified low-level dispatcher (May 2026 R cleanup):
res <- pls4all_method("iriv_select", X, y,
                      n_components = 4L, params = list(max_rounds = 3L, fold = 3L, seed = 11L))
# res is a named list with MethodResult arrays/scalars.
# selected_indices / top_k_intervals are 1-based.

R · pls4all (raw fn)

library(pls4all)
res  <- iriv_select(X, Y, n_components, max_rounds = 20L, seed = 0L)
yhat <- pls4all_predict(res, X_test)

MATLAB · pls4all (MEX)

res = pls4all.iriv_select(X, y, 4);
% see header of bindings/matlab/+pls4all/iriv_select.m for full
% parameter surface:
%   res = iriv_select(X, Y, n_components, max_rounds, seed)
yhat = predict(res, Xtest);

MATLAB · pls4all (classdef)

No idiomatic classdef wrapper — invoke pls4all.fit("iriv_select", X, y, …) directly from the unified MEX factory.

Registry parity references 📐

📐 ref.python_iriv_numpy_port (python · python) — iriv_numpy_port 1.0.0 · strict (rmse_rel ≤ 1e-06) — NumPy port of libPLS iriv (Yun 2014). Mann-Whitney U test via scipy.stats.mannwhitneyu; binary-matrix sampler keyed to np.random.default_rng(seed) for bit-exact reproducibility.

Benchmarks¶

Adaptive wall-clock per cell measured against full_matrix.csv. Only backends that implement this method are listed; libraries without the method are omitted.

Verdict · ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance · ✓ bind = pls4all binding agrees with the C++ baseline · ⇄ cross-check = documented by-design selector/RNG/model, noncanonical API/facade convention, or secondary oracle · ✗ divergent · ⚠ error · — not run. The fastest backend per column is marked 🏆.

Reference gate: strict — numeric equivalence (rmse_rel_tol ≤ 1e-06).

Rows tagged with 📐 are the canonical parity references for this method (declared in parity_timing.registry). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band.

1 thread

Backend	Parity	80×25 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ J 1.00	268.0 ms
Python · pls4all
`pls4all.python`	✓ J 1.00	270.2 ms
`pls4all.sklearn`	⇄ J 0.60	25.3 ms🏆
R · pls4all
`pls4all.R`	⇄ J 0.60	29.4 ms
`pls4all.R.formula`	⇄ J 0.60	30.0 ms
`pls4all.R.mdatools`	⇄ J 0.60	30.8 ms
`pls4all.R.pls`	⇄ J 0.60	30.5 ms
Python · external
📐`ref.python_iriv_numpy_port`	source	274.8 ms

3 threads

Backend	Parity	80×25 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ J 1.00	273.2 ms
Python · pls4all
`pls4all.python`	✓ J 1.00	267.2 ms
`pls4all.sklearn`	⇄ J 0.60	25.1 ms🏆
R · pls4all
`pls4all.R`	⇄ J 0.60	29.8 ms
`pls4all.R.formula`	⇄ J 0.60	30.8 ms
`pls4all.R.mdatools`	⇄ J 0.60	30.6 ms
`pls4all.R.pls`	⇄ J 0.60	30.3 ms
Python · external
📐`ref.python_iriv_numpy_port`	source	275.1 ms

10 threads

Backend	Parity	80×25 (ms)
C++ native · libn4m
`pls4all.cpp.blas+omp`	✓ J 1.00	274.2 ms
Python · pls4all
`pls4all.python`	✓ J 1.00	275.4 ms
`pls4all.sklearn`	⇄ J 0.60	25.4 ms🏆
R · pls4all
`pls4all.R`	⇄ J 0.60	29.9 ms
`pls4all.R.formula`	⇄ J 0.60	30.7 ms
`pls4all.R.mdatools`	⇄ J 0.60	31.0 ms
`pls4all.R.pls`	⇄ J 0.60	30.7 ms
Python · external
📐`ref.python_iriv_numpy_port`	source	270.7 ms

nirs4all-methods

Navigation

`iriv_select` — IRIV — Iteratively Retaining Informative Variables¶

Description¶

Parameters¶

Explanations¶

Bibliographic source¶

Mathematical principle¶

Implementation¶

Usage¶

Benchmarks¶

iriv_select — IRIV — Iteratively Retaining Informative Variables¶

Description¶

Parameters¶

Explanations¶

Bibliographic source¶

Mathematical principle¶

Implementation¶

Usage¶

Benchmarks¶

`iriv_select` — IRIV — Iteratively Retaining Informative Variables¶