# `pcr` — Principal Components Regression _Group_: **Core PLS** · _Registry tolerance_: `1e-06` ## Description Principal Components Regression From the `pls4all.sklearn.PCR` docstring: > Principal Components Regression — fits a least-squares regression on the SVD of X. > **Registry note** — PCR via SVD on X then linear regression; references are sklearn Pipeline(PCA(svd_solver='full') + LinearRegression) and R `pls::pcr`. ### Parameters | Name | Type | Default | Notes | |------|------|---------|-------| | `n_components` | `int` | `2` | Number of latent components extracted (k). | | `center_x` | `bool` | `True` | Subtract the column mean of X before fitting. | | `scale_x` | `bool` | `True` | Standardize X columns to unit variance before fitting. | | `center_y` | `bool` | `True` | Subtract the column mean of y before fitting. | | `scale_y` | `bool` | `False` | Standardize y columns to unit variance before fitting. | | `tol` | `float` | `1e-06` | Convergence tolerance for iterative solvers (NIPALS / power-iteration). | | `max_iter` | `int` | `500` | Maximum iterations for iterative solvers. | | `store_scores` | `bool` | `False` | If True, keep the latent score matrix (`x_scores_`) after fit. | ## Explanations ### Bibliographic source Massy, W. F. (1965). *Principal Components Regression in Exploratory Statistical Research*. JASA 60(309), 234–256. ### Mathematical principle PCR sidesteps the multicollinearity of $\mathbf{X}$ by regressing on its orthogonal principal-component scores rather than on the raw columns. The factorisation $\mathbf{X} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^{\top}$ (SVD) yields scores $\mathbf{T}_k = \mathbf{U}_k\boldsymbol{\Sigma}_k$ for the top $k$ components, and the regression $\mathbf{Y} = \mathbf{T}_k\mathbf{Q}_k + \mathbf{E}$ is fit by ordinary least squares. Unlike PLS, PCR is **unsupervised in its dimensionality reduction**: the first $k$ directions maximise the variance of $\mathbf{X}$ regardless of how relevant they are to $\mathbf{Y}$. This makes PCR a useful baseline for diagnosing whether the predictive directions in a calibration set really do coincide with the high-variance directions (in which case PCR ≈ PLS) or not (in which case PLS is strictly preferable at the same $k$). Coefficients in the original feature scale are recovered as $\mathbf{B} = \mathbf{V}_k \boldsymbol{\Sigma}_k^{-1} \mathbf{T}_k^{\top}\mathbf{Y}$. Total cost is dominated by the partial SVD: $O(np\min(n,p))$ for a full decomposition, or $O(npk)$ with a truncated method (Lanczos, randomised SVD). ### Implementation `Algorithm.PCR` + `Solver.SVD` in libn4m. Reference implementations are scikit-learn's `Pipeline(PCA(n_components=k), LinearRegression())` and R `pls::pcr`. MATLAB header (`bindings/matlab/+pls4all/PcrRegression.m`): ```text pls4all.PcrRegression — Principal Component Regression model. Example: mdl = pls4all.PcrRegression(X, y, 5); yhat = predict(mdl, Xnew); ``` ### Usage Every pls4all binding tab dispatches into the same C kernel; the external libraries listed at the bottom of the page are the parity references registered in `benchmarks.parity_timing.registry`. Switch tabs to read the same fit in your language. The R package now ships drop-in-compatible facades for the CRAN `pls` package (`plsr`, `pcr`, `mvr`) and for the `mdatools::pls(x, y, ...)` matrix idiom — those tabs appear only on the methods that have a meaningful equivalence. **pls4all bindings** ::::{tab-set} :class: pls4all-bindings :::{tab-item} C ABI · libn4m :sync: c :class-label: lang-c ```c /* C ABI — libn4m direct MethodResult path */ n4m_context_t* ctx = n4m_context_create(); n4m_config_t* cfg = n4m_config_create(); n4m_config_set_n_components(cfg, 4); n4m_method_result_t* res = NULL; n4m_pcr_fit(ctx, cfg, &x_view, &y_view, &res); /* res contains coefficients, predictions, x_mean/x_scale, y_mean/y_scale, * weights_w, loadings_p, rotations_r, rmse and n_components. */ n4m_method_result_destroy(res); n4m_config_destroy(cfg); n4m_context_destroy(ctx); ``` ::: :::{tab-item} Python · pls4all (raw) :sync: python-raw :class-label: lang-python ```python import pls4all from pls4all import Algorithm, Solver with pls4all.Context() as ctx, pls4all.Config() as cfg: cfg.algorithm = Algorithm.PCR cfg.solver = Solver.SVD cfg.n_components = 4 with pls4all.Model.fit(ctx, cfg, X, y) as mdl: y_hat = mdl.predict(X_test) ``` ::: :::{tab-item} Python · n4m direct :sync: python-n4m :class-label: lang-python ```python import n4m from n4m.sklearn import NativePCRRegressor res = n4m.pcr(X, y, n_components=4, scale_x=True) mdl = NativePCRRegressor(n_components=4, scale_x=True).fit(X, y) y_hat = mdl.predict(X_test) ``` ::: :::{tab-item} Python · pls4all.sklearn :sync: python-sklearn :class-label: lang-python ```python from pls4all.sklearn import PCR mdl = PCR(n_components=2, center_x=True, scale_x=True, center_y=True, scale_y=False, tol=1e-06, max_iter=500, store_scores=False) mdl.fit(X, y) y_hat = mdl.predict(X_test) ``` ::: :::{tab-item} R · pls4all_method() :sync: r-dispatcher :class-label: lang-r ```r library(pls4all) # Unified low-level dispatcher (May 2026 R cleanup): res <- pls4all_method("pcr", X, y, n_components = 4L) # res is a named list with MethodResult arrays/scalars. # selected_indices / top_k_intervals are 1-based. ``` ::: :::{tab-item} R · `pls` package compat :sync: r-pls-compat :class-label: lang-r ```r library(pls4all) # Drop-in for CRAN `pls::pcr` (same signature). fit <- pcr(y ~ ., ncomp = 4L, data = train, validation = "CV", segments = 10L) yhat <- predict(fit, newdata = test, ncomp = 4L) RMSEP(fit) ``` ::: :::{tab-item} R · `mdatools` compat :sync: r-mdatools :class-label: lang-r ```r library(pls4all) # Drop-in for `mdatools::pls(x, y, ncomp, method = "pcr")`. fit <- pls_mdatools(X, y, ncomp = 4L, method = "pcr", center = TRUE, scale = FALSE) yhat <- predict(fit, newdata = X_test, ncomp = 4L) ``` ::: :::{tab-item} MATLAB · pls4all (MEX) :sync: matlab-mex :class-label: lang-matlab ```matlab res = pls4all.pcr(X, y, 4); % see header of bindings/matlab/+pls4all/pcr.m for full % parameter surface: % [coefs, x_mean, y_mean, predictions] = pcr(X, Y, n_components) yhat = predict(res, Xtest); ``` ::: :::{tab-item} MATLAB · pls4all (classdef) :sync: matlab-classdef :class-label: lang-matlab ```matlab mdl = pls4all.fit("pcr", X, y, "NumComponents", 4); yhat = predict(mdl, Xtest); ``` ::: :::: **Registry parity references** 📐 :::{card} :class-card: external-refs - 📐 **`ref.python_scikit_learn`** (python · python) — `scikit-learn` 1.8.0 · strict (rmse_rel ≤ 1e-06) — sklearn Pipeline(PCA(svd_solver='full') + LinearRegression). - 📐 **`ref.r_pls`** (R · r) — `pls` 2.8.5 · strict (rmse_rel ≤ 1e-06) — R pls::pcr(scale=FALSE). ::: ### Benchmarks Adaptive wall-clock per cell measured against [`full_matrix.csv`](../benchmarks/overview.md). Only backends that implement this method are listed; libraries without the method are omitted. **Verdict**  ·  ✓ ref / ≈ ref / ~ shape mark a reference-gate pass at strict / relaxed / qualitative tolerance  ·  ✓ bind = pls4all binding agrees with the C++ baseline  ·  ✗ divergent  ·  ⚠ error  ·  — not run. The fastest backend per column is marked 🏆. **Reference gate**: strict — numeric equivalence (`rmse_rel_tol ≤ 1e-06`). Rows tagged with **📐** are the canonical parity references for this method (declared in [`parity_timing.registry`](../benchmarks/methodology.md)). C++ and external rows show reference parity; pls4all language bindings show binding parity against the C++ backend. Hover the icon for role and tolerance band. ::::{tab-set} :class: parity-tabs :::{tab-item} 1 thread :sync: threads-1
BackendParity50×250 (ms)100×50 (ms)100×500 (ms)100×2500 (ms)200×50 (ms)250×50 (ms)500×50 (ms)500×500 (ms)500×2500 (ms)2500×50 (ms)2500×500 (ms)2500×2500 (ms)10000×50 (ms)10000×500 (ms)
C++ native · libn4m
pls4all.cpp.blas≈ +9e-12108.6 ms1.86 ms1.2 s195.9 s🏆2.60 ms🏆3.60 ms10.3 ms2.9 s709.4 s🏆46.6 ms3.0 s🏆171.2 ms4.1 s
pls4all.cpp.blas+omp≈ +9e-12107.0 ms1.82 ms🏆1.4 s209.9 s2.77 ms3.34 ms9.78 ms🏆2.8 s🏆742.2 s42.7 ms🏆3.2 s169.0 ms🏆4.0 s🏆
pls4all.cpp.omp≈ +9e-12109.0 ms1.90 ms1.4 s205.8 s2.70 ms3.55 ms10.9 ms2.9 s713.8 s43.0 ms3.3 s173.2 ms5.2 s
pls4all.cpp.ref≈ +9e-12109.0 ms1.97 ms1.2 s🏆212.3 s2.68 ms3.69 ms11.0 ms2.9 s719.9 s49.8 ms3.3 s178.3 ms5.4 s
Python · pls4all
pls4all.python✓ bind108.1 ms2.71 ms3.33 ms🏆
pls4all.sklearn✓ bind108.7 ms3.48 ms4.05 ms
R · pls4all
pls4all.R✓ 2e-112.7 s7.98 ms26.2 ms
pls4all.R.formula✓ 2e-112.7 s12.6 ms28.0 ms
pls4all.R.mdatools✓ 2e-112.7 s10.3 ms28.8 ms
pls4all.R.pls✓ 2e-112.7 s16.4 ms34.9 ms
MATLAB · pls4all
pls4all.matlab✗ +3e+00117.3 ms4.23 ms4.85 ms
pls4all.matlab.classdef✗ +3e+00119.2 ms4.83 ms5.90 ms
Python · external
📐ref.python_scikit_learnsource4.09 ms🏆3.28 ms4.69 ms
R · external
📐ref.r_pls✗ +1e+0025.9 ms12.6 ms15.8 ms
::: :::{tab-item} 3 threads :sync: threads-3
BackendParity50×250 (ms)100×50 (ms)100×500 (ms)100×2500 (ms)200×50 (ms)250×50 (ms)500×50 (ms)500×500 (ms)500×2500 (ms)2500×50 (ms)2500×500 (ms)2500×2500 (ms)10000×50 (ms)10000×500 (ms)
C++ native · libn4m
pls4all.cpp.blas✓ ref 3e-123.46 ms
pls4all.cpp.blas+omp✓ ref 3e-123.29 ms
pls4all.cpp.omp✓ ref 3e-123.39 ms
pls4all.cpp.ref✓ ref 3e-123.09 ms
Python · pls4all
pls4all.python✓ 2e-152.76 ms🏆
pls4all.sklearn✓ 2e-153.29 ms
R · pls4all
pls4all.R✓ bind6.81 ms
pls4all.R.formula✓ bind9.69 ms
pls4all.R.mdatools✓ bind9.28 ms
pls4all.R.pls✓ bind15.7 ms
MATLAB · pls4all
pls4all.matlab✗ +3e+003.87 ms
pls4all.matlab.classdef✗ +3e+005.14 ms
Python · external
📐ref.python_scikit_learnsource3.45 ms
R · external
📐ref.r_pls✗ +1e+0012.6 ms
::: :::{tab-item} 10 threads :sync: threads-10
BackendParity50×250 (ms)100×50 (ms)100×500 (ms)100×2500 (ms)200×50 (ms)250×50 (ms)500×50 (ms)500×500 (ms)500×2500 (ms)2500×50 (ms)2500×500 (ms)2500×2500 (ms)10000×50 (ms)10000×500 (ms)
C++ native · libn4m
pls4all.cpp.blas✓ ref 3e-122.55 ms
pls4all.cpp.blas+omp✓ ref 3e-122.47 ms🏆
pls4all.cpp.omp✓ ref 3e-122.62 ms
pls4all.cpp.ref✓ ref 3e-122.54 ms
Python · pls4all
pls4all.python✓ 2e-152.53 ms
pls4all.sklearn✓ 2e-152.76 ms
R · pls4all
pls4all.R✓ bind5.92 ms
pls4all.R.formula✓ bind7.09 ms
pls4all.R.mdatools✓ bind7.52 ms
pls4all.R.pls✓ bind12.5 ms
MATLAB · pls4all
pls4all.matlab✗ +3e+003.72 ms
pls4all.matlab.classdef✗ +3e+003.97 ms
Python · external
📐ref.python_scikit_learnsource5.56 ms
R · external
📐ref.r_pls✗ +1e+0013.6 ms
::: :::: --- _See also_: [benchmark overview](../benchmarks/overview.md) · [methods index](index.md) · [interactive dashboard](../landing/dashboard.md)