collinear 3.0.0

Breaking Changes

API Changes

Renamed Functions

Old Name (v2.0) New Name (v3.0)
identify_predictors() Split into identify_valid_variables(), identify_numeric_variables(), identify_categorical_variables(), identify_logical_variables()
identify_predictors_categorical() identify_categorical_variables()
identify_predictors_numeric() identify_numeric_variables()
identify_predictors_zero_variance() identify_zero_variance_variables()
identify_predictors_type() Removed (merged into identify_valid_variables())

Renamed f_ Functions for Preference Order

Old Name (v2.0) New Name (v3.0)
f_r2_glm_gaussian() f_numeric_glm()
f_r2_gam_gaussian() f_numeric_gam()
f_r2_rf() f_numeric_rf()
f_r2_glm_poisson() f_count_glm()
f_r2_gam_poisson() f_count_gam()
f_auc_glm_binomial() f_binomial_glm()
f_auc_gam_binomial() f_binomial_gam()
f_auc_rf_binomial() f_binomial_rf()
f_v_rf() f_categorical_rf()
f_count_rf() (new)

Major New Features

Adaptive Multicollinearity Thresholds

When both max_cor = NULL and max_vif = NULL, the function now automatically determines optimal filtering thresholds using:

This data-driven approach adapts to each dataset’s correlation structure, preventing over-filtering while maintaining statistically meaningful bounds.

Tidymodels Integration

Cross-Validation Support in Preference Order

Rich Output Structure

collinear() now returns comprehensive results including:

S3 methods print() and summary() for collinear_output and collinear_selection classes provide clean output formatting.

Correlation Matrix Improvements


New Functions

Multicollinearity Assessment

Preference Order

S3 Methods

New Datasets and Models

Name Description
experiment_adaptive_thresholds Validation experiment results (10,000 iterations)
experiment_cor_vs_vif Correlation vs VIF equivalence experiment results
gam_cor_to_vif Fitted GAM for mapping max_cor to max_vif
prediction_cor_to_vif Look-up table for threshold equivalence
toy Simple dataset illustrating multicollinearity concepts
vi_smol Smaller version of vi dataset (610 rows) for faster examples
vi_responses Character vector of response variable names

Improvements

VIF Computation

Validation

Documentation


Bug Fixes


Deprecated


collinear 2.0.0

Main Improvements

  1. Expanded Functionality: Functions collinear() and preference_order() support both categorical and numeric responses and predictors, and can handle several responses at once.

  2. Robust Selection Algorithms: Enhanced selection in vif_select() and cor_select().

  3. Enhanced Functionality to Rank Predictors: New functions to compute association between response and predictors covering most use-cases, and automated function selection depending on data features.

  4. Simplified Target Encoding: Streamlined and parallelized for better efficiency, and new default is "loo" (leave-one-out).

  5. Parallelization and Progress Bars: Utilizes future and progressr for enhanced performance and user experience.


collinear 1.1.1