Upcoming Features in corrselect 2.1.0

Upcoming Features in corrselect 2.1.0

This vignette introduces features that will be available in version 2.1.0 of the corrselect package. These enhancements aim to provide more flexibility and alternative strategies for variable subset selection.

Spectral Method (Prototype)

A new selection strategy based on spectral clustering is currently in development. This approach performs a normalized spectral clustering on the correlation matrix to identify sets of weakly correlated variables.

Rationale

Unlike local or exhaustive search algorithms, spectral clustering provides a global approximation that can rapidly identify candidate subsets with minimal internal association.

Overview of Steps

The algorithm follows these steps:

  1. Similarity matrix from absolute correlations: \(S_{ij} = 1 - |r_{ij}|\)
  2. Degree vector: \(D_i = \sum_j S_{ij}\)
  3. Normalized Laplacian: \(L = I - D^{-1/2} S D^{-1/2}\)
  4. Eigen decomposition of \(L\)
  5. K-means clustering in the reduced eigenvector space
  6. Validation of each cluster based on correlation threshold and forced variables

Basic Example

set.seed(1)
mat <- matrix(rnorm(100), ncol = 10)
colnames(mat) <- paste0("V", 1:10)
cmat <- cor(mat)

res <- MatSelect(cmat, threshold = 0.5, method = "spectral")
res

Customizing the Number of Clusters

You can pass an integer k to override the default number of clusters:

res <- MatSelect(cmat, threshold = 0.5, method = "spectral", k = 4)

Note that this method is still under testing and might change before release.

Availability

This feature will be available in version 2.1.0. If you’re interested in testing it early, you can install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("gcol33/corrselect")

I welcome feedback and suggestions via GitHub issues or direct contact.