Title: Model Wrappers for Discriminant Analysis
Version: 1.0.1
Description: Bindings for additional classification models for use with the 'parsnip' package. Models include flavors of discriminant analysis, such as linear (Fisher (1936) <doi:10.1111/j.1469-1809.1936.tb02137.x>), regularized (Friedman (1989) <doi:10.1080/01621459.1989.10478752>), and flexible (Hastie, Tibshirani, and Buja (1994) <doi:10.1080/01621459.1994.10476866>), as well as naive Bayes classifiers (Hand and Yu (2007) <doi:10.1111/j.1751-5823.2001.tb00465.x>).
License: MIT + file LICENSE
URL: https://github.com/tidymodels/discrim, https://discrim.tidymodels.org/
BugReports: https://github.com/tidymodels/discrim/issues
Depends: parsnip (≥ 0.2.0), R (≥ 3.4)
Imports: dials, rlang, stats, tibble, withr
Suggests: covr, dplyr, earth, ggplot2, klaR, knitr, MASS, mda, mlbench, modeldata, naivebayes, rmarkdown, sda, sparsediscrim (≥ 0.3.0), spelling, testthat (≥ 3.0.0), xml2
Config/Needs/website: tidymodels/tidymodels, tidyverse/tidytemplate
Encoding: UTF-8
Language: en-US
LazyData: true
RoxygenNote: 7.2.3
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2023-03-08 20:38:54 UTC; emilhvitfeldt
Author: Emil Hvitfeldt ORCID iD [aut, cre], Max Kuhn ORCID iD [aut], Posit Software, PBC [cph, fnd]
Maintainer: Emil Hvitfeldt <emil.hvitfeldt@posit.co>
Repository: CRAN
Date/Publication: 2023-03-08 22:00:15 UTC

parsnip methods for discriminant analysis

Description

discrim offers various functions to fit classification models via the discriminant analysis.

Details

The model function works with the tidymodels infrastructure so that the model can be resampled, tuned, tided, etc.

Example

As an example, we’ll use a flexible discriminant analysis model of Hastie, Tibshirani, and Buja (1994). This fits a model that uses features generated by the multivariate adaptive regression spline (MARS) model of Friedman (1991). It is able to create class boundaries that are polygons and has built-in feature selection.

The parabolic data from the modeldata package will be used to illustrate:

library(tidymodels)
library(discrim)
tidymodels_prefer()
theme_set(theme_bw())

data(parabolic, package = "modeldata")

To create the model, the discrim_flexible() function is used along with an engine of "earth" (which contains the methods to use the MARS model). We’ll set the number of MARS terms to use but this can be tuned via the methods in the tune package.

The fit() function estimates the model. fit_xy() can be used if one does not wish to use the formula method.

fda_mod <-
  discrim_flexible(num_terms = 3) %>%
  # increase `num_terms` to find smoother boundaries
  set_engine("earth") %>%
  fit(class ~ ., data = parabolic)
fda_mod
## parsnip model object
## 
## Call:
## mda::fda(formula = class ~ ., data = data, method = earth::earth, 
##     nprune = ~3)
## 
## Dimension: 1 
## 
## Percent Between-Group Variance Explained:
##  v1 
## 100 
## 
## Training Misclassification Error: 0.136 ( N = 500 )

Now let’s plot the class boundary by predicting on a grid of points then creating a contour plot for the 50% probability cutoff.

parabolic_grid <-
  expand.grid(X1 = seq(-5, 5, length = 100),
              X2 = seq(-5, 5, length = 100))

parabolic_grid <- 
  parabolic_grid %>% 
  bind_cols(
    predict(fda_mod, parabolic_grid, type = "prob")
  )

ggplot(parabolic, aes(x = X1, y = X2)) +
  geom_point(aes(col = class), alpha = .5) +
  geom_contour(data = parabolic_grid, aes(z = .pred_Class1), col = "black", breaks = .5) +
  coord_equal()

Author(s)

Maintainer: Emil Hvitfeldt emil.hvitfeldt@posit.co (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Wrapper for sparsediscrim models

Description

Wrapper for sparsediscrim models

Usage

fit_regularized_linear(x, y, method = "diagonal", ...)

fit_regularized_quad(x, y, method = "diagonal", ...)

Arguments

x

A matrix or data frame.

y

A factor column.

method

A character string.

...

Not currently used.

Value

A sparsediscrim object


Parameter objects for Regularized Discriminant Models

Description

discrim_regularized() describes the effect of frac_common_cov() and frac_identity(). smoothness() is an alias for the adjust parameter in stats::density().

Usage

frac_common_cov(range = c(0, 1), trans = NULL)

frac_identity(range = c(0, 1), trans = NULL)

smoothness(range = c(0.5, 1.5), trans = NULL)

Arguments

range

A two-element vector holding the defaults for the smallest and largest possible values, respectively.

trans

A trans object from the scales package, such as scales::log10_trans() or scales::reciprocal_trans(). If not provided, the default is used which matches the units used in range. If no transformation, NULL.

Details

These parameters can modulate a RDA model to go between linear and quadratic class boundaries.

Value

A function with classes "quant_param" and "param"

Examples

frac_common_cov()

Internal wrapper functions

Description

Internal wrapper functions

Usage

klar_bayes_wrapper(x, y, ...)

pred_wrapper(object, new_data, ...)

Parabolic class boundary data

Description

Parabolic class boundary data

Details

These data were simulated. There are two correlated predictors and two classes in the factor outcome.

Value

parabolic

a data frame

Examples

data(parabolic)

library(ggplot2)
ggplot(parabolic, aes(x = X1, y = X2, col = class)) +
  geom_point(alpha = .5) +
  theme_bw()