Title: | Fair Data Adaptation with Quantile Preservation |
Description: | An implementation of the fair data adaptation with quantile preservation described in Plecko & Meinshausen (JMLR 2020, 21(242), 1-44). The adaptation procedure uses the specified causal graph to pre-process the given training and testing data in such a way to remove the bias caused by the protected attribute. The procedure uses tree ensembles for quantile regression. Instructions for using the methods are further elaborated in the corresponding JSS manuscript, see <doi:10.18637/jss.v110.i04>. |
Version: | 1.0.0 |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
Language: | en-US |
LazyData: | true |
URL: | https://github.com/dplecko/fairadapt |
BugReports: | https://github.com/dplecko/fairadapt/issues |
Depends: | R (≥ 3.5.0) |
Imports: | ranger (≥ 0.13.1), assertthat, quantreg, qrnn, igraph, ggplot2, cowplot, scales |
Suggests: | testthat (≥ 3.0.3), knitr, rmarkdown, rticles, mvtnorm, magick, ggraph, pdftools, microbenchmark, xtable, spelling |
RoxygenNote: | 7.3.1 |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-09-05 16:14:37 UTC; pleckod |
Author: | Drago Plecko [aut, cre], Nicolas Bennett [aut] |
Maintainer: | Drago Plecko <www.plecko@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-09-06 12:50:08 UTC |
fairadapt: Fair Data Adaptation with Quantile Preservation
Description
An implementation of the fair data adaptation with quantile preservation described in Plecko & Meinshausen (JMLR 2020, 21(242), 1-44). The adaptation procedure uses the specified causal graph to pre-process the given training and testing data in such a way to remove the bias caused by the protected attribute. The procedure uses tree ensembles for quantile regression. Instructions for using the methods are further elaborated in the corresponding JSS manuscript, see doi:10.18637/jss.v110.i04.
Author(s)
Maintainer: Drago Plecko www.plecko@gmail.com
Authors:
Nicolas Bennett nicolas.bennett@stat.math.ethz.ch
See Also
Useful links:
Convenience function for returning adapted data
Description
Convenience function for returning adapted data
Usage
adaptedData(x, train = TRUE)
## S3 method for class 'fairadapt'
adaptedData(x, train = TRUE)
## S3 method for class 'fairadaptBoot'
adaptedData(x, train = TRUE)
Arguments
x |
Object of class |
train |
A logical indicating whether train data should be returned.
Defaults to |
Value
Either a data.frame
when called on an fairadapt
object, or a list
of data.frame
s with the adapted data of length n.boot
, when called on a
fairadaptBoot
object.
Plotting data before and after adaptation
Description
Plotting data before and after adaptation
Usage
## S3 method for class 'fairadapt'
autoplot(object, when = "after", ...)
Arguments
object |
An object of class |
when |
A |
... |
In this case ignored. |
Value
A ggplot
for visualizing the distribution of the outcome before/after
the adaptation procedure.
COMPAS dataset
Description
A real dataset from Broward County, Florida. Contains information on individuals released on parole, and whether they reoffended within two years.
Usage
compas
Format
A data frame with 1,000 rows and 9 variables:
- sex
sex of the individual
- age
age, measured in years
- race
race, binary with values Non-White and White
- juv_fel_count
count of juvenile felonies
- juv_misd_count
count of juvenile misdemeanors
- juv_other_count
count of other juvenile offenses
- priors_count
count of prior offenses
- c_charge_degree
degree of charge, with two values, F (felony) and M (misdemeanor)
- two_year_recid
a logical TRUE/FALSE indicator of recidivism within two years after parole start
Compute quantiles generic for the quantile learning step
Description
Compute quantiles generic for the quantile learning step
Usage
computeQuants(x, data, newdata, ind, ...)
Arguments
x |
Object with an associated |
data |
|
newdata |
|
ind |
A |
... |
Additional arguments to be passed down to respective method functions. |
Value
A vector of counterfactual values corresponding to newdata
.
Fair twin inspection convenience function
Description
Fair twin inspection convenience function
Usage
fairTwins(x, train.id = seq_len(nrow(x$train)), test.id = NULL, cols = NULL)
Arguments
x |
Object of class |
train.id |
A vector of indices specifying which rows of the training data should be displayed. |
test.id |
A vector of indices specifying which rows of the test data should be displayed. |
cols |
A |
Value
A data.frame
, containing the original and adapted values
of the requested individuals. Adapted columns have _adapted
appended
to their original name.
Examples
n_samp <- 200
uni_dim <- c( "gender", "edu", "test", "score")
uni_adj <- matrix(c( 0, 1, 1, 0,
0, 0, 1, 1,
0, 0, 0, 1,
0, 0, 0, 0),
ncol = length(uni_dim),
dimnames = rep(list(uni_dim), 2),
byrow = TRUE)
uni_ada <- fairadapt(score ~ .,
train.data = head(uni_admission, n = n_samp),
test.data = tail(uni_admission, n = n_samp),
adj.mat = uni_adj,
prot.attr = "gender"
)
fairTwins(uni_ada, train.id = 1:5)
Fair data adaptation (fairadapt)
Description
Implementation of fair data adaptation with quantile preservation
(Plecko & Meinshausen, 2020). Uses only plain R
.
Usage
fairadapt(
formula,
prot.attr,
adj.mat,
train.data,
test.data = NULL,
cfd.mat = NULL,
top.ord = NULL,
res.vars = NULL,
quant.method = rangerQuants,
visualize.graph = FALSE,
eval.qfit = NULL,
...
)
## S3 method for class 'fairadapt'
print(x, ...)
Arguments
formula |
Object of class |
prot.attr |
A value of class |
adj.mat |
Matrix of class |
train.data , test.data |
Training data & testing data, both of class
|
cfd.mat |
Symmetric matrix of class |
top.ord |
A vector of class |
res.vars |
A vector of class |
quant.method |
A function choosing the method used for quantile
regression. Default value is |
visualize.graph |
A |
eval.qfit |
Argument indicating whether the quality of the quantile
regression fit should be computed using cross-validation. Default value is
|
... |
Additional arguments forwarded to the function passed as
|
x |
Object of class |
Details
The procedure takes the training and testing data as an input, together with the causal graph given by an adjacency matrix and the list of resolving variables, which should be kept fixed during the adaptation procedure. The procedure then calculates a fair representation of the data, after which any classification method can be used. There are, however, several valid training options yielding fair predictions, and the best of them can be chosen with cross-validation. For more details we refer the user to the original paper. Most of the running time is due to the quantile regression step using the ranger package.
Value
An object of class fairadapt
, containing the original and
adapted training and testing data, together with the causal graph and some
additional meta-information.
References
Plecko, D. & Meinshausen, N. (2020). Fair Data Adaptation with Quantile Preservation. Journal of Machine Learning Research, 21(242), 1-44.
Plecko, D. & Bennett, N. & Meinshausen, N. (2024). fairadapt: Causal reasoning for fair data pre-processing. Journal of Statistical Software, 110(4). doi:10.18637/jss.v110.i04.
Examples
n_samp <- 200
uni_dim <- c( "gender", "edu", "test", "score")
uni_adj <- matrix(c( 0, 1, 1, 0,
0, 0, 1, 1,
0, 0, 0, 1,
0, 0, 0, 0),
ncol = length(uni_dim),
dimnames = rep(list(uni_dim), 2),
byrow = TRUE)
uni_ada <- fairadapt(score ~ .,
train.data = head(uni_admission, n = n_samp),
test.data = tail(uni_admission, n = n_samp),
adj.mat = uni_adj,
prot.attr = "gender"
)
uni_ada
Fairadapt boostrap wrapper
Description
The fairadapt()
function performs data adaptation, but does so only
once. Sometimes, it might be desirable to repeat this process, in order to be
able to make uncertainty estimates about the data adaptation that is
performed. The wrapper function fairadaptBoot()
enables the user to do
so, by performing the fairadapt()
procedure multiple times, and
keeping in memory the important multiple data transformations. For a worked
example of how to use fairadaptBoot()
for uncertainty quantification,
see the fairadapt
vignette.
Usage
fairadaptBoot(
formula,
prot.attr,
adj.mat,
train.data,
test.data = NULL,
cfd.mat = NULL,
top.ord = NULL,
res.vars = NULL,
quant.method = rangerQuants,
keep.object = FALSE,
n.boot = 100,
rand.mode = c("finsamp", "quant", "both"),
test.seed = 2022,
...
)
## S3 method for class 'fairadaptBoot'
print(x, ...)
Arguments
formula |
Object of class |
prot.attr |
A value of class |
adj.mat |
Matrix of class |
train.data , test.data |
Training data & testing data, both of class
|
cfd.mat |
Symmetric matrix of class |
top.ord |
A vector of class |
res.vars |
A vector of class |
quant.method |
A function choosing the method used for quantile
regression. Default value is |
keep.object |
a |
n.boot |
An integer corresponding to the umber of bootstrap iterations. |
rand.mode |
A string, taking values |
test.seed |
a seed for the randomness in breaking quantiles for the
discrete variables. This argument is only relevant when |
... |
Additional arguments forwarded to the function passed as
|
x |
Object of class |
Value
An object of class fairadaptBoot
, containing the original and
adapted training and testing data, together with the causal graph and some
additional meta-information.
References
Plecko, D. & Meinshausen, N. (2020). Fair Data Adaptation with Quantile Preservation. Journal of Machine Learning Research, 21(242), 1-44.
Plecko, D. & Bennett, N. & Meinshausen, N. (2024). fairadapt: Causal reasoning for fair data pre-processing. Journal of Statistical Software, 110(4). doi:10.18637/jss.v110.i04.
Examples
n_samp <- 200
uni_dim <- c( "gender", "edu", "test", "score")
uni_adj <- matrix(c( 0, 1, 1, 0,
0, 0, 1, 1,
0, 0, 0, 1,
0, 0, 0, 0),
ncol = length(uni_dim),
dimnames = rep(list(uni_dim), 2),
byrow = TRUE)
uni_ada <- fairadaptBoot(score ~ .,
train.data = head(uni_admission, n = n_samp),
test.data = tail(uni_admission, n = n_samp),
adj.mat = uni_adj,
prot.attr = "gender",
n.boot = 5
)
uni_ada
Census information of US government employees
Description
The dataset contains various demographic, education and work information of the employees of the US government. The data is taken from the 2018 US Census data.
Usage
gov_census
Format
A data frame with 204,309 rows and 17 variables:
- sex
gender of the employee
- age
employee age in years
- race
race of the employee
- hispanic_origin
indicator of hispanic origin
- citizenship
citizenship of the employee
- nativity
indicator of nativity to the US
- marital
marital status
- family_size
size of the employee's family
- children
number of children of the employee
- education_level
education level measured in years
- english_level
- salary
yearly salary in US dollars
- hours_worked
hours worked every week
- weeks_worked
weeks worked in the given year
- occupation
occupation classification
- industry
industry classification
- economic_region
economic region where the person is employed in the US
Source
https://www.census.gov/programs-surveys/acs/microdata/documentation.html
Obtaining the graphical causal model (GCM)
Description
Obtaining the graphical causal model (GCM)
Usage
graphModel(adj.mat, cfd.mat = NULL, res.vars = NULL)
Arguments
adj.mat |
Matrix of class |
cfd.mat |
Symmetric matrix of class |
res.vars |
A vector of class |
Value
An object of class igraph
, containing the causal graphical,
with directed and bidirected edges.
Examples
adj.mat <- cfd.mat <- array(0L, dim = c(3, 3))
colnames(adj.mat) <- rownames(adj.mat) <-
colnames(cfd.mat) <- rownames(cfd.mat) <- c("A", "X", "Y")
adj.mat["A", "X"] <- adj.mat["X", "Y"] <-
cfd.mat["X", "Y"] <- cfd.mat["Y", "X"] <- 1L
gcm <- graphModel(adj.mat, cfd.mat, res.vars = "X")
Plotting data before and after adaptation
Description
Plotting data before and after adaptation
Usage
## S3 method for class 'fairadapt'
plot(x, when = "after", ...)
Arguments
x |
An object of class |
when |
A |
... |
In this case ignored. |
Value
A base R plot for visualizing the distribution of the outcome before/after the adaptation procedure.
Prediction function for new data from a saved fairadapt
object
Description
Prediction function for new data from a saved fairadapt
object
Usage
## S3 method for class 'fairadapt'
predict(object, newdata, ...)
Arguments
object |
Object of class |
newdata |
A |
... |
Additional arguments forwarded to |
Details
The newdata
argument should be compatible with adapt.test
argument that was used when constructing the fairadapt
object. In
particular, newdata
should contain column names that appear in the formula
argument that was used when calling fairadapt()
(apart from the outcome
variable on the LHS of the formula).
Value
A data.frame
containing the adapted version of the new data.
Examples
n_samp <- 200
uni_dim <- c( "gender", "edu", "test", "score")
uni_adj <- matrix(c( 0, 1, 1, 0,
0, 0, 1, 1,
0, 0, 0, 1,
0, 0, 0, 0),
ncol = length(uni_dim),
dimnames = rep(list(uni_dim), 2),
byrow = TRUE)
uni_ada <- fairadapt(score ~ .,
train.data = head(uni_admission, n = n_samp),
adj.mat = uni_adj,
prot.attr = "gender"
)
predict(object = uni_ada, newdata = tail(uni_admission, n = n_samp))
Prediction function for new data from a saved fairadaptBoot
object
Description
Prediction function for new data from a saved fairadaptBoot
object
Usage
## S3 method for class 'fairadaptBoot'
predict(object, newdata, ...)
Arguments
object |
Object of class |
newdata |
A |
... |
Additional arguments forwarded to |
Details
The newdata
argument should be compatible with adapt.test
argument that was used when constructing the fairadaptBoot
object. In
particular, newdata
should contain column names that appear in the
formula
argument that was used when calling fairadaptBoot()
(apart from
the outcome variable on the LHS of the formula).
Value
A data.frame
containing the adapted version of the new data.
Examples
n_samp <- 200
uni_dim <- c( "gender", "edu", "test", "score")
uni_adj <- matrix(c( 0, 1, 1, 0,
0, 0, 1, 1,
0, 0, 0, 1,
0, 0, 0, 0),
ncol = length(uni_dim),
dimnames = rep(list(uni_dim), 2),
byrow = TRUE)
uni_ada_boot <- fairadaptBoot(score ~ .,
train.data = head(uni_admission, n = n_samp),
adj.mat = uni_adj,
prot.attr = "gender",
n.boot = 5,
keep.object = TRUE
)
predict(object = uni_ada_boot, newdata = tail(uni_admission, n = n_samp))
Quality of quantile fit statistics
Description
Quality of quantile fit statistics
Usage
quantFit(x, ...)
Arguments
x |
Object of class |
... |
Ignored in this case. |
Value
A numeric
vector, containing the average empirical loss for
the 25%, 50% and 75% quantile loss functions, for each variable.
Examples
n_samp <- 200
uni_dim <- c( "gender", "edu", "test", "score")
uni_adj <- matrix(c( 0, 1, 1, 0,
0, 0, 1, 1,
0, 0, 0, 1,
0, 0, 0, 0),
ncol = length(uni_dim),
dimnames = rep(list(uni_dim), 2),
byrow = TRUE)
uni_ada <- fairadapt(score ~ .,
train.data = head(uni_admission, n = n_samp),
test.data = tail(uni_admission, n = n_samp),
adj.mat = uni_adj,
prot.attr = "gender",
eval.qfit = 3L
)
quantFit(uni_ada)
Quantile engine constructor for the quantile learning step
Description
There are several methods that can be used for the quantile learning step
in the fairadapt
package. Each of the methods needs a specific
constructor. The constructor is a function that takes the data (with some
additional meta-information) and returns an object on which the
computeQuants()
generic can be called.
Usage
rangerQuants(data, A.root, ind, min.node.size = 20, ...)
linearQuants(
data,
A.root,
ind,
tau = c(0.001, seq(0.005, 0.995, by = 0.01), 0.999),
...
)
mcqrnnQuants(
data,
A.root,
ind,
tau = seq(0.005, 0.995, by = 0.01),
iter.max = 500,
...
)
Arguments
data |
A |
A.root |
A |
ind |
A |
min.node.size |
Forwarded to |
... |
Forwarded to further methods. |
tau |
Forwarded to |
iter.max |
Forwarded to |
Details
Within the package, there are 3 different methods implemented, which use
quantile regressors based on linear models, random forests and neural
networks. However, there is additional flexibility and the user can provide
her/his own quantile method. For this, the user needs to write (i) the
constructor which returns an S3 classed object (see examples below);
(ii) a method for the computeQuants()
generic for the S3 class
returned in (i).
The rangerQuants()
function uses random forests
(ranger
package) for quantile regression.
The linearQuants()
function uses linear quantile regression
(quantreg
package) for the Quantile Learning step.
The mcqrnnQuants()
function uses monotone quantile
regression neural networks (mcqrnn
package) in the Quantile Learning step.
Value
A ranger
or a rangersplit
S3 object, depending on the
value of the A.root
argument, for rangerQuants()
.
A rqs
or a quantregsplit
S3 object, depending on the
value of the A.root
argument, for linearQuants()
.
An mcqrnn
S3 object for mcqrnnQuants()
.
Summarizing fairadapt fit
Description
summary
method for class "fairadapt"
.
Usage
## S3 method for class 'fairadapt'
summary(object, ...)
## S3 method for class 'summary.fairadapt'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
object |
An object of class |
... |
In this case ignored. |
x |
Object of class |
digits |
Number of digits appearing in the output. |
Value
Summary of the object formula, protected attribute, attribute levels, resolving variables, number of training and test samples, adapted variables, TV measure before adaptation, TV measure after adaptation, and the quantile method that was used.
Summarizing fairadaptBoot fit
Description
summary
method for class "fairadaptBoot"
.
Usage
## S3 method for class 'fairadaptBoot'
summary(object, ...)
## S3 method for class 'summary.fairadaptBoot'
print(x, ...)
Arguments
object |
An object of class |
... |
In this case ignored. |
x |
Object of class |
Value
Summary of the bootstrap wrapper call, protected attribute, attribute levels, resolving variables, number of training and test samples, adapted variables, number of bootstrap repetitions, indicator if the quantileFit objects were saved, randomness mode, and the quantile method that was used.
University admission data of 1,000 students
Description
A simulated dataset containing the evaluation of students' abilities.
Usage
uni_admission
Format
A data frame with 1,000 rows and 4 variables:
- gender
the gender of the student
- edu
educational achievement, for instance GPA
- test
performance on a university admission test
- score
overall final score measuring the quality of a candidate
Visualize graphical causal model
Description
Visualize graphical causal model
Usage
visualizeGraph(x, ...)
Arguments
x |
Object of class |
... |
Additional arguments passed to the graph plotting function. |