Type: | Package |
Title: | Outlier Robust Two-Stage Least Squares Inference and Testing |
Version: | 0.2.3 |
Description: | An implementation of easy tools for outlier robust inference in two-stage least squares (2SLS) models. The user specifies a reference distribution against which observations are classified as outliers or not. After removing the outliers, adjusted standard errors are automatically provided. Furthermore, several statistical tests for the false outlier detection rate can be calculated. The outlier removing algorithm can be iterated a fixed number of times or until the procedure converges. The algorithms and robust inference are described in more detail in Jiao (2019) https://drive.google.com/file/d/1qPxDJnLlzLqdk94X9wwVASptf1MPpI2w/view. |
URL: | https://github.com/jkurle/robust2sls |
BugReports: | https://github.com/jkurle/robust2sls/issues |
License: | GPL-3 |
Encoding: | UTF-8 |
Suggests: | covr, datasets, doFuture, doParallel, doRNG, future, ggplot2, grDevices, ivgets, knitr, parallel, rmarkdown, testthat, utils |
RoxygenNote: | 7.3.2 |
Config/testthat/edition: | 3 |
Imports: | exactci, foreach, ivreg, MASS, mathjaxr, pracma, stats |
VignetteBuilder: | knitr |
Depends: | R (≥ 2.10) |
RdMacros: | mathjaxr |
NeedsCompilation: | no |
Packaged: | 2025-05-20 22:14:47 UTC; jonas |
Author: | Jonas Kurle |
Maintainer: | Jonas Kurle <mail@jonaskurle.com> |
Repository: | CRAN |
Date/Publication: | 2025-05-20 22:30:02 UTC |
robust2sls: A package for outlier robust 2SLS inference and testing
Description
The robust2sls package provides two main functionalities. First, it implements an algorithm for determining whether an observations is an outlier based on its standardized residual and re-estimation based on the sub-sample excluding all outliers. This procedure is often used in empirical research to show that the results are not driven by outliers. This package has implemented the algorithm in various forms and the user can select between different initial estimators and how often the algorithm is iterated. The statistical inference is adapted to account for potential false positives (classifying observations as outliers even though they are not).
Second, the robust2sls package provides easy-to-use statistical tests on whether the difference between the original and the outlier-robust estimates is statistically significant. Furthermore, several different statistical tests are implemented to test whether the sample actually contains outliers.
Author(s)
Maintainer: Jonas Kurle mail@jonaskurle.com (ORCID)
See Also
Useful links:
Calculates a Hausman test on the difference between robust and full sample estimates
Description
Calculates a Hausman test on the difference between robust and full sample estimates
Usage
beta_hausman(robust2sls_object, iteration, subset = NULL, fp = FALSE)
Arguments
robust2sls_object |
An object of class |
iteration |
An integer > 0 specifying the iteration step for which parameters to calculate corrected standard errors. |
subset |
A vector of numeric indices or strings indicating which
coefficients to include in the Hausman test. |
fp |
A logical value whether the fixed point asymptotic variance (TRUE) or the exact iteration asymptotic variance should be used (FALSE). |
Details
Argument fp
determines whether the fixed point asymptotic variance
should be used. This argument is only respected if the specified
iteration
is one of the iterations after the algorithm converged.
Value
beta_hausman
returns a matrix with the value of the Hausman
test statistic and its corresponding p-value. The attribute
"type of avar"
records which asymptotic variance has been used (the
specific iteration or the fixed point). The attribute "coefficients"
stores the names of the coefficients that were included in the Hausman test.
Calculates valid se for coefficients under H0 of no outliers
Description
Calculates valid se for coefficients under H0 of no outliers
Usage
beta_inf(robust2sls_object, iteration = 1, exact = FALSE, fp = FALSE)
Arguments
robust2sls_object |
An object of class |
iteration |
An integer > 0 specifying the iteration step for which parameters to calculate corrected standard errors. |
exact |
A logical value indicating whether the actually detected share of outliers (TRUE) or the theoretical share (FALSE) should be used. |
fp |
A logical value whether the fixed point standard error correction (TRUE) or the exact iteration correction should be computed (FALSE). |
Details
Argument iteration
specifies which iteration of the robust structural
parameter estimates should be calculated. Iteration 1
refers to the
first robust estimate. Iteration 0
is not a valid argument since it
is the baseline estimate, which is not robust.
The parameter exact
does not matter much under the null hypothesis of
no outliers since the detected share will converge to the theoretical share.
Under the alternative, this function should not be used.
Argument fp
determines whether the fixed point standard error
correction should be computed. This argument is only respected if the
specified iteration
is one of the iterations after the algorithm
converged.
Value
beta_inf
returns the corrected standard errors for the
structural parameters. These are valid under the null hypothesis of no
outliers in the sample. For comparison, the uncorrected standard errors are
also reported.
Calculates the correction factor for inference under H0 of no outliers
Description
Calculates the correction factor for inference under H0 of no outliers
Usage
beta_inf_correction(
robust2sls_object,
iteration = 1,
exact = FALSE,
fp = FALSE
)
Arguments
robust2sls_object |
An object of class |
iteration |
An integer > 0 specifying the iteration step for which parameters to calculate corrected standard errors. |
exact |
A logical value indicating whether the actually detected share of outliers (TRUE) or the theoretical share (FALSE) should be used. |
fp |
A logical value whether the fixed point standard error correction (TRUE) or the exact iteration correction should be computed (FALSE). |
Details
Argument iteration
specifies which iteration of the robust structural
parameter estimates should be calculated. Iteration 1
refers to the
first robust estimate. Iteration 0
is not a valid argument since it
is the baseline estimate, which is not robust.
The parameter exact
does not matter much under the null hypothesis of
no outliers since the detected share will converge to the theoretical share.
Under the alternative, this function should not be used.
Argument fp
determines whether the fixed point standard error
correction should be computed. This argument is only respected if the
specified iteration
is one of the iterations after the algorithm
converged.
Value
beta_inf_correction
returns the numeric correction factor.
Conducts a t-test on the difference between robust and full sample estimates
Description
Conducts a t-test on the difference between robust and full sample estimates
Usage
beta_t(robust2sls_object, iteration, element, fp = FALSE)
Arguments
robust2sls_object |
An object of class |
iteration |
An integer > 0 specifying the iteration step for which parameters to calculate corrected standard errors. |
element |
An index or a string to select the coefficient which is to be
tested. The index should refer to the index of coefficients in the
|
fp |
A logical value whether the fixed point asymptotic variance (TRUE) or the exact iteration asymptotic variance should be used (FALSE). |
Details
Argument fp
determines whether the fixed point asymptotic variance
should be used. This argument is only respected if the specified
iteration
is one of the iterations after the algorithm converged.
Value
beta_t
returns a matrix with the robust and full sample
estimates of beta, the t statistic on their difference, the standard error of
the difference, and three p-values (two-sided, both one-sided alternatives).
Calculates the asymptotic variance of the difference between robust and full sample estimators of the structural parameters
Description
Calculates the asymptotic variance of the difference between robust and full sample estimators of the structural parameters
Usage
beta_test_avar(robust2sls_object, iteration, fp = FALSE)
Arguments
robust2sls_object |
An object of class |
iteration |
An integer > 0 specifying the iteration step for which parameters to calculate corrected standard errors. |
fp |
A logical value whether the fixed point asymptotic variance (TRUE) or the exact iteration asymptotic variance should be computed (FALSE). |
Details
Argument fp
determines whether the fixed point asymptotic variance
should be computed. This argument is only respected if the specified
iteration
is one of the iterations after the algorithm converged.
Value
beta_test_avar
returns a dx by dx variance-covariance matrix of
the difference between the robust and full sample structural parameter
estimates of the 2SLS model.
Uses nonparametric case resampling for standard errors of parameters and gauge
Description
Uses nonparametric case resampling for standard errors of parameters and gauge
Usage
case_resampling(robust2sls_object, R, coef = NULL, m = NULL, parallel = FALSE)
Arguments
robust2sls_object |
An object of class |
R |
An integer specifying the number of resamples. |
coef |
A numeric or character vector specifying which structural
coefficient estimates should be recorded across bootstrap replications.
|
m |
A single numeric or vector of integers specifying for which
iterations the bootstrap statistics should be calculated. |
parallel |
A logical value indicating whether to run the bootstrap sampling in parallel or sequentially. See Details. |
Details
Argument parallel
allows for parallel computing using the
foreach package, so the user has to register a parallel
backend before invoking this command.
Argument coef
is useful if the model includes many controls whose
parameters are not of interest. This can reduce the memory space needed to
store the bootstrap results.
Value
case_resampling
returns an object of class
"r2sls_boot"
. This is a list with three named elements. $boot
stores the bootstrap results as a data frame. The columns record the
different test statistics, the iteration m
, and the number of the
resample, r
. The values corresponding to the original data is stored
as r = 0
. $resamples
is a list of length R
that stores
the indices for each specific resample. $original
stores the original
robust2sls_object
based on which the bootstrapping was done.
Calculate constants across estimation
Description
constants
calculates various values that do not change across the
estimation and records them in a list.
Usage
constants(
call,
formula,
data,
reference = c("normal"),
sign_level,
estimator,
split,
shuffle,
shuffle_seed,
iter,
criterion,
max_iter,
user_model,
verbose
)
Arguments
call |
A record of the original function call. |
formula |
The regression formula specified in the function call. |
data |
The dataframe used in the function call. |
reference |
A character vector of length 1 that denotes a valid reference distribution. |
sign_level |
A numeric value between 0 and 1 that determines the cutoff in the reference distribution against which observations are judged as outliers or not. |
estimator |
A character vector specifying which initial estimator was used. |
split |
A numeric value strictly between 0 and 1 that specifies how the
sample is split in case of saturated 2SLS. |
shuffle |
A logical value whether the sample is re-arranged in random
order before splitting the sample in case of saturated 2SLS. |
shuffle_seed |
A numeric value setting the seed for the shuffling of the
sample. Only used if |
iter |
An integer value setting the number of iterations of the outlier-detection algorithm. |
criterion |
A numeric value that determines when the iterated outlier-detection algorithm stops by comparing it to the sum of squared differences between the m- and (m-1)-step parameter estimates. NULL if convergence criterion should not be used. |
max_iter |
A numeric value that determines after which iteration the algorithm stops in case it does not converge. |
user_model |
A model object of class ivreg. Only
required if argument |
verbose |
A logical value whether progress during estimation should be reported. |
Value
Returns a list that stores values that are constant across the
estimation. It is used to fill parts of the "robust2sls"
class object,
which is returned by outlier_detection.
$call
The captured function call.
$verbose
The verbose argument (TRUE/FALSE).
$formula
The formula argument.
$data
The original data set.
$reference
The chosen reference distribution to classify outliers.
$sign_level
The significance level determining the cutoff.
$psi
The probability that an observation is not classified as an outlier under the null hypothesis of no outliers.
$cutoff
The cutoff used to classify outliers if their standardised residuals are larger than that value.
$bias_corr
A numeric bias correction factor to account for potential false positives (observations classified as outliers even though they are not).
$initial
A list storing settings about the initial estimator:
$estimator
is the type of the initial estimator (e.g. robustified or saturated),$split
how the sample is split (NULL
if argument not used),$shuffle
whether the sample is shuffled before splitting (NULL
if argument not used),$shuffle_seed
the value of the random seed (NULL
if argument not used),$user
the user-specified initial model (NULL
if not used).$convergence
A list storing information about the convergence of the outlier-detection algorithm:
$criterion
is the user-specified convergence criterion (NULL
if argument not used),$difference
is initialised asNULL
.$converged
is initialised asNULL
.$iter
is initialised asNULL
.$max_iter
the maximum number of iterations if does not converge (NULL
if not used or applicable).$iterations
A list storing information about the iterations of the algorithm.
$setting
stores the user-specifiediterations
argument.$actual
is initialised asNULL
and will store the actual number of iterations done.
L2 norm between two most recent estimates
Description
conv_diff
uses an object of class "robust2sls"
to calculate the
L2 norm (sum of squared differences) between the most recent outlier-robust
iteration and the previous iteration estimates.
Usage
conv_diff(current, counter)
Arguments
current |
A list object of class |
counter |
An integer denoting the number of the current iteration. |
Value
conv_diff
returns a numeric value, which is the L2 norm
of the difference between the most recent and the previous parameter
estimates. The L2 norm is the sum of squared differences of the estimates.
Counts the number of times each index was sampled
Description
count_indices
takes a list of indices for resampling and counts how
often each index was sampled in each resample. The result is returned in two
versions of a matrix where each row corresponds to a different resample and
each column to one index.
Usage
count_indices(resamples, indices)
Arguments
resamples |
A list of resamples, as created by nonparametric. |
indices |
The vector of original indices from which the resamples were drawn. |
Value
count_indices
returns a list with two names elements. Each
element is a matrix that stores how often each observation/index was
resampled (column) for each resample (row). $count_clean
only has
columns for observations that were available in the indices.
$count_all
counts the occurrence of all indices in the range of
indices that were provided, even if the index was actually not available in
the given indices. These are of course zero since they were not available for
resampling. If the given indices do not skip any numbers, the two coincide.
Count test
Description
counttest()
conducts a test whether the number of detected outliers
deviates significantly from the expected number of outliers under the null
hypothesis that there are no outliers in the sample.
Usage
counttest(
robust2sls_object,
alpha,
iteration,
one_sided = FALSE,
tsmethod = c("central", "minlike", "blaker")
)
Arguments
robust2sls_object |
An object of class |
alpha |
A numeric value between 0 and 1 representing the significance level of the test. |
iteration |
An integer >= 0 or the character "convergence" that determines which iteration is used for the test. |
one_sided |
A logical value whether a two-sided test ( |
tsmethod |
A character specifying the method for calculating two-sided p-values. Ignored for one-sided test. |
Details
See outlier_detection()
and
multi_cutoff()
for creating an object of class
"robust2sls"
or a list thereof.
See exactci::poisson.exact()
for the
different methods of calculating two-sided p-values.
Value
counttest()
returns a data frame with the iteration (m) to be
tested, the actual iteration that was tested (generally coincides with the
iteration that was specified to be tested but is the convergent iteration
if the fixed point is tested), the setting of the probability of exceeding
the cut-off (gamma), the number of detected outliers, the expected number
of outliers under the null hypothesis that there are no outliers, the type
of test (one- or two-sided), the p-value, the significance level
alpha
, the decision, and which method was used to calculate
(two-sided) p-values. The number of rows of the data frame corresponds to
the length of the argument robust2sls_object
.
Estimation of moments of the data
Description
NOTE (12 Apr 2022): probably superseded by estimate_param_null() function taken out of testing
Usage
estimate_param(robust2SLS_object, iteration)
Arguments
robust2SLS_object |
An object of class |
iteration |
An integer >= 0 specifying based on which model iteration the moments should be estimated. The model iteration affects which observations are determined to be outliers and these observations will hence be excluded during the estimation of the moments. |
Details
DO NOT USE YET!
estimate_param
can be used to estimate certain moments of the data
that are required for calculating the asymptotic variance of the gauge. Such
moments are the covariance between the standardised first stage errors and
the structural error \Omega
, the covariance matrix of the first stage
errors \Sigma
, the first stage parameter matrix \Pi
, and more.
Value
estimate_param
returns a list with a similar structure as the
output of the Monte Carlo functionality generate_param. Hence, the
resulting list can be given to the function gauge_avar as argument
parameters
to return an estimate of the asymptotic variance of the
gauge.
Warning
The function is not yet fully developed. The estimators of the moments are at the moment not guaranteed to be consistent for the population moments. DO NOT USE!
Estimation of moments of the data
Description
estimate_param_null
can be used to estimate certain moments of the
data that are required for calculating the asymptotic variance of the gauge.
Such moments are the covariance between the standardised first stage errors
and the structural error \Omega
, the covariance matrix of the first
stage errors \Sigma
, the first stage parameter matrix \Pi
, and
more.
Usage
estimate_param_null(robust2SLS_object)
Arguments
robust2SLS_object |
An object of class |
Value
estimate_param_null
returns a list with a similar structure as
the output of the Monte Carlo functionality generate_param. Hence, the
resulting list can be given to the function gauge_avar as argument
parameters
to return an estimate of the asymptotic variance of the
gauge.
Warning
The function uses the full sample to estimate the moments. Therefore, they are only consistent under the null hypothesis of no outliers and estimators are likely to be inconsistent under the alternative.
Evaluate bootstrap results
Description
Evaluate bootstrap results
Usage
evaluate_boot(r2sls_boot, iterations)
Arguments
r2sls_boot |
An object of class |
iterations |
An integer or numeric vector with values >= 0 specifying which bootstrap results to evaluate. |
Value
evaluate_boot
returns a data frame with the bootstrap and the
theoretical standard errors. Each row corresponds to a different iteration
step while each column refers to the parameters whose standard errors are
produced.
Extracts bootstrap results for a specific iteration
Description
Extracts bootstrap results for a specific iteration
Usage
extract_boot(r2sls_boot, iteration)
Arguments
r2sls_boot |
An object of class |
iteration |
An integer >= 0 specifying which bootstrap results to extract. |
Value
extract_boot
returns a matrix with the bootstrap results for
a specific iteration.#'
Extract the elements of ivreg formula
Description
extract_formula
takes a formula object for ivreg
,
i.e. in a format of y ~ x1 + x2 | x1 + z2
and extracts the different
elements in a list. Each element is a character vector storing the different
types of regressors. Element y_var
refers to the dependent variable,
x1_var
to the exogenous regressors, x2_var
to the endogenous
regressors, z1_var
to the exogenous regressors (which have to be
included again as instruments and hence coincide with x1_var
), and
z2_var
refers to the outside instruments.
Usage
extract_formula(formula)
Arguments
formula |
A formula for the |
Value
extract_formula
returns a list with five named components,
each of which is a character vector: $y_var
refers to the dependent
variable, $x1_var
to the exogenous regressors, $x2_var
to the
endogenous regressors, $z1_var
to the exogenous regressors (which have
to be included again as instruments and hence coincide with $x1_var
),
and $z2_var
refers to the outside instruments.
Asymptotic variance of gauge
Description
gauge_avar
calculates the asymptotic variance of the gauge for a
given iteration using a given set of parameters (true or estimated).
Usage
gauge_avar(
ref_dist = c("normal"),
sign_level,
initial_est = c("robustified", "saturated", "iis"),
iteration,
parameters,
split
)
Arguments
ref_dist |
A character vector that specifies the reference distribution
against which observations are classified as outliers. |
sign_level |
A numeric value between 0 and 1 that determines the cutoff in the reference distribution against which observations are judged as outliers or not. |
initial_est |
A character vector that specifies the initial estimator
for the outlier detection algorithm. |
iteration |
An integer >= 0 or character |
parameters |
A list created by generate_param or
estimate_param_null that stores the parameters (true or estimated).
|
split |
A numeric value strictly between 0 and 1 that determines
in which proportions the sample will be split. Can be |
Details
Initial estimator "iis"
uses the asymptotic variances of
"robustified"
2SLS because there is no formal theory for the
multi-block search.
Value
gauge_avar
returns a numeric value.
Asymptotic covariance of gauge
Description
gauge_covar
calculates the asymptotic covariance between two FODRs
with different cut-off values s and t for a given iteration using a given set
of parameters (true or estimated).
Usage
gauge_covar(
ref_dist = c("normal"),
sign_level1,
sign_level2,
initial_est = c("robustified", "saturated", "iis"),
iteration,
parameters,
split
)
Arguments
ref_dist |
A character vector that specifies the reference distribution
against which observations are classified as outliers. |
sign_level1 |
A numeric value between 0 and 1 that determines the first cutoff in the reference distribution against which observations are judged as outliers or not. |
sign_level2 |
A numeric value between 0 and 1 that determines the second cutoff in the reference distribution against which observations are judged as outliers or not. |
initial_est |
A character vector that specifies the initial estimator
for the outlier detection algorithm. |
iteration |
An integer >= 0 or character |
parameters |
A list created by generate_param or
estimate_param_null that stores the parameters (true or estimated).
|
split |
A numeric value strictly between 0 and 1 that determines
in which proportions the sample will be split. Can be |
Details
Initial estimator "iis"
uses the asymptotic variances of
"robustified"
2SLS because there is no formal theory for the
multi-block search.
Value
gauge_covar
returns a numeric value.
Random data of 2SLS model (Monte Carlo)
Description
generate_data
draws random data for a 2SLS model given the parameters.
Usage
generate_data(parameters, n)
Arguments
parameters |
A list with 2SLS model parameters as created by generate_param. |
n |
Sample size to be drawn. |
Value
generate_data
returns a data frame with n
rows
(observations) and the following variables of the 2SLS model: dependent
variable y, exogenous regressors x1, endogenous regressors x2, structural
error u, outside instruments z2, first stage projection errors r1 (identical
to zero) and r2.
Parameters of 2SLS model (Monte Carlo)
Description
By default, generate_param
creates random parameters of a 2SLS model
that satisfy conditions for 2SLS models, such as positive definite
variance-covariance matrices. The user can also specify certain parameters
directly, which are then checked for their validity.
Usage
generate_param(
dx1,
dx2,
dz2,
intercept = TRUE,
beta = NULL,
sigma = 1,
mean_z = NULL,
cov_z = NULL,
Sigma2_half = NULL,
Omega2 = NULL,
Pi = NULL,
seed = 42
)
Arguments
dx1 |
An integer value specifying the number of exogenous regressors.
This should include the intercept if it is present in the model
(see argument |
dx2 |
An integer value specifying the number of endogenous regressors. |
dz2 |
An integer value specifying the number of outside / excluded instruments. |
intercept |
A logical value ( |
beta |
A numeric vector of length |
sigma |
A strictly positive numeric value specifying the standard deviation of the error in the structural model. |
mean_z |
A numeric vector of length |
cov_z |
A numeric positive definite matrix specifying the variance-covariance matrix of the exogenous variables, x1 and z2. |
Sigma2_half |
A numeric positive definite matrix of dimension
|
Omega2 |
A numeric vector of length |
Pi |
A numeric matrix of dimension |
seed |
An integer for setting the seed for the random number generator. |
Value
generate_param
returns a list with the (randomly created or
user-specified) parameters that are required for drawing random data that.
The parameters are generated to fulfill the 2SLS model assumptions.
$structural
A list with two components storing the mean (
$mean
) and variance-covariance matrix ($cov
) for the structural error (u), the random first stage errors (r2), and all instruments (excluding the intercept since it is not random) (z).$params
A list storing the parameters of the 2SLS model.
$beta
is the coefficient vector (including intercept if present) of the structural equation,$Pi
the coefficient matrix of the first stage projections,$Omega2
the covariance between the structural error and the endogenous first stage errors,$Sigma2_half
the square root of the variance-covariance matrix of the endogenous first stage errors,$mean_z
the mean of all instruments (excluding the intercept since it is not random),$cov_z
the variance-covariance matrix of the endogenous first-stage errors,$Ezz
the expected value of the squared instruments.$settings
A list storing the function call (
$call
), whether an intercept is included in the model ($intercept
), a regression formula for the model setup ($formula
), and the dimensions of the regressors and instruments ($dx1
,$dx2
,$dz2
.$names
A list storing generic names for the regressors, instruments, and errors as character vectors (
$x1
,$x2
,$x
,$z2
,$z
,$r
, and$u
).
Global test correcting for multiple hypothesis testing
Description
globaltest()
uses several proportion or count tests with different
cut-offs to test a global hypothesis of no outliers using the Simes (1986)
procedure to account for multiple testing.
Usage
globaltest(tests, global_alpha)
Arguments
tests |
A data frame that contains a column named |
global_alpha |
A numeric value representing the global significance level. |
Details
See Simes (1986), doi:10.1093/biomet/73.3.751.
Value
A list with three entries. The first entry named $reject
contains the global rejection decision. The second entry named
$global_alpha
stores the global significance level. The third entry
named $tests
returns the input data frame tests
, appended
with two columns containing the adjusted significance level and respective
rejection decision.
See Also
[proptest()], [counttest()]
Impulse Indicator Saturation (IIS initial estimator)
Description
Impulse Indicator Saturation (IIS initial estimator)
Usage
iis_init(
data,
formula,
gamma,
t.pval = gamma,
do.pet = FALSE,
normality.JarqueB = NULL,
turbo = FALSE,
overid = NULL,
weak = NULL
)
Arguments
data |
A dataframe. |
formula |
A formula in the format |
gamma |
A numeric value between 0 and 1 representing the significance level used for two-sided significance t-test on the impulse indicators. Corresponds to the probability of falsely classifying an observation as an outlier. |
t.pval |
A numeric value between 0 and 1 representing the significance level for the Parsimonious Encompassing Test (PET). |
do.pet |
logical. If |
normality.JarqueB |
|
turbo |
logical. If |
overid |
|
weak |
|
Value
iis_init
returns a list with five elements. The first
four are vectors whose length equals the number of observations in the data
set. Unlike the residuals stored in a model object (usually accessible via
model$residuals
), it does not ignore observations where any of y, x
or z are missing. It instead sets their values to NA
.
The first element is a double vector containing the residuals for each
observation based on the model estimates. The second element contains the
standardised residuals, the third one a logical vector with TRUE
if
the observation is judged as not outlying, FALSE
if it is an outlier,
and NA
if any of y, x, or z are missing. The fourth element of the
list is an integer vector with three values: 0 if the observations is judged
to be an outlier, 1 if not, and -1 if missing. The fifth and last element
stores the ivreg
model object based on which the four
vectors were calculated.
Note
IIS runs multiple models, similar to saturated_init
but with
multiple block search. These intermediate models are not recorded. For
simplicity, the element $model
of the returned list stores the full
sample model result, identical to robustified_init
.
Monte Carlo simulations parameter grid
Description
WARNING: not for average user - function not completed yet
Usage
mc_grid(
M,
n,
seed,
parameters,
formula,
ref_dist,
sign_level,
initial_est,
iterations,
convergence_criterion = NULL,
max_iter = NULL,
shuffle = FALSE,
shuffle_seed = 10,
split = 0.5,
path = FALSE,
verbose = FALSE
)
Arguments
M |
Number of replications. |
n |
Sample size for each replication. |
seed |
Random seed for the iterations. |
parameters |
A list as created by generate_param that specifies the true model. |
formula |
A formula that specifies the 2SLS model to be estimated. The
format has to follow |
ref_dist |
A character vector that specifies the reference distribution
against which observations are classified as outliers. |
sign_level |
A numeric value between 0 and 1 that determines the cutoff in the reference distribution against which observations are judged as outliers or not. |
initial_est |
A character vector that specifies the initial estimator
for the outlier detection algorithm. |
iterations |
An integer >= 0 that specifies how often the outlier
detection algorithm is iterated and for which summary statistics will be
calculated. The value |
convergence_criterion |
A numeric value that determines whether the
algorithm has converged as measured by the L2 norm of the difference in
coefficients between the current and the previous iteration. Only used when
argument |
max_iter |
A numeric value >= 1 or NULL. If
|
shuffle |
A logical value or |
shuffle_seed |
An integer value that will set the seed for shuffling the
sample or |
split |
A numeric value strictly between 0 and 1 that determines in which proportions the sample will be split. |
path |
A character string or |
verbose |
A logical value whether any messages should be printed. |
Details
mc_grid
runs Monte Carlo simulations to assess the performance of
the theory of the gauge, simple proportion tests, and count tests.
Value
mc_grid
returns a data frame with the results of the Monte
Carlo experiments. Each row corresponds to a specific simulation setup. The
columns record the simulation setup and its results. Currently, the average
proportion of detected outliers ("mean_gauge") and their variance
("var_gauge") are being recorded. Moreover, the theoretical asymptotic
variance ("avar") and the ratio of simulated to theoretical variance -
adjusted by the sample size - are calculated ("var_ratio"). Furthermore,
tentative results of size and power for the tests are calculated.
Details
Requires the package doRNG to be installed, which has been orphaned as of 2022-12-09.
The following arguments can also be supplied as a vector of their type:
n
, sign_level
, initial_est
, and split
. This makes
the function estimate all possible combinations of the arguments. Note that
the initial estimator "robustified"
is not affected by the argument
split
and hence is not varied in this case.
For example, specifying n = c(100, 1000)
and
sign_level = c(0.01, 0.05)
estimates four Monte Carlo experiments with
the four possible combinations of the parameters.
The path
argument allows users to store the M
replication
results for all of the individual Monte Carlo simulations that are part of
the grid. The results are saved both as .Rds
and .csv
files.
The file name is indicative of the simulation setting.
Multiple models, varying cut-off
Description
multi_cutoff()
runs several outlier detection algorithms that differ
in the value of the cut-off that determines whether an observation is
classified as an outlier or not.
Usage
multi_cutoff(gamma, ...)
Arguments
gamma |
A numeric vector representing the probability of falsely
classifying an observation as an outlier. One setting of the algorithm per
element of |
... |
Arguments for specifying the other settings of the outlier
detection algorithm, |
Details
mutli_cutoff
uses the
foreach
and
future
packages to run several models at the
same time in parallel. This means the user has to register a backend and
thereby determine how the code should be executed. The default is
sequential, i.e. not in parallel. See
future::plan()
for details.
Value
A list containing the robust2sls
objects, one per setting of
gamma
. The length of the list therefore corresponds to the length of
the vector gamma
.
Creates a vector of the centered FODR across different cut-offs
Description
multi_cutoff_to_fodr_vec()
takes a list of "robust2sls"
objects
and returns a vector of the centered FODR (sample - expected) for different
values of the cut-off c (equivalently gamma):
\[ \sqrt{n}(\widehat{\gamma_{c}} - \gamma_{c}) \]
Usage
multi_cutoff_to_fodr_vec(robust2sls_object, iteration)
Arguments
robust2sls_object |
A list of |
iteration |
An integer >= 0 or the character "convergence" that determines which iteration is used for the test. |
Details
See outlier_detection()
and
multi_cutoff()
for creating an object of class
"robust2sls"
or a list thereof.
Value
A numeric vector of the centered FODR values.
Multivariate normal supremum simulation
Description
mvn_sup
simulates the distribution of the supremum of the specified
multivariate normal distribution by drawing repeatedly from the multivariate
normal distribution and calculating the maximum of each vector.
Usage
mvn_sup(n, mu, Sigma, seed = NULL)
Arguments
n |
An integer determining the number of draws from the multivariate normal distribution. |
mu |
A numeric vector representing the mean of the multivariate normal distribution. |
Sigma |
A numeric matrix representing the variance-covariance matrix of the mutlivariate normal distribution. |
seed |
An integer setting the random seed or |
Value
mvn_sup
returns a vector of suprema of length n
.
Constructor of robust2sls class
Description
new_robust2sls
turns a list into an object of class
"robust2sls"
Usage
new_robust2sls(x = list())
Arguments
x |
A list with components of the |
Value
new_robust2sls
returns an object of class "robust2sls"
,
which is a list with a special structure of named components.
Warning
Only checks that the input is a list but not that its components match the
requirements of the "robust2sls"
class. Use the validator function
validate_robust2sls
for that purpose.
Determine which observations can be used for estimation
Description
nonmissing
takes a dataframe and a formula and determines which
observations can principally be used for the estimation of the 2SLS model
that is specified by the formula. Observations where any of the y, x, or z
variables are missing will be set to FALSE. While technically, fitted values
and residuals could be calculated for observations where only any of the
outside instruments is missing, this is often not desirable. This would cause
the sample on which the model is estimated to be different from the sample
on which the outliers are determined.
Usage
nonmissing(data, formula)
Arguments
data |
A dataframe. |
formula |
A formula for the |
Value
Returns a logical vector with the same length as the number of observations in the data set that specifies whether an observation has any missing values in any of y, x, or z variables. TRUE means not missing, FALSE means at least one of these variables necessary for estimation is missing.
Create indices for nonparametric bootstrap
Description
nonparametric
is used for nonparametric resampling, for example
nonparametric case or error/residual resampling. The function takes a vector
of indices that correspond to the indices of observations that should be used
in the resampling procedure.
Usage
nonparametric(
indices,
R,
size = length(indices),
replacement = TRUE,
seed = NULL
)
Arguments
indices |
A vector of indices (integer) from which to sample. |
R |
An integer specifying the number of resamples. |
size |
An integer specifying the size of the resample. Standard bootstrap suggests to resample as many datapoints as in the original sample, which is set as the default. |
replacement |
A logical value whether to sample with (TRUE) or without (FALSE) replacement. Standard bootstrap suggests to resample with replacement, which is set as the default. |
seed |
|
Value
nonparametric
returns a list of length R
containing
vectors with the resampled indices.
Nonparametric resampling from a data frame
Description
Nonparametric resampling from a data frame
Usage
nonparametric_resampling(df, resample)
Arguments
df |
Data frame containing observations to be sampled from. |
resample |
A vector of indices that extract the observations from the data frame. |
Details
The input to the resample
argument could for example be generated as
one of the elements in the list generated by the command
nonparametric.
The input to the df
argument would be the original data frame for case
resampling. For error/residual resampling, it would be a data frame
containing the residuals from the model.
Value
nonparametric_resampling
returns a data frame containing the
observations of the resample.
Outlier history of single observation
Description
outlier
takes a "robust2sls"
object and the index of a specific
observation and returns its history of classification across the different
iterations contained in the "robust2sls"
object.
Usage
outlier(robust2sls_object, obs)
Arguments
robust2sls_object |
An object of class |
obs |
An index (row number) of an observation |
Value
outlier
returns a vector that contains the 'type' value for
the given observations across the different iterations. There are three
possible values: 0 if the observations is judged to be an outlier, 1 if not,
and -1 if any of its x, y, or z values required for estimation is missing.
Outlier detection algorithms
Description
outlier_detection
provides different types of outlier detection
algorithms depending on the arguments provided. The decision whether to
classify an observations as an outlier or not is based on its standardised
residual in comparison to some user-specified reference distribution.
The algorithms differ mainly in two ways. First, they can differ by the use
of initial estimator, i.e. the estimator based on which the first
classification as outliers is made. Second, the algorithm can either be
iterated a fixed number of times or until the difference in coefficient
estimates between the most recent model and the previous one is smaller than
some user-specified convergence criterion. The difference is measured by
the L2 norm.
Usage
outlier_detection(
data,
formula,
ref_dist = c("normal"),
sign_level,
initial_est = c("robustified", "saturated", "user", "iis"),
user_model = NULL,
iterations = 1,
convergence_criterion = NULL,
max_iter = NULL,
shuffle = FALSE,
shuffle_seed = NULL,
split = 0.5,
verbose = FALSE,
iis_args = NULL
)
Arguments
data |
A dataframe. |
formula |
A formula for the |
ref_dist |
A character vector that specifies the reference distribution
against which observations are classified as outliers. |
sign_level |
A numeric value between 0 and 1 that determines the cutoff in the reference distribution against which observations are judged as outliers or not. |
initial_est |
A character vector that specifies the initial estimator
for the outlier detection algorithm. |
user_model |
A model object of class ivreg. Only
required if argument |
iterations |
Either an integer >= 0 that specifies how often the outlier
detection algorithm is iterated, or the character vector
|
convergence_criterion |
A numeric value or NULL. The algorithm stops as
soon as the difference in coefficient estimates between the most recent model
and the previous one is smaller than |
max_iter |
A numeric value >= 1 or NULL. If
|
shuffle |
A logical value or |
shuffle_seed |
An integer value that will set the seed for shuffling the
sample or |
split |
A numeric value strictly between 0 and 1 that determines in which proportions the sample will be split. |
verbose |
A logical value whether progress during estimation should be reported. |
iis_args |
A list with named entries corresponding to the arguments for
|
Value
outlier_detection
returns an object of class
"robust2sls"
, which is a list with the following components:
$cons
A list which stores high-level information about the function call and some results.
$call
is the captured function call,$formula
the formula argument,$data
the original data set,$reference
the chosen reference distribution to classify outliers,$sign_level
the significance level,$psi
the probability that an observation is not classified as an outlier under the null hypothesis of no outliers,$cutoff
the cutoff used to classify outliers if their standardised residuals are larger than that value,$bias_corr
a bias correction factor to account for potential false positives (observations classified as outliers even though they are not). There are three further elements that are lists themselves.
$initial
stores settings about the initial estimator:$estimator
is the type of the initial estimator (e.g. robustified or saturated),$split
how the sample is split (NULL
if argument not used),$shuffle
whether the sample is shuffled before splitting (NULL
if argument not used),$shuffle_seed
the value of the random seed (NULL
if argument not used).
$convergence
stores information about the convergence of the outlier-detection algorithm:$criterion
is the user-specified convergence criterion (NULL
if argument not used),$difference
is the L2 norm between the last coefficient estimates and the previous ones (NULL
if argument not used or only initial estimator calculated).$converged
is a logical value indicating whether the algorithm has converged, i.e. whether the difference is smaller than the convergence criterion (NULL
if argument not used).$max_iter
is the maximum iteration set by the user (NULL
if argument not used or not set).
$iterations
contains information about the user-specified iterations argument ($setting
) and the actual number of iterations that were done ($actual
). The actual number can be lower if the algorithm converged already before the user-specified number of iterations were reached.$model
A list storing the model objects of class ivreg for each iteration. Each model is stored under
$m0
,$m1
, ...$res
A list storing the residuals of all observations for each iteration. Residuals of observations where any of the y, x, or z variables used in the 2SLS model are missing are set to NA. Each vector is stored under
$m0
,$m1
, ...$stdres
A list storing the standardised residuals of all observations for each iteration. Standardised residuals of observations where any of the y, x, or z variables used in the 2SLS model are missing are set to NA. Standardisation is done by dividing by sigma, which is not adjusted for degrees of freedom. Each vector is stored under
$m0
,$m1
, ...$sel
A list of logical vectors storing whether an observation is included in the estimation or not. Observations are excluded (FALSE) if they either have missing values in any of the x, y, or z variables needed in the model or when they are classified as outliers based on the model. Each vector is stored under
$m0
,$m1
, ...$type
A list of integer vectors indicating whether an observation has any missing values in x, y, or z (
-1
), whether it is classified as an outlier (0
) or not (1
). Each vector is stored under$m0
,$m1
, ...
Warning
Check Jiao (2019)
(as well as forthcoming working paper in the future) about conditions on the
initial estimator that should be satisfied for the initial estimator when
using initial_est == "user"
(e.g. they have to be Op(1)).
IIS is a generalisation of Saturated 2SLS
with
multiple block search but no asymptotic theory exists for IIS.
Number of outliers
Description
outliers
calculates the number of outliers from a "robust2sls"
object for a given iteration.
Usage
outliers(robust2sls_object, iteration)
Arguments
robust2sls_object |
An object of class |
iteration |
An integer >= 0 representing the iteration for which the outliers are calculated. |
Value
outliers
returns the number of outliers for a given iteration
as determined by the outlier-detection algorithm.
Proportion of outliers
Description
outliers_prop
calculates the proportion of outliers relative to all
non-missing observations in the full sample from a "robust2sls"
object
for a given iteration.
Usage
outliers_prop(robust2sls_object, iteration)
Arguments
robust2sls_object |
An object of class |
iteration |
An integer >= 0 representing the iteration for which the outliers are calculated. |
Value
outliers_prop
returns the proportion of outliers for a given
iteration as determined by the outlier-detection algorithm.
Plotting of standardised residuals and outliers
Description
Plot method for objects of class "robust2sls"
. Plots the
standardised residuals of non-missing observations for a given iteration of
the outlier-detection algorithm and distinguishes whether an observation is
classified as an outlier by colour.
Usage
## S3 method for class 'robust2sls'
plot(x, iteration = NULL, ...)
Arguments
x |
An object of class |
iteration |
Either |
... |
Arguments to be passed to methods, see plot. |
Value
plot.robust2sls
returns a graph of class
ggplot.
Helper of robust2sls class
Description
robust2sls
allows the user to create an object of class
"robust2sls"
by specifying the different components of the list. The
validator function validate_robust2sls
is called at the end to ensure
that the resulting object is a valid object of class
"robust2sls"
.
Usage
## S3 method for class 'robust2sls'
print(x, verbose = FALSE, ...)
Arguments
x |
An object of class |
verbose |
A logical value, |
... |
Further arguments passed to or from other methods, see print. |
Details
Printing summary output
Print method for objects of class "robust2sls"
. Prints a
high-level summary of the settings and results of the outlier-detection
algorithm.
Value
No return value, prints model summary.
Proportion test
Description
proptest()
conducts a test whether the false outlier detection rate
(FODR) in the sample deviates significantly from its expected value
(population FODR) under the null hypothesis that there are no outliers in the
sample.
Usage
proptest(robust2sls_object, alpha, iteration, one_sided = FALSE)
Arguments
robust2sls_object |
An object of class |
alpha |
A numeric value between 0 and 1 representing the significance level of the test. |
iteration |
An integer >= 0 or the character "convergence" that determines which iteration is used for the test. |
one_sided |
A logical value whether a two-sided test ( |
Details
See outlier_detection()
and
multi_cutoff()
for creating an object of class
"robust2sls"
or a list thereof.
Value
proptest()
returns a data frame with the iteration (m) to be
tested, the actual iteration that was tested (generally coincides with the
iteration that was specified to be tested but is the convergent iteration if
the fixed point is tested), the setting of the probability of exceeding the
cut-off (gamma), the type of t-test (one- or two-sided), the value of the
test statistic, its p-value, the significance level alpha
, and the
decision. The number of rows of the data frame corresponds to the length of
the argument robust2sls_object
.
Robustified 2SLS (full sample initial estimator)
Description
robustified_init
estimates the full sample 2SLS model, which is used
as the initial estimator for the iterative procedure.
Usage
robustified_init(data, formula, cutoff)
Arguments
data |
A dataframe. |
formula |
A formula in the format |
cutoff |
A numeric cutoff value used to judge whether an observation is an outlier or not. If its absolute value is larger than the cutoff value, the observations is classified as an outlier. |
Value
robustified_init
returns a list with five elements. The first
four are vectors whose length equals the number of observations in the data
set. Unlike the residuals stored in a model object (usually accessible via
model$residuals
), it does not ignore observations where any of y, x
or z are missing. It instead sets their values to NA
.
The first element is a double vector containing the residuals for each
observation based on the model estimates. The second element contains the
standardised residuals, the third one a logical vector with TRUE
if
the observation is judged as not outlying, FALSE
if it is an outlier,
and NA
if any of y, x, or z are missing. The fourth element of the
list is an integer vector with three values: 0 if the observations is judged
to be an outlier, 1 if not, and -1 if missing. The fifth and last element
stores the ivreg
model object based on which the four
vectors were calculated.
Saturated 2SLS (split-sample initial estimator)
Description
saturated_init
splits the sample into two sub-samples. The 2SLS model
is estimated on both sub-samples and the estimates of one sub-sample are
used to calculate the residuals and hence outliers from the other sub-sample.
Usage
saturated_init(data, formula, cutoff, shuffle, shuffle_seed, split = 0.5)
Arguments
data |
A dataframe. |
formula |
A formula in the format |
cutoff |
A numeric cutoff value used to judge whether an observation is an outlier or not. If its absolute value is larger than the cutoff value, the observations is classified as an outlier. |
shuffle |
A logical value ( |
shuffle_seed |
A numeric value that sets the seed for shuffling the
data set before splitting it. Only used if |
split |
A numeric value strictly between 0 and 1 that determines in which proportions the sample will be split. |
Value
saturated_init
returns a list with five elements. The first
four are vectors whose length equals the number of observations in the data
set. Unlike the residuals stored in a model object (usually accessible via
model$residuals
), it does not ignore observations where any of y, x
or z are missing. It instead sets their values to NA
.
The first element is a double vector containing the residuals for each
observation based on the model estimates. The second element contains the
standardised residuals, the third one a logical vector with TRUE
if
the observation is judged as not outlying, FALSE
if it is an outlier,
and NA
if any of y, x, or z are missing. The fourth element of the
list is an integer vector with three values: 0 if the observations is judged
to be an outlier, 1 if not, and -1 if missing. The fifth and last element
is a list with the two initial ivreg
model objects based
on the two different sub-samples.
Warning
The estimator may have bad properties if the split
is too unequal and
the sample size is not large enough.
Create selection (non-outlying) vector from model
Description
selection
uses the data and model objects to create a list with five
elements that are used to determine whether the observations are judged as
outliers or not.
Usage
selection(data, yvar, model, cutoff, bias_correction = NULL)
Arguments
data |
A dataframe. |
yvar |
A character vector of length 1 that refers to the name of the dependent variable in the data set. |
model |
A model object of class |
cutoff |
A numeric cutoff value used to judge whether an observation is an outlier or not. If its absolute value is larger than the cutoff value, the observations is classified as being an outlier. |
bias_correction |
A numeric factor used to correct the estimate of
sigma under the null hypothesis of no outliers or |
Value
A list with five elements. The first four are vectors whose length
equals the number of observations in the data set. Unlike the residuals
stored in a model object (usually accessible via model$residuals
), it
does not ignore observations where any of y, x or z are missing. It instead
sets their values to NA
.
The first element is a double vector containing the residuals for each
observation based on the model estimates. The second element contains the
standardised residuals, the third one a logical vector with TRUE
if
the observation is judged as not outlying, FALSE
if it is an outlier,
and NA
if any of y, x, or z are missing. The fourth element of the
list is an integer vector with three values: 0 if the observations is judged
to be an outlier, 1 if not, and -1 if missing. The fifth and last element
stores the ivreg
model object based on which the four
vectors were calculated.
Warning
Unlike the residuals stored in a model object (usually accessible via
model$residuals
), this function returns vectors of the same length as
the original data set even if any of the y, x, or z variables are missing.
The residuals for those observations are set to NA
.
Create selection (non-outlying) vector from IIS model
Description
selection_iis
uses the data and isat model object to create a list
with five elements that are used to determine whether the observations are
judged as outliers or not.
Usage
selection_iis(x, data, yvar, complete, rownames_orig, refmodel)
Arguments
x |
An object of class |
data |
A dataframe. |
yvar |
A character vector of length 1 that refers to the name of the dependent variable in the data set. |
complete |
A logical vector with the same length as the number of observations in the data set that specifies whether an observation has any missing values in any of y, x, or z variables. |
rownames_orig |
A character vector storing the original rownames of the dataframe. |
refmodel |
A model object that will be stored in |
Value
A list with five elements. The first four are vectors whose length
equals the number of observations in the data set. Unlike the residuals
stored in a model object (usually accessible via model$residuals
), it
does not ignore observations where any of y, x or z are missing. It instead
sets their values to NA
.
The first element is a double vector containing the residuals for each
observation based on the model estimates. The second element contains the
standardised residuals, the third one a logical vector with TRUE
if
the observation is judged as not outlying, FALSE
if it is an outlier,
and NA
if any of y, x, or z are missing. The fourth element of the
list is an integer vector with three values: 0 if the observations is judged
to be an outlier, 1 if not, and -1 if missing. The fifth and last element
stores the ivreg
model object based on which the four
vectors were calculated.
Note
IIS runs multiple models, similar to saturated_init
but with
multiple block search. These intermediate models are not recorded. For
simplicity, the element $model
of the returned list stores the full
sample model result, identical to robustified_init
.
Warning
Unlike the residuals stored in a model object (usually accessible via
model$residuals
), this function returns vectors of the same length as
the original data set even if any of the y, x, or z variables are missing.
The residuals for those observations are set to NA
.
Simes (1986) procedure for multiple testing
Description
simes()
takes a vector of p-values corresponding to individual null
hypotheses and performs the Simes (1986) procedure for the global null
hypothesis. The global null hypothesis is the intersection of all individual
null hypotheses.
Usage
simes(pvals, alpha)
Arguments
pvals |
A numeric vector of p-values corresponding to the p-values of the individual null hypotheses. |
alpha |
A numeric value representing the global significance level. |
Details
See Simes (1986), doi:10.1093/biomet/73.3.751.
Value
simes()
returns a list with three named elements.
$reject
stores a logical value whether the global null hypothesis has
been rejected. $alpha
stores the significance level that was chosen.
$details
stores a matrix of the individual null hypothesis p-values,
the adjusted significance level according to Simes' procedure, and the
rejection decision for each individual hypothesis test.
Scaling sum proportion test across different cut-offs
Description
sumtest()
uses the estimations across several cut-offs to test whether
the sum of the deviations between sample and population FODR differ
significantly from its expected value.
\[\sum_{k = 1}^{K} \sqrt{n}(\widehat{\gamma}_{c_{k}} - \gamma_{c_{k}}) \]
Usage
sumtest(robust2sls_object, alpha, iteration, one_sided = FALSE)
Arguments
robust2sls_object |
A list of |
alpha |
A numeric value between 0 and 1 representing the significance level of the test. |
iteration |
An integer >= 0 or the character "convergence" that determines which iteration is used for the test. |
one_sided |
A logical value whether a two-sided test ( |
Value
sumtest()
returns a data frame with one row storing the
iteration that was tested, the value of the test statistic (t-test), the
type of the test (one- or two-sided), the corresponding p-value, the
significance level, and whether the null hypothesis is rejected. The data
frame also contains an attribute named "gammas"
that records which
gammas determining the different cut-offs were used in the scaling sum test.
Supremum proportion test across different cut-offs
Description
suptest()
uses the estimations across several cut-offs to test whether
the supremum/maximum of the deviations between sample and population FODR
differs significantly from its expected value.
\[ \sup_{c} |\sqrt{n}(\widehat{\gamma}_{c} - \gamma_{c})| \]
Usage
suptest(robust2sls_object, alpha, iteration, p = c(0.9, 0.95, 0.99), R = 50000)
Arguments
robust2sls_object |
A list of |
alpha |
A numeric value between 0 and 1 representing the significance level of the test. |
iteration |
An integer >= 0 or the character "convergence" that determines which iteration is used for the test. |
p |
A numeric vector of probabilities with values in [0,1] for which the corresponding quantiles are calculated. |
R |
An integer specifying the number of replications for simulating the distribution of the test statistic. |
Value
suptest()
returns a data frame with one row storing the
iteration that was tested, the value of the test statistic, the corresponding
p-value, the significance level, and whether the null hypothesis is rejected.
The data frame also contains two named attributes. The first attribute is
named "gammas"
and records which gammas determining the different
cut-offs were used in the scaling sup test. The second attribute is named
"critical"
and records the critical values corresponding to the
different quantiles in the limiting distribution that were specified in
p
.
Critical and p-value for test statistic relative to simulated distribution
Description
test_cpv
returns the critical value corresponding to a given
quantile of the simulated distribution and the p-value of the test statistic.
Usage
test_cpv(dist, teststat, p)
Arguments
dist |
A numeric vector of simulated values approximating the
distribution of the test statistic, e.g. generated as in |
teststat |
A numeric value of the test statistic. |
p |
A numeric vector of probabilities with values in [0,1] for which the corresponding quantiles are calculated. |
Value
A list with two named entries. $pval
is the p-value of the
test statistic with respect to the distribution dist
. $q
is the
vector of sample quantiles in the distribution dist
corresponding to
the probabilities specified in p
.
Append new iteration results to "robust2sls"
object
Description
update_list
takes an existing "robust2sls"
object and appends
the estimation results (ivreg model object, residuals,
standardised residuals, selection and type vectors) of a new iteration.
Usage
update_list(current_list, new_info, name)
Arguments
current_list |
A list object of class |
new_info |
A list with named components |
name |
A character vector of length one naming the appended iteration
results. Convention is |
Value
An object of class "robust2sls"
whose components
$model
, $res
, $stdres
, $sel
, and $type
are
now appended with the new iteration results.
User-specified initial estimator
Description
user_init
uses a model supplied by the user as the initial estimator.
Based on this estimator, observations are classified as outliers or not.
Usage
user_init(data, formula, cutoff, user_model)
Arguments
data |
A dataframe. |
formula |
A formula in the format |
cutoff |
A numeric cutoff value used to judge whether an observation is an outlier or not. If its absolute value is larger than the cutoff value, the observations is classified as an outlier. |
user_model |
A model object of class ivreg whose parameters are used to calculate the residuals. |
Value
user_init
returns a list with five elements. The first
four are vectors whose length equals the number of observations in the data
set. Unlike the residuals stored in a model object (usually accessible via
model$residuals
), it does not ignore observations where any of y, x
or z are missing. It instead sets their values to NA
.
The first element is a double vector containing the residuals for each
observation based on the model estimates. The second element contains the
standardised residuals, the third one a logical vector with TRUE
if
the observation is judged as not outlying, FALSE
if it is an outlier,
and NA
if any of y, x, or z are missing. The fourth element of the
list is an integer vector with three values: 0 if the observations is judged
to be an outlier, 1 if not, and -1 if missing. The fifth and last element
stores the ivreg
user-specified model object based on
which the four vectors were calculated.
Warning
Check Jiao (2019) about conditions on the initial estimator that should be satisfied for the initial estimator (e.g. they have to be Op(1)).
Validator of robust2sls class
Description
validate_robust2sls
checks that the input is a valid object of
class "robust2sls"
.
Usage
validate_robust2sls(x)
Arguments
x |
An object whose validity of class |
Value
If the object is a valid "robust2sls"
object then the function
returns the object. No return value otherwise.
Calculate varrho coefficients
Description
varrho
calculates the coefficients for the asymptotic variance of the
gauge (false outlier detection rate) for a specific iteration m >= 1.
Usage
varrho(sign_level, ref_dist = c("normal"), iteration)
Arguments
sign_level |
A numeric value between 0 and 1 that determines the cutoff in the reference distribution against which observations are judged as outliers or not. |
ref_dist |
A character vector that specifies the reference distribution
against which observations are classified as outliers. |
iteration |
An integer >= 1 that specifies the iteration of the outlier detection algorithm. |
Value
varrho
returns a list with four components, all of which are
lists themselves. $setting
stores the arguments with which the
function was called. $c
stores the values of the six different
coefficients for the specified iteration. $fp
contains the fixed point
versions of the six coefficients. $aux
stores intermediate values
required for calculating the coefficients.