Type: | Package |
Title: | Robust Survey Statistics Estimation |
Version: | 0.7 |
Description: | Robust (outlier-resistant) estimators of finite population characteristics like of means, totals, ratios, regression, etc. Available methods are M- and GM-estimators of regression, weight reduction, trimming, and winsorization. The package extends the 'survey' https://CRAN.R-project.org/package=survey package. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Classification/MSC-2010: | 62D05, 62F35 |
URL: | https://github.com/tobiasschoch/robsurvey |
BugReports: | https://github.com/tobiasschoch/robsurvey/issues |
Encoding: | UTF-8 |
NeedsCompilation: | yes |
LazyData: | true |
Depends: | R (≥ 3.5.0) |
Imports: | grDevices, KernSmooth, stats, survey (≥ 3.35-1), utils |
Suggests: | hexbin, knitr, MASS, rmarkdown, wbacon |
VignetteBuilder: | knitr, rmarkdown |
Packaged: | 2024-08-21 17:54:28 UTC; tobias |
Author: | Beat Hulliger |
Maintainer: | Tobias Schoch <tobias.schoch@fhnw.ch> |
Repository: | CRAN |
Date/Publication: | 2024-08-22 07:10:02 UTC |
Package Overview
Description
A key design pattern of the package is that the majority of the estimating methods is available in two "flavors":
bare-bone methods
survey methods
Bare-bone methods are stripped-down versions of the survey methods in terms of functionality and informativeness. These functions may serve users and package developers as building blocks. In particular, bare-bone functions cannot compute variances.
The survey methods are much more capable and depend, for variance estimation, on the survey package.
Basic Robust Estimators
Trimming
Bare-bone methods:
weighted_mean_trimmed
andweighted_total_trimmed
Survey methods:
svymean_trimmed
andsvytotal_trimmed
Winsorization
Bare-bone methods:
Survey methods:
Dalen's estimators (weight reduction methods)
Bare-bone methods:
weighted_mean_dalen
andweighted_total_dalen
Survey methods:
svymean_dalen
andsvytotal_dalen
M-estimators
Bare-bone methods:
-
huber2
(weighted Huber Proposal 2 estimator)
Survey methods:
-
mer
(minimum estimated risk estimator)
Survey Regression (weighted least squares)
Robust Regression and Ratio Estimation (weighted)
Regression M-estimators:
svyreg_huberM
andsvyreg_tukeyM
Regression GM-estimators (Mallows and Schweppe):
svyreg_huberGM
andsvyreg_tukeyGM
Ratio M-estimators:
svyratio_huber
andsvyratio_tukey
Note: The functions svyreg_huber
and
svyreg_tukey
are deprecated, use instead
svyreg_huberM
and svyreg_tukeyM
, respectively;
see also robsurvey-deprecated.
Robust Generalized Regression (GREG) and Ratio Prediction of the Population Mean and Total
Regression predictors:
svymean_reg
andsvytotal_reg
Ratio predictors:
svymean_ratio
andsvytotal_ratio
Utility functions
PPS Sample From the MU284 Population
Description
Probability-proportional-to-size sample (PPS) without replacement of municipalities from the MU284 population in Särndal et al. (1992). The sample inclusion probabilities are proportional to the population size in 1975 (variable P75).
Usage
data(MU284pps)
Format
A data.frame
with 60 observations on the following variables:
LABEL
identifier variable,
[integer]
.P85
1985 population size (in thousands),
[double]
.P75
1975 population size (in thousands),
[double]
.RMT85
Revenues from the 1985 municipal taxation (in millions of kronor),
[double]
.CS82
number of Conservative seats in municipal council,
[double]
.SS82
number of Social-Democrat seats in municipal council (1982),
[double]
.S82
total number of seats in municipal council (1982),
[double]
.ME84
number of municipal employees in 1984,
[double]
.REV84
real estate values according to 1984 assessment (in millions of kronor),
[double]
.REG
geographic region indicator,
[integer]
.CL
cluster indicator (a cluster consists of a set of neighbouring municipalities),
[integer]
.weights
sampling weights,
[double]
.pi
sample inclusion probability,
[double]
.
Details
The MU284 population of Särndal et al. (1992, Appendix B) is a
dataset with observations on the 284 municipalities in Sweden in the
late 1970s and early 1980s. The MU284
population data
are available in the sampling package of Tillé and Matei (2021).
The data frame MU284pps
is a probability-proportional-to-size
sample (PPS) without replacement from the MU284 population.
The sample inclusion probabilities are proportional to the
population size in 1975 (variable P75). The sample has been
selected by Brewer’s method; see Tillé (2006, Chap. 7).
The sampling weight (inclusion probabilities) are calibrated to
the population size and the population total of P75.
Source
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer-Verlag.
Tillé, Y. and Matei, A. (2021). sampling: Survey Sampling. R package version 2.9. https://CRAN.R-project.org/package=sampling
Tillé, Y. (2006). Sampling Algorithms. New York: Springer-Verlag.
See Also
Examples
head(MU284pps)
library(survey)
# Survey design with inclusion probabilities proportional to size
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~LABEL, fpc = ~pi, data = MU284pps, pps = "brewer",
calibrate.formula = ~1)
} else {
# legacy mode
svydesign(ids = ~LABEL, fpc = ~pi, data = MU284pps, pps = "brewer")
}
Stratified Sample from the MU284 Population
Description
Stratified simple random sample (without replacement) of municipalities from the MU284 population in Särndal et al. (1992). Stratification is by geographic region and a take-all stratum (by 1975 population size), which includes the big cities Stockholm, Göteborg, and Malmö.
Usage
data(MU284strat)
Format
A data.frame
with 60 observations on the following variables:
LABEL
identifier variable,
[integer]
.P85
1985 population size (in thousands),
[double]
.P75
1975 population size (in thousands),
[double]
.RMT85
Revenues from the 1985 municipal taxation (in millions of kronor),
[double]
.CS82
number of Conservative seats in municipal council,
[double]
.SS82
number of Social-Democrat seats in municipal council (1982),
[double]
.S82
total number of seats in municipal council (1982),
[double]
.ME84
number of municipal employees in 1984,
[double]
.REV84
real estate values according to 1984 assessment (in millions of kronor),
[double]
.CL
cluster indicator (a cluster consists of a set of neighbouring municipalities),
[integer]
.REG
geographic region indicator,
[integer]
.Stratum
stratum indicator,
[integer]
.weights
sampling weights,
[double]
.fpc
finite population correction,
[double]
.
Details
The MU284 population of Särndal et al. (1992, Appendix B) is a
dataset with observations on the 284 municipalities in Sweden in the
late 1970s and early 1980s. The MU284
population data
are available in the sampling package of Tillé and Matei (2021).
The population is divided into two parts based on 1975 population
size (P75
):
the MU281 population, which consists of the 281 smallest municipalities;
the MU3 population of the three biggest municipalities/ cities in Sweden (Stockholm, Göteborg, and Malmö).
The three biggest cities take exceedingly large values (representative
outliers) on almost all of the variables. To account for this, a stratified
sample has been drawn from the MU284 population using a take-all stratum.
The sample data, MU284strat
, (of size n=60
) consists of
a stratified simple random sample (without replacement) from the MU281 population, where stratification is by geographic region (
REG
) with proportional sample size allocation;a take-all stratum that includes the three biggest cities/ municipalities (population M3).
Source
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer-Verlag.
Tillé, Y. and Matei, A. (2021). sampling: Survey Sampling. R package version 2.9. https://CRAN.R-project.org/package=sampling
See Also
Examples
head(MU284strat)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~LABEL, strata = ~Stratum, fpc = ~fpc,
weights = ~weights, data = MU284strat,
calibrate.formula = ~-1 + Stratum)
} else {
# legacy mode
svydesign(ids = ~LABEL, strata = ~Stratum, fpc = ~fpc,
weights = ~weights, data = MU284strat)
}
Utility Functions for Objects of Class svyreg_rob
Description
Methods and utility functions for objects of class svyreg_rob
.
Usage
## S3 method for class 'svyreg_rob'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'svyreg_rob'
summary(object, mode = c("design", "model", "compound"),
digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'svyreg_rob'
coef(object, ...)
## S3 method for class 'svyreg_rob'
vcov(object, mode = c("design", "model", "compound"), ...)
## S3 method for class 'svyreg_rob'
SE(object, mode = c("design", "model", "compound"), ...)
## S3 method for class 'svyreg_rob'
residuals(object, ...)
## S3 method for class 'svyreg_rob'
fitted(object, ...)
## S3 method for class 'svyreg_rob'
robweights(object)
## S3 method for class 'svyreg_rob'
plot(x, which = 1L:4L,
hex = FALSE, caption = c("Standardized residuals vs. Fitted Values",
"Normal Q-Q", "Response vs. Fitted values",
"Sqrt of abs(Residuals) vs. Fitted Values"),
panel = if (add.smooth) function(x, y, ...) panel.smooth(x, y,
iter = iter.smooth, ...) else points, sub.caption = NULL, main = "",
ask = prod(par("mfcol")) < length(which) && dev.interactive(), ...,
id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE,
add.smooth = getOption("add.smooth"), iter.smooth = 3,
label.pos = c(4, 2), cex.caption = 1, cex.oma.main = 1.25)
Arguments
x |
object of class |
digits |
|
... |
additional arguments passed to the method. |
object |
object of class |
mode |
|
which |
|
hex |
|
caption |
|
panel |
panel function. The useful alternative to
|
sub.caption |
|
main |
|
ask |
|
id.n |
|
labels.id |
|
cex.id |
|
qqline |
|
add.smooth |
|
iter.smooth |
|
label.pos |
|
cex.caption |
|
cex.oma.main |
|
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
- Variance
-
For variance estimation (
summary
,vcov
, andSE
) three modes are available:-
"design"
: design-based variance estimator using linearization; see Binder (1983) -
"model"
: model-based weighted variance estimator (the sampling design is ignored) -
"compound"
: design-model-based variance estimator; see Rubin-Bleuer and Schiopu-Kratina (2005) and Binder and Roberts (2009)
-
- Utility functions
-
The following utility functions are available:
-
summary
gives a summary of the estimation properties -
plot
shows diagnostic plots for the estimated regression model -
robweights
extracts the robustness weights (if available) -
coef
extracts the estimated regression coefficients -
vcov
extracts the (estimated) covariance matrix -
residuals
extracts the residuals -
fitted
extracts the fitted values
-
References
Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. International Statistical Review 51, 279–292. doi:10.2307/1402588
Binder, D. A. and Roberts, G. (2009). Design- and Model-Based Inference for Model Parameters. In: Sample Surveys: Inference and Analysis ed. by Pfeffermann, D. and Rao, C. R. Volume 29B of Handbook of Statistics, Amsterdam: Elsevier, Chap. 24, 33–54 doi:10.1016/S0169-7161(09)00224-7
Rubin-Bleuer, S. and Schiopu-Kratina, I. (2005). On the Two-phase framework for joint model and design-based inference. The Annals of Statistics 33, 2789–2810. doi:10.1214/009053605000000651
See Also
Weighted least squares: svyreg
; robust weighted regression
svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
and svyreg_tukeyGM
Examples
head(workplace)
library(survey)
# Survey design for simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Compute regression M-estimate with Huber psi-function
m <- svyreg_huberM(payroll ~ employment, dn, k = 14)
# Diagnostic plots (e.g., standardized residuals against fitted values)
plot(m, which = 1L)
# Plot of the robustness weights of the M-estimate against its residuals
plot(residuals(m), robweights(m))
# Utility functions
summary(m)
coef(m)
SE(m)
vcov(m)
residuals(m)
fitted(m)
robweights(m)
Utility Functions for Objects of Class svystat_rob
Description
Methods and utility functions for objects of class svystat_rob
.
Usage
mse(object, ...)
## S3 method for class 'svystat_rob'
mse(object, ...)
## S3 method for class 'svystat'
mse(object, ...)
## S3 method for class 'svystat_rob'
summary(object, digits = max(3L,
getOption("digits") - 3L), ...)
## S3 method for class 'svystat_rob'
coef(object, ...)
## S3 method for class 'svystat_rob'
SE(object, ...)
## S3 method for class 'svystat_rob'
vcov(object, ...)
## S3 method for class 'svystat_rob'
scale(x, ...)
## S3 method for class 'svystat_rob'
residuals(object, ...)
## S3 method for class 'svystat_rob'
fitted(object, ...)
robweights(object)
## S3 method for class 'svystat_rob'
robweights(object)
## S3 method for class 'svystat_rob'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
object |
object of class |
digits |
|
... |
additional arguments passed to the method. |
x |
object of class |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
Utility functions:
-
mse
computes the estimated risk (mean square error) in presence of representative outliers; see alsomer
-
summary
gives a summary of the estimation properties -
robweights
extracts the robustness weights -
coef
extracts the estimate of location -
SE
extracts the (estimated) standard error -
vcov
extracts the (estimated) covariance matrix -
residuals
extracts the residuals -
fitted
extracts the fitted values
See Also
svymean_dalen
, svymean_huber
,
svymean_ratio
, svymean_reg
,
svymean_tukey
, svymean_trimmed
,
svymean_winsorized
svytotal_dalen
, svytotal_huber
,
svytotal_ratio
, svytotal_reg
,
svytotal_tukey
, svytotal_trimmed
,
svytotal_winsorized
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Estimated one-sided k winsorized population total (i.e., k = 2 observations
# are winsorized at the top of the distribution)
wtot <- svytotal_k_winsorized(~employment, dn, k = 2)
# Show summary statistic of the estimated total
summary(wtot)
# Estimated mean square error (MSE)
mse(wtot)
# Estimate, std. err., variance, and the residuals
coef(wtot)
SE(wtot)
vcov(wtot)
residuals(wtot)
# M-estimate of the total (Huber psi-function; tuning constant k = 3)
mtot <- svytotal_huber(~employment, dn, k = 45)
# Plot of the robustness weights of the M-estimate against its residuals
plot(residuals(mtot), robweights(mtot))
Data on a Simple Random Sample of 100 Counties in the U.S.
Description
Data from a simple random sample (without replacement) of 100 of the 3141 counties in the United Stated (U.S. Bureau of the Census, 1994).
Usage
data(counties)
Format
A data.frame
with 100 observations on the following variables:
state
state,
[character]
.county
county,
[character]
.landarea
land area, 1990 (square miles),
[double]
.totpop
population total, 1992,
[double]
.unemp
number of unemployed persons, 1991,
[double]
.farmpop
farm population, 1990,
[double]
.numfarm
number of farms, 1987,
[double]
.farmacre
acreage in farms, 1987,
[double]
.weights
sampling weight,
[double]
.fpc
finite population corretion,
[double]
.
Details
The data (and 10 additional variables) are published in Lohr (1999, Appendix C).
Source
Lohr, S. L. (1999). Sampling: Design and Analysis, Pacific Grove (CA): Duxbury Press.
Examples
head(counties)
library(survey)
# Survey design for simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~1, fpc = ~fpc, weights = ~weights, data = counties,
calibrate.formula = ~1)
} else {
# legacy mode
svydesign(ids = ~1, fpc = ~fpc, weights = ~weights, data = counties)
}
Measurement of Copper Content in Wholemeal Flour
Description
Measurement of copper content in wholemeal flour (measured in parts per million).
Usage
data(flour)
Format
A data.frame
with 24 observations (sorted in ascending order)
on the following variables:
copper
copper content
[double]
.weight
weight
[double]
.
Details
The data are published in Maronna et al. (2019, p. 2).
Source
Maronna, R. A., Martin, R. D., Yohai, V. J. and Salibián-Barrera, M. (2019). Robust Statistics: Theory and Methods (with R), Hoboken (NJ): John Wiley and Sons, 2nd edition. doi:10.1002/9781119214656
Examples
head(flour)
Weighted Huber Proposal 2 Estimator
Description
Weighted Huber Proposal 2 estimator of location and scatter.
Usage
huber2(x, w, k = 1.5, na.rm = FALSE, maxit = 50, tol = 1e-04, info = FALSE,
k_Inf = 1e6, df_cor = TRUE)
Arguments
x |
|
w |
|
k |
|
na.rm |
|
maxit |
|
tol |
|
info |
|
k_Inf |
|
df_cor |
|
Details
Function huber2
computes the weighted Huber (1964) Proposal 2
estimates of location and scale.
The method is initialized by the weighted median (location) and the weighted interquartile range (scale).
Value
The return value depends on info
:
info = FALSE
:estimate of mean or total
[double]
info = TRUE
:a
[list]
with items:-
characteristic
[character]
, -
estimator
[character]
, -
estimate
[double]
, -
variance
(default:NA
), -
robust
[list]
, -
residuals
[numeric vector]
, -
model
[list]
, -
design
(default:NA
), -
[call]
-
Comparison
The huber2
estimator is initialized by the weighted median and
the weighted (scaled) interquartile range. For unweighted data, this
estimator differs from hubers
in MASS,
which is initialized by mad
.
The difference between the estimators is usually negligible (for
sufficiently small values of tol
). See examples.
References
Huber, P. J. (1964). Robust Estimation of a Location Parameter. Annals of Mathematical Statistics 35, 73–101. doi:10.1214/aoms/1177703732
Examples
head(workplace)
# Weighted "Proposal 2" estimator of the mean
huber2(workplace$employment, workplace$weight, k = 8)
# More information on the estimate, i.e., info = TRUE
m <- huber2(workplace$employment, workplace$weight, k = 8, info = TRUE)
# Estimate of scale
m$scale
# Comparison with MASS::hubers (without weights). We make a copy of MASS::hubers
library(MASS)
hubers_mod <- hubers
# Then we replace mad by the (scaled) IQR as initial scale estimator
body(hubers_mod)[[7]][[3]][[2]] <- substitute(s0 <- IQR(y, type = 2) * 0.7413)
# Define the numerical tolerance
TOLERANCE <- 1e-8
# Comparison
m1 <- huber2(workplace$payroll, rep(1, 142), tol = TOLERANCE)
m2 <- hubers_mod(workplace$payroll, tol = TOLERANCE)$mu
m1 / m2 - 1
# The absolute relative difference is < 4.0-09 (smaller than TOLERANCE)
Length-of-Stay (LOS) Hospital Data
Description
A simple random sample of 70 patients in inpatient hospital treatment.
Usage
data(losdata)
Format
A data.frame
with data on the following variables:
los
length of stay (days)
[integer]
.weight
sampling weight
[double]
.fpc
finite population correction
[double]
.
Details
The losdata
are a simple random sample without replacement (SRSWOR)
of size n = 70
patients from the (fictive) population of
N = 2479
patients in inpatient hospital treatment. We have
constructed the losdata
as a showcase; though, the LOS
measurements are real data that we have taken from the 201 observations
in Ruffieux et al. (2000). The original LOS data of Ruffieux et al.
(2000) are available in the R package robustbase; see
robustbase::data(los)
. Our losdata
are a SRSWOR of
size n = 70
from the 201 original observations.
Ruffieux et al. (2000) and data.frame los
in the R
package robustbase.
Source
Ruffieux, C., Paccaud, F. and Marazzi, A. (2000). Comparing rules for truncating hospital length of stay. Casemix Quarterly 2.
Examples
head(losdata)
library(survey)
# Survey design for simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata,
calibrate.formula = ~1)
} else {
# legacy mode
svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata)
}
Minimum Estimated Risk (MER) M-Estimator
Description
mer
is an adaptive M-estimator of the weighted mean or total. It
is defined as the estimator that minimizes the estimated mean square error,
mse
, of the estimator under consideration.
Usage
mer(object, verbose = TRUE, max_k = 10, init = 1, method = "Brent",
optim_args = list())
Arguments
object |
an object of class |
verbose |
|
init |
|
method |
|
max_k |
|
optim_args |
|
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
MER-estimators are available for the methods svymean_huber
,
svytotal_huber
, svymean_tukey
and
svytotal_tukey
.
Value
Object of class svystat_rob
References
Hulliger, B. (1995). Outlier Robust Horvitz-Thompson Estimators. Survey Methodology 21, 79–87.
See Also
Overview (of all implemented functions)
Examples
head(losdata)
library(survey)
# Survey design for simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata,
calibrate.formula = ~1)
} else {
# legacy mode
svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata)
}
# M-estimator of the total with tuning constant k = 8
m <- svymean_huber(~los, dn, type = "rhj", k = 8)
# MER estimator
mer(m)
Sampling with probability proportional to size (pps without replacement)
Description
Methods to compute the first-order sample inclusion probabilities (given a measure of size) and sampling mechanisms to draw samples with probabilities proportional to size (pps).
Usage
pps_probabilities(size, n)
pps_draw(x, method = "brewer", sort = TRUE)
## S3 method for class 'prob_pps'
print(x, ...)
Arguments
size |
|
n |
|
x |
object of class |
method |
|
sort |
|
... |
additional arguments. |
Details
Function pps_probabilities
computes the first-order sample inclusion
probabilities for a given sample size n
; see e.g., Särndal et
al., 1992 (p. 90). The probabilities (and additional attributes) are
returned as a vector, more precisely as an object of class prob_pps
.
For an object of class prob_pps
(inclusion probabilities and
additional attributes), function pps_draw
draws a pps sample
without replacement and returns the indexes of the population elements.
Only the method of Brewer (1963, 1975) is currently implemented.
Value
Function pps_probabilities
returns the probabilities (an object
of class (prob_pps
).
Function pps_draw
returns a pps sample of indexes from the
population elements.
References
Brewer, K. W. R. (1963). A Model of Systematic Sampling with Unequal Probabilities. Australian Journal of Statistics 5, 93–105. doi:10.1111/j.1467-842X.1963.tb00132.x
Brewer, K. W. R. (1975). A simple procedure for \pi
pswor,
Australian Journal of Statistics 17, 166–172.
doi:10.1111/j.1467-842X.1975.tb00954.x
Särndal, C.-E., Swensson, B., Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer-Verlag.
Examples
# We are going to pretend that the workplace sample is our population.
head(workplace)
# The population size is N = 142. We want to draw a pps sample (without
# replacement) of size n = 10, where the variable employment is the measure of
# size. The first-order sample inclusion probabilities are calculated as
# follows
p <- pps_probabilities(workplace$employment, n = 10)
# Now, we draw a pps sample using Brewer's method.
pps_draw(p, method = "brewer")
Deprecated Functions in Package 'robsurvey'
Description
These functions are provided for compatibility with older versions of the package only, and may be defunct as soon as the next release.
Use instead:
See Also
Internal Function for the Regression GM-Estimator
Description
Internal function to call the robust survey regression GM-estimator; this function is only intended for internal use. The function does not check or validate the arguments. In particular, missing values in the data may make the function crash.
Usage
robsvyreg(x, y, w, k, psi, type, xwgt, var = NULL, verbose = TRUE, ...)
svyreg_control(tol = 1e-5, maxit = 100, k_Inf = 1e6, init = NULL,
mad_center = TRUE, ...)
Arguments
x |
|
y |
|
w |
|
k |
|
psi |
|
type |
|
xwgt |
|
var |
|
verbose |
|
tol |
|
maxit |
|
k_Inf |
|
init |
either |
mad_center |
|
... |
additional arguments passed to the method
(see |
Details
Not documented
Value
[list]
Weighted Huber and Tukey Mean and Total (M-Estimator) – Robust Horvitz-Thompson Estimator
Description
Weighted Huber and Tukey M-estimator of the population mean and total (robust Horvitz-Thompson estimator)
Usage
svymean_huber(x, design, k, type = "rwm", asym = FALSE, na.rm = FALSE,
verbose = TRUE, ...)
svytotal_huber(x, design, k, type = "rwm", asym = FALSE, na.rm = FALSE,
verbose = TRUE, ...)
svymean_tukey(x, design, k, type = "rwm", na.rm = FALSE, verbose = TRUE, ...)
svytotal_tukey(x, design, k, type = "rwm", na.rm = FALSE, verbose = TRUE, ...)
Arguments
x |
a one-sided |
design |
an object of class |
k |
|
type |
|
asym |
|
na.rm |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
- Methods/ types
type = "rht"
ortype = "rwm"
; seeweighted_mean_huber
orweighted_mean_tukey
for more details.- Variance estimation.
Taylor linearization (residual variance estimator).
- Utility functions
- Bare-bone functions
See
weighted_mean_huber
weighted_mean_tukey
,weighted_total_huber
, andweighted_total_tukey
.
Value
Object of class svystat_rob
Failure of convergence
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05
. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control
.
References
Hulliger, B. (1995). Outlier Robust Horvitz-Thompson Estimators. Survey Methodology 21, 79–87.
See Also
Overview (of all implemented functions)
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Robust Horvitz-Thompson M-estimator of the population total
svytotal_huber(~employment, dn, k = 9, type = "rht")
# Robust weighted M-estimator of the population mean
m <- svymean_huber(~employment, dn, k = 12, type = "rwm")
# Summary statistic
summary(m)
# Plot of the robustness weights of the M-estimate against its residuals
plot(residuals(m), robweights(m))
# Extract estimate
coef(m)
# Extract estimate of scale
scale(m)
# Extract estimated standard error
SE(m)
Dalen's Estimators of the Population Mean and Total
Description
Dalen's estimators Z2 and Z3 of the population mean and total; see
weighted_mean_dalen
for further details.
Usage
svymean_dalen(x, design, censoring, type = "Z2", na.rm = FALSE,
verbose = TRUE, ...)
svytotal_dalen(x, design, censoring, type = "Z2", na.rm = FALSE,
verbose = TRUE, ...)
Arguments
x |
a one-sided |
design |
an object of class |
censoring |
|
type |
|
na.rm |
|
verbose |
|
... |
additional arguments (currently not used). |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
- Methods/ types
type = "Z2"
ortype = "Z3"
; seeweighted_mean_dalen
for more details.- Utility functions
- Bare-bone functions
See
weighted_mean_dalen
andweighted_total_dalen
.
Value
Object of class svystat_rob
References
Dalén, J. (1987). Practical Estimators of a Population Total Which Reduce the Impact of Large Observations. R & D Report U/STM 1987:32, Statistics Sweden, Stockholm.
See Also
Overview (of all implemented functions)
svymean_trimmed
, svytotal_trimmed
,
svymean_winsorized
, svytotal_winsorized
,
svymean_huber
and svytotal_huber
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Dalen's estimator Z3 of the population total
svytotal_dalen(~employment, dn, censoring = 20000, type = "Z3")
# Dalen's estimator Z3 of the population mean
m <- svymean_dalen(~employment, dn, censoring = 20000, type = "Z3")
# Summarize
summary(m)
# Extract estimate
coef(m)
# Extract estimated standard error
SE(m)
Robust Ratio Predictor of the Mean and Total
Description
Robust ratio predictor (M-estimator) of the population mean and total with Huber and Tukey biweight (bisquare) psi-function.
Usage
svytotal_ratio(object, total, variance = "wu", keep_object = TRUE, ...)
svymean_ratio(object, total, N = NULL, variance = "wu",
keep_object = TRUE, N_unknown = FALSE, ...)
Arguments
object |
an object of class |
total |
|
N |
|
variance |
|
keep_object |
|
N_unknown |
|
... |
additional arguments (currently not used). |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
The (robust) ratio predictor of the population total or mean is computed in two steps.
Step 1: Fit the ratio model associated with the predictor by one of the functions
svyratio_huber
orsvyratio_tukey
. The fitted model is calledobject
.Step 2: Based on the fitted model obtained in the first step, we predict the population total and mean, respectively, by the predictors
svytotal_ratio
andsvymean_ratio
, whereobject
is the fitted ratio model.
- Auxiliary data
-
Two types of auxiliary variables are distinguished: (1) population size
N
and (2) the population total of the auxiliary variable (denominator) used in the ratio model.The option
N_unknown = TRUE
can be used in the predictor of the population mean ifN
is unknown. - Variance estimation
-
Three variance estimators are implemented (argument
variance
):"base"
,"wu"
, and"hajek"
. These estimators correspond to the estimatorsv0
,v1
, andv2
in Wu (1982). - Utility functions
-
The return value is an object of class
svystat_rob
. Thus, the utility functionssummary
,coef
,SE
,vcov
,residuals
,fitted
, androbweights
are available.
Value
Object of class svystat_rob
References
Wu, C.-F. (1982). Estimation of Variance of the Ratio Estimator. Biometrika 69, 183–189.
See Also
Overview (of all implemented functions)
svymean_reg
and svytotal_reg
for (robust) GREG
regression predictors
svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
and svyreg_tukeyGM
for robust
regression M
- and GM
-estimators
svymean_huber
, svytotal_huber
,
svymean_tukey
and svytotal_tukey
for
M
-estimators
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Robust ratio M-estimator with Huber psi-function
rat <- svyratio_huber(~payroll, ~ employment, dn, k = 5)
# Summary of the ratio estimate
summary(rat)
# Diagnostic plots of the ration/regression M-estimate (e.g.,
# standardized residuals against fitted values)
plot(rat, which = 1L)
# Plot of the robustness weights of the ratio/regression M-estimate
# against its residuals
plot(residuals(rat), robweights(rat))
# Robust ratio predictor of the population mean
m <- svymean_ratio(rat, total = 1001233, N = 90840)
m
# Summary of the ratio estimate of the population mean
summary(m)
# Extract estimate
coef(m)
# Extract estimate of scale
scale(m)
# Extract estimated standard error
SE(m)
Robust Generalized Regression Predictor (GREG) of the Mean and Total
Description
Generalized regression estimator (GREG) predictor of the mean and total,
and robust GREG M
-estimator predictor
Usage
svytotal_reg(object, totals, N = NULL, type, k = NULL, check.names = TRUE,
keep_object = TRUE, ...)
svymean_reg(object, totals, N = NULL, type, k = NULL, check.names = TRUE,
keep_object = TRUE, N_unknown = FALSE, ...)
Arguments
object |
an object of class |
totals |
|
N |
|
type |
|
k |
|
check.names |
|
keep_object |
|
N_unknown |
|
... |
additional arguments (currently not used). |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
The (robust) GREG predictor of the population total or mean is computed in two steps.
Step 1: Fit the regression model associated with the GREG predictor by one of the functions
svyreg
,svyreg_huberM
,svyreg_huberGM
,svyreg_tukeyM
orsvyreg_tukeyGM
. The fitted model is calledobject
.Step 2: Based on the fitted model obtained in the first step, we predict the population total and mean, respectively, by the predictors
svytotal_reg
andsvymean_reg
, whereobject
is the fitted regression model.
The following GREG predictors are available:
- GREG (not robust,
k = NULL
) -
The following non-robust GREG predictors are available:
-
type = "projective"
ignores the bias correction term of the GREG predictor; see Särndal and Wright (1984). -
type = "ADU"
is the "standard" GREG, which is an asymptotically design unbiased (ADU) predictor; see Särndal et al.(1992, Chapter 6).
If the fitted regression model (
object
) does include a regression intercept, the predictor types"projective"
and"ADU"
are identical because the bias correction of the GREG is zero by design. -
- Robust GREG
-
The following robust GREG predictors are available:
-
type = "huber"
andtype = "tukey"
are, respectively, the robust GREG predictors with Huber and Tukey bisquare (biweight) psi-function. The tuning constant must satisfy0 < k <= Inf
. We can use the Huber-type GREG predictor although the model has been fitted by the regression estimator with Tukey psi-function (and vice versa). -
type = "BR"
is the bias-corrected robust GREG predictor of Beaumont and Rivest (2009), which is inspired by the bias-corrected robust predictor of Chambers (1986). The tuning constant must satisfy0 < k <= Inf
. -
type = "lee"
is the bias-corrected predictor of Lee (1991; 1992). Tthe tuning constantk
must satisfy0 <= k <= 1
. -
type = "duchesne"
is the bias-corrected, calibration-type estimator/ predictor of Duchesne (1999). The tuning constantk
must be specified as a vectork = c(a, b)
, wherea
andb
are the tuning constants of Duchesne's modified Huber psi-function (default values:a = 9
andb = 0.25
).
-
- Auxiliary data
-
Two types of auxiliary variables are distinguished: (1) population size
N
and (2) population totals of the auxiliary variables used in the regression model (i.e., non-constant explanatory variables).The option
N_unknown = TRUE
can be used in the predictor of the population mean ifN
is unknown.The names of the entries of
totals
are checked against the names of the regression fit (object
), unless we specifycheck.names = FALSE
. - Utility functions
-
The return value is an object of class
svystat_rob
. Thus, the utility functionssummary
,coef
,SE
,vcov
,residuals
,fitted
, androbweights
are available.
Value
Object of class svystat_rob
References
Beaumont, J.-F. and Rivest, L.-P. (2009). Dealing with outliers in survey data. In: Sample Surveys: Theory, Methods and Inference ed. by Pfeffermann, D. and Rao, C. R. Volume 29A of Handbook of Statistics, Amsterdam: Elsevier, Chap. 11, 247–280. doi:10.1016/S0169-7161(08)00011-4
Chambers, R. (1986). Outlier Robust Finite Population Estimation. Journal of the American Statistical Association 81, 1063–1069. doi:10.1080/01621459.1986.10478374
Duchesne, P. (1999). Robust calibration estimators, Survey Methodology 25, 43–56.
Gwet, J.-P. and Rivest, L.-P. (1992). Outlier Resistant Alternatives to the Ratio Estimator. Journal of the American Statistical Association 87, 1174–1182. doi:10.1080/01621459.1992.10476275
Lee, H. (1991). Model-Based Estimators That Are Robust to Outliers, in Proceedings of the 1991 Annual Research Conference, Bureau of the Census, 178–202. Washington, DC, Department of Commerce.
Lee, H. (1995). Outliers in business surveys. In: Business survey methods ed. by Cox, B. G., Binder, D. A., Chinnappa, B. N., Christianson, A., Colledge, M. J. and Kott, P. S. New York: John Wiley and Sons, Chap. 26, 503–526. doi:10.1002/9781118150504.ch26
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer.
Särndal, C.-E. and Wright, R. L. (1984). Cosmetic Form of Estimators in Survey Sampling. Scandinavian Journal of Statistics 11, 146–156.
See Also
Overview (of all implemented functions)
svymean_ratio
and svytotal_ratio
for (robust)
ratio predictors
svymean_huber
, svytotal_huber
,
svymean_tukey
and svytotal_tukey
for
M
-estimators
svyreg
, svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
and svyreg_tukeyGM
for robust
regression M
- and GM
-estimators
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Robust regression M-estimator with Huber psi-function
reg <- svyreg_huberM(payroll ~ employment, dn, k = 3)
# Summary of the regression M-estimate
summary(reg)
# Diagnostic plots of the regression M-estimate (e.g., standardized
# residuals against fitted values)
plot(reg, which = 1L)
# Plot of the robustness weights of the regression M-estimate against
# its residuals
plot(residuals(reg), robweights(reg))
# ADU (asymptotically design unbiased) estimator
m <- svytotal_reg(reg, totals = 1001233, 90840, type = "ADU")
m
# Robust GREG estimator of the mean; the population means of the auxiliary
# variables are from a register
m <- svymean_reg(reg, totals = 1001233, 90840, type = "huber", k = 20)
m
# Summary of the robust GREG estimate
summary(m)
# Extract estimate
coef(m)
# Extract estimated standard error
SE(m)
# Approximation of the estimated mean square error
mse(m)
Weighted Trimmed Mean and Total
Description
Weighted trimmed population mean and total.
Usage
svymean_trimmed(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, ...)
svytotal_trimmed(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, ...)
Arguments
x |
a one-sided |
design |
an object of class |
LB |
|
UB |
|
na.rm |
|
... |
additional arguments (currently not used). |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
- Characteristic.
Population mean or total. Let
\mu
denote the estimated trimmed population mean; then, the estimated trimmed total is given by\hat{N} \mu
with\hat{N} =\sum w_i
, where summation is over all observations in the sample.- Trimming.
The methods trims the
LB
~\cdot 100\%
of the smallest observations and the (1 -UB
)~\cdot 100\%
of the largest observations from the data.- Variance estimation.
Large-sample approximation based on the influence function; see Huber and Ronchetti (2009, Chap. 3.3) and Shao (1994).
- Utility functions.
- Bare-bone functions.
Value
Object of class svystat_rob
References
Huber, P. J. and Ronchetti, E. (2009). Robust Statistics, New York: John Wiley and Sons, 2nd edition. doi:10.1002/9780470434697
Shao, J. (1994). L-Statistics in Complex Survey Problems. The Annals of Statistics 22, 976–967. doi:10.1214/aos/1176325505
See Also
Overview (of all implemented functions)
weighted_mean_trimmed
and weighted_total_trimmed
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Estimated trimmed population total (5% symmetric trimming)
svytotal_trimmed(~employment, dn, LB = 0.05, UB = 0.95)
# Estimated trimmed population mean (5% trimming at the top of the distr.)
svymean_trimmed(~employment, dn, UB = 0.95)
Weighted Winsorized Mean and Total
Description
Weighted winsorized mean and total
Usage
svymean_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE,
trim_var = FALSE, ...)
svymean_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...)
svytotal_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE,
trim_var = FALSE, ...)
svytotal_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...)
Arguments
x |
a one-sided |
design |
an object of class |
LB |
|
UB |
|
na.rm |
|
trim_var |
|
k |
|
... |
additional arguments (currently not used). |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
- Characteristic.
Population mean or total. Let
\mu
denote the estimated winsorized population mean; then, the estimated winsorized total is given by\hat{N} \mu
with\hat{N} =\sum w_i
, where summation is over all observations in the sample.- Modes of winsorization.
The amount of winsorization can be specified in relative or absolute terms:
-
Relative: By specifying
LB
andUB
, the method winsorizes theLB
~\cdot 100\%
of the smallest observations and the (1 -UB
)~\cdot 100\%
of the largest observations from the data. -
Absolute: By specifying argument
k
in the functions with the "infix"_k_
in their name (e.g.,svymean_k_winsorized
), the largestk
observations are winsorized,0<k<n
, wheren
denotes the sample size. E.g.,k = 2
implies that the largest and the second largest observation are winsorized.
-
- Variance estimation.
Large-sample approximation based on the influence function; see Huber and Ronchetti (2009, Chap. 3.3) and Shao (1994). Two estimators are available:
simple_var = FALSE
Variance estimator of the winsorized mean/ total. The estimator depends on the estimated probability density function evaluated at the winsorization thresholds, which can be – depending on the context – numerically unstable. As a remedy, a simplified variance estimator is available by setting
simple_var = TRUE
.simple_var = TRUE
Variance is approximated using the variance estimator of the trimmed mean/ total.
- Utility functions.
- Bare-bone functions.
See:
Value
Object of class svystat_rob
References
Huber, P. J. and Ronchetti, E. (2009). Robust Statistics, New York: John Wiley and Sons, 2nd edition. doi:10.1002/9780470434697
Shao, J. (1994). L-Statistics in Complex Survey Problems. The Annals of Statistics 22, 976–967. doi:10.1214/aos/1176325505
See Also
Overview (of all implemented functions)
weighted_mean_winsorized
,
weighted_mean_k_winsorized
,
weighted_total_winsorized
and
weighted_total_k_winsorized
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Estimated winsorized population mean (5% symmetric winsorization)
svymean_winsorized(~employment, dn, LB = 0.05)
# Estimated one-sided k winsorized population total (2 observations are
# winsorized at the top of the distribution)
svytotal_k_winsorized(~employment, dn, k = 2)
Robust Survey Ratio M-Estimator
Description
svyratio_huber
and svyratio_tukey
compute the robust
M
-estimator of the ratio of two variables with, respectively,
Huber and Tukey biweight (bisquare) psi-function.
Usage
svyratio_huber(numerator, denominator, design, k, var = denominator,
na.rm = FALSE, asym = FALSE, verbose = TRUE, ...)
svyratio_tukey(numerator, denominator, design, k, var = denominator,
na.rm = FALSE, verbose = TRUE, ...)
Arguments
numerator |
a one-sided |
denominator |
a one-sided |
design |
an object of class |
k |
|
var |
a |
na.rm |
|
asym |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
The functions svyratio_huber
and svyratio_tukey
are
implemented as wrapper functions of the regression estimators
svyreg_huberM
and svyreg_tukeyM
. See
the help files of these functions (e.g., on how additional
parameters can be passed via ...
or on the usage of the
var
argument).
Value
Object of class svyreg.rob
and ratio
See Also
Overview (of all implemented functions)
summary
, coef
,
residuals
, fitted
,
SE
and vcov
plot
for regression diagnostic plot methods
svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
and svyreg_tukeyGM
for robust
regression estimators
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Compute regression M-estimate with Huber psi-function
m <- svyratio_huber(~payroll, ~employment, dn, k = 8)
# Regression inference
summary(m)
# Extract the coefficients
coef(m)
# Extract estimated standard error
SE(m)
# Extract variance/ covariance matrix
vcov(m)
# Diagnostic plots (e.g., standardized residuals against fitted values)
plot(m, which = 1L)
# Plot of the robustness weights of the M-estimate against its residuals
plot(residuals(m), robweights(m))
Survey Regression Estimator – Weighted Least Squares
Description
Weighted least squares estimator of regression
Usage
svyreg(formula, design, var = NULL, na.rm = FALSE, ...)
Arguments
formula |
a |
design |
an object of class |
var |
a one-sided |
na.rm |
|
... |
additional arguments (currently not used). |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
svyreg
computes the regression coefficients by weighted least
squares.
Models for svyreg_rob
are specified symbolically. A typical
model has the form response ~ terms
where response
is
the (numeric) response vector and terms
is a series of terms
which specifies a linear predictor for response
; see
formula
and lm
.
A formula has an implied intercept term. To remove this use either
y ~ x - 1
or y ~ 0 + x
; see formula
for more
details of allowed formulae.
Value
Object of class svyreg_rob
.
See Also
Overview (of all implemented functions)
summary
, coef
,
residuals
, fitted
,
SE
and vcov
plot
for regression diagnostic plot methods
Robust estimating methods svyreg_huberM
,
svyreg_huberGM
, svyreg_tukeyM
and
svyreg_tukeyGM
.
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Compute the regression estimate (weighted least squares)
m <- svyreg(payroll ~ employment, dn)
# Regression inference
summary(m)
# Extract the coefficients
coef(m)
# Extract variance/ covariance matrix
vcov(m)
# Diagnostic plots (e.g., Normal Q-Q-plot)
plot(m, which = 2L)
Deprecated Huber Robust Survey Regression M-Estimator
Description
The function svyreg_huber
is deprecated; use instead
svyreg_huberM
.
Usage
svyreg_huber(formula, design, k, var = NULL, na.rm = FALSE, asym = FALSE,
verbose = TRUE, ...)
Arguments
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
asym |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
Details
See svyreg_huberM
.
Value
Object of class svyreg.rob
See Also
Huber Robust Survey Regression M- and GM-Estimator
Description
svyreg_huberM
and svyreg_huberGM
compute, respectively,
a survey weighted M- and GM-estimator of regression using
the Huber psi-function.
Usage
svyreg_huberM(formula, design, k, var = NULL, na.rm = FALSE, asym = FALSE,
verbose = TRUE, ...)
svyreg_huberGM(formula, design, k, type = c("Mallows", "Schweppe"),
xwgt, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE,
...)
Arguments
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
asym |
|
verbose |
|
type |
|
xwgt |
|
... |
additional arguments passed to the method (e.g., |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
svyreg_huberM
and svyreg_huberGM
compute, respectively,
M- and GM-estimates of regression by iteratively re-weighted
least squares (IRWLS). The estimate of regression scale is (by default)
computed as the (normalized) weighted median of absolute deviations
from the weighted median (MAD; see weighted_mad
) for
each IRWLS iteration. If the weighted MAD is zero (or nearly so),
the scale is computed as the (normalized) weighted interquartile
range (IQR).
- M-estimator
-
The regression M-estimator
svyreg_huberM
is robust against residual outliers (granted that the tuning constantk
is chosen appropriately). - GM-estimator
-
Function
svyreg_huberGM
implements the Mallows and Schweppe regression GM-estimator (see argumenttype
). The regression GM-estimators are robust against residual outliers and outliers in the model's design space (leverage observations; see argumentxwgt
). - Numerical optimization
-
See
svyreg_control
. - Models
-
Models for
svyreg_rob
are specified symbolically. A typical model has the formresponse ~ terms
, whereresponse
is the (numeric) response vector andterms
is a series of terms which specifies a linear predictor forresponse
; seeformula
andlm
.A formula has an implied intercept term. To remove this use either
y ~ x - 1
ory ~ 0 + x
; seeformula
for more details of allowed formulae.
Value
Object of class svyreg.rob
Failure of convergence
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05
. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control
.
See Also
Overview (of all implemented functions)
Overview (of all implemented functions)
summary
, coef
,
residuals
, fitted
,
SE
and vcov
plot
for regression diagnostic plot methods
Other robust estimating methods svyreg_tukeyM
and
svyreg_tukeyGM
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Compute regression M-estimate with Huber psi-function
m <- svyreg_huberM(payroll ~ employment, dn, k = 8)
# Regression inference
summary(m)
# Extract the coefficients
coef(m)
# Extract variance/ covariance matrix
vcov(m)
# Diagnostic plots (e.g., standardized residuals against fitted values)
plot(m, which = 1L)
# Plot of the robustness weights of the M-estimate against its residuals
plot(residuals(m), robweights(m))
Deprecated Tukey Biweight Robust Survey Regression M-Estimator
Description
The function svyreg_tukey
is deprecated; use instead
svyreg_tukeyM
.
Usage
svyreg_tukey(formula, design, k, var = NULL, na.rm = FALSE, verbose = TRUE,
...)
Arguments
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
Details
See svyreg_tukeyM
.
Value
Object of class svyreg.rob
See Also
Tukey Biweight Robust Survey Regression M- and GM-Estimator
Description
svyreg_tukeyM
and svyreg_tukeyGM
compute, respectively,
a survey weighted M- and GM-estimator of regression using
the biweight Tukey psi-function.
Usage
svyreg_tukeyM(formula, design, k, var = NULL, na.rm = FALSE, verbose = TRUE,
...)
svyreg_tukeyGM(formula, design, k, type = c("Mallows", "Schweppe"),
xwgt, var = NULL, na.rm = FALSE, verbose = TRUE, ...)
Arguments
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
verbose |
|
type |
|
xwgt |
|
... |
additional arguments passed to the method (e.g., |
Details
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
svyreg_tukeyM
and svyreg_tukeyGM
compute, respectively,
M- and GM-estimates of regression by iteratively re-weighted least
squares (IRWLS). The estimate of regression scale is (by default)
computed as the (normalized) weighted median of absolute deviations
from the weighted median (MAD; see weighted_mad
) for
each IRWLS iteration. If the weighted MAD is zero (or nearly so),
the scale is computed as the (normalized) weighted
interquartile range (IQR).
- M-estimator
-
The regression M-estimator
svyreg_tukeyM
is robust against residual outliers (granted that the tuning constantk
is chosen appropriately). - GM-estimator
-
Function
svyreg_huberGM
implements the Mallows and Schweppe regression GM-estimator (see argumenttype
). The regression GM-estimators are robust against residual outliers and outliers in the model's design space (leverage observations; see argumentxwgt
). - Numerical optimization
-
See
svyreg_control
. - Models
-
Models for
svyreg_rob
are specified symbolically. A typical model has the formresponse ~ terms
, whereresponse
is the (numeric) response vector andterms
is a series of terms which specifies a linear predictor forresponse
; seeformula
andlm
.A formula has an implied intercept term. To remove this use either
y ~ x - 1
ory ~ 0 + x
; seeformula
for more details of allowed formulae.
Value
Object of class svyreg.rob
Failure of convergence
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05
. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control
.
See Also
Overview (of all implemented functions)
summary
, coef
,
residuals
, fitted
,
SE
and vcov
plot
for regression diagnostic plot methods.
Other robust estimating methods svyreg_huberM
and
svyreg_huberGM
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
# Compute regression M-estimate with Tukey bisquare psi-function
m <- svyreg_tukeyM(payroll ~ employment, dn, k = 8)
# Regression inference
summary(m)
# Extract the coefficients
coef(m)
# Extract variance/ covariance matrix
vcov(m)
# Diagnostic plots (e.g., standardized residuals against fitted values)
plot(m, which = 1L)
# Plot of the robustness weights of the M-estimate against its residuals
plot(residuals(m), robweights(m))
Weighted Five-Number Summary of a Variable
Description
Weighted five-number summary used for survey.design
and
survey.design2
objects (similar to base::summary
for [numeric vectors]
).
Usage
svysummary(object, design, na.rm = FALSE, ...)
Arguments
object |
one-sided |
design |
an object of class |
na.rm |
|
... |
additional arguments. |
Value
A weighted five-number summary (numeric variable) or a frequency table (factor variable).
Examples
head(workplace)
library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}
svysummary(~payroll, dn)
Weighted Huber and Tukey Mean and Total (bare-bone functions)
Description
Weighted Huber and Tukey M-estimator of the mean and total
(bare-bone function with limited functionality; see
svymean_huber
, svymean_tukey
,
svytotal_huber
, and svytotal_tukey
for more
capable methods)
Usage
weighted_mean_huber(x, w, k, type = "rwm", asym = FALSE, info = FALSE,
na.rm = FALSE, verbose = TRUE, ...)
weighted_total_huber(x, w, k, type = "rwm", asym = FALSE, info = FALSE,
na.rm = FALSE, verbose = TRUE, ...)
weighted_mean_tukey(x, w, k, type = "rwm", info = FALSE, na.rm = FALSE,
verbose = TRUE, ...)
weighted_total_tukey(x, w, k, type = "rwm", info = FALSE, na.rm = FALSE,
verbose = TRUE, ...)
Arguments
x |
|
w |
|
k |
|
type |
|
asym |
|
info |
|
na.rm |
|
verbose |
|
... |
additional arguments passed to the method (e.g.,
|
Details
- Characteristic.
Population mean or total. Let
\mu
denote the estimated population mean; then, the estimated total is given by\hat{N} \mu
with\hat{N} =\sum w_i
, where summation is over all observations in the sample.- Type.
Two methods/types are available for estimating the location
\mu
:type = "rwm" (default)
:robust weighted M-estimator of the population mean and total, respectively. This estimator is recommended for sampling designs whose inclusion probabilities are not proportional to some measure of size. [Legacy note: In an earlier version, the method
type = "rwm"
was called"rhj"
; the type"rhj"
is now silently converted to"rwm"
]type = "rht"
:robust Horvitz-Thompson M-estimator of the population mean and total, respectively. This estimator is recommended for proportional-to-size sampling designs.
- Variance estimation.
See the related but more capable functions:
- Psi-function.
By default, the
Huber
orTukey
psi-function are used in the specification of the M-estimators. For the Huber estimator, an asymmetric version of the Huber psi-function can be used by setting the argumentasym = TRUE
in the function call.
Value
The return value depends on info
:
info = FALSE
:estimate of mean or total
[double]
info = TRUE
:a
[list]
with items:-
characteristic
[character]
, -
estimator
[character]
, -
estimate
[double]
, -
variance
(default:NA
), -
robust
[list]
, -
residuals
[numeric vector]
, -
model
[list]
, -
design
(default:NA
), -
[call]
-
Failure of convergence
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05
. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control
.
References
Hulliger, B. (1995). Outlier Robust Horvitz-Thompson Estimators. Survey Methodology 21, 79–87.
See Also
Overview (of all implemented functions)
Examples
head(workplace)
# Robust Horvitz-Thompson M-estimator of the population total
weighted_total_huber(workplace$employment, workplace$weight, k = 9,
type = "rht")
# Robust weighted M-estimator of the population mean
weighted_mean_huber(workplace$employment, workplace$weight, k = 12,
type = "rwm")
Weighted Interquartile Range (IQR)
Description
Weighted (normalized) interquartile range
Usage
weighted_IQR(x, w, na.rm = FALSE, constant = 0.7413)
Arguments
x |
|
w |
|
na.rm |
|
constant |
|
Details
By default, the weighted IQR is normalized to be an unbiased estimate of
scale at the Gaussian core model. If normalization is not wanted, put
constant = 1
.
Value
Weighted IQR
See Also
Overview (of all implemented functions)
Examples
head(workplace)
# normalized weighted IQR (default)
weighted_IQR(workplace$employment, workplace$weight)
# weighted IQR (without normalization)
weighted_IQR(workplace$employment, workplace$weight, constant = 1)
Weighted Robust Line Fitting
Description
weighted_line
fits a robust line and allows weights.
Usage
weighted_line(x, y = NULL, w, na.rm = FALSE, iter = 1)
## S3 method for class 'medline'
print(x, ...)
## S3 method for class 'medline'
coef(object, ...)
## S3 method for class 'medline'
residuals(object, ...)
## S3 method for class 'medline'
fitted(object, ...)
Arguments
x |
|
y |
|
w |
|
na.rm |
|
iter |
|
object |
object of class |
... |
additional arguments passed to the method. |
Details
weighted_line
uses different quantiles for splitting the sample
than stats::line()
.
Value
intercept and slope of the fitted line
See Also
Overview (of all implemented functions)
Examples
head(cars)
# compute weighted line
weighted_line(cars$speed, cars$dist, w = rep(1, length(cars$speed)))
m <- weighted_line(cars$speed, cars$dist, w = rep(1:10, each = 5))
m
coef(m)
residuals(m)
fitted(m)
Weighted Median Absolute Deviation from the Median (MAD)
Description
Weighted median of the absolute deviations from the weighted median
Usage
weighted_mad(x, w, na.rm = FALSE, constant = 1.482602)
Arguments
x |
|
w |
|
na.rm |
|
constant |
|
Details
The weighted MAD is computed as the (normalized) weighted median of the
absolute deviation from the weighted median; see
weighted_median
. The weighted MAD is normalized to be
an unbiased estimate of scale at the Gaussian core model. If
normalization is not wanted, put constant = 1
.
Value
Weighted median absolute deviation from the (weighted) median
See Also
Overview (of all implemented functions)
Examples
head(workplace)
# normalized weighted MAD (default)
weighted_mad(workplace$employment, workplace$weight)
# weighted MAD (without normalization)
weighted_mad(workplace$employment, workplace$weight, constant = 1)
Weighted Total and Mean (Horvitz-Thompson and Hajek Estimators)
Description
Weighted total and mean (Horvitz-Thompson and Hajek estimators)
Usage
weighted_mean(x, w, na.rm = FALSE)
weighted_total(x, w, na.rm = FALSE)
Arguments
x |
|
w |
|
na.rm |
|
Details
weighted_total
and weighted_mean
compute, respectively,
the Horvitz-Thompson estimator of the population total and the Hajek
estimator of the population mean.
Value
Estimated population mean or total
See Also
Overview (of all implemented functions)
Examples
head(workplace)
# Horvitz-Thompson estimator of the total
weighted_total(workplace$employment, workplace$weight)
# Hajek estimator of the mean
weighted_mean(workplace$employment, workplace$weight)
Dalen Estimators of the Mean and Total
Description
Dalén's estimators of the population mean and the population total (bare-bone functions with limited functionality).
Usage
weighted_mean_dalen(x, w, censoring, type = "Z2", info = FALSE,
na.rm = FALSE, verbose = TRUE)
weighted_total_dalen(x, w, censoring, type = "Z2", info = FALSE,
na.rm = FALSE, verbose = TRUE)
Arguments
x |
|
w |
|
censoring |
|
type |
|
info |
|
na.rm |
|
verbose |
|
Details
Let \sum_{i \in s} w_ix_i
denote the expansion
estimator of the x
-total (summation is over all elements i
in sample s
). The estimators Z2 and Z3 of Dalén (1987) are
defined as follows.
- Estimator Z2
-
The estimator Z2 of the population total sums over
\min(c, w_ix_i)
; hence, it censors the productsw_ix_i
to the censoring constantc
(censoring
). The estimator of the populationx
-mean is is defined as the total divided by the population size. - Estimator Z3
-
The estimator Z3 of the population total is defined as the sum over the elements
z_i
, which is equal toz_i = w_ix_i
ifw_iy_i \leq c
andz_i = c + (y_i - c/w_i)
otherwise.
Value
The return value depends on info
:
info = FALSE
:estimate of mean or total
[double]
info = TRUE
:a
[list]
with items:-
characteristic
[character]
, -
estimator
[character]
, -
estimate
[double]
, -
variance
(default:NA
), -
robust
[list]
, -
residuals
[numeric vector]
, -
model
[list]
, -
design
(default:NA
), -
[call]
-
References
Dalén, J. (1987). Practical Estimators of a Population Total Which Reduce the Impact of Large Observations. R & D Report U/STM 1987:32, Statistics Sweden, Stockholm.
See Also
Overview (of all implemented functions)
Examples
head(workplace)
# Dalen's estimator of the total (with censoring threshold: 100000)
weighted_total_dalen(workplace$employment, workplace$weight, 100000)
Weighted Trimmed Mean and Total (bare-bone functions)
Description
Weighted trimmed mean and total (bare-bone functions with limited
functionality; see svymean_trimmed
and
svytotal_trimmed
for more capable methods)
Usage
weighted_mean_trimmed(x, w, LB = 0.05, UB = 1 - LB, info = FALSE,
na.rm = FALSE)
weighted_total_trimmed(x, w, LB = 0.05, UB = 1 - LB, info = FALSE,
na.rm = FALSE)
Arguments
x |
|
w |
|
LB |
|
UB |
|
info |
|
na.rm |
|
Details
- Characteristic.
Population mean or total. Let
\mu
denote the estimated trimmed population mean; then, the estimated trimmed population total is given by\hat{N} \mu
with\hat{N} =\sum w_i
, where summation is over all observations in the sample.- Trimming.
The methods trims the
LB
~\cdot 100\%
of the smallest observations and the (1 -UB
)~\cdot 100\%
of the largest observations from the data.- Variance estimation.
See survey methods:
Value
The return value depends on info
:
info = FALSE
:estimate of mean or total
[double]
info = TRUE
:a
[list]
with items:-
characteristic
[character]
, -
estimator
[character]
, -
estimate
[double]
, -
variance
(default:NA
), -
robust
[list]
, -
residuals
[numeric vector]
, -
model
[list]
, -
design
(default:NA
), -
[call]
-
See Also
Overview (of all implemented functions)
svymean_trimmed
and svytotal_trimmed
Examples
head(workplace)
# Estimated trimmed population total (5% symmetric trimming)
weighted_total_trimmed(workplace$employment, workplace$weight, LB = 0.05,
UB = 0.95)
# Estimated trimmed population mean (5% trimming at the top of the distr.)
weighted_mean_trimmed(workplace$employment, workplace$weight, UB = 0.95)
Weighted Winsorized Mean and Total (bare-bone functions)
Description
Weighted winsorized mean and total (bare-bone functions with limited
functionality; see svymean_winsorized
and
svytotal_winsorized
for more capable methods)
Usage
weighted_mean_winsorized(x, w, LB = 0.05, UB = 1 - LB, info = FALSE,
na.rm = FALSE)
weighted_mean_k_winsorized(x, w, k, info = FALSE, na.rm = FALSE)
weighted_total_winsorized(x, w, LB = 0.05, UB = 1 - LB, info = FALSE,
na.rm = FALSE)
weighted_total_k_winsorized(x, w, k, info = FALSE, na.rm = FALSE)
Arguments
x |
|
w |
|
LB |
|
UB |
|
info |
|
na.rm |
|
k |
|
Details
- Characteristic.
Population mean or total. Let
\mu
denote the estimated winsorized population mean; then, the estimated population total is given by\hat{N} \mu
with\hat{N} =\sum w_i
, where summation is over all observations in the sample.- Modes of winsorization.
The amount of winsorization can be specified in relative or absolute terms:
-
Relative: By specifying
LB
andUB
, the methods winsorizes theLB
~\cdot 100\%
of the smallest observations and the (1 -UB
)~\cdot 100\%
of the largest observations from the data. -
Absolute: By specifying argument
k
in the functions with the "infix"_k_
in their name, the largestk
observations are winsorized,0<k<n
, wheren
denotes the sample size. E.g.,k = 2
implies that the largest and the second largest observation are winsorized.
-
- Variance estimation.
See survey methods:
Value
The return value depends on info
:
info = FALSE
:estimate of mean or total
[double]
info = TRUE
:a
[list]
with items:-
characteristic
[character]
, -
estimator
[character]
, -
estimate
[double]
, -
variance
(default:NA
), -
robust
[list]
, -
residuals
[numeric vector]
, -
model
[list]
, -
design
(default:NA
), -
[call]
-
See Also
Overview (of all implemented functions)
svymean_winsorized
, svymean_k_winsorized
,
svytotal_winsorized
and svytotal_k_winsorized
Examples
head(workplace)
# Estimated winsorized population mean (5% symmetric winsorization)
weighted_mean_winsorized(workplace$employment, workplace$weight, LB = 0.05)
# Estimated one-sided k winsorized population total (2 observations are
# winsorized at the top of the distribution)
weighted_total_k_winsorized(workplace$employment, workplace$weight, k = 2)
Weighted Median
Description
Weighted population median.
Usage
weighted_median(x, w, na.rm = FALSE)
Arguments
x |
|
w |
|
na.rm |
|
Details
Weighted sample median; see weighted_quantile
for more
information.
Value
Weighted estimate of the population median
See Also
Overview (of all implemented functions)
Examples
head(workplace)
weighted_median(workplace$employment, workplace$weight)
Robust Simple Linear Regression Based on Medians
Description
Robust simple linear regression based on medians: two methods are available:
"slopes"
and "product"
.
Usage
weighted_median_line(x, y = NULL, w, type = "slopes", na.rm = FALSE)
Arguments
x |
|
y |
|
w |
|
type |
|
na.rm |
|
Details
- Overview.
Robust simple linear regression based on medians
- Type.
Two methods/ types are available. Let
m(x,w)
denote the weighted median of variablex
with weightsw
:type = "slopes"
:The slope is computed as
b1 = m\left( \frac{y - m(y,w)}{x - m(x,w)}, w\right).
type = "products"
:The slope is computed as
b1 = \frac{m\big([y - m(y,w)][x - m(x,w)], w\big)} {m\big([x - m(x,w)]^2, w\big)}.
Value
A vector with two components: intercept and slope
See Also
Overview (of all implemented functions)
line
, weighted_line
and
weighted_median_ratio
Examples
x <- c(1, 2, 4, 5)
y <- c(3, 2, 7, 4)
weighted_line(y ~ x, w = rep(1, length(x)))
weighted_median_line(y ~ x, w = rep(1, length(x)))
m <- weighted_median_line(y ~ x, w = rep(1, length(x)), type = "prod")
m
coef(m)
fitted(m)
residuals(m)
# cars data
head(cars)
with(cars, weighted_median_line(dist ~ speed, w = rep(1, length(dist))))
with(cars, weighted_median_line(dist ~ speed, w = rep(1, length(dist)),
type = "prod"))
# weighted
w <- c(rep(1,20), rep(2,20), rep(5, 10))
with(cars, weighted_median_line(dist ~ speed, w = w))
with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod"))
# outlier in y
cars$dist[49] <- 360
with(cars, weighted_median_line(dist ~ speed, w = w))
with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod"))
# outlier in x
data(cars)
cars$speed[49] <- 72
with(cars, weighted_median_line(dist ~ speed, w = w))
with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod"))
Weighted Robust Ratio Estimator Based on Median
Description
A weighted median of the ratios y/x determines the slope of a regression through the origin.
Usage
weighted_median_ratio(x, y = NULL, w, na.rm = FALSE)
Arguments
x |
|
y |
|
w |
|
na.rm |
|
Value
A vector with two components: intercept and slope
See Also
Overview (of all implemented functions)
line
, weighted_line
and
weighted_median_line
Examples
x <- c(1,2,4,5)
y <- c(1,0,5,2)
m <- weighted_median_ratio(y ~ x, w = rep(1, length(y)))
m
coef(m)
fitted(m)
residuals(m)
Weighted Quantile
Description
Weighted population quantile.
Usage
weighted_quantile(x, w, probs, na.rm = FALSE)
Arguments
x |
|
w |
|
probs |
|
na.rm |
|
Details
- Overview.
weighted_quantile
computes the weighted sample quantiles; argumentprobs
allows vector inputs.- Implementation.
The function is based on a weighted version of the quickselect/Find algorithm with the Bentley and McIlroy (1993) 3-way partitioning scheme. For very small arrays, we use insertion sort.
- Compatibility.
For equal weighting, i.e., when all elements in
w
are equal,weighted_quantile
is identical withtype = 2
ofstats::quantile
; see also Hyndman and Fan (1996).
Value
Weighted estimate of the population quantiles
References
Bentley, J. L. and McIlroy, D. M. (1993). Engineering a Sort Function, Software - Practice and Experience 23, 1249–1265. doi:10.1002/spe.4380231105
Hyndman, R.J. and Fan, Y. (1996). Sample Quantiles in Statistical Packages, The American Statistician 50, 361–365. doi:10.1080/00031305.1996.10473566
See Also
Overview (of all implemented functions)
Examples
head(workplace)
# Weighted 25% quantile (1st quartile)
weighted_quantile(workplace$employment, workplace$weight, 0.25)
Weight Functions (for the M- and GM-Estimators)
Description
Weight functions associated with the Huber and the Tukey biweight psi-functions; and the weight function of Simpson et al. (1992) for GM-estimators.
Usage
huberWgt(x, k = 1.345)
tukeyWgt(x, k = 4.685)
simpsonWgt(x, a, b)
Arguments
x |
|
k |
|
a |
|
b |
|
Details
The functions huberWgt
and tukeyWgt
return the weights
associated with the respective psi-function.
The function simpsonWgt
is used (in regression GM-estimators)
to downweight leverage observations (i.e., outliers in the model's design
space). Let d_i
denote the (robust) squared Mahalanobis
distance of the i-th observation. The Simpson et al. (1992) type of
weight is defined as
\min \{1, (b/d_i)^{a/2}\}
, where
a
and b
are tuning constants.
By default,
a = 1
; this choice implies that the weights are computed on the basis of the robust Mahalanobis distances. Alternative:a = Inf
implies a weight of zero for all observations whose (robust) squared Mahalanobis is larger thanb
.The tuning constants
b
is a threshold on the distances.
Value
Numerical vector of weights
References
Simpson, D. G., Ruppert, D. and Carroll, R.J. (1992). On One-Step GM Estimates and Stability of Inferences in Linear Regression. Journal of the American Statistical Association 87, 439–450. doi:10.2307/2290275
See Also
Overview (of all implemented functions)
svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
and svyreg_tukeyGM
Examples
head(flour)
# standardized distance from median (copper content in wholemeal flour)
x <- flour$copper
z <- abs(x - median(x)) / mad(x)
# plot of weight functions vs. distance
plot(z, huberWgt(z, k = 3), ylim = c(0, 1), xlab = "distance",
ylab = "weight")
points(z, tukeyWgt(z, k = 6), pch = 2, col = 2)
points(z, simpsonWgt(z, a = Inf, b = 3), pch = 3, col = 4)
legend("topright", c("huberWgt(k = 3)", "tukeyWgt(k = 6)",
"simpsonWgt(a = Inf, b = 3)"), pch = 1:3, col = c(1, 2, 4))
(Modified) Canadian Workplace and Employee Survey
Description
The workplace
data are from Fuller (2009, pp. 366–367).
Usage
data(workplace)
Format
A data.frame
with a sample of 142 workplaces on the following
variables
ID
identifier variable
[integer]
.weight
sampling weight
[double]
.employment
employment total
[double]
.payroll
payroll total (1000 dollars)
[double]
.strat
stratum identifier
[integer]
.fpc
finite population correction
[integer]
.
Details
The workplace
data represent a sample of workplaces in the
retail sector in a Canadian province. The data are not those
collected by Statistics Canada, but have been generated by Fuller
(2009, Example 3.1.1) to display similar characteristics to the
original 1999 Canadian Workplace and Employee Survey (WES).
Sampling design of the 1999 WES
The WES target population is defined as all workplaces operating in Canada with paid employees. The sampling frame is stratified by industry, geographic region, and size (size is defined using estimated employment). A sample of workplaces has been drawn independently in each stratum using simple random sample without replacement (the stratum-specific sample sizes are determined by Neyman allocation). Several strata containing very large workplaces were sampled exhaustively; see Patak et al (1998). The original sampling weights were adjusted for nonresponse.
Remarks by Fuller (2009, p. 365)
The original weights of WES were about 2200 for the stratum of small workplaces, about 750 for medium-sized, and about 35 for large workspaces.
Source
The data workplace
is from Table 6.3 in Fuller (2009, pp. 366–367).
References
Fuller, W. A. (2009). Sampling Statistics, Hoboken (NJ): John Wiley and Sons. doi:10.1002/9780470523551
Patak, Z., Hidiroglou, M. and Lavallée, P. (1998). The methodology of the Workplace and Employee Survey. Proceedings of the Survey Research Methods Section, American Statistical Association, 83–91.
Examples
head(workplace)
library("survey")
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
# survey design with pre-calibrated weights
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
} else {
# legacy mode
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace)
}