Title: | Fused Extended Two-Way Fixed Effects |
Version: | 1.5.0 |
Maintainer: | Gregory Faletto <gfaletto@gmail.com> |
Depends: | R (≥ 4.1.0) |
Description: | Calculates the fused extended two-way fixed effects (FETWFE) estimator for unbiased and efficient estimation of difference-in-differences in panel data with staggered treatment adoption. This estimator eliminates bias inherent in conventional two-way fixed effects estimators, while also employing a novel bridge regression regularization approach to improve efficiency and yield valid standard errors. Also implements extended TWFE (etwfe) and bridge-penalized ETWFE (betwfe). Provides S3 classes for streamlined workflow and supports flexible tuning (ridge and rank-condition guarantees), automatic covariate centering/scaling, and detailed overall and cohort-specific effect estimates with valid standard errors. Includes simulation and formatting utilities, extensive diagnostic tools, vignettes, and examples. See Faletto (2025) (<doi:10.48550/arXiv.2312.05985>). |
URL: | https://github.com/gregfaletto/fetwfePackage |
BugReports: | https://github.com/gregfaletto/fetwfePackage/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | expm, glmnet, grpreg, Matrix (≥ 1.6-0) |
Suggests: | bacondecomp, knitr, rmarkdown, dplyr, did |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-07-02 06:27:29 UTC; gregfaletto |
Author: | Gregory Faletto |
Repository: | CRAN |
Date/Publication: | 2025-07-02 16:20:12 UTC |
Convert data formatted for att_gt()
to a dataframe suitable for fetwfe()
/ etwfe()
Description
attgtToFetwfeDf()
reshapes and renames a panel dataset that is already
formatted for did::att_gt()
(Callaway and Sant'Anna 2021) so that it can be
passed directly to fetwfe()or
etwfe()from the
fetwfe' package. In
particular, it
creates an absorbing‑state treatment dummy that equals 1 from the first treated period onward* and 0 otherwise,
(optionally) drops units that are already treated in the very first period of the sample (because
fetwfe()
removes them internally), andreturns a tidy dataframe whose column names match the arguments that
fetwfe()
/etwfe()
expect.
Usage
attgtToFetwfeDf(
data,
yname,
tname,
idname,
gname,
covars = character(0),
drop_first_period_treated = TRUE,
out_names = list(time = "time_var", unit = "unit_var", treatment = "treatment",
response = "response")
)
Arguments
data |
A |
yname |
Character scalar. Name of the outcome column. |
tname |
Character scalar. Name of the time variable (numeric or
integer). This becomes |
idname |
Character scalar. Name of the unit identifier. Converted to
character and returned as |
gname |
Character scalar. Name of the group variable holding the first period of treatment. Values must be 0 for never‑treated, or a positive integer representing the first treated period. |
covars |
Character vector of additional covariate column names to carry
through (default |
drop_first_period_treated |
Logical. If |
out_names |
A named list giving the column names to use in the
resulting dataframe. Defaults are |
Value
A data.frame
with columns time
, unit
, treatment
, y
, and any
covariates requested in covars
, ready to be fed to
fetwfe()
/etwfe()
. All required columns are of the correct type:
time
is integer, unit
is character, treatment
is integer 0/1, and
y
is numeric.
References
Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in- Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. doi:10.1016/j.jeconom.2020.12.001, https://arxiv.org/abs/1803.09015.
Examples
## toy example ---------------------------------------------------------------
## Not run:
library(did) # provides the mpdta example dataframe
data(mpdta)
head(mpdta)
tidy_df <- attgtToFetwfeDf(
data = mpdta,
yname = "lemp",
tname = "year",
idname = "countyreal",
gname = "first.treat",
covars = c("lpop"))
head(tidy_df)
## End(Not run)
## Now you can call fetwfe() ------------------------------------------------
# res <- fetwfe(
# pdata = tidy_df,
# time_var = "time_var",
# unit_var = "unit_var",
# treatment = "treatment",
# response = "response",
# covs = c("lpop"))
Bridge-penalized extended two-way fixed effects
Description
Implementation of extended two-way fixed effects with a bridge penalty. Estimates overall ATT as well as CATT (cohort average treatment effects on the treated units).
Usage
betwfe(
pdata,
time_var,
unit_var,
treatment,
response,
covs = c(),
indep_counts = NA,
sig_eps_sq = NA,
sig_eps_c_sq = NA,
lambda.max = NA,
lambda.min = NA,
nlambda = 100,
q = 0.5,
verbose = FALSE,
alpha = 0.05,
add_ridge = FALSE
)
Arguments
pdata |
Dataframe; the panel data set. Each row should represent an observation of a unit at a time. Should contain columns as described below. |
time_var |
Character; the name of a single column containing a variable for the time period. This column is expected to contain integer values (for example, years). Recommended encodings for dates include format YYYY, YYYYMM, or YYYYMMDD, whichever is appropriate for your data. |
unit_var |
Character; the name of a single column containing a variable for each unit. This column is expected to contain character values (i.e. the "name" of each unit). |
treatment |
Character; the name of a single column containing a variable
for the treatment dummy indicator. This column is expected to contain integer
values, and in particular, should equal 0 if the unit was untreated at that
time and 1 otherwise. Treatment should be an absorbing state; that is, if
unit |
response |
Character; the name of a single column containing the response for each unit at each time. The response must be an integer or numeric value. |
covs |
(Optional.) Character; a vector containing the names of the columns for covariates. All of these columns are expected to contain integer, numeric, or factor values, and any categorical values will be automatically encoded as binary indicators. If no covariates are provided, the treatment effect estimation will proceed, but it will only be valid under unconditional versions of the parallel trends and no anticipation assumptions. Default is c(). |
indep_counts |
(Optional.) Integer; a vector. If you have a sufficiently
large number of units, you can optionally randomly split your data set in
half (with |
sig_eps_sq |
(Optional.) Numeric; the variance of the row-level IID noise assumed to apply to each observation. See Section 2 of Faletto (2025) for details. It is best to provide this variance if it is known (for example, if you are using simulated data). If this variance is unknown, this argument can be omitted, and the variance will be estimated using the estimator from Pesaran (2015, Section 26.5.1) with ridge regression. Default is NA. |
sig_eps_c_sq |
(Optional.) Numeric; the variance of the unit-level IID noise (random effects) assumed to apply to each observation. See Section 2 of Faletto (2025) for details. It is best to provide this variance if it is known (for example, if you are using simulated data). If this variance is unknown, this argument can be omitted, and the variance will be estimated using the estimator from Pesaran (2015, Section 26.5.1) with ridge regression. Default is NA. |
lambda.max |
(Optional.) Numeric. A penalty parameter |
lambda.min |
(Optional.) Numeric. The smallest |
nlambda |
(Optional.) Integer. The total number of |
q |
(Optional.) Numeric; determines what |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
Value
A named list with the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
If |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
If |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A dataframe displaying the cohort names,
average treatment effects, standard errors, and |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
lambda.max |
Either the provided |
lambda.max_model_size |
The size of the selected model corresponding
|
lambda.min |
Either the provided |
lambda.min_model_size |
The
size of the selected model corresponding to |
lambda_star |
The value of |
lambda_star_model_size |
The size of the model that was selected. If
this value is close to |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
R |
The final number of treated cohorts that appear in the final data set. |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
Author(s)
Gregory Faletto
References
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985. Pesaran, M. H. . Time Series and Panel Data Econometrics. Number 9780198759980 in OUP Catalogue. Oxford University Press, 2015. URL https://ideas.repec.org/b/oxp/obooks/9780198759980.html.
Examples
set.seed(23451)
library(bacondecomp)
data(divorce)
# sig_eps_sq and sig_eps_c_sq, calculated in a separate run of `fetwfe(),
# are provided to speed up the computation of the example
res <- betwfe(
pdata = divorce[divorce$sex == 2, ],
time_var = "year",
unit_var = "st",
treatment = "changed",
covs = c("murderrate", "lnpersinc", "afdcrolls"),
response = "suiciderate_elast_jag",
sig_eps_sq = 0.1025361,
sig_eps_c_sq = 4.227651e-35,
verbose = TRUE)
# Average treatment effect on the treated units (in percentage point
# units)
100 * res$att_hat
# Conservative 95% confidence interval for ATT (in percentage point units)
low_att <- 100 * (res$att_hat - qnorm(1 - 0.05 / 2) * res$att_se)
high_att <- 100 * (res$att_hat + qnorm(1 - 0.05 / 2) * res$att_se)
c(low_att, high_att)
# Cohort average treatment effects and confidence intervals (in percentage
# point units)
catt_df_pct <- res$catt_df
catt_df_pct[["Estimated TE"]] <- 100 * catt_df_pct[["Estimated TE"]]
catt_df_pct[["SE"]] <- 100 * catt_df_pct[["SE"]]
catt_df_pct[["ConfIntLow"]] <- 100 * catt_df_pct[["ConfIntLow"]]
catt_df_pct[["ConfIntHigh"]] <- 100 * catt_df_pct[["ConfIntHigh"]]
catt_df_pct
Run BETWFE on Simulated Data
Description
This function runs the bridge-penalized extended two-way fixed effects estimator (betwfe()
) on
simulated data. It is simply a wrapper for betwfe()
: it accepts an object of class
"FETWFE_simulated"
(produced by simulateData()
) and unpacks the necessary
components to pass to betwfe()
. So the outputs match betwfe()
, and the needed inputs
match their counterparts in betwfe()
.
Usage
betwfeWithSimulatedData(
simulated_obj,
lambda.max = NA,
lambda.min = NA,
nlambda = 100,
q = 0.5,
verbose = FALSE,
alpha = 0.05,
add_ridge = FALSE
)
Arguments
simulated_obj |
An object of class |
lambda.max |
(Optional.) Numeric. A penalty parameter |
lambda.min |
(Optional.) Numeric. The smallest |
nlambda |
(Optional.) Integer. The total number of |
q |
(Optional.) Numeric; determines what |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
Value
A named list with the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
If |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
If |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A dataframe displaying the cohort names,
average treatment effects, standard errors, and |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
lambda.max |
Either the provided |
lambda.max_model_size |
The size of the selected model corresponding
|
lambda.min |
Either the provided |
lambda.min_model_size |
The
size of the selected model corresponding to |
lambda_star |
The value of |
lambda_star_model_size |
The size of the model that was selected. If
this value is close to |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
R |
The final number of treated cohorts that appear in the final data set. |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
Examples
## Not run:
# Generate coefficients
coefs <- genCoefs(R = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123)
# Simulate data using the coefficients
sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5)
result <- betwfeWithSimulatedData(sim_data)
## End(Not run)
Extended two-way fixed effects
Description
Implementation of extended two-way fixed effects. Estimates overall ATT as well as CATT (cohort average treatment effects on the treated units).
Usage
etwfe(
pdata,
time_var,
unit_var,
treatment,
response,
covs = c(),
indep_counts = NA,
sig_eps_sq = NA,
sig_eps_c_sq = NA,
verbose = FALSE,
alpha = 0.05,
add_ridge = FALSE
)
Arguments
pdata |
Dataframe; the panel data set. Each row should represent an observation of a unit at a time. Should contain columns as described below. |
time_var |
Character; the name of a single column containing a variable for the time period. This column is expected to contain integer values (for example, years). Recommended encodings for dates include format YYYY, YYYYMM, or YYYYMMDD, whichever is appropriate for your data. |
unit_var |
Character; the name of a single column containing a variable for each unit. This column is expected to contain character values (i.e. the "name" of each unit). |
treatment |
Character; the name of a single column containing a variable
for the treatment dummy indicator. This column is expected to contain integer
values, and in particular, should equal 0 if the unit was untreated at that
time and 1 otherwise. Treatment should be an absorbing state; that is, if
unit |
response |
Character; the name of a single column containing the response for each unit at each time. The response must be an integer or numeric value. |
covs |
(Optional.) Character; a vector containing the names of the columns for covariates. All of these columns are expected to contain integer, numeric, or factor values, and any categorical values will be automatically encoded as binary indicators. If no covariates are provided, the treatment effect estimation will proceed, but it will only be valid under unconditional versions of the parallel trends and no anticipation assumptions. Default is c(). |
indep_counts |
(Optional.) Integer; a vector. If you have a sufficiently
large number of units, you can optionally randomly split your data set in
half (with |
sig_eps_sq |
(Optional.) Numeric; the variance of the row-level IID noise assumed to apply to each observation. See Section 2 of Faletto (2025) for details. It is best to provide this variance if it is known (for example, if you are using simulated data). If this variance is unknown, this argument can be omitted, and the variance will be estimated using the estimator from Pesaran (2015, Section 26.5.1) with ridge regression. Default is NA. |
sig_eps_c_sq |
(Optional.) Numeric; the variance of the unit-level IID noise (random effects) assumed to apply to each observation. See Section 2 of Faletto (2025) for details. It is best to provide this variance if it is known (for example, if you are using simulated data). If this variance is unknown, this argument can be omitted, and the variance will be estimated using the estimator from Pesaran (2015, Section 26.5.1) with ridge regression. Default is NA. |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
Value
A named list with the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
A standard error for the ATT. If the Gram matrix is not invertible, this will be NA. |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
A named vector containing the (asymptotically exact) standard errors for the estimated average treatment effects within each cohort. |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A dataframe displaying the cohort names,
average treatment effects, standard errors, and |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
R |
The final number of treated cohorts that appear in the final data set. |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
Author(s)
Gregory Faletto
References
Wooldridge, J. M. (2021). Two-way fixed effects, the two-way mundlak regression, and difference-in-differences estimators. Available at SSRN 3906345. doi:10.2139/ssrn.3906345.
Extended Two-Way Fixed Effects Output Class
Description
S3 class for the output of etwfe()
.
Convert data prepared for etwfe::etwfe()
to the format required by
fetwfe()
and fetwfe::etwfe()
Description
etwfeToFetwfeDf()
reshapes and renames a panel dataset that is already
formatted for etwfe::etwfe()
(McDermott 2024) so that it can be
passed directly to fetwfe()or
etwfe()from the
fetwfe' package. In
particular, it
creates an absorbing‑state treatment dummy that equals 1 from the first treated period onward* and 0 otherwise,
(optionally) drops units that are already treated in the very first period of the sample (because
fetwfe()
removes them internally), andreturns a tidy dataframe whose column names match the arguments that
fetwfe()
/etwfe()
expect.
Usage
etwfeToFetwfeDf(
data,
yvar,
tvar,
idvar,
gvar,
covars = character(0),
drop_first_period_treated = TRUE,
out_names = list(time = "time_var", unit = "unit_var", treatment = "treatment",
response = "response")
)
Arguments
data |
A long-format data.frame that you could already feed to |
yvar |
Character. Column name of the outcome (left-hand side in your |
tvar |
Character. Column name of the time variable that you pass to |
idvar |
Character. Column name of the unit identifier (the variable you would
cluster on, or pass to |
gvar |
Character. Column name of the “first treated” cohort variable passed to |
covars |
Character vector of additional covariate columns to keep (default |
drop_first_period_treated |
Logical. Should units already treated in the very first
sample period be removed? ( |
out_names |
Named list giving the column names that the returned dataframe should have.
The default ( |
Value
A tidy data.frame
with (in this order)
-
time
integer, -
unit
character, -
treatment
integer 0/1 absorbing-state dummy, -
response
numeric outcome, any covariates requested in
covars
. Ready to pass straight tofetwfe()
orfetwfe::etwfe()
.
References
McDermott G (2024). etwfe: Extended Two-Way Fixed Effects. doi:10.32614/CRAN.package.etwfe doi:10.32614/CRAN.package.etwfe, R package version 0.5.0, https://CRAN.R-project.org/package=etwfe.
Examples
## toy example ---------------------------------------------------------------
## Not run:
library(did) # provides the mpdta example dataframe
data(mpdta)
head(mpdta)
tidy_df <- etwfeToFetwfeDf(
data = mpdta,
yvar = "lemp",
tvar = "year",
idvar = "countyreal",
gvar = "first.treat",
covars = c("lpop"))
head(tidy_df)
## End(Not run)
## Now you can call fetwfe() ------------------------------------------------
# res <- fetwfe(
# pdata = tidy_df,
# time_var = "time_var",
# unit_var = "unit_var",
# treatment = "treatment",
# response = "response",
# covs = c("lpop"))
Run ETWFE on Simulated Data
Description
This function runs the extended two-way fixed effects estimator (etwfe()
) on
simulated data. It is simply a wrapper for etwfe()
: it accepts an object of class
"FETWFE_simulated"
(produced by simulateData()
) and unpacks the necessary
components to pass to etwfe()
. So the outputs match etwfe()
, and the needed inputs
match their counterparts in etwfe()
.
Usage
etwfeWithSimulatedData(
simulated_obj,
verbose = FALSE,
alpha = 0.05,
add_ridge = FALSE
)
Arguments
simulated_obj |
An object of class |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
Value
A named list with the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
A standard error for the ATT. If the Gram matrix is not invertible, this will be NA. |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
A named vector containing the (asymptotically exact) standard errors for the estimated average treatment effects within each cohort. |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A dataframe displaying the cohort names,
average treatment effects, standard errors, and |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
R |
The final number of treated cohorts that appear in the final data set. |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
Examples
## Not run:
# Generate coefficients
coefs <- genCoefs(R = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123)
# Simulate data using the coefficients
sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5)
result <- etwfeWithSimulatedData(sim_data)
## End(Not run)
Fused extended two-way fixed effects
Description
Implementation of fused extended two-way fixed effects. Estimates overall ATT as well as CATT (cohort average treatment effects on the treated units).
Usage
fetwfe(
pdata,
time_var,
unit_var,
treatment,
response,
covs = c(),
indep_counts = NA,
sig_eps_sq = NA,
sig_eps_c_sq = NA,
lambda.max = NA,
lambda.min = NA,
nlambda = 100,
q = 0.5,
verbose = FALSE,
alpha = 0.05,
add_ridge = FALSE
)
Arguments
pdata |
Dataframe; the panel data set. Each row should represent an observation of a unit at a time. Should contain columns as described below. |
time_var |
Character; the name of a single column containing a variable for the time period. This column is expected to contain integer values (for example, years). Recommended encodings for dates include format YYYY, YYYYMM, or YYYYMMDD, whichever is appropriate for your data. |
unit_var |
Character; the name of a single column containing a variable for each unit. This column is expected to contain character values (i.e. the "name" of each unit). |
treatment |
Character; the name of a single column containing a variable
for the treatment dummy indicator. This column is expected to contain integer
values, and in particular, should equal 0 if the unit was untreated at that
time and 1 otherwise. Treatment should be an absorbing state; that is, if
unit |
response |
Character; the name of a single column containing the response for each unit at each time. The response must be an integer or numeric value. |
covs |
(Optional.) Character; a vector containing the names of the columns for covariates. All of these columns are expected to contain integer, numeric, or factor values, and any categorical values will be automatically encoded as binary indicators. If no covariates are provided, the treatment effect estimation will proceed, but it will only be valid under unconditional versions of the parallel trends and no anticipation assumptions. Default is c(). |
indep_counts |
(Optional.) Integer; a vector. If you have a sufficiently
large number of units, you can optionally randomly split your data set in
half (with |
sig_eps_sq |
(Optional.) Numeric; the variance of the row-level IID noise assumed to apply to each observation. See Section 2 of Faletto (2025) for details. It is best to provide this variance if it is known (for example, if you are using simulated data). If this variance is unknown, this argument can be omitted, and the variance will be estimated using the estimator from Pesaran (2015, Section 26.5.1) with ridge regression. Default is NA. |
sig_eps_c_sq |
(Optional.) Numeric; the variance of the unit-level IID noise (random effects) assumed to apply to each observation. See Section 2 of Faletto (2025) for details. It is best to provide this variance if it is known (for example, if you are using simulated data). If this variance is unknown, this argument can be omitted, and the variance will be estimated using the estimator from Pesaran (2015, Section 26.5.1) with ridge regression. Default is NA. |
lambda.max |
(Optional.) Numeric. A penalty parameter |
lambda.min |
(Optional.) Numeric. The smallest |
nlambda |
(Optional.) Integer. The total number of |
q |
(Optional.) Numeric; determines what |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
Value
An object of class fetwfe
containing the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
If |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
If |
cohort_probs |
A vector of the estimated probabilities of being in each cohort conditional on being treated, which was used in calculating |
catt_df |
A dataframe displaying the cohort names, average treatment effects, standard errors, and |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either the provided |
lambda.max |
Either the provided |
lambda.max_model_size |
The size of the selected model corresponding to |
lambda.min |
Either the provided |
lambda.min_model_size |
The size of the selected model corresponding to |
lambda_star |
The value of |
lambda_star_model_size |
The size of the model that was selected. If this value is close to |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
R |
The final number of treated cohorts that appear in the final data set. |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
alpha |
The alpha level used for confidence intervals. |
internal |
A list containing internal outputs that are typically not needed for interpretation:
|
The object has methods for print()
, summary()
, and coef()
. By default, print()
and summary()
only show the essential outputs. To see internal details, use print(x, show_internal = TRUE)
or summary(x, show_internal = TRUE)
. The coef()
method returns the vector of estimated coefficients (beta_hat
).
Author(s)
Gregory Faletto
References
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985. Pesaran, M. H. . Time Series and Panel Data Econometrics. Number 9780198759980 in OUP Catalogue. Oxford University Press, 2015. URL https://ideas.repec.org/b/oxp/obooks/9780198759980.html.
Examples
set.seed(23451)
library(bacondecomp)
data(divorce)
# sig_eps_sq and sig_eps_c_sq, calculated in a separate run of `fetwfe(),
# are provided to speed up the computation of the example
res <- fetwfe(
pdata = divorce[divorce$sex == 2, ],
time_var = "year",
unit_var = "st",
treatment = "changed",
covs = c("murderrate", "lnpersinc", "afdcrolls"),
response = "suiciderate_elast_jag",
sig_eps_sq = 0.1025361,
sig_eps_c_sq = 4.227651e-35,
verbose = TRUE)
# Print results with internal details
print(res, max_cohorts = Inf)
Fused Extended Two-Way Fixed Effects Output Class
Description
S3 class for the output of fetwfe()
.
Run FETWFE on Simulated Data
Description
This function runs the fused extended two-way fixed effects estimator (fetwfe()
) on
simulated data. It is simply a wrapper for fetwfe()
: it accepts an object of class
"FETWFE_simulated"
(produced by simulateData()
) and unpacks the necessary
components to pass to fetwfe()
. So the outputs match fetwfe()
, and the needed inputs
match their counterparts in fetwfe()
.
Usage
fetwfeWithSimulatedData(
simulated_obj,
lambda.max = NA,
lambda.min = NA,
nlambda = 100,
q = 0.5,
verbose = FALSE,
alpha = 0.05,
add_ridge = FALSE
)
Arguments
simulated_obj |
An object of class |
lambda.max |
(Optional.) Numeric. A penalty parameter |
lambda.min |
(Optional.) Numeric. The smallest |
nlambda |
(Optional.) Integer. The total number of |
q |
(Optional.) Numeric; determines what |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
Value
An object of class fetwfe
containing the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
If |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
If |
cohort_probs |
A vector of the estimated probabilities of being in each cohort conditional on being treated, which was used in calculating |
catt_df |
A dataframe displaying the cohort names, average treatment effects, standard errors, and |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either the provided |
lambda.max |
Either the provided |
lambda.max_model_size |
The size of the selected model corresponding to |
lambda.min |
Either the provided |
lambda.min_model_size |
The size of the selected model corresponding to |
lambda_star |
The value of |
lambda_star_model_size |
The size of the model that was selected. If this value is close to |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
R |
The final number of treated cohorts that appear in the final data set. |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
alpha |
The alpha level used for confidence intervals. |
internal |
A list containing internal outputs that are typically not needed for interpretation:
|
The object has methods for print()
, summary()
, and coef()
. By default, print()
and summary()
only show the essential outputs. To see internal details, use print(x, show_internal = TRUE)
or summary(x, show_internal = TRUE)
. The coef()
method returns the vector of estimated coefficients (beta_hat
).
Examples
## Not run:
# Generate coefficients
coefs <- genCoefs(R = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123)
# Simulate data using the coefficients
sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5)
result <- fetwfeWithSimulatedData(sim_data)
## End(Not run)
Generate Coefficient Vector for Data Generation
Description
This function generates a coefficient vector beta
for simulation studies of the fused
extended two-way fixed effects estimator. It returns an S3 object of class
"FETWFE_coefs"
containing beta
along with simulation parameters R
,
T
, and d
. See the simulation studies section of Faletto (2025) for details.
Usage
genCoefs(R, T, d, density, eff_size, seed = NULL)
Arguments
R |
Integer. The number of treated cohorts (treatment is assumed to start in periods 2 to
|
T |
Integer. The total number of time periods. |
d |
Integer. The number of time-invariant covariates. If |
density |
Numeric in (0,1). The probability that any given entry in the initial sparse
coefficient vector |
eff_size |
Numeric. The magnitude used to scale nonzero entries in |
seed |
(Optional) Integer. Seed for reproducibility. |
Details
The length of beta
is given by
p = R + (T - 1) + d + dR + d(T - 1) + \mathit{num\_treats} + (\mathit{num\_treats} \times d)
, where the number of treatment parameters is defined as
\mathit{num\_treats} = T \times R - \frac{R(R+1)}{2}
.
The function operates in two steps:
It first creates a sparse vector
theta
of lengthp
, with nonzero entries occurring with probabilitydensity
. Nonzero entries are set toeff_size
or-eff_size
(with a 60\The full coefficient vector
beta
is then computed by applying an inverse fusion transform totheta
using internal routines (e.g.,genBackwardsInvFusionTransformMat()
andgenInvTwoWayFusionTransformMat()
).
Value
An object of class "FETWFE_coefs"
, which is a list containing:
- beta
A numeric vector representing the full coefficient vector after the inverse fusion transform.
- theta
A numeric vector representing the coefficient vector in the transformed feature space.
theta
is a sparse vector, which aligns with an assumption that deviations from the restrictions encoded in the FETWFE model are sparse.beta
is derived fromtheta
.- R
The provided number of treated cohorts.
- T
The provided number of time periods.
- d
The provided number of covariates.
- seed
The provided seed.
References
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
Examples
## Not run:
# Generate coefficients
coefs <- genCoefs(R = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123)
# Simulate data using the coefficients
sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5)
## End(Not run)
Generate Coefficient Vector for Data Generation
Description
This function generates a coefficient vector beta
along with a sparse auxiliary vector
theta
for simulation studies of the fused extended two-way fixed effects estimator. The
returned beta
is formatted to align with the design matrix created by
genRandomData()
, and is a valid input for the beta
argument of that function. The
vector theta
is sparse, with nonzero entries occurring with probability density
and
scaled by eff_size
. See the simulation studies section of Faletto (2025) for details.
Usage
genCoefsCore(R, T, d, density, eff_size, seed = NULL)
Arguments
R |
Integer. The number of treated cohorts (treatment is assumed to start in periods 2 to
|
T |
Integer. The total number of time periods. |
d |
Integer. The number of time-invariant covariates. If |
density |
Numeric in (0,1). The probability that any given entry in the initial sparse
coefficient vector |
eff_size |
Numeric. The magnitude used to scale nonzero entries in |
seed |
(Optional) Integer. Seed for reproducibility. |
Details
The length of beta
is given by
p = R + (T - 1) + d + dR + d(T - 1) + \mathit{num\_treats} + (\mathit{num\_treats} \times d)
, where the number of treatment parameters is defined as
\mathit{num\_treats} = T \times R - \frac{R(R+1)}{2}
.
The function operates in two steps:
It first creates a sparse vector
theta
of lengthp
, with nonzero entries occurring with probabilitydensity
. Nonzero entries are set toeff_size
or-eff_size
(with a 60\The full coefficient vector
beta
is then computed by applying an inverse fusion transform totheta
using internal routines (e.g.,genBackwardsInvFusionTransformMat()
andgenInvTwoWayFusionTransformMat()
).
Value
A list with two elements:
beta
A numeric vector representing the full coefficient vector after the inverse fusion transform.
- theta
A numeric vector representing the coefficient vector in the transformed feature space.
theta
is a sparse vector, which aligns with an assumption that deviations from the restrictions encoded in the FETWFE model are sparse.beta
is derived fromtheta
.
References
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
Examples
## Not run:
# Set parameters for the coefficient generation
R <- 3 # Number of treated cohorts
T <- 6 # Total number of time periods
d <- 2 # Number of covariates
density <- 0.1 # Probability that an entry in the initial vector is nonzero
eff_size <- 1.5 # Scaling factor for nonzero coefficients
seed <- 789 # Seed for reproducibility
# Generate coefficients using genCoefsCore()
coefs_core <- genCoefsCore(R = R, T = T, d = d, density = density,
eff_size = eff_size, seed = seed)
beta <- coefs_core$beta
theta <- coefs_core$theta
# For diagnostic purposes, compute the expected length of beta.
# The length p is defined internally as:
# p = R + (T - 1) + d + d*R + d*(T - 1) + num_treats + num_treats*d,
# where num_treats = T * R - (R*(R+1))/2.
num_treats <- T * R - (R * (R + 1)) / 2
p_expected <- R + (T - 1) + d + d * R + d * (T - 1) + num_treats + num_treats * d
cat("Length of beta:", length(beta), "\nExpected length:", p_expected, "\n")
## End(Not run)
Compute True Treatment Effects
Description
This function extracts the true treatment effects from a full coefficient vector
as generated by genCoefs()
. It calculates the overall average treatment effect on the
treated (ATT) as the equal-weighted average of the cohort-specific treatment effects, and also
returns the individual treatment effects for each treated cohort.
Usage
getTes(coefs_obj)
Arguments
coefs_obj |
An object of class |
Details
The function internally uses auxiliary routines getNumTreats()
, getP()
,
getFirstInds()
, getTreatInds()
, and getActualCohortTes()
to determine the
correct indices of treatment effect coefficients in beta
. The overall treatment effect
is computed as the simple average of these cohort-specific effects.
Value
A named list with two elements:
- att_true
A numeric value representing the overall average treatment effect on the treated. It is computed as the (equal-weighted) mean of the cohort-specific treatment effects.
- actual_cohort_tes
A numeric vector containing the true cohort-specific treatment effects, calculated by averaging the coefficients corresponding to the treatment dummies for each cohort.
Examples
## Not run:
# Generate coefficients
coefs <- genCoefs(R = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123)
# Compute the true treatment effects:
te_results <- getTes(coefs)
# Overall average treatment effect on the treated:
print(te_results$att_true)
# Cohort-specific treatment effects:
print(te_results$actual_cohort_tes)
## End(Not run)
Generate Random Panel Data for FETWFE Simulations
Description
Generates a random panel data set for simulation studies of the fused extended two-way fixed
effects (FETWFE) estimator by taking an object of class "FETWFE_coefs"
(produced by
genCoefs()
) and using it to simulate data. The function creates a balanced panel
with N
units over T
time periods, assigns treatment status across R
treated cohorts (with equal marginal probabilities for treatment and non-treatment), and
constructs a design matrix along with the corresponding outcome. The covariates are
generated according to the specified distribution
: by default, covariates are drawn
from a normal distribution; if distribution = "uniform"
, they are drawn uniformly
from [-\sqrt{3}, \sqrt{3}]
. When d = 0
(i.e. no covariates), no
covariate-related columns or interactions are generated. See the simulation studies section of
Faletto (2025) for details.
Usage
simulateData(
coefs_obj,
N,
sig_eps_sq,
sig_eps_c_sq,
distribution = "gaussian",
guarantee_rank_condition = FALSE
)
Arguments
coefs_obj |
An object of class |
N |
Integer. Number of units in the panel. |
sig_eps_sq |
Numeric. Variance of the idiosyncratic (observation-level) noise. |
sig_eps_c_sq |
Numeric. Variance of the unit-level random effects. |
distribution |
Character. Distribution to generate covariates.
Defaults to |
guarantee_rank_condition |
(Optional). Logical. If TRUE, the returned
data set is guaranteed to have at least |
Details
This function extracts simulation parameters from the FETWFE_coefs
object and passes them,
along with additional simulation parameters, to the internal function simulateDataCore()
.
It validates that all necessary components are returned and assigns the S3 class
"FETWFE_simulated"
to the output.
The argument distribution
controls the generation of covariates. For
"gaussian"
, covariates are drawn from rnorm
; for "uniform"
,
they are drawn from runif
on the interval [-\sqrt{3}, \sqrt{3}]
(which ensures that
the covariates have unit variance regardless of which distribution is chosen).
When d = 0
(i.e. no covariates), the function omits any covariate-related columns
and their interactions.
Value
An object of class "FETWFE_simulated"
, which is a list containing:
- pdata
A dataframe containing generated data that can be passed to
fetwfe()
.- X
The design matrix
X
, withp
columns with interactions.- y
A numeric vector of length
N \times T
containing the generated responses.- covs
A character vector containing the names of the generated features (if
d > 0
), or simply an empty vector (ifd = 0
)- time_var
The name of the time variable in pdata
- unit_var
The name of the unit variable in pdata
- treatment
The name of the treatment variable in pdata
- response
The name of the response variable in pdata
- coefs
The coefficient vector
\beta
used for data generation.- first_inds
A vector of indices indicating the first treatment effect for each treated cohort.
- N_UNTREATED
The number of never-treated units.
- assignments
A vector of counts (of length
R+1
) indicating how many units fall into the never-treated group and each of theR
treated cohorts.- indep_counts
Independent cohort assignments (for auxiliary purposes).
- p
The number of columns in the design matrix
X
.- N
Number of units.
- T
Number of time periods.
- R
Number of treated cohorts.
- d
Number of covariates.
- sig_eps_sq
The idiosyncratic noise variance.
- sig_eps_c_sq
The unit-level noise variance.
References
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
Examples
## Not run:
# Generate coefficients
coefs <- genCoefs(R = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123)
# Simulate data using the coefficients
sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5)
## End(Not run)
Generate Random Panel Data for FETWFE Simulations
Description
Generates a random panel data set for simulation studies of the fused extended two-way fixed
effects (FETWFE) estimator. The function creates a balanced panel with N
units over T
time periods, assigns treatment status across R
treated cohorts (with equal marginal
probabilities for treatment and non-treatment), and constructs a design matrix along with the
corresponding outcome. When gen_ints = TRUE
the full design matrix is returned (including
interactions between covariates and fixed effects and treatment indicators). When
gen_ints = FALSE
the design matrix is generated in a simpler format (with no interactions)
as expected by fetwfe()
. Moreover, the covariates are generated according to the
specified distribution
: by default, covariates are drawn from a normal distribution;
if distribution = "uniform"
, they are drawn uniformly from [-\sqrt{3}, \sqrt{3}]
.
When d = 0
(i.e. no covariates), no covariate-related columns or interactions are
generated.
See the simulation studies section of Faletto (2025) for details.
Usage
simulateDataCore(
N,
T,
R,
d,
sig_eps_sq,
sig_eps_c_sq,
beta,
seed = NULL,
gen_ints = FALSE,
distribution = "gaussian",
guarantee_rank_condition = FALSE
)
Arguments
N |
Integer. Number of units in the panel. |
T |
Integer. Number of time periods. |
R |
Integer. Number of treated cohorts (with treatment starting in periods 2 to T). |
d |
Integer. Number of time-invariant covariates. |
sig_eps_sq |
Numeric. Variance of the idiosyncratic (observation-level) noise. |
sig_eps_c_sq |
Numeric. Variance of the unit-level random effects. |
beta |
Numeric vector. Coefficient vector for data generation. Its required length depends
on the value of
|
seed |
(Optional) Integer. Seed for reproducibility. |
gen_ints |
Logical. If |
distribution |
Character. Distribution to generate covariates.
Defaults to |
guarantee_rank_condition |
(Optional). Logical. If TRUE, the returned
data set is guaranteed to have at least |
Details
When gen_ints = TRUE
, the function constructs the design matrix by first generating
base fixed effects and a long-format covariate matrix (via generateBaseEffects()
), then
appending interactions between the covariates and cohort/time fixed effects (via
generateFEInts()
) and finally treatment indicator columns and treatment-covariate
interactions (via genTreatVarsSim()
and genTreatInts()
). When
gen_ints = FALSE
, the design matrix consists only of the base fixed effects, covariates,
and treatment indicators.
The argument distribution
controls the generation of covariates. For
"gaussian"
, covariates are drawn from rnorm
; for "uniform"
,
they are drawn from runif
on the interval [-\sqrt{3}, \sqrt{3}]
.
When d = 0
(i.e. no covariates), the function omits any covariate-related columns
and their interactions.
Value
An object of class "FETWFE_simulated"
, which is a list containing:
- pdata
A dataframe containing generated data that can be passed to
fetwfe()
.- X
The design matrix. When
gen_ints = TRUE
,X
hasp
columns with interactions; whengen_ints = FALSE
,X
has no interactions.- y
A numeric vector of length
N \times T
containing the generated responses.- covs
A character vector containing the names of the generated features (if
d > 0
), or simply an empty vector (ifd = 0
)- time_var
The name of the time variable in pdata
- unit_var
The name of the unit variable in pdata
- treatment
The name of the treatment variable in pdata
- response
The name of the response variable in pdata
- coefs
The coefficient vector
\beta
used for data generation.- first_inds
A vector of indices indicating the first treatment effect for each treated cohort.
- N_UNTREATED
The number of never-treated units.
- assignments
A vector of counts (of length
R+1
) indicating how many units fall into the never-treated group and each of theR
treated cohorts.- indep_counts
Independent cohort assignments (for auxiliary purposes).
- p
The number of columns in the design matrix
X
.- N
Number of units.
- T
Number of time periods.
- R
Number of treated cohorts.
- d
Number of covariates.
- sig_eps_sq
The idiosyncratic noise variance.
- sig_eps_c_sq
The unit-level noise variance.
References
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
Examples
## Not run:
# Set simulation parameters
N <- 100 # Number of units in the panel
T <- 5 # Number of time periods
R <- 3 # Number of treated cohorts
d <- 2 # Number of time-invariant covariates
sig_eps_sq <- 1 # Variance of observation-level noise
sig_eps_c_sq <- 0.5 # Variance of unit-level random effects
# Generate coefficient vector using genCoefsCore()
# (Here, density controls sparsity and eff_size scales nonzero entries)
coefs_core <- genCoefsCore(R = R, T = T, d = d, density = 0.2, eff_size = 2, seed = 123)
# Now simulate the data. Setting gen_ints = TRUE generates the full design
matrix with interactions.
sim_data <- simulateDataCore(
N = N,
T = T,
R = R,
d = d,
sig_eps_sq = sig_eps_sq,
sig_eps_c_sq = sig_eps_c_sq,
beta = coefs_core$beta,
seed = 456,
gen_ints = TRUE,
distribution = "gaussian"
)
# Examine the returned list:
str(sim_data)
## End(Not run)
Two-way fixed effects with covariates and separate treatment effects for each cohort
Description
WARNING: This function should NOT be used for estimation. It is a biased estimator of treatment effects. Implementation of two-way fixed effects with covariates and separate treatment effects for each cohort. Estimates overall ATT as well as CATT (cohort average treatment effects on the treated units). It is implemented only for the sake of the simulation studies in Faletto (2025). This estimator is only unbiased under the assumptions that treatment effects are homogeneous across covariates and are identical within cohorts across all times since treatment.
Usage
twfeCovs(
pdata,
time_var,
unit_var,
treatment,
response,
covs = c(),
indep_counts = NA,
sig_eps_sq = NA,
sig_eps_c_sq = NA,
verbose = FALSE,
alpha = 0.05,
add_ridge = FALSE
)
Arguments
pdata |
Dataframe; the panel data set. Each row should represent an observation of a unit at a time. Should contain columns as described below. |
time_var |
Character; the name of a single column containing a variable for the time period. This column is expected to contain integer values (for example, years). Recommended encodings for dates include format YYYY, YYYYMM, or YYYYMMDD, whichever is appropriate for your data. |
unit_var |
Character; the name of a single column containing a variable for each unit. This column is expected to contain character values (i.e. the "name" of each unit). |
treatment |
Character; the name of a single column containing a variable
for the treatment dummy indicator. This column is expected to contain integer
values, and in particular, should equal 0 if the unit was untreated at that
time and 1 otherwise. Treatment should be an absorbing state; that is, if
unit |
response |
Character; the name of a single column containing the response for each unit at each time. The response must be an integer or numeric value. |
covs |
(Optional.) Character; a vector containing the names of the columns for covariates. All of these columns are expected to contain integer, numeric, or factor values, and any categorical values will be automatically encoded as binary indicators. If no covariates are provided, the treatment effect estimation will proceed, but it will only be valid under unconditional versions of the parallel trends and no anticipation assumptions. Default is c(). |
indep_counts |
(Optional.) Integer; a vector. If you have a sufficiently
large number of units, you can optionally randomly split your data set in
half (with |
sig_eps_sq |
(Optional.) Numeric; the variance of the row-level IID noise assumed to apply to each observation. See Section 2 of Faletto (2025) for details. It is best to provide this variance if it is known (for example, if you are using simulated data). If this variance is unknown, this argument can be omitted, and the variance will be estimated using the estimator from Pesaran (2015, Section 26.5.1) with ridge regression. Default is NA. |
sig_eps_c_sq |
(Optional.) Numeric; the variance of the unit-level IID noise (random effects) assumed to apply to each observation. See Section 2 of Faletto (2025) for details. It is best to provide this variance if it is known (for example, if you are using simulated data). If this variance is unknown, this argument can be omitted, and the variance will be estimated using the estimator from Pesaran (2015, Section 26.5.1) with ridge regression. Default is NA. |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
Value
A named list with the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
A standard error for the ATT. If the Gram matrix is not invertible, this will be NA. |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
A named vector containing the (asymptotically exact) standard errors for the estimated average treatment effects within each cohort. |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A dataframe displaying the cohort names,
average treatment effects, standard errors, and |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
R |
The final number of treated cohorts that appear in the final data set. |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
Author(s)
Gregory Faletto
References
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
Run twfeCovs on Simulated Data
Description
This function runs the bridge-penalized extended two-way fixed effects estimator (twfeCovs()
) on
simulated data. It is simply a wrapper for twfeCovs()
: it accepts an object of class
"FETWFE_simulated"
(produced by simulateData()
) and unpacks the necessary
components to pass to twfeCovs()
. So the outputs match twfeCovs()
, and the needed inputs
match their counterparts in twfeCovs()
.
Usage
twfeCovsWithSimulatedData(
simulated_obj,
verbose = FALSE,
alpha = 0.05,
add_ridge = FALSE
)
Arguments
simulated_obj |
An object of class |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
Value
A named list with the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
If |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
If |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A dataframe displaying the cohort names,
average treatment effects, standard errors, and |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
R |
The final number of treated cohorts that appear in the final data set. |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
Examples
## Not run:
# Generate coefficients
coefs <- genCoefs(R = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123)
# Simulate data using the coefficients
sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5)
result <- twfeCovsWithSimulatedData(sim_data)
## End(Not run)