Help for package simITS

Type:

Package

Title:

Analysis via Simulation of Interrupted Time Series (ITS) Data

Version:

0.1.1

Description:

Uses simulation to create prediction intervals for post-policy outcomes in interrupted time series (ITS) designs, following Miratrix (2020) <doi:10.48550/arXiv.2002.05746>. This package provides methods for fitting ITS models with lagged outcomes and variables to account for temporal dependencies. It then conducts inference via simulation, simulating a set of plausible counterfactual post-policy series to compare to the observed post-policy series. This package also provides methods to visualize such data, and also to incorporate seasonality models and smoothing and aggregation/summarization. This work partially funded by Arnold Ventures in collaboration with MDRC.

License:

GPL-3

Depends:

dplyr, R (≥ 2.10), rlang

Suggests:

arm, ggplot2, knitr, plyr, purrr, rmarkdown, stats, testthat (≥ 2.1.0), tidyr

VignetteBuilder:

knitr

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.1.0

NeedsCompilation:

Packaged:

2020-05-20 13:24:23 UTC; lmiratrix

Author:

Luke Miratrix [aut, cre], Brit Henderson [ctb], Chloe Anderson [ctb], Arnold Ventures [fnd], MDRC [fnd]

Maintainer:

Luke Miratrix <lmiratrix@g.harvard.edu>

Repository:

CRAN

Date/Publication:

2020-05-20 13:50:02 UTC

Augment dataframe with lagged covariates

Description

Take outcome and a list of covariates and add new columns with lagged versions. Assumes rows of dataframe are in time ascending order. Lagged outcome canonically called 'lag.outcome'. Covariates 'lag.XXX'.

Usage

add_lagged_covariates(dat, outcomename, covariates = NULL)

Arguments

dat

The dataframe

outcomename

The outcome of interest (string)

covariates

The covariates to lag along with the outcome. This can be either of two things. First, it can be a list of string names. Covariates can also be a function with a "lags" attribute with the listed covariates (as returned by, e.g., make_fit_season_model) (which is a list of string names). NULL if no covariates other than outcome should be lagged.

Value

Augmented dataframe with lagged covariates as new columns. Will clobber old columns if the names (of form "lag.XXXX") conflict.

Examples

data( "newjersey" )
newjersey = add_lagged_covariates(newjersey, "n.warrant", c("sin.m","cos.m" ) )
head( newjersey[ c( "n.warrant", "sin.m", "lag.outcome", "lag.sin.m" ) ] )

Adjust an outcome time series based on the group weights.

Description

Reweight the components of a series to match target weights for several categories. This is a good preprocessing step to adjust for time-varying covariates such as changing mix of case types.

Usage

adjust_data(
  dat,
  outcomename,
  groupname,
  Nname,
  pi_star,
  is_count = FALSE,
  include_aggregate = FALSE,
  covariates = NULL
)

Arguments

dat

Dataframe of data. Requires an N column of total cases represented in each row.

outcomename

Name of column that has the outcome to calculated adjusted values for.

groupname

Name of categorical covariate that determines the groups.

Nname

Name of column in dat that contains total cases (this is the name of the variable used to generate the weights in pi_star).

pi_star

The target weights. Each month will have its groups re-weighted to match these target weights.

is_count

Indicator of whether outcome is count data or a continuous measure (this impacts how aggregation is done).

include_aggregate

Include aggregated (unadjusted) totals in the output as well.

covariates

Covariates to be passed to aggregation (list of string variable names).

Value

Dataframe of adjusted data.

Examples

data( "meck_subgroup" )
head( meck_subgroup )
pis = calculate_group_weights( "category", Nname="n.cases", 
                               meck_subgroup, t_min=0, t_max= max( meck_subgroup$month ) )
pis

agg = aggregate_data( meck_subgroup, 
                      outcomename="pbail", groupname="category", Nname="n.cases", 
                      is_count=FALSE,
                      rich = TRUE, covariates = NULL )
head( agg )  


adjdat = adjust_data( meck_subgroup, "pbail", "category", "n.cases", pis, include_aggregate=TRUE )
head( adjdat )

Aggregate grouped data

Description

This will take a dataframe with each row being the outcomes, etc., for a given group for a given month and aggregate those groups for each month.

Usage

aggregate_data(
  dat,
  outcomename,
  groupname,
  Nname,
  is_count = FALSE,
  rich = TRUE,
  covariates = NULL
)

Arguments

dat

Dataframe with one row for each time point and group that we are going to post stratify on. This dataframe should also have an column with passed name "Nname" indicating the number of cases that make up each given row. It should have a 'month' column for the time.

outcomename

String name of the outcome variable in dat.

groupname

Name of the column that has the grouping categorical variable

Nname

Name of variable holding the counts (weight) in each group.

is_count

If TRUE the data are counts, and should be aggregated by sum rather than by mean.

rich

If TRUE, add a bunch of extra columns with proportions of the month that are each group and so forth.

covariates

group-invariant covariates to preserve in the augmented rich dataframe. These are not used in this method for any calculations. Pass as list of column names of dat

Value

Dataframe of aggregated data, one row per month. If rich=TRUE many extra columns with further information.

Examples

data( "meck_subgroup" )
head( meck_subgroup )
pis = calculate_group_weights( "category", Nname="n.cases", 
                               meck_subgroup, t_min=0, t_max= max( meck_subgroup$month ) )
pis

agg = aggregate_data( meck_subgroup, 
                      outcomename="pbail", groupname="category", Nname="n.cases", 
                      is_count=FALSE,
                      rich = TRUE, covariates = NULL )
head( agg )  


adjdat = adjust_data( meck_subgroup, "pbail", "category", "n.cases", pis, include_aggregate=TRUE )
head( adjdat )

Test a passed test statistic on the simulated data

Description

This method is used to look at summary statistics such as average impact post-policy, and see how the predictive distribution compares to the observed.

Usage

aggregate_simulation_results(
  orig.data,
  predictions,
  outcomename,
  summarizer = calculate_average_outcome,
  ...
)

Arguments

orig.data

The raw data (dataframe)

predictions

The results from process_outcome_model.

outcomename

Outcome to use.

summarizer

A function to calculate some summary quantity, Default: calculate_average_outcome

...

Extra arguments passed to the summarizer function.

Value

List of length two, with first item being the observed value of the test statistic and the second being a numeric vector representing the emperical reference distribution.

Examples

predictions = process_outcome_model( "pbail", mecklenberg,
                                    t0=0, R = 5,
                                    summarize = FALSE, smooth=FALSE )
sstat = aggregate_simulation_results( orig.data = mecklenberg, outcomename = "pbail",
                                     predictions = predictions, months = 1:18 )
sstat$t
sstat$t.obs

Summary function for summarize.simulation.results

Description

Given a set of simulation runs, estimate average impact over range of months.

Usage

calculate_average_outcome(res, outcomename, months = 1:54, ...)

Arguments

res

Dataframe of a single series (simulated or otherwise)

outcomename

Name of outcome in res

months

Which months to average over, Default: 1:18

...

Other parameters (ignored)

Value

Single number (in this case mean of given months)

Examples

data( mecklenberg )
calculate_average_outcome( mecklenberg, "pbail", months=1:24 )
calculate_average_outcome( mecklenberg, "pbail", months = 1:18 )

Calculate proportion of subgroups across time

Description

Calculate overall proportion of cases in each group that lie within a given interval of time defined by t_min and t_max.

Usage

calculate_group_weights(
  groupname,
  dat,
  t_min,
  t_max = max(dat$month),
  Nname = "N"
)

Arguments

groupname

Name of the column that has the grouping categorical variable

dat

t_min

The start month to aggregate cases over.

t_max

The final month (default is last month).

Nname

Name of variable holding the counts (weight) in each group.

Value

Dataframe of each group along with overall average group weight in the specified timespan.

Examples

data( "meck_subgroup" )
head( meck_subgroup )
pis = calculate_group_weights( "category", Nname="n.cases", 
                               meck_subgroup, t_min=0, t_max= max( meck_subgroup$month ) )
pis

agg = aggregate_data( meck_subgroup, 
                      outcomename="pbail", groupname="category", Nname="n.cases", 
                      is_count=FALSE,
                      rich = TRUE, covariates = NULL )
head( agg )  


adjdat = adjust_data( meck_subgroup, "pbail", "category", "n.cases", pis, include_aggregate=TRUE )
head( adjdat )

Extrapolate pre-policy data to post-policy era

Description

This function takes a fitted model and uses it to make the post-policy predictions by simulating data.

Usage

extrapolate_model(
  M0,
  outcomename,
  dat,
  t0,
  R = 400,
  summarize = FALSE,
  smooth = FALSE,
  smoother = smooth_series,
  full_output = FALSE,
  fix_parameters = FALSE,
  ...
)

Arguments

M0

The fit model

outcomename

Outcome of interest (name of column)

dat

Dataframe with data being analyzed.

t0

Last pre-policy timepoint

R

Number of replications

summarize

Boolean, TRUE means collapse all simulated trajectories into single aggregate. FALSE means return all paths.

smooth

Boolean. TRUE means fit a smoother to the trajectories and look at distribution of smoothed trajectories. FALSE means look at raw data treajectories.

smoother

Function to do smoothing, if smoothing set to TRUE. Default is smooth_series()

full_output

TRUE means smoother returns residuals as well as smoothed series.

fix_parameters

Keep the parameters in the model M0 as fixed; do not add parameter uncertainty.

...

Extra arguments to be passed to smoother (e.g, bandwidth).

Value

Dataframe with columns corresponding to the simulations. If summarize=TRUE, one row per month in original data. If FALSE, all the details of all the runs are returned.

Examples

data("mecklenberg" )
mecklenberg = add_lagged_covariates( mecklenberg, "pbail"  )
mecklenberg.pre = dplyr::filter( mecklenberg, month <= 0 )
M0 = fit_model_default( mecklenberg.pre, "pbail" )
res = extrapolate_model( M0, "pbail", mecklenberg, 0, 1,
                         smooth=TRUE)
tail( res )

Default ITS model

Description

This fits the model 'outcomename ~ lag.outcome + month', with no covariates.

Usage

fit_model_default(dat, outcomename, lagless = FALSE, ...)

Arguments

dat

Dataframe of pre-policy data to fit model to. Needs a "month" column

outcomename

Outcome of interest

lagless

Boolean, include the lagged outcome, or not?

...

Extra arguments passed to the lm() call.

Value

A fit model (a 'lm' object from a 'lm()' call) from fitting a simple regression of outcome onto month and lagged month.

Examples

mecklenberg = add_lagged_covariates(mecklenberg, "pbail")
meck.pre = filter( mecklenberg, month <= 0 )
mod = fit_model_default( meck.pre, "pbail", lagless = TRUE )
summary( mod )
mod = fit_model_default( meck.pre, "pbail", lagless = FALSE )
summary( mod )

Make fake data for testing purposes.

Description

Defaults have heavy seasonality, and an extra bump in impact kicks in at 12 months post-policy.

Usage

generate_fake_data(
  t_min = -40,
  t_max = 9,
  t0 = 0,
  rho = 0.5,
  sd.omega = 1,
  coef_line = c(20, 0.05),
  coef_q = c(1, 0, -1, 0),
  coef_temp = 0.1,
  coef_sin = c(0, 0),
  coef_tx = c(0, 0.25, 5)
)

Arguments

t_min

Index of first month

t_max

Index of last month

t0

Last pre-policy time point

rho

Autocorrelation

sd.omega

Standard deviation of the true residual

coef_line

Intercept and slope of the main trendline (list of 2).

coef_q

Coefficients for the four quarters (list of 4).

coef_temp

Coefficient for temperature.

coef_sin

Coefficents for sin and cos features (list of 2)

coef_tx

Coefficient for treatment post-policy (list of 3, initial offset, initial slope, additional slope past 12 months). Treatment is a piecewise linear function.

Value

A data.frame having month , temperature , sin.m , cos.m , Q1, Q2 , Q3, Q4, post , Ystr0 , Ystr , Y

Examples

fdat = generate_fake_data(-100,100, rho = 0.95, coef_q=c(0,0,0,0), coef_temp = 0)
plot( fdat$month, fdat$Y, type="l" )
fdat2 = generate_fake_data(-100, 100, rho = 0.0, coef_q=c(0,0,0,0), coef_temp = 0)
plot( fdat$month, fdat2$Y, type="l" )

A fake DGP with time varying categorical covariate for illustrating the code.

Description

This code makes synthetic grouped data that can be used to illustrate benefits of post stratification.

Usage

generate_fake_grouped_data(
  t_min,
  t0,
  t_max,
  method = c("complex", "linear", "jersey")
)

Arguments

t_min

Index of first month

t0

last pre-policy timepoint

t_max

Index of last month

method

Type of post-stratification structure to generate (three designs of 'complex', 'linear' and 'jersey' were originally concieved of when designing simulation studies with different types of structure).

Value

Dataframe of fake data, with one row per group per time period.

Examples

fdat = generate_fake_grouped_data(t_min=-5,t_max=10, t0 = 0)
table( fdat$month )
table( fdat$type )

Make envelope style graph with associated smoothed trendlines

Description

This method builds a ggplot object with the trendline and prediction envelope. It can be customized after the fact by adding more ggplot layers via normal ggplot "+" syntax.

Usage

make_envelope_graph(envelope, t0, ylab = "Y", xlab = "month")

Arguments

envelope

The result of a 'process_outcome_model()' call, i.e. dataframe with columns of original data, imputed data and, potentially, smoothed data.

t0

Last pre-policy timepoint. Will draw vertical line here.

ylab

Y label of plot

xlab

X label of plot

Value

Returns (does not yet display) a ggplot plot object containing the time series along with extrapolation and prediction envelope. This plot can be augmented and changed via standard ggplot commands.

Examples

data( "mecklenberg" )
t0 = 0
envelope = process_outcome_model( "pbail", mecklenberg, 
                                  t0=t0, R = 10, 
                                  summarize = TRUE, smooth=FALSE )
make_envelope_graph(envelope, t0=t0, ylab = "Proportion given bail") +
  ggplot2::labs( title="Sample ITS plot")
data( "mecklenberg" )
t0 = 0
envelope = process_outcome_model( "pbail", mecklenberg, 
                                  t0=t0, R = 10, 
                                  summarize = TRUE, smooth=FALSE )
make_envelope_graph(envelope, t0=t0, ylab = "Proportion given bail") +
  ggplot2::labs( title="Sample ITS plot")

Make a fit_model that takes a seasonality component

Description

This method returns a function that will fit a model both with and without lagged outcomes.

Usage

make_fit_season_model(formula, no_lag = NULL)

Arguments

formula

Formula specifying seasonality. No outcome or month needed.

no_lag

Formula specifying additional variables to not lag (usually used due to colinearity of lagged outcomes, such as with a sin and cos component).

Details

You hand it a formula object specifying the seasonality, e.g., " ~ Q2 + Q3 + Q4", if you have quarterly season effects. This method assumes you want models with a linear month component as well, and will add that and an intercept in automatically.

Value

A callable function that takes the arguments of dat, outcomename, and a lagless flag (see, e.g., the parameters listed in 'fit_model_default()').

Examples

data( "newjersey")
modF = make_fit_season_model( ~ temperature )
newjersey = add_lagged_covariates( newjersey, "n.warrant", covariates = c("temperature") )
modF( newjersey, "n.warrant" )

Generate a collection of raw counterfactual trajectories

Description

Given a fit linear model 'fit0', generate R prediction series starting at t0. This takes model uncertainty into account by pulling from the pseudo-posterior of the model parameters (from Gelman and Hill arm package).

Usage

make_many_predictions(fit0, dat, R, outcomename, t0)

make_many_predictions_plug(fit0, dat, R, outcomename, t0)

Arguments

fit0

The fit linear model to simulate from.

dat

A dataframe with the covariates needed by the model fit0 for both pre and post-policy months.

R

Number of series to generate.

outcomename

The name of the column in dat which is our outcome.

t0

Last month of pre-policy. Will start predicting at t0+1.

Value

A data.frame with the collection of predicted series, one row per month per replicate (so will have R*nrow(dat) rows).

Functions

make_many_predictions_plug: This version makes multiple predictions using estimated parameters without additional uncertainty. This takes point estimates from the fit model as fixed parameters. WARNING: This method will not capture true uncertainty as it is not taking parameter uncertainty into account.

References

The 'arm' package, see https://cran.r-project.org/package=arm

Also see Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevelhierarchical models (Vol. 1). New York, NY, USA: Cambridge University Press.

Examples

data("mecklenberg" )
mecklenberg = add_lagged_covariates( mecklenberg, "pbail"  )
mecklenberg.pre = dplyr::filter( mecklenberg, month <= 0 )
M0 = fit_model_default( mecklenberg.pre, "pbail" )
res = make_many_predictions( M0, dat=mecklenberg, outcome="pbail", t0=0, R=2 )
tail( res )

Make a smoother that fits a model and then smooths residuals

Description

This helper function gives back a function that takes the resulting simulation data from a single iteration of the simulation, and fits 'fit_model' to it, smoothes the residuals, and puts the predictions from 'fit_model' back.

Usage

make_model_smoother(fit_model, covariates)

Arguments

fit_model

A function that takes data, fits a linear model, and returns the fit model. This function needs an option to include (or not) lagged covariates.

covariates

A dataframe with all covariates needed in the model fitting defined by fit_model.

Details

This can be used to build smoothers that smooth using models other than the model being used for extrapolation (e.g., a model without temperature).

Resulting functions have the following parameters: 'res' (the data), 't0' (start time), 'outcomename', 'post.only' flag (for smoothing only post data or not), and 'smooth_k', a tuning parameter for degree of smoothing.

Value

A smoother function that can be passed to the smoothing routines. This function is of the form listed above.

Examples

data( "newjersey")
modA = make_fit_season_model( ~ temperature )
modB = make_fit_season_model( ~ sin.m + cos.m )
newjersey = add_lagged_covariates( newjersey, "n.warrant", 
                                   covariates = c("sin.m", "cos.m", "temperature") )
smoother = make_model_smoother( fit_model = modA, covariates = newjersey )
class(smoother)

# Pass made function to process_outcome_model()
envelope = process_outcome_model( "n.warrant", newjersey, t0=-8, R = 1,
                                  summarize = TRUE, smooth=TRUE, 
                                  smoother = smoother, smooth_k = 11,
                                  fit.model = modB )

Mecklenberg data by subgroup of charge type

Description

Mecklenberg data that gives proportion of different charge categories of cases given bail (by month).

Usage

meck_subgroup

Format

A data frame with 144 rows and 5 variables:

month: integer Month, with 0 being month of policy implementation.
n.cases: integer Number of cases of that subgroup for that month
n.bail: interger Total number of cases given bail for that subgroup for that month
pbail: double Proportion of new cases in given subgroup in that month assigned bail
category: character Category of group (charge type).

Mecklenberg PSA Reform Data

Description

Monthly aggregate outcomes of various measures of interest from Mecklenberg. See MDRC Report.

Usage

mecklenberg

Format

A data frame with 54 rows and 10 variables:

month: integer Month, with 0 being month of policy implementation.
karr: integer Total count of arrests.
pbail: double Proportion of cases in a given month assigned bail (or outright detention).
pptrel: double Proportion of cases assigned to pretrial supervised release.
pror: double Proportion of cases released on own recognizance.
pb4c: double Proportion of cases assigned to money bail (alternate coding from pbail, above).
avg_days_initial: double Average number of days spent detained before release due to bail, case resolution, etc.
avg_t2d: double Average time to case resolution (in days).
pstint7: double Proportion detained longer than 7 days.
pstint30: double Proportion detained longer than 30 days.

New Jersey PSA Reform aggregate data

Description

Montly aggregate counts of arrests of different types in New Jersey.

Usage

newjersey

Format

A data frame with 106 rows and 11 variables:

month: integer Index of month.
sin.m: double cos of month number
cos.m: double sin of month number
M12: integer Month number
Q1: integer Indicator of 1st quarter.
Q2: integer Indicator of 2nd quarter.
Q3: integer Indicator of 3rd quarter.
Q4: integer Indicator of 4th quarter.
n.warrant: double Number of warrant arrests
n.summons: double Number of summons arrests
n: double Total number of arrests
temperature: double Average temperature in New Jersey that month.

Generate an ITS extrapolation simulation.

Description

This is the primary function to use to use this approach on a given dataset.

Usage

process_outcome_model(
  outcomename,
  dat,
  t0,
  R = 400,
  summarize = FALSE,
  smooth = FALSE,
  smoother = NULL,
  fit_model = fit_model_default,
  covariates = NULL,
  plug_in = FALSE,
  ...
)

Arguments

outcomename

Name of column in dat containing the time series.

dat

Dataframe with a 'month' column for time. 'month' is assumed to be a sequence of integer values.

t0

Last pre-policy timepoint

R

Number of simulated pre-policy extrapolations to generate.

summarize

Summarise the series? (TRUE/FALSE)

smooth

Smooth the series? (TRUE/FALSE)

smoother

Function to smooth residuals, if smoothing set to TRUE. If NULL, will dynamically make a model smoother based on the fit_model method if covariates are passed. Otherwise it will use the simple smoother on the outcomes.

fit_model

The function used to fit the model to simulate from. (This model could be a seasonality model. Default is simple linear model with no covariates.)

covariates

Vector of covariate names of all covariates used in the passed model function fit_model. If null, will attempt to get list of covariates form the "lags" attribute of the passed 'fit_model'.

plug_in

Use the estimated parameters as fixed and do not include extra uncertainty of parameter estimation in the simulation. (Not recommended as it destroys inference.)

...

Extra arguments to be passed to 'extrapolate_model()'

Details

Take a given outcome variable, fit an ITS model, use it to extrapolate R plusible trajectories, and then using these trajectories, generate final impact results by averaging (if summarize is set to TRUE).

This function is basically a wrapper for 'extrapolate_model()' with some extra calls to 'make_model_smoother()' to prepare, in the case of smoothing, and adding on a summary trend via 'generate_Ybars()' in the case of summarizing.

Value

If summarize=TRUE, A dataframe with several columns of interest and one row per month of data. The columns are Ymin and Ymax, the limits of the envelope, 'range', the range of the envelope, 'SE', the standard deviation of the trajectories at that time point, 'Ysmooth' the median smoothed value at that time point (if smoothing), 'Ystar' the median unsmoothed value at that time point (regardless of smooth flag), 'Y', the observed outcome, 'Ysmooth1', the smoothed observed outcomes, and 'Ybar' the predicted outcome given the model with no autoregressive aspect.

If summarize=FALSE, a dataframe of all the raw series generated.

Examples

data( "mecklenberg" )
t0 = 0
envelope = process_outcome_model( "pbail", mecklenberg, 
                                  t0=t0, R = 10, 
                                  summarize = TRUE, smooth=FALSE )
make_envelope_graph(envelope, t0=t0, ylab = "Proportion given bail") +
  ggplot2::labs( title="Sample ITS plot")

`simITS` package overview

Description

Analysis via Simulation of Interrupted Time Series

Details

This package is based on the backbone analytic code for the analyses in, e.g., Redcross et al. (2019) or Golub et al. (2019). See companion paper Miratrix (2020) for technical discussion of the overall approach.

Broadly, this package provides methods for fitting Interrupted Time Series models with lagged outcomes and variables to account for temporal dependencies. It then conducts inference via simulation, simulating a set of plausible counterfactual post-policy series to compare to the observed post-policy series. This package provides methods to visualize such data, and also to incorporate seasonality models and smoothing and aggregation/summarization. See the vignette for a guide of how to conduct such analyses.

References

Redcross, C., Henderson, B., Valentine, E. & Miratrix, L. (2019). Evaluation of pretrial justice system reforms that use the public safety assessment: Effects in Mecklenburg County, North Carolina. Technical report, MDRC (link)

Golub, C. A., Redcross, C., Valentine, E., & Miratrix, L. (2019). Evaluation of pretrial justice system reforms that use the public safety assessment: Effects of New Jersey’s criminal justice reform. Technical report, MDRC. (link)

Miratrix, L. (2020). Using Simulation to Analyze Interrupted Time Series Designs (link)

Smooth residuals after model fit

Description

Smooth a series by fitting the model to the data, smoothing the residuals, and then putting the model predictions back.

Usage

smooth_residuals(
  res,
  t0,
  outcomename,
  post.only = TRUE,
  smooth_k = SMOOTH_K,
  fit_model = fit_model_default,
  covariates = res,
  full_output = FALSE
)

Arguments

res

A dataframe with a month column and an 'outcomename' column (which is the column that will be smoothed).

t0

last pre-policy timepoint

outcomename

String name of the outcome variable in dat.

post.only

If TRUE fit model and smooth post-policy only. WHY fit model on post-policy data only? Because this will make sure the fixed pre-policy does not dominate too much? We are focusing on post-policy so we want a good fitting model for that so we can get our residuals as "white noise" as possible before smoothing.

smooth_k

A rough proxy for the number of observations to primarily consider to kernal weight in the neighborhood of each timepoint (this is a bandwidth, and the loess smoother gets smooth_k / n as a span value). We want to smooth with an absolute bandwidth, not one as function of how long the time series is.

fit_model

A function that takes data, fits a linear model, and returns the fit model. This function needs an option to include (or not) lagged covariates.

covariates

A dataframe with all covariates needed in the model fitting defined by fit_model.

full_output

If TRUE give back pieces for diagnostics of smoothing process.

Details

Use loess smoother on complete series of residuals including original data pre-policy and synthetic data post policy (i.e., smooth the entire plausible series).

Value

A numeric vector of the smoothed residuals. If full_output=TRUE return a dataframe with several other columns: 'resid', the residuals based on Ystar and the model, ‘residStar' the smoothed residuals, ’Ybar.sm' the structural predictions of the model used for smoothing. Here the smoothed values will be 'Ysmooth'.

Examples


data( "newjersey" )
smooth = smooth_series( newjersey, outcomename = "n.warrant", t0= -8,
                        smooth_k = 30,
                        post.only = FALSE)
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )  

mod =  make_fit_season_model( ~ temperature )
newjersey = add_lagged_covariates( newjersey, outcomename = "n.warrant", 
                                   covariates = c("temperature") )

smooth = smooth_residuals( newjersey, outcomename = "n.warrant", t0=-8,
                           smooth_k = 30,
                           post.only = FALSE,
                           fit_model = mod )
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )

Smooth a series using a static loess smoother

Description

Use loess smoother on complete series of residuals including original data pre-policy and synthetic data post policy (i.e., smooth the entire plausible series).

Usage

smooth_series(res, outcomename, t0, smooth_k = SMOOTH_K, post.only = TRUE, ...)

Arguments

res

A dataframe with a month column and an 'outcomename' column (which is the column that will be smoothed).

outcomename

String name of the outcome variable in dat.

t0

last pre-policy timepoint

smooth_k

post.only

...

Extra arguments (not used in this function).

Details

This method takes several parameters it does not use, to maintain compatability with smooth_residuals.

Value

An updated version of the 'res' dataframe with 'Ysmooth', the smoothed predictions of the original Ystar outcome. Also includes 'Ystar' the original sequence to be smoothed.

Examples


data( "newjersey" )
smooth = smooth_series( newjersey, outcomename = "n.warrant", t0= -8,
                        smooth_k = 30,
                        post.only = FALSE)
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )  

mod =  make_fit_season_model( ~ temperature )
newjersey = add_lagged_covariates( newjersey, outcomename = "n.warrant", 
                                   covariates = c("temperature") )

smooth = smooth_residuals( newjersey, outcomename = "n.warrant", t0=-8,
                           smooth_k = 30,
                           post.only = FALSE,
                           fit_model = mod )
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )

Augment dataframe with lagged covariates

Description

Usage

Arguments

Value

Examples

Adjust an outcome time series based on the group weights.

Description

Usage

Arguments

Value

Examples

Aggregate grouped data

Description

Usage

Arguments

Value

Examples

Test a passed test statistic on the simulated data

Description

Usage

Arguments

Value

Examples

Summary function for summarize.simulation.results

Description

Usage

Arguments

Value

See Also

Examples

Calculate proportion of subgroups across time

Description

Usage

Arguments

Value

Examples

Extrapolate pre-policy data to post-policy era

Description

Usage

Arguments

Value

See Also

Examples

Default ITS model

Description

Usage

Arguments

Value

Examples

Make fake data for testing purposes.

Description

Usage

Arguments

Value

Examples

A fake DGP with time varying categorical covariate for illustrating the code.

Description

Usage

Arguments

Value

Examples

Make envelope style graph with associated smoothed trendlines

Description

Usage

Arguments

Value

See Also

Examples

Make a fit_model that takes a seasonality component

Description

Usage

Arguments

Details

Value

See Also

Examples

Generate a collection of raw counterfactual trajectories

Description

Usage

`simITS` package overview