Title: | Understand Individual-Level Variation in Infectious Disease Transmission |
Version: | 0.4.0 |
Description: | Estimate and understand individual-level variation in transmission. Implements density and cumulative compound Poisson discrete distribution functions (Kremer et al. (2021) <doi:10.1038/s41598-021-93578-x>), as well as functions to calculate infectious disease outbreak statistics given epidemiological parameters on individual-level transmission; including the probability of an outbreak becoming an epidemic/extinct (Kucharski et al. (2020) <doi:10.1016/S1473-3099(20)30144-4>), or the cluster size statistics, e.g. what proportion of cases cause X\% of transmission (Lloyd-Smith et al. (2005) <doi:10.1038/nature04153>). |
License: | MIT + file LICENSE |
URL: | https://github.com/epiverse-trace/superspreading, https://epiverse-trace.github.io/superspreading/ |
BugReports: | https://github.com/epiverse-trace/superspreading/issues |
Imports: | checkmate, rlang, stats |
Suggests: | dplyr, epiparameter (≥ 0.4.0), fitdistrplus, ggplot2, ggtext, knitr, purrr, rmarkdown, scales, spelling, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
Config/Needs/website: | epiverse-trace/epiversetheme |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
Language: | en-GB |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-07-15 16:44:49 UTC; lshjl15 |
Author: | Joshua W. Lambert |
Maintainer: | Joshua W. Lambert <joshua.lambert@lshtm.ac.uk> |
Repository: | CRAN |
Date/Publication: | 2025-07-15 17:00:02 UTC |
superspreading: Understand Individual-Level Variation in Infectious Disease Transmission
Description
Estimate and understand individual-level variation in transmission. Implements density and cumulative compound Poisson discrete distribution functions (Kremer et al. (2021) doi:10.1038/s41598-021-93578-x), as well as functions to calculate infectious disease outbreak statistics given epidemiological parameters on individual-level transmission; including the probability of an outbreak becoming an epidemic/extinct (Kucharski et al. (2020) doi:10.1016/S1473-3099(20)30144-4), or the cluster size statistics, e.g. what proportion of cases cause X% of transmission (Lloyd-Smith et al. (2005) doi:10.1038/nature04153).
Author(s)
Maintainer: Joshua W. Lambert joshua.lambert@lshtm.ac.uk (ORCID) [copyright holder]
Authors:
Adam Kucharski adam.kucharski@lshtm.ac.uk (ORCID) [copyright holder]
Dillon C. Adam dcadam@hku.hk (ORCID)
Other contributors:
Sebastian Funk sebastian.funk@lshtm.ac.uk (ORCID) (.chain_sim uses code from bpmodels::chain_sim) [contributor, copyright holder]
Pratik Gupte pratik.gupte@lshtm.ac.uk (ORCID) [reviewer]
Hugo Gruson hugo@data.org (ORCID) [reviewer]
James M. Azam james.azam@lshtm.ac.uk (ORCID) [reviewer, contributor]
Chris Hartgerink chris@data.org (ORCID) [reviewer]
See Also
Useful links:
Report bugs at https://github.com/epiverse-trace/superspreading/issues
Simulate transmission chains using a stochastic branching process
Description
Code modified from the bpmodels::chain_sim()
function.
The function chain_sim()
function from bpmodels is reused with
permission and licensed under MIT as is bpmodels.
bpmodels is not on CRAN and is retired.
Usage
.chain_sim(
n,
offspring,
stat = c("size", "length"),
stat_threshold = Inf,
generation_time,
tf = Inf,
...
)
Arguments
n |
Number of simulations to run. |
offspring |
Offspring distribution: a character string corresponding to
the R distribution function (e.g., "pois" for Poisson, where
|
stat |
String; Statistic to calculate. Can be one of:
|
stat_threshold |
A size or length above which the simulation results
should be set to |
generation_time |
The generation time generator function; the name of a
user-defined named or anonymous function with only one argument |
tf |
End time (if |
... |
Parameters of the offspring distribution as required by R. |
Value
A <data.frame>
with columns n
(simulation ID), id
(a unique
ID within each simulation for each individual element of the chain),
ancestor
(the ID of the ancestor of each element), and generation
. A
time
column is also appended if the generation_time interval is supplied
to serial
.
Author(s)
Sebastian Funk, James M. Azam, Joshua W. Lambert
Optimise a function using either numerical optimisation or grid search
Description
Optimise a function using either numerical optimisation or grid search
Usage
.fit(func, fit_method = c("optim", "grid"), ...)
Arguments
func |
A |
fit_method |
A |
... |
< |
Details
Arguments passed through dots depend on whether fit_method
is set to
"optim"
or "grid"
. For "optim"
, arguments are passed to optim()
,
for "grid"
, the variable arguments are lower
, upper
(lower and
upper bounds on the grid search for the parameter being optimised, defaults
are lower = 0.001
and upper = 0.999
), and "res"
(the resolution of
grid, default is res = 0.001
).
Value
A single numeric
.
Calculate the reproduction number (R
) for a (heterogeneous)
network
Description
The calculation of the reproduction number adjusting for heterogeneity in number of contacts.
Usage
calc_network_R(
mean_num_contact,
sd_num_contact,
infect_duration,
prob_transmission,
age_range
)
Arguments
mean_num_contact |
A |
sd_num_contact |
A |
infect_duration |
A |
prob_transmission |
A |
age_range |
A |
Value
A named numeric
vector of length 2, the unadjusted (R
)
and network adjusted (R_net
) estimates of R
.
Examples
# example using NATSAL data
calc_network_R(
mean_num_contact = 14.1,
sd_num_contact = 69.6,
infect_duration = 1,
prob_transmission = 1,
age_range = c(16, 74)
)
Constants used in superspreading
Description
FINITE_INF
is a large finite number used to approximate Inf
.
NSIM
is the number of simulations run when generating random samples or
branching process simulation replicates.
Usage
FINITE_INF
NSIM
Format
An object of class numeric
of length 1.
An object of class numeric
of length 1.
Density of the Poisson-lognormal compound distribution
Description
Density of the Poisson-lognormal compound distribution
Usage
dpoislnorm(x, meanlog, sdlog)
Arguments
x |
A |
meanlog |
A |
sdlog |
A |
Details
The function is vectorised so a vector of quantiles can be input and the output will have an equal length.
Value
A numeric
vector of the density of the Poisson-lognormal
distribution.
Examples
dpoislnorm(x = 10, meanlog = 1, sdlog = 2)
dpoislnorm(x = 1:10, meanlog = 1, sdlog = 2)
Density of the Poisson-Weibull compound distribution
Description
Density of the Poisson-Weibull compound distribution
Usage
dpoisweibull(x, shape, scale)
Arguments
x |
A |
shape |
A |
scale |
A |
Details
The function is vectorised so a vector of quantiles can be input and the output will have an equal length.
Value
A numeric
vector of the density of the Poisson-Weibull
distribution.
Examples
dpoisweibull(x = 10, shape = 1, scale = 2)
dpoisweibull(x = 1:10, shape = 1, scale = 2)
Defines the gamma functions in terms of the mean reproductive number (R) and the dispersion parameter (k)
Description
-
dgammaRk()
for the gamma density function -
pgammaRk()
for the gamma distribution function -
fvx()
fore the gamma probability density function (pdf) describing the individual reproductive number\nu
given R0 and k
Usage
dgammaRk(x, R, k)
pgammaRk(x, R, k)
fvx(x, R, k)
Arguments
R |
A |
k |
A |
Helper function to create a model comparison table
Description
This is a helper function for creating a model comparison <data.frame>
primarily for use in the superspreading vignettes. It is designed
specifically for handling fitdistrplus::fitdist()
output and not a
generalised function. See bbmle::ICtab()
for a more general use function
to create information criteria tables.
Usage
ic_tbl(..., sort_by = c("AIC", "BIC", "none"))
Arguments
... |
dots One or more model fit results from
|
sort_by |
A |
Value
A <data.frame>
.
Examples
if (requireNamespace("fitdistrplus", quietly = TRUE)) {
cases <- rnbinom(n = 100, mu = 5, size = 0.7)
pois_fit <- fitdistrplus::fitdist(data = cases, distr = "pois")
geom_fit <- fitdistrplus::fitdist(data = cases, distr = "geom")
nbinom_fit <- fitdistrplus::fitdist(data = cases, distr = "nbinom")
ic_tbl(pois_fit, geom_fit, nbinom_fit)
}
Generates a log scaled sequence of real numbers
Description
Generates a log scaled sequence of real numbers
Usage
lseq(from, to, length.out)
Arguments
from , to |
the starting and (maximal) end values of the
sequence. Of length |
length.out |
desired length of the sequence. A
non-negative number, which for |
Value
seq.int
and the default method of seq
for numeric
arguments return a vector of type "integer"
or "double"
:
programmers should not rely on which.
seq_along
and seq_len
return an integer vector, unless
it is a long vector when it will be double.
Cumulative distribution function of the Poisson-lognormal compound distribution
Description
Cumulative distribution function of the Poisson-lognormal compound distribution
Usage
ppoislnorm(q, meanlog, sdlog)
Arguments
q |
A |
meanlog |
A |
sdlog |
A |
Details
The function is vectorised so a vector of quantiles can be input and the output will have an equal length.
Value
A numeric
vector of the distribution function.
Examples
ppoislnorm(q = 10, meanlog = 1, sdlog = 2)
ppoislnorm(q = 1:10, meanlog = 1, sdlog = 2)
Cumulative distribution function of the Poisson-Weibull compound distribution
Description
Cumulative distribution function of the Poisson-Weibull compound distribution
Usage
ppoisweibull(q, shape, scale)
Arguments
q |
A |
shape |
A |
scale |
A |
Details
The function is vectorised so a vector of quantiles can be input and the output will have an equal length.
Value
A numeric
vector of the distribution function.
Examples
ppoisweibull(q = 10, shape = 1, scale = 2)
ppoisweibull(q = 1:10, shape = 1, scale = 2)
Probability that an outbreak will be contained
Description
Outbreak containment is defined as outbreak extinction when
simulate = FALSE
. When simulate = FALSE
, probability_contain()
is
equivalent to calling probability_extinct()
.
When simulate = TRUE
, outbreak containment is defined by the
case_threshold
(default = 100) and outbreak_time
arguments.
Firstly, case_threshold
sets the size of the transmission chain below
which the outbreak is considered contained. Secondly, outbreak_time
sets
the time duration from the start of the outbreak within which the outbreak
is contained if there is no more onwards transmission beyond this time.
When setting an outbreak_time
, a generation_time
is also required.
case_threshold
and outbreak_time
can be jointly set.
Overall, when simulate = TRUE
, containment is defined as the size and
time duration of a transmission chain not reaching the case_threshold
and outbreak_time
, respectively.
Usage
probability_contain(
R,
k,
num_init_infect,
ind_control = 0,
pop_control = 0,
simulate = FALSE,
...,
case_threshold = 100,
outbreak_time = Inf,
generation_time = NULL,
offspring_dist
)
Arguments
R |
A |
k |
A |
num_init_infect |
An |
ind_control |
A |
pop_control |
A |
simulate |
A |
... |
< |
case_threshold |
A number for the threshold of the number of cases below
which the epidemic is considered contained. |
outbreak_time |
A number for the time since the start of
the outbreak to determine if outbreaks are contained within a given period
of time. |
generation_time |
A |
offspring_dist |
An |
Details
When using simulate = TRUE
, the default arguments to simulate the
transmission chains with .chain_sim()
are 105 replicates,
a negative binomial (nbinom
) offspring distribution, parameterised with
R
(and pop_control
if > 0) and k
.
When setting the outbreak_time
argument, the generation_time
argument is
also required. The generation_time
argument requires a random number
generator function. For example, if we assume the generation time is
lognormally distributed with meanlog = 1
and sdlog = 1.5
, then we can
define the function
to pass to generation_time
as:
function(x) rlnorm(x, meanlog = 1, sdlog = 1.5)
Value
A number
for the probability of containment.
References
Lloyd-Smith, J. O., Schreiber, S. J., Kopp, P. E., & Getz, W. M. (2005) Superspreading and the effect of individual variation on disease emergence. Nature, 438(7066), 355-359. doi:10.1038/nature04153
See Also
Examples
# population-level control measures
probability_contain(R = 1.5, k = 0.5, num_init_infect = 1, pop_control = 0.1)
# individual-level control measures
probability_contain(R = 1.5, k = 0.5, num_init_infect = 1, ind_control = 0.1)
# both levels of control measures
probability_contain(
R = 1.5,
k = 0.5,
num_init_infect = 1,
ind_control = 0.1,
pop_control = 0.1
)
# multi initial infections with population-level control measures
probability_contain(R = 1.5, k = 0.5, num_init_infect = 5, pop_control = 0.1)
# probability of containment within a certain amount of time
# this requires parameterising a generation time
gt <- function(n) {
rlnorm(n, meanlog = 1, sdlog = 1.5)
}
probability_contain(
R = 1.2,
k = 0.5,
num_init_infect = 1,
simulate = TRUE,
case_threshold = 50,
outbreak_time = 20,
generation_time = gt
)
Calculate the probability a disease will emerge and cause a sustained
outbreak (R > 1
) based on R and initial cases
Description
The method for the evolution of pathogen emergence described in
Antia et al. (2003) (doi:10.1038/nature02104). The model is a multi-type
branching process model with an initial (wild-type) reproduction number,
usually below 1, and a evolved reproduction number that is
greater than 1, and thus can cause a sustained human-to-human epidemic.
The reproduction number for a pathogen changes at the mutation_rate
.
Usage
probability_emergence(
R_wild,
R_mutant,
mutation_rate,
num_init_infect,
tol = 1e-10,
max_iter = 1000
)
Arguments
R_wild |
A |
R_mutant |
A |
mutation_rate |
A |
num_init_infect |
An |
tol |
A |
max_iter |
A |
Details
Following Antia et al. (2003), we assume that the mutation rate for all variants is the same.
Value
A value with the probability of a disease emerging and causing an outbreak.
References
Antia, R., Regoes, R., Koella, J. & Bergstrom, C. T. (2003) The role of evolution in the emergence of infectious diseases. Nature 426, 658–661. doi:10.1038/nature02104
See Also
probability_epidemic()
, probability_extinct()
Examples
probability_emergence(
R_wild = 0.5,
R_mutant = 1.5,
mutation_rate = 0.5,
num_init_infect = 1
)
Calculate the probability a disease will cause an outbreak based on R, k and initial cases
Description
Calculates the probability a branching process will cause an epidemic (i.e. probability will fail to go extinct) based on R, k and initial cases.
Usage
probability_epidemic(
R,
k,
num_init_infect,
ind_control = 0,
pop_control = 0,
...,
offspring_dist
)
Arguments
R |
A |
k |
A |
num_init_infect |
An |
ind_control |
A |
pop_control |
A |
... |
< |
offspring_dist |
An |
Value
A value with the probability of a large epidemic.
References
Lloyd-Smith, J. O., Schreiber, S. J., Kopp, P. E., & Getz, W. M. (2005) Superspreading and the effect of individual variation on disease emergence. Nature, 438(7066), 355-359. doi:10.1038/nature04153
Kucharski, A. J., Russell, T. W., Diamond, C., Liu, Y., Edmunds, J., Funk, S. & Eggo, R. M. (2020) Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The Lancet Infectious Diseases, 20(5), 553-558. doi:10.1016/S1473-3099(20)30144-4
See Also
Examples
probability_epidemic(R = 1.5, k = 0.1, num_init_infect = 10)
Calculate the probability a branching process will go extinct based on R, k and initial cases
Description
Calculates the probability a branching process will not causes an epidemic
and will go extinct. This is the complement of the probability of a disease
causing an epidemic (probability_epidemic()
).
Usage
probability_extinct(
R,
k,
num_init_infect,
ind_control = 0,
pop_control = 0,
...,
offspring_dist
)
Arguments
R |
A |
k |
A |
num_init_infect |
An |
ind_control |
A |
pop_control |
A |
... |
< |
offspring_dist |
An |
Value
A value with the probability of going extinct.
References
Lloyd-Smith, J. O., Schreiber, S. J., Kopp, P. E., & Getz, W. M. (2005) Superspreading and the effect of individual variation on disease emergence. Nature, 438(7066), 355-359. doi:10.1038/nature04153
See Also
Examples
probability_extinct(R = 1.5, k = 0.1, num_init_infect = 10)
Estimate what proportion of new cases originated within a transmission event of a given size
Description
Calculates the proportion of new cases that originated with a transmission event of a given size. It can be useful to inform backwards contact tracing efforts, i.e. how many cases are associated with large clusters. Here we define a cluster to as a transmission of a primary case to at least one secondary case.
Usage
proportion_cluster_size(
R,
k,
cluster_size,
...,
offspring_dist,
format_prop = TRUE
)
Arguments
R |
A |
k |
A |
cluster_size |
A |
... |
dots not used, extra arguments supplied will cause a warning. |
offspring_dist |
An |
format_prop |
A |
Details
This function calculates the proportion of secondary cases that are caused by transmission events of a certain size. It does not calculate the proportion of transmission events that cause a cluster of secondary cases of a certain size. In other words it is the number of cases above a threshold divided by the total number of cases, not the number of transmission events above a certain threshold divided by the number of transmission events.
Value
A <data.frame>
with the value for the proportion of new cases
that are part of a transmission event above a threshold for a given value
of R and k.
Examples
R <- 2
k <- 0.1
cluster_size <- 10
proportion_cluster_size(R = R, k = k, cluster_size = cluster_size)
# example with a vector of k
k <- c(0.1, 0.2, 0.3, 0.4, 0.5)
proportion_cluster_size(R = R, k = k, cluster_size = cluster_size)
# example with a vector of cluster sizes
cluster_size <- c(5, 10, 25)
proportion_cluster_size(R = R, k = k, cluster_size = cluster_size)
Estimate what proportion of cases cause a certain proportion of transmission
Description
Calculates the proportion of cases that cause a certain percentage of transmission.
It is commonly estimated what proportion of cases cause 80% of transmission
(i.e. secondary cases).
This can be calculated using proportion_transmission()
at varying values of
R
and for different values of percentage transmission.
There are two methods for calculating the proportion of transmission,
p_{80}
(default) and t_{20}
, see method
argument or details for
more information.
Usage
proportion_transmission(
R,
k,
prop_transmission,
method = c("p_80", "t_20"),
simulate = FALSE,
...,
offspring_dist,
format_prop = TRUE
)
Arguments
R |
A |
k |
A |
prop_transmission |
A |
method |
A |
simulate |
A |
... |
dots not used, extra arguments supplied will cause a warning. |
offspring_dist |
An |
format_prop |
A |
Details
Calculates the expected proportion of transmission from a given
proportion of infectious cases. There are two methods to calculate this with
distinct formulations, p_{80}
and t_{20}
these can be specified
by the method
argument.
method = p_80
calculates relative transmission heterogeneity
from the offspring distribution of secondary cases, Z
, where the upper
proportion of the distribution comprise x\%
of total number of cases
given R0 and k, where x
is typically defined as 0.8 or 80%. e.g. 80%
of all transmissions are generated by the upper 20% of cases, or
p_80 = 0.2
, per the 80/20 rule. In this formulation, changes in R can
have a significant effect on the estimate of p_{80}
even when k
is constant. Importantly, this formulation does not allow for true
homogeneity when k = Inf
i.e. p_{80} = 0.8
.
method = t_20
calculates a similar ratio, instead in terms of
the theoretical individual reproductive number and infectiousness given
R_0
and k
. The individual reproductive number, \nu
, is
described in Lloyd-Smith et al. (2005), "as a random variable representing
the expected number of secondary cases caused by a particular infected
individual. Values for \nu
are drawn from a continuous gamma
probability distribution with population mean R_0
and dispersion
parameter k
, which encodes all variation in infectious histories of
individuals, including properties of the host and pathogen and environmental
circumstances." The value of k
corresponds to the shape parameters of
the gamma distribution which encodes the variation in the gamma-Poisson
mixture a.k.a. the negative binomial.
For method = t_20
, we define the upper proportion of infectiousness,
which is typically 0.2 i.e. the upper 20% most infectious
cases, again per the 80/20 rule. e.g. the most infectious 20% of cases,
are expected to produce 80% of all infections, or t_20 = 0.8
. Unlike
method = p_80
, changes in R have no effect on the estimate
of t_80 when k is constant, but R is still required for the underlying
calculation. This formulation does allow for true homogeneity when
k = Inf
i.e. t_20 = 0.2
, or t_80 = 0.8
.
Multiple values of R
and k
can be supplied and a <data.frame>
of
every combination of these will be returned.
The numerical calculation for method = p_80
uses random number generation
to simulate secondary contacts so the answers may minimally vary between
calls. The number of simulation replicates is fixed to 105.
Value
A <data.frame>
with the value for the proportion of cases for a
given value of R and k.
References
The analytical calculation is from:
Endo, A., Abbott, S., Kucharski, A. J., & Funk, S. (2020) Estimating the overdispersion in COVID-19 transmission using outbreak sizes outside China. Wellcome Open Research, 5. doi:10.12688/wellcomeopenres.15842.3
The t_{20}
method follows the formula defined in section 2.2.5 of the
supplementary material for:
Lloyd-Smith, J. O., Schreiber, S. J., Kopp, P. E., & Getz, W. M. (2005) Superspreading and the effect of individual variation on disease emergence. Nature. Nov;438(7066):355–9. doi:10.1038/nature04153
The original code for the t_{20}
method is from ongoing work
originating from https://github.com/dcadam/kt and:
Adam, D., Gostic, K., Tsang, T., Wu, P., Lim, W. W., Yeung, A., Wong, J., Lau, E., Du, Z., Chen, D., Ho, L.-M., Martín-Sánchez, M., Cauchemez, S., Cobey, S., Leung, G., & Cowling, B. (2022) Time-varying transmission heterogeneity of SARS and COVID-19 in Hong Kong. doi:10.21203/rs.3.rs-1407962/v1
Examples
# example of single values of R and k
prop_transmission <- 0.8 # 80% of transmission
R <- 2
k <- 0.5
proportion_transmission(
R = R,
k = k,
prop_transmission = prop_transmission
)
# example with multiple values of k
k <- c(0.1, 0.2, 0.3, 0.4, 0.5, 1)
proportion_transmission(
R = R,
k = k,
prop_transmission = prop_transmission
)
# example with vectors of R and k
R <- c(1, 2, 3)
proportion_transmission(
R = R,
k = k,
prop_transmission = prop_transmission
)