Title: | Network Analysis of Dependencies of CRAN Packages |
Version: | 0.3.13 |
Description: | The dependencies of CRAN packages can be analysed in a network fashion. For each package we can obtain the packages that it depends, imports, suggests, etc. By iterating this procedure over a number of packages, we can build, visualise, and analyse the dependency network, enabling us to have a bird's-eye view of the CRAN ecosystem. One aspect of interest is the number of reverse dependencies of the packages, or equivalently the in-degree distribution of the dependency network. This can be fitted by the power law and/or an extreme value mixture distribution <doi:10.1111/stan.12355>, of which functions are provided. |
Depends: | R (≥ 3.4) |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://github.com/clement-lee/crandep |
BugReports: | https://github.com/clement-lee/crandep/issues |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | stringr, dplyr, igraph, Rcpp, pracma, gsl, utils, tools, stats |
Suggests: | ggplot2, tibble, visNetwork, knitr, rmarkdown |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | yes |
SystemRequirements: | pandoc (>= 1.12.3) - http://pandoc.org |
Packaged: | 2025-06-16 11:03:03 UTC; ntl34 |
Author: | Clement Lee |
Maintainer: | Clement Lee <clement.lee.tm@outlook.com> |
VignetteBuilder: | knitr |
LinkingTo: | Rcpp, RcppArmadillo |
Repository: | CRAN |
Date/Publication: | 2025-06-16 13:10:11 UTC |
Survival function of 2-component discrete extreme value mixture distribution
Description
Smix2
returns the survival function at x for the 2-component discrete extreme value mixture distribution. The components below and above the threshold u are the (truncated) Zipf-polylog(alpha,theta) and the generalised Pareto(shape, sigma) distributions, respectively.
Usage
Smix2(x, u, alpha, theta, shape, sigma, phiu)
Arguments
x |
Vector of positive integers |
u |
Positive integer representing the threshold |
alpha |
Real number, first parameter of the Zipf-polylog component |
theta |
Real number in (0, 1], second parameter of the Zipf-polylog component |
shape |
Real number, shape parameter of the generalised Pareto component |
sigma |
Real number, scale parameter of the generalised Pareto component |
phiu |
Real number in (0, 1), exceedance rate of the threshold u |
Value
A numeric vector of the same length as x
See Also
dmix2
for the corresponding probability mass function, Spol
and Smix3
for the survival functions of the Zipf-polylog and 3-component discrete extreme value mixture distributions, respectively.
Survival function of 3-component discrete extreme value mixture distribution
Description
Smix3
returns the survival function at x for the 3-component discrete extreme value mixture distribution. The component below v is the (truncated) Zipf-polylog(alpha1,theta1) distribution, between v & u the (truncated) Zipf-polylog(alpha2,theta2) distribution, and above u the generalised Pareto(shape, sigma) distribution.
Usage
Smix3(x, v, u, alpha1, theta1, alpha2, theta2, shape, sigma, phi1, phi2, phiu)
Arguments
x |
Vector of positive integers |
v |
Positive integer representing the lower threshold |
u |
Positive integer representing the upper threshold |
alpha1 |
Real number, first parameter of the Zipf-polylog component below v |
theta1 |
Real number in (0, 1], second parameter of the Zipf-polylog component below v |
alpha2 |
Real number, first parameter of the Zipf-polylog component between v & u |
theta2 |
Real number in (0, 1], second parameter of the Zipf-polylog component between v & u |
shape |
Real number, shape parameter of the generalised Pareto component |
sigma |
Real number, scale parameter of the generalised Pareto component |
phi1 |
Real number in (0, 1), proportion of values below v |
phi2 |
Real number in (0, 1), proportion of values between v & u |
phiu |
Real number in (0, 1), exceedance rate of the threshold u |
Value
A numeric vector of the same length as x
See Also
dmix3
for the corresponding probability mass function, Spol
and Smix2
for the survival functions of the Zipf-polylog and 2-component discrete extreme value mixture distributions, respectively.
Survival function of Zipf-polylog distribution
Description
Spol
returns the survival function at x for the Zipf-polylog distribution with parameters (alpha, theta). The distribution is reduced to the discrete power law when theta = 1.
Usage
Spol(x, alpha, theta, x_max = 100000L)
Arguments
x |
Vector of positive integers |
alpha |
Real number greater than 1 |
theta |
Real number in (0, 1] |
x_max |
Scalar (default 100000), positive integer limit for computing the normalising constant |
Value
A numeric vector of the same length as x
See Also
dpol
for the corresponding probability mass function, Smix2
and Smix3
for the survival functions of the 2-component and 3-component discrete extreme value mixture distributions, respectively.
Examples
Spol(c(1,2,3,4,5), 1.2, 0.5)
Check and convert dependency word(s)
Description
Check and convert dependency word(s)
Usage
check_dep_word(x)
Arguments
x |
A character vector of dependency words |
Value
A character vector of modified dependency words
Citation network of CHI papers
Description
A dataset containing the citations of conference papers of the ACM Conference on Human Factors in Computing Systems (CHI) from 1981 to 2019, obtained from the ACM digital library. The resulting citation network can be compared to the dependencies network of CRAN packages, in terms of network-related characteristics, such as degree distribution and sparsity.
Usage
chi_citations
Format
A data from with 21951 rows and 4 variables:
- from
the unique identifier (in the digital library) of the paper that cites other papers
- to
the unique identifier of the paper that is being cited
- year_from
the publication year of the citing paper
- year_to
the publication year of the cited paper
Source
https://dl.acm.org/conference/chi
See Also
Conditionally change a string
Description
Conditionally change a string
Usage
conditional_change(x, from, to)
Arguments
x |
A character vector |
from |
A character vector of words to change from |
to |
A string to change to |
Value
A string
Dependencies of CRAN packages
Description
A dataset containing the dependencies of various types (Imports, Depends, Suggests, LinkingTo, and their reverse counterparts) of more than 14600 packages available on CRAN as of 2020-05-09.
Usage
cran_dependencies
Format
A data frame with 211408 rows and 4 variables:
- from
the name of the package that introduced the dependencies
- to
the name of the package that the dependency is directed towards
- type
the type of dependency, which can take the follow values (all in lowercase): "depends", "imports", "linking to", "suggests"
- reverse
a boolean representing whether the dependency is a reverse one (TRUE) or a forward one (FALSE)
Source
The CRAN pages of all the packages available on https://cran.r-project.org
See Also
Construct the giant component of the network from two data frames
Description
Construct the giant component of the network from two data frames
Usage
df_to_graph(edgelist, nodelist = NULL, gc = TRUE)
Arguments
edgelist |
A data frame with (at least) two columns: from and to |
nodelist |
NULL, or a data frame with (at least) one column: name, that contains the nodes to include |
gc |
Boolean, if 'TRUE' (default) then the giant component is extracted, if 'FALSE' then the whole graph is returned |
Value
An igraph object & a connected graph if gc is 'TRUE'
Examples
from <- c("1", "2", "4")
to <- c("2", "3", "5")
edges <- data.frame(from = from, to = to, stringsAsFactors = FALSE)
nodes <- data.frame(name = c("1", "2", "3", "4", "5"), stringsAsFactors = FALSE)
df_to_graph(edges, nodes)
Probability mass function (PMF) of 2-component discrete extreme value mixture distribution
Description
dmix2
returns the PMF at x for the 2-component discrete extreme value mixture distribution. The components below and above the threshold u are the (truncated) Zipf-polylog(alpha,theta) and the generalised Pareto(shape, sigma) distributions, respectively.
Usage
dmix2(x, u, alpha, theta, shape, sigma, phiu)
Arguments
x |
Vector of positive integers |
u |
Positive integer representing the threshold |
alpha |
Real number, first parameter of the Zipf-polylog component |
theta |
Real number in (0, 1], second parameter of the Zipf-polylog component |
shape |
Real number, shape parameter of the generalised Pareto component |
sigma |
Real number, scale parameter of the generalised Pareto component |
phiu |
Real number in (0, 1), exceedance rate of the threshold u |
Value
A numeric vector of the same length as x
See Also
Smix2
for the corresponding survival function, dpol
and dmix3
for the PMFs of the Zipf-polylog and 3-component discrete extreme value mixture distributions, respectively.
Probability mass function (PMF) of 3-component discrete extreme value mixture distribution
Description
dmix3
returns the PMF at x for the 3-component discrete extreme value mixture distribution. The component below v is the (truncated) Zipf-polylog(alpha1,theta1) distribution, between v & u the (truncated) Zipf-polylog(alpha2,theta2) distribution, and above u the generalised Pareto(shape, sigma) distribution.
Usage
dmix3(x, v, u, alpha1, theta1, alpha2, theta2, shape, sigma, phi1, phi2, phiu)
Arguments
x |
Vector of positive integers |
v |
Positive integer representing the lower threshold |
u |
Positive integer representing the upper threshold |
alpha1 |
Real number, first parameter of the Zipf-polylog component below v |
theta1 |
Real number in (0, 1], second parameter of the Zipf-polylog component below v |
alpha2 |
Real number, first parameter of the Zipf-polylog component between v & u |
theta2 |
Real number in (0, 1], second parameter of the Zipf-polylog component between v & u |
shape |
Real number, shape parameter of the generalised Pareto component |
sigma |
Real number, scale parameter of the generalised Pareto component |
phi1 |
Real number in (0, 1), proportion of values below v |
phi2 |
Real number in (0, 1), proportion of values between v & u |
phiu |
Real number in (0, 1), exceedance rate of the threshold u |
Value
A numeric vector of the same length as x
See Also
Smix3
for the corresponding survival function, dpol
and dmix2
for the PMFs of the Zipf-polylog and 2-component discrete extreme value mixture distributions, respectively.
Probability mass function (PMF) of Zipf-polylog distribution
Description
dpol
returns the PMF at x for the Zipf-polylog distribution with parameters (alpha, theta). The distribution is reduced to the discrete power law when theta = 1.
Usage
dpol(x, alpha, theta, x_max = 100000L)
Arguments
x |
Vector of positive integers |
alpha |
Real number greater than 1 |
theta |
Real number in (0, 1] |
x_max |
Scalar (default 100000), positive integer limit for computing the normalising constant |
Details
The PMF is proportional to x^(-alpha) * theta^x. It is normalised in order to be a proper PMF.
Value
A numeric vector of the same length as x
See Also
Spol
for the corresponding survival function, dmix2
and dmix3
for the PMFs of the 2-component and 3-component discrete extreme value mixture distributions, respectively.
Examples
dpol(c(1,2,3,4,5), 1.2, 0.5)
Multiple types of dependencies
Description
get_dep
returns a data frame of multiple types of dependencies of a package
Usage
get_dep(name, type, reverse = FALSE)
Arguments
name |
String, name of the package |
type |
A character vector that contains one or more of the following dependency words: "Depends", "Imports", "LinkingTo", "Suggests", "Enhances", up to letter case and space replaced by underscore. Alternatively, if 'type = "all"', all five dependencies will be obtained; if 'type = "strong"', "Depends", "Imports" & "LinkingTo" will be obtained. |
reverse |
Boolean, whether forward (FALSE, default) or reverse (TRUE) dependencies are requested. |
Value
A data frame of dependencies
See Also
get_dep_all_packages
for the dependencies of all CRAN packages, and get_graph_all_packages
for obtaining directly a network of dependencies as an igraph object
Examples
get_dep("dplyr", c("Imports", "Depends"))
get_dep("MASS", c("Suggests", "Depends", "Imports"), TRUE)
Dependencies of all CRAN packages
Description
get_dep_all_packages
returns the data frame of dependencies of all packages currently available on CRAN.
Usage
get_dep_all_packages()
Value
A list of two data frames, one the names of all CRAN packages, the other their dependencies
See Also
get_dep
for multiple types of dependencies, and get_graph_all_packages
for obtaining directly a network of dependencies as an igraph object
Examples
## Not run:
df.cran <- get_dep_all_packages()
## End(Not run)
Split a string to a list of dependencies
Description
Split a string to a list of dependencies
Usage
get_dep_vec(x)
Arguments
x |
A scalar string, possibly an output of get_dep_str() |
Value
A string vector of dependencies
Graph of dependencies of all CRAN packages
Description
get_graph_all_packages
returns an igraph object representing the network of one or more types of dependencies of all CRAN packages.
Usage
get_graph_all_packages(type, gc = TRUE, reverse = FALSE)
Arguments
type |
A character vector that contains one or more of the following dependency words: "Depends", "Imports", "LinkingTo", "Suggests", "Enhances", up to letter case and space replaced by underscore. Alternatively, if 'type = "all"', all five dependencies will be obtained; if 'type = "strong"', "Depends", "Imports" & "LinkingTo" will be obtained. |
gc |
Boolean, if 'TRUE' (default) then the giant component is extracted, if 'FALSE' then the whole graph is returned |
reverse |
Boolean, whether forward (FALSE, default) or reverse (TRUE) dependencies are requested. |
Value
An igraph object & a connected graph if gc is 'TRUE'
See Also
get_dep_all_packages
for the dependencies of all CRAN packages in a data frame, and df_to_graph
for constructing the giant component of the network from two data frames
Examples
## Not run:
g0.cran.depends <- get_graph_all_packages("depends")
g1.cran.imports <- get_graph_all_packages("imports", reverse = TRUE)
## End(Not run)
Wrapper of lpost_bulk, assuming power law (theta = 1.0)
Description
Wrapper of lpost_bulk, assuming power law (theta = 1.0)
Usage
lpost_bulk_wrapper(alpha, ...)
Arguments
alpha |
A scalar, positive |
... |
Other arguments passed to lpost_bulk |
Value
A scalar of the log-posterior density
Wrapper of lpost_mix2, assuming power law (theta = 1.0) & contrained (alpha > 1.0, xi < 1.0 / (alpha - 1.0))
Description
Wrapper of lpost_mix2, assuming power law (theta = 1.0) & contrained (alpha > 1.0, xi < 1.0 / (alpha - 1.0))
Usage
lpost_mix2_constrained(par, ...)
Arguments
par |
parameter vector of length 3, with elements alpha, shape and sigma |
... |
Other arguments passed to lpost_mix2 |
Value
A scalar of the log-posterior density
Wrapper of lpost_pol, assuming power law (theta = 1.0)
Description
Wrapper of lpost_pol, assuming power law (theta = 1.0)
Usage
lpost_pol_wrapper(alpha, x, count, ...)
Arguments
alpha |
A scalar, positive |
... |
Other arguments passed to lpost_pol |
Value
A scalar of the log-posterior density
Unnormalised log-posterior density of discrete power law
Description
Unnormalised log-posterior density of discrete power law
Usage
lpost_pow(alpha, df, m_alpha, s_alpha)
Arguments
alpha |
Real number greater than 1 |
df |
A data frame with at least two columns, x & count |
m_alpha |
Real number, mean of the prior normal distribution for alpha |
s_alpha |
Positive real number, standard deviation of the prior normal distribution for alpha |
Value
A real number
Marginal log-likelihood and posterior density of discrete power law via numerical integration
Description
Marginal log-likelihood and posterior density of discrete power law via numerical integration
Usage
marg_pow(df, lower, upper, m_alpha = 0, s_alpha = 10, by = 0.001)
Arguments
df |
A data frame with at least two columns, x & count |
lower |
Real number greater than 1, lower limit for numerical integration |
upper |
Real number greater than lower, upper limit for numerical integration |
m_alpha |
Real number (default 0.0), mean of the prior normal distribution for alpha |
s_alpha |
Positive real number (default 10.0), standard deviation of the prior normal distribution for alpha |
by |
Positive real number, the width of subintervals between lower and upper, for numerical integration and posterior density evaluation |
Value
A list: log_marginal
is the marginal log-likelihood, posterior
is a data frame of non-zero posterior densities
Markov chain Monte Carlo for TZP-power-law mixture
Description
mcmc_mix1
returns the posterior samples of the parameters, for fitting the TZP-power-law mixture distribution. The samples are obtained using Markov chain Monte Carlo (MCMC).
Usage
mcmc_mix1(
x,
count,
u_set,
u,
alpha1,
theta1,
alpha2,
a_psiu,
b_psiu,
a_alpha1,
b_alpha1,
a_theta1,
b_theta1,
a_alpha2,
b_alpha2,
positive,
iter,
thin,
burn,
freq,
invt,
mc3_or_marg,
x_max
)
Arguments
x |
Vector of the unique values (positive integers) of the data |
count |
Vector of the same length as x that contains the counts of each unique value in the full data, which is essentially rep(x, count) |
u_set |
Positive integer vector of the values u will be sampled from |
u |
Positive integer, initial value of the threshold |
alpha1 |
Real number, initial value of the parameter |
theta1 |
Real number in (0, 1], initial value of the parameter |
alpha2 |
Real number greater than 1, initial value of the parameter |
a_psiu , b_psiu , a_alpha1 , b_alpha1 , a_theta1 , b_theta1 , a_alpha2 , b_alpha2 |
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
positive |
Boolean, is alpha positive (TRUE) or unbounded (FALSE)? |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invt |
Vector of the inverse temperatures for Metropolis-coupled MCMC |
mc3_or_marg |
Boolean, is invt for parallel tempering / Metropolis-coupled MCMC (TRUE, default) or marginal likelihood via power posterior (FALSE)? |
x_max |
Scalar, positive integer limit for computing the normalising constant |
Details
In the MCMC, a componentwise Metropolis-Hastings algorithm is used. The threshold u is treated as a parameter and therefore sampled. The hyperparameters are used in the following priors: u is such that the implied unique exceedance probability psiu ~ Uniform(a_psi, b_psi); alpha1 ~ Normal(mean = a_alpha1, sd = b_alpha1); theta1 ~ Beta(a_theta1, b_theta1); alpha2 ~ Normal(mean = a_alpha2, sd = b_alpha2)
Value
A list: $pars is a data frame of iter rows of the MCMC samples, $fitted is a data frame of length(x) rows with the fitted values, amongst other quantities related to the MCMC
See Also
mcmc_pol
, mcmc_mix2
and mcmc_mix3
for MCMC for the Zipf-polylog, and 2-component and 3-component discrete extreme value mixture distributions, respectively.
Wrapper of mcmc_mix1
Description
Wrapper of mcmc_mix1
Usage
mcmc_mix1_wrapper(
df,
seed,
u_max = 2000L,
log_diff_max = 11,
a_psiu = 0.1,
b_psiu = 0.9,
m_alpha1 = 0,
s_alpha1 = 10,
a_theta1 = 1,
b_theta1 = 1,
m_alpha2 = 0,
s_alpha2 = 10,
positive = FALSE,
iter = 20000L,
thin = 1L,
burn = 10000L,
freq = 100L,
invts = 1,
mc3_or_marg = TRUE,
x_max = 1e+05
)
Arguments
df |
A data frame with at least two columns, x & count |
seed |
Integer for |
u_max |
Scalar (default 2000), positive integer for the maximum threshold to be passed to |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
a_psiu , b_psiu , m_alpha1 , s_alpha1 , a_theta1 , b_theta1 , m_alpha2 , s_alpha2 |
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
positive |
Boolean, is alpha1 positive (TRUE) or unbounded (FALSE)? |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invts |
Vector of the inverse temperatures for Metropolis-coupled MCMC (if mc3_or_marg = TRUE) or power posterior (if mc3_or_marg = FALSE) |
mc3_or_marg |
Boolean, is Metropolis-coupled MCMC to be used? Ignored if invts = c(1.0) |
x_max |
Scalar (default 100000), positive integer limit for computing the normalising constant |
Value
A list returned by mcmc_mix1
Markov chain Monte Carlo for 2-component discrete extreme value mixture distribution
Description
mcmc_mix2
returns the posterior samples of the parameters, for fitting the 2-component discrete extreme value mixture distribution. The samples are obtained using Markov chain Monte Carlo (MCMC).
Usage
mcmc_mix2(
x,
count,
u_set,
u,
alpha,
theta,
shape,
sigma,
a_psiu,
b_psiu,
a_alpha,
b_alpha,
a_theta,
b_theta,
m_shape,
s_shape,
a_sigma,
b_sigma,
positive,
a_pseudo,
b_pseudo,
pr_power,
iter,
thin,
burn,
freq,
invt,
mc3_or_marg = TRUE,
constrained = FALSE
)
Arguments
x |
Vector of the unique values (positive integers) of the data |
count |
Vector of the same length as x that contains the counts of each unique value in the full data, which is essentially rep(x, count) |
u_set |
Positive integer vector of the values u will be sampled from |
u |
Positive integer, initial value of the threshold |
alpha |
Real number greater than 1, initial value of the parameter |
theta |
Real number in (0, 1], initial value of the parameter |
shape |
Real number, initial value of the parameter |
sigma |
Positive real number, initial value of the parameter |
a_psiu , b_psiu , a_alpha , b_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma |
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
positive |
Boolean, is alpha positive (TRUE) or unbounded (FALSE)? Ignored if constrained is TRUE |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
pr_power |
Real number in [0, 1], prior probability of the discrete power law (below u). Overridden if constrained is TRUE |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invt |
Vector of the inverse temperatures for Metropolis-coupled MCMC |
mc3_or_marg |
Boolean, is invt for parallel tempering / Metropolis-coupled MCMC (TRUE, default) or marginal likelihood via power posterior (FALSE)? |
constrained |
Boolean, are alpha & shape constrained such that 1/shape+1 > alpha > 1 with the powerlaw assumed in the body & "continuity" at the threshold u (TRUE), or is there no constraint between alpha & shape, with the former governed by positive, and no powerlaw and continuity enforced (FALSE, default)? |
Details
In the MCMC, a componentwise Metropolis-Hastings algorithm is used. The threshold u is treated as a parameter and therefore sampled. The hyperparameters are used in the following priors: u is such that the implied unique exceedance probability psiu ~ Uniform(a_psi, b_psi); alpha ~ Normal(mean = a_alpha, sd = b_alpha); theta ~ Beta(a_theta, b_theta); shape ~ Normal(mean = m_shape, sd = s_shape); sigma ~ Gamma(a_sigma, scale = b_sigma). If pr_power = 1.0, the discrete power law (below u) is assumed, and the samples of theta will be all 1.0. If pr_power is in (0.0, 1.0), model selection between the polylog distribution and the discrete power law will be performed within the MCMC.
Value
A list: $pars is a data frame of iter rows of the MCMC samples, $fitted is a data frame of length(x) rows with the fitted values, amongst other quantities related to the MCMC
See Also
mcmc_pol
and mcmc_mix3
for MCMC for the Zipf-polylog and 3-component discrete extreme value mixture distributions, respectively.
Wrapper of mcmc_mix2
Description
Wrapper of mcmc_mix2
Usage
mcmc_mix2_wrapper(
df,
seed,
u_max = 2000L,
log_diff_max = 11,
a_psiu = 0.001,
b_psiu = 0.9,
m_alpha = 0,
s_alpha = 10,
a_theta = 1,
b_theta = 1,
m_shape = 0,
s_shape = 10,
a_sigma = 1,
b_sigma = 0.01,
a_pseudo = 10,
b_pseudo = 1,
pr_power = 0.5,
positive = FALSE,
iter = 20000L,
thin = 20L,
burn = 100000L,
freq = 1000L,
invts = 1,
mc3_or_marg = TRUE,
constrained = FALSE
)
Arguments
df |
A data frame with at least two columns, x & count |
seed |
Integer for |
u_max |
Scalar (default 2000), positive integer for the maximum threshold to be passed to |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
a_psiu , b_psiu , m_alpha , s_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma |
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
pr_power |
Real number in [0, 1], prior probability of the discrete power law (below u) |
positive |
Boolean, is alpha positive (TRUE) or unbounded (FALSE)? |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invts |
Vector of the inverse temperatures for Metropolis-coupled MCMC (if mc3_or_marg = TRUE) or power posterior (if mc3_or_marg = FALSE) |
mc3_or_marg |
Boolean, is Metropolis-coupled MCMC to be used? Ignored if invts = c(1.0) |
constrained |
Boolean, are alpha & shape constrained such that 1/shape+1 > alpha > 1 with the powerlaw assumed in the body & "continuity" at the threshold u (TRUE), or is there no constraint between alpha & shape, with the former governed by positive, and no powerlaw and continuity enforced (FALSE, default)? |
Value
A list returned by mcmc_mix2
Markov chain Monte Carlo for 3-component discrete extreme value mixture distribution
Description
mcmc_mix3
returns the posterior samples of the parameters, for fitting the 3-component discrete extreme value mixture distribution. The samples are obtained using Markov chain Monte Carlo (MCMC).
Usage
mcmc_mix3(
x,
count,
v_set,
u_set,
v,
u,
alpha1,
theta1,
alpha2,
theta2,
shape,
sigma,
a_psi1,
a_psi2,
a_psiu,
b_psiu,
a_alpha1,
b_alpha1,
a_theta1,
b_theta1,
a_alpha2,
b_alpha2,
a_theta2,
b_theta2,
m_shape,
s_shape,
a_sigma,
b_sigma,
powerlaw1,
positive1,
positive2,
a_pseudo,
b_pseudo,
pr_power2,
iter,
thin,
burn,
freq,
invt,
mc3_or_marg = TRUE
)
Arguments
x |
Vector of the unique values (positive integers) of the data |
count |
Vector of the same length as x that contains the counts of each unique value in the full data, which is essentially rep(x, count) |
v_set |
Positive integer vector of the values v will be sampled from |
u_set |
Positive integer vector of the values u will be sampled from |
v |
Positive integer, initial value of the lower threshold |
u |
Positive integer, initial value of the upper threshold |
alpha1 |
Real number greater than 1, initial value of the parameter |
theta1 |
Real number in (0, 1], initial value of the parameter |
alpha2 |
Real number greater than 1, initial value of the parameter |
theta2 |
Real number in (0, 1], initial value of the parameter |
shape |
Real number, initial value of the parameter |
sigma |
Positive real number, initial value of the parameter |
a_psi1 , a_psi2 , a_psiu , b_psiu , a_alpha1 , b_alpha1 , a_theta1 , b_theta1 , a_alpha2 , b_alpha2 , a_theta2 , b_theta2 , m_shape , s_shape , a_sigma , b_sigma |
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
powerlaw1 |
Boolean, is the discrete power law assumed for below v? |
positive1 |
Boolean, is alpha1 positive (TRUE) or unbounded (FALSE)? |
positive2 |
Boolean, is alpha2 positive (TRUE) or unbounded (FALSE)? |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta2 in model selection; ignored if pr_power2 = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta2 in model selection; ignored if pr_power2 = 1.0 |
pr_power2 |
Real number in [0, 1], prior probability of the discrete power law (between v and u) |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invt |
Vector of the inverse temperatures for Metropolis-coupled MCMC |
mc3_or_marg |
Boolean, is invt for parallel tempering / Metropolis-coupled MCMC (TRUE, default) or marginal likelihood via power posterior (FALSE)? |
Details
In the MCMC, a componentwise Metropolis-Hastings algorithm is used. The thresholds v and u are treated as parameters and therefore sampled. The hyperparameters are used in the following priors: psi1 / (1.0 - psiu) ~ Beta(a_psi1, a_psi2); u is such that the implied unique exceedance probability psiu ~ Uniform(a_psi, b_psi); alpha1 ~ Normal(mean = a_alpha1, sd = b_alpha1); theta1 ~ Beta(a_theta1, b_theta1); alpha2 ~ Normal(mean = a_alpha2, sd = b_alpha2); theta2 ~ Beta(a_theta2, b_theta2); shape ~ Normal(mean = m_shape, sd = s_shape); sigma ~ Gamma(a_sigma, scale = b_sigma). If pr_power2 = 1.0, the discrete power law (between v and u) is assumed, and the samples of theta2 will be all 1.0. If pr_power2 is in (0.0, 1.0), model selection between the polylog distribution and the discrete power law will be performed within the MCMC.
Value
A list: $pars is a data frame of iter rows of the MCMC samples, $fitted is a data frame of length(x) rows with the fitted values, amongst other quantities related to the MCMC
See Also
mcmc_pol
and mcmc_mix2
for MCMC for the Zipf-polylog and 2-component discrete extreme value mixture distributions, respectively.
Wrapper of mcmc_mix3
Description
Wrapper of mcmc_mix3
Usage
mcmc_mix3_wrapper(
df,
seed,
v_max = 100L,
u_max = 2000L,
log_diff_max = 11,
a_psi1 = 1,
a_psi2 = 1,
a_psiu = 0.001,
b_psiu = 0.9,
m_alpha = 0,
s_alpha = 10,
a_theta = 1,
b_theta = 1,
m_shape = 0,
s_shape = 10,
a_sigma = 1,
b_sigma = 0.01,
a_pseudo = 10,
b_pseudo = 1,
pr_power2 = 0.5,
powerlaw1 = FALSE,
positive1 = FALSE,
positive2 = TRUE,
iter = 20000L,
thin = 20L,
burn = 100000L,
freq = 1000L,
invts = 1,
mc3_or_marg = TRUE
)
Arguments
df |
A data frame with at least two columns, x & count |
seed |
Integer for |
v_max |
Scalar (default 100), positive integer for the maximum lower threshold to be passed to |
u_max |
Scalar (default 2000), positive integer for the maximum upper threshold to be passed to |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
a_psi1 , a_psi2 , a_psiu , b_psiu , m_alpha , s_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma |
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta2 in model selection; ignored if pr_power2 = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta2 in model selection; ignored if pr_power2 = 1.0 |
pr_power2 |
Real number in [0, 1], prior probability of the discrete power law (between v and u) |
powerlaw1 |
Boolean, is the discrete power law assumed for below v? |
positive1 |
Boolean, is alpha1 positive (TRUE) or unbounded (FALSE)? |
positive2 |
Boolean, is alpha2 positive (TRUE) or unbounded (FALSE)? |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invts |
Vector of the inverse temperatures for Metropolis-coupled MCMC (if mc3_or_marg = TRUE) or power posterior (if mc3_or_marg = FALSE) |
mc3_or_marg |
Boolean, is Metropolis-coupled MCMC to be used? Ignored if invts = c(1.0) |
Value
A list returned by mcmc_mix3
Markov chain Monte Carlo for Zipf-polylog distribution
Description
mcmc_pol
returns the samples from the posterior of alpha and theta, for fitting the Zipf-polylog distribution to the data x. The samples are obtained using Markov chain Monte Carlo (MCMC). In the MCMC, a Metropolis-Hastings algorithm is used.
Usage
mcmc_pol(
x,
count,
alpha,
theta,
a_alpha,
b_alpha,
a_theta,
b_theta,
a_pseudo,
b_pseudo,
pr_power,
iter,
thin,
burn,
freq,
invt,
mc3_or_marg,
x_max
)
Arguments
x |
Vector of the unique values (positive integers) of the data |
count |
Vector of the same length as x that contains the counts of each unique value in the full data, which is essentially rep(x, count) |
alpha |
Real number greater than 1, initial value of the parameter |
theta |
Real number in (0, 1], initial value of the parameter |
a_alpha |
Real number, mean of the prior normal distribution for alpha |
b_alpha |
Positive real number, standard deviation of the prior normal distribution for alpha |
a_theta |
Positive real number, first parameter of the prior beta distribution for theta; ignored if pr_power = 1.0 |
b_theta |
Positive real number, second parameter of the prior beta distribution for theta; ignored if pr_power = 1.0 |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
pr_power |
Real number in [0, 1], prior probability of the discrete power law |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invt |
Vector of the inverse temperatures for Metropolis-coupled MCMC |
mc3_or_marg |
Boolean, is invt for parallel tempering / Metropolis-coupled MCMC (TRUE, default) or marginal likelihood via power posterior (FALSE)? |
x_max |
Scalar, positive integer limit for computing the normalising constant |
Value
A list: $pars is a data frame of iter rows of the MCMC samples, $fitted is a data frame of length(x) rows with the fitted values, amongst other quantities related to the MCMC
See Also
mcmc_mix2
and mcmc_mix3
for MCMC for the 2-component and 3-component discrete extreme value mixture distributions, respectively.
Wrapper of mcmc_pol
Description
Wrapper of mcmc_pol
Usage
mcmc_pol_wrapper(
df,
seed,
alpha_init = 1.5,
theta_init = 0.5,
m_alpha = 0,
s_alpha = 10,
a_theta = 1,
b_theta = 1,
a_pseudo = 10,
b_pseudo = 1,
pr_power = 0.5,
iter = 20000L,
thin = 20L,
burn = 100000L,
freq = 1000L,
invts = 1,
mc3_or_marg = TRUE,
x_max = 1e+05
)
Arguments
df |
A data frame with at least two columns, x & count |
seed |
Integer for |
alpha_init |
Real number greater than 1, initial value of the parameter |
theta_init |
Real number in (0, 1], initial value of the parameter |
m_alpha |
Real number, mean of the prior normal distribution for alpha |
s_alpha |
Positive real number, standard deviation of the prior normal distribution for alpha |
a_theta |
Positive real number, first parameter of the prior beta distribution for theta; ignored if pr_power = 1.0 |
b_theta |
Positive real number, second parameter of the prior beta distribution for theta; ignored if pr_power = 1.0 |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
pr_power |
Real number in [0, 1], prior probability of the discrete power law |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invts |
Vector of the inverse temperatures for Metropolis-coupled MCMC (if mc3_or_marg = TRUE) or power posterior (if mc3_or_marg = FALSE) |
mc3_or_marg |
Boolean, is Metropolis-coupled MCMC to be used? Ignored if invts = c(1.0) |
x_max |
Scalar (default 100000), positive integer limit for computing the normalising constant |
Value
A list returned by mcmc_pol
Obtain set of thresholds with high posterior density for the TZP-power-law mixture model
Description
obtain_u_set_mix1
computes the profile posterior density of the threshold u, and subsets the thresholds (and other parameter values) with high profile values i.e. within a certain value from the maximum posterior density. The set of u can then be used for mcmc_mix1
.
Usage
obtain_u_set_mix1(
df,
positive = FALSE,
u_max = 2000L,
log_diff_max = 11,
alpha1_init = 0.01,
theta1_init = exp(-1),
alpha2_init = 2,
a_psiu = 0.1,
b_psiu = 0.9,
m_alpha1 = 0,
s_alpha1 = 10,
a_theta1 = 1,
b_theta1 = 1,
m_alpha2 = 0,
s_alpha2 = 10,
x_max = 1e+05
)
Arguments
df |
A data frame with at least two columns, x & count |
positive |
Boolean, is alpha1 positive (TRUE) or unbounded (FALSE, default)? |
u_max |
Positive integer for the maximum threshold |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
alpha1_init |
Scalar, initial value of alpha1 |
theta1_init |
Scalar, initial value of theta1 |
alpha2_init |
Scalar, initial value of alpha2 |
a_psiu , b_psiu , m_alpha1 , s_alpha1 , a_theta1 , b_theta1 , m_alpha2 , s_alpha2 |
Scalars, hyperparameters of the priors for the parameters |
x_max |
Scalar (default 100000), positive integer limit for computing the normalising constant |
Value
A list: u_set
is the vector of thresholds with high posterior density, init
is the data frame with the maximum profile posterior density and associated parameter values, profile
is the data frame with all thresholds with high posterior density and associated parameter values, scalars
is the data frame with all arguments (except df)
See Also
mcmc_mix1_wrapper
that wraps obtain_u_set_mix1
and mcmc_mix1
, obtain_u_set_mix2
for the equivalent function for the 2-component mixture model
Obtain set of thresholds with high posterior density for the 2-component mixture model
Description
obtain_u_set_mix2
computes the profile posterior density of the threshold u, and subsets the thresholds (and other parameter values) with high profile values i.e. within a certain value from the maximum posterior density. The set of u can then be used for mcmc_mix2
.
Usage
obtain_u_set_mix2(
df,
powerlaw = FALSE,
positive = FALSE,
u_max = 2000L,
log_diff_max = 11,
alpha_init = 0.01,
theta_init = exp(-1),
shape_init = 0.1,
sigma_init = 1,
a_psiu = 0.001,
b_psiu = 0.9,
m_alpha = 0,
s_alpha = 10,
a_theta = 1,
b_theta = 1,
m_shape = 0,
s_shape = 10,
a_sigma = 1,
b_sigma = 0.01
)
Arguments
df |
A data frame with at least two columns, x & count |
powerlaw |
Boolean, is the power law (TRUE) or polylogarithm (FALSE, default) assumed? |
positive |
Boolean, is alpha positive (TRUE) or unbounded (FALSE, default)? |
u_max |
Positive integer for the maximum threshold |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
alpha_init |
Scalar, initial value of alpha |
theta_init |
Scalar, initial value of theta |
shape_init |
Scalar, initial value of shape parameter |
sigma_init |
Scalar, initial value of sigma |
a_psiu , b_psiu , m_alpha , s_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma |
Scalars, hyperparameters of the priors for the parameters |
Value
A list: u_set
is the vector of thresholds with high posterior density, init
is the data frame with the maximum profile posterior density and associated parameter values, profile
is the data frame with all thresholds with high posterior density and associated parameter values, scalars
is the data frame with all arguments (except df)
See Also
mcmc_mix2_wrapper
that wraps obtain_u_set_mix2
and mcmc_mix2
, obtain_u_set_mix1
for the equivalent function for the TZP-power-law mixture model
Obtain set of thresholds with high posterior density for the constrained 2-component mixture model
Description
obtain_u_set_mix2_constrained
computes the profile posterior density of the threshold u, and subsets the thresholds (and other parameter values) with high profile values i.e. within a certain value from the maximum posterior density. The set of u can then be used for mcmc_mix2
. Power law is assumed for the body, and alpha is assumed to be greater than 1.0 and smaller than 1.0/shape+1.0
Usage
obtain_u_set_mix2_constrained(
df,
u_max = 2000L,
log_diff_max = 11,
alpha_init = 2,
shape_init = 0.1,
sigma_init = 1,
a_psiu = 0.001,
b_psiu = 0.9,
m_alpha = 0,
s_alpha = 10,
a_theta = 1,
b_theta = 1,
m_shape = 0,
s_shape = 10,
a_sigma = 1,
b_sigma = 0.01
)
Arguments
df |
A data frame with at least two columns, x & count |
u_max |
Positive integer for the maximum threshold |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
alpha_init |
Scalar, initial value of alpha |
shape_init |
Scalar, initial value of shape parameter |
sigma_init |
Scalar, initial value of sigma |
a_psiu , b_psiu , m_alpha , s_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma |
Scalars, hyperparameters of the priors for the parameters |
Value
A list: u_set
is the vector of thresholds with high posterior density, init
is the data frame with the maximum profile posterior density and associated parameter values, profile
is the data frame with all thresholds with high posterior density and associated parameter values, scalars
is the data frame with all arguments (except df)
See Also
obtain_u_set_mix2
for the unconstrained version
Obtain set of thresholds with high posterior density for the 3-component mixture model
Description
obtain_u_set_mix3
computes the profile posterior density of the thresholds v & u, and subsets the thresholds (and other parameter values) with high profile values i.e. within a certain value from the maximum posterior density. The sets of v & u can then be used for mcmc_mix3
.
Usage
obtain_u_set_mix3(
df,
powerlaw1 = FALSE,
powerlaw2 = FALSE,
positive1 = FALSE,
positive2 = TRUE,
log_diff_max = 11,
v_max = 100L,
u_max = 2000L,
alpha_init = 0.01,
theta_init = exp(-1),
shape_init = 1,
sigma_init = 1,
a_psi1 = 1,
a_psi2 = 1,
a_psiu = 0.001,
b_psiu = 0.9,
m_alpha = 0,
s_alpha = 10,
a_theta = 1,
b_theta = 1,
m_shape = 0,
s_shape = 10,
a_sigma = 1,
b_sigma = 0.01
)
Arguments
df |
A data frame with at least two columns, degree & count |
powerlaw1 |
Boolean, is the power law (TRUE) or polylogarithm (FALSE, default) assumed for the left tail? |
powerlaw2 |
Boolean, is the power law (TRUE) or polylogarithm (FALSE, default) assumed for the middle bulk? |
positive1 |
Boolean, is alpha positive (TRUE) or unbounded (FALSE, default) for the left tail? |
positive2 |
Boolean, is alpha positive (TRUE) or unbounded (FALSE, default) for the middle bulk? |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
v_max |
Positive integer for the maximum lower threshold |
u_max |
Positive integer for the maximum upper threshold |
alpha_init |
Scalar, initial value of alpha |
theta_init |
Scalar, initial value of theta |
shape_init |
Scalar, initial value of shape parameter |
sigma_init |
Scalar, initial value of sigma |
a_psi1 , a_psi2 , a_psiu , b_psiu , m_alpha , s_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma |
Scalars, hyperparameters of the priors for the parameters |
Value
A list: v_set
is the vector of lower thresholds with high posterior density, u_set
is the vector of upper thresholds with high posterior density, init
is the data frame with the maximum profile posterior density and associated parameter values, profile
is the data frame with all thresholds with high posterior density and associated parameter values, scalars
is the data frame with all arguments (except df)
See Also
mcmc_mix3_wrapper
that wraps obtain_u_set_mix3
and mcmc_mix3
Reshape the data frame of dependencies
Description
Reshape the data frame of dependencies
Usage
reshape_dep(x, names)
Arguments
x |
A character vector of dependencies, each element of which corresponds to an individual package |
names |
A character vector of package names of the same length as x |
Value
A data frame of dependencies