Help for package gnonadd

Type:

Package

Title:

Various Non-Additive Models for Genetic Associations

Version:

1.0.3

Description:

The goal of 'gnonadd' is to simplify workflows in the analysis of non-additive effects of sequence variants. This includes variance effects (Ivarsdottir et. al (2017) <doi:10.1038/ng.3928>), correlation effects, interaction effects and dominance effects. The package also includes convenience functions for visualization.

URL:

https://github.com/DecodeGenetics/gnonadd

BugReports:

https://github.com/DecodeGenetics/gnonadd/issues

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.2.3

Depends:

R (≥ 2.10)

Imports:

ggplot2

Suggests:

MASS

NeedsCompilation:

Packaged:

2024-12-18 23:28:43 UTC; gudmundureinarsson

Author:

Audunn S. Snaebjarnarson [aut, cre, ctb], Gudmundur Einarsson [aut, ctb], Daniel F. Gudbjartsson [aut, ctb], deCODE Genetics/AMGEN [cph, fnd]

Maintainer:

Audunn S. Snaebjarnarson <audunn.snaebjarnarson@decode.is>

Repository:

CRAN

Date/Publication:

2024-12-19 09:40:02 UTC

Creates poligenic risk scores

Description

This function creates genetic risk scores, with the option of including interactions or dominance effects. The score is automatically shifted to have mean 0.

Usage

PRS_creator(
  g,
  betas,
  dominance_effects = rep(0, length(betas)),
  interaction_effects = matrix(0, nrow = 0, ncol = 0),
  log_scale = FALSE
)

Arguments

g

A matrix, where each colomn represents a variant and each line represents a subject

betas

A numeric vector, representing the (additive) effects of the variants

dominance_effects

A numeric vector, representing dominance effects of the variants

interaction_effects

A matrix with three columns. First two columns are integers that correspond to the variants that are interacting. The third column is the effect size.

log_scale

A Boolian variabe. If true all analysis is done on log-transformed effect values and the resulting score is transformed back to an exponential scale in the end.

Value

A numeric vector with a poligenic risk score for each subject

Examples

g_vec <- matrix(0, nrow = 100000, ncol = 5)
freqs <- runif(ncol(g_vec), min = 0, max = 1)
for(i in 1:ncol(g_vec)){
 g_vec[,i] <- rbinom(100000, 2, freqs[i])
}
beta_vec <- runif(5, min = -0.5, max = 0.5)
dom_vec <- runif(5, min = -0.5, max = 0.5)
int_vec <- matrix(0,nrow = 2, ncol = 3)
int_vec[, 1] <- c(1, 3)
int_vec[, 2] <- c(2, 5)
int_vec[, 3] <- runif(2, min = -0.5, max = 0.5)
res <- PRS_creator(g_vec, beta_vec, dominance_effects = dom_vec, interaction_effects = int_vec)

Uncertanty association

Description

This function finds the association between the predicted uncertanty of some estimates of a trait to the "actual uncertanty" of the estimates. Suppose we have estimates of some trait (this might be a polygenic risk score). Moreover, suppose we have assigned a variance value to each estimate (this might be a variance risk score) to reflect how certain we believe we are about each estimate. Given the true trait values, this function evaluates how well the assigned variance values reflect reality.

We use a likelihood ratio test with 1 degree of freedom * H0: y~N(mu+b*m_score,sigma_sq), * H1: y~N(mu+b*m_score,sigma_sq*(v_score)^a), where y is the trait m_score is the estimate of the trait and v_score is the variance assigned to the estimate. Thus H0 has three degrees of freedom (mu,b,sigma_sq), whereas H1 has four (mu,b,sigma_sq,a)

Usage

Var.assoc(qt, m_score, v_score, iter = 50)

Arguments

qt

A numeric vector.

m_score

A numeric vector

v_score

A numeric vector with positive values

iter

A number of iterations for the Gauss-Newton algorithm

Value

A list with the values:

* a, the association between v_score and the actual variance. * pval, the p-value of the likelihood ratio test

Examples


n_val <- 50000L
trait_vec <- rnorm(n_val,0,1)
var_vec <- exp(rnorm(n_val,0,0.1))
est_vec <- trait_vec+rnorm(n_val,0,var_vec)
res <- Var.assoc(trait_vec,est_vec,var_vec, iter = 20)

Actual variance vs predicted variance plot

Description

This tool creates a line plot that compares the predicted variance of data to its actual variance.

Usage

VarGS.plot(
  qt,
  v_score,
  bins = 10,
  xlab = "Predicted variance",
  ylab = "Variance",
  title = ""
)

Arguments

qt

A numeric vector.

v_score

A numeric vector.

bins

An integer.

xlab

A string.

ylab

A string.

title

A string.

Value

A plot comparing predicted variance to actual variance.

Examples

n_val <- 100000L
v_vec <- exp(rnorm(n_val, 0, 0.1))
qt_vec <- stats::rnorm(n_val, 0, sqrt(v_vec))
VarGS.plot(qt_vec, v_vec)

Violin plot by genotype

Description

This tool creates violin plots corresponding to each genotype.

Usage

Viol.by.gen(qt, g, trait_name = "qt trait", title = "")

Arguments

qt

A numeric vector.

g

An integer vector.

trait_name

A string.

title

A string.

Value

A violin plot

Examples

n_val <- 50000L
geno_vec <- sample(c(0, 1, 2), size = n_val, replace = TRUE)
qt_vec <- rnorm(n_val) * (1.3^geno_vec) + 1 * geno_vec
Viol.by.gen(qt_vec, geno_vec)

Genetic variance effects

Description

This function estimates the variance effect of a genetic variant on a quantitative trait. The genotype is coded as 0 (non-carrier), 1 (single copy of effect allele) and 2 (two copies of effect allele). The variance effect (alpha) is modelled to be multiplicitive. We use a likelihood ratio test with 1 degree of freedom, * H0: y~N(mu_g,sigma^2) * H1: y~N(mu_g,sigma^2\*alpha^g) Under the alternative model, the variance of the trait changes multiplicatively with genotype: Var(y|g=j)=alpha^j*sigma^2, j in 0,1,2

Usage

alpha.calc(qt, g)

Arguments

qt

A numeric vector.

g

An integer vector.

Value

A list with the values:

* alpha, the estimated variance effect * sigma2_alt, The variance estimated for non-carriers * pval, the p-value of the likelihood ratio test * chi2, the chi squared statistics (with one degree of freedom) corrisponding to the p-value

Examples

n_val <- 50000L
geno_vec <- sample(c(0, 1, 2), size = n_val, replace = TRUE)
qt_vec <- rnorm(n_val) * (1.1^geno_vec)
res <- alpha.calc(qt_vec, geno_vec)

Conditional analysis for genetic variance effects

Description

We estimate the variance effect of a primary variant conditioned on one or more secondary variants. We apply a likelyhood ratio test with one degree of freedom

H0: All secondary variants have a variance effect, but not the primary one. H1: All variants have a variance effect, including the primary one

Under both models we assume a full mean effect model. That is, the number of mean-value parameters is 3^(n+1), where n is the number of covariates Thus the null model has 3^(n+1)+n degrees of freedom whereas the alternative model has 3^(n+1)+n+1 Due to the exponential growth of free parameters, the test might have low statistical power if many covariates are used

IMPORTANT NOTE: We use the Gauss-Newton algorithm to estimate many parameters. We do not check if the algorithm converges to the true minimum values, or if it converges at all. Thus, we advise against blindly believing all results

Usage

alpha.cond(qt, g, g_covar, iter_num = 50)

Arguments

qt

A numeric vector.

g

An integer vector.

g_covar

An integer matrix where each column corresponds to a genetic covariate

iter_num

An integer. Represents the number of iterations performed in the Gauss-Newton algorithm

Value

A list with the values: * alpha, the estimated variance effect, conditioned on the covariates * pval, the p-value corresponding to alpha

Examples

n <- 10000
qt <- rnorm(n)
g <- rbinom(n, 2, 0.3)
qt <- qt * 1.2^g
g_covar <- as.data.frame(matrix(0, nrow = n, ncol = 3))
for(i in 1:ncol(g_covar)){
  freq <- runif(1, min = 0, max = 1)
  g_covar[, i] <- rbinom(n, 2, freq)
}
res <- alpha.cond(qt, g, g_covar)

variance effect conditioned on continuous variables

Description

We estimate the variance effect of a variant conditioned on one or more continous variable We apply a likelyhood ratio test with one degree of freedom

H0: All covariates have a variance effect, but not the variant H1: The variant has a variance effect, and the covariates as well

Usage

alpha.continuous.cond(qt, g, x, iter_num = 50, eps_param = 1e-10)

Arguments

qt

A numeric vector.

g

An integer vector.

x

A numeric matrix, each column represents a covariate.

iter_num

An integer. Represents the number of iterations performed in the Gauss-Newton algorithm

eps_param

A number. The Gauss-Newton algorithm terminates if the incriment change of all variance estimates is smaller than this number.

Value

A list with the values: * alpha, the estimated variance effect, conditioned on the covariates * pval, the p-value corresponding to alpha

Examples

n_val <- 50000
x <- matrix(0,nrow = n_val, ncol = 4)
for(i in 1:4) {
x[, i] <- rnorm(n_val)
}
g_vec <- rbinom(n_val,2,0.3)
var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4] + 0.3 * g_vec)
qt_vec <- rnorm(n_val, 0, sqrt(var_vec))
res <- alpha.continuous.cond(qt_vec, g_vec, x)

Variance parameters

Description

This function jointly estimates the variance effect of a set of (continuous) variables on a qt trait. More precisely. It finds the maximum likelyhood estimators.

Usage

alpha.multi.est(
  qt,
  x,
  iter_num = 50,
  eps_param = 1e-10,
  initial_guess = rep(0, ncol(x))
)

Arguments

qt

A numeric vector.

x

A numeric matrix, each column represents a covariate.

iter_num

An integer. Represents the number of iterations performed in the Gauss-Newton algorithm

eps_param

A number. The Gauss-Newton algorithm terminates if the incriment change of all variance estimates is smaller than this number.

initial_guess

A vector of length ncol(x). Represents the initial guess of parameters for the Gauss-Newton algorithm.

Value

A vector with a variance estimate for each variable.

Examples

n_val <- 50000
x <- matrix(0,nrow = n_val, ncol = 4)
for(i in 1:4) {
x[, i] <- rnorm(n_val)
}
var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4])
qt_vec <- rnorm(n_val, 0, sqrt(var_vec))
res <- alpha.multi.est(qt_vec, x)

Calibration for the correlation test

Description

This function finds appropriate scaling_parameters for the kappa_calc function by means of bootstrapping.

Usage

corr.calibration(qt1, qt2, reps = 10000)

Arguments

qt1

A numeric vector.

qt2

A numeric vector.

reps

The number of repeates we want to perform for each sample size

Value

A list with the values:

* bias_scale, a value to determine the bias for the correlation estimators * weight_scale, a value to determine how much weight we assign to each correlation estimator * safe_weight_scale, same as weight scale, but larger. Using this weight decreases the chance of type 1 error, with the cost of statistical power.

Examples

n_val <- 10000
Q <- MASS::mvrnorm(n = n_val, mu = c(0,0), Sigma = matrix(c(1,0.3,0.3,1),
                   nrow = 2, ncol = 2))
qt1_val <- Q[,1]
qt2_val <- Q[,2]
res <- corr.calibration(qt1_val, qt2_val, 10)

Genetic dominance effects

Description

This function estimates the dominance effect of a genetic variant on a quantitatvie trait Nothing fancy here. We apply a simple linear regression model to estimate dominance effects. We include a linear term, coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. We also include a dominance term, coded as 1 for homozygous carriers and 0 for others. Effect size and significance is based on the dominance term.

Usage

dominance.calc(
  qt,
  g,
  round_imputed = FALSE,
  covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0))
)

Arguments

qt

A numeric vector

g

A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2.

round_imputed

A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis

covariates

A dataframe containing any covariates that should be used; one column per covariate.

Value

A list with the dominanc effect and corresponding standard error, t statistic and p-value

Examples

g_vec <- rbinom(100000, 2, 0.3)
qt_vec <- rnorm(100000) + 0.2 * g_vec^2
res <- dominance.calc(qt_vec, g_vec)

Genetic dominance effects on a case control variable

Description

This function estimates the dominance effect of a genetic variant on a case-control variable We apply a logistic regression model to estimate dominance effects. We include a linear term, coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. We also include a dominance term, coded as 1 for homozygous carriers and 0 for others. Effect size and significance is based on the dominance term.

Usage

dominance_CC.calc(
  cc,
  g,
  yob = rep(-1, length(cc)),
  sex = rep(-1, length(cc)),
  round_imputed = FALSE,
  covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0))
)

Arguments

cc

A case control vector, containing 0's and 1's

g

A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2.

yob

A numerical vector containing year of birth. If some are unknown they should be marked as -1

sex

A numerical vector containing sex, coded 0 for males, 1 for females and -1 for unknown

round_imputed

A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis

covariates

A dataframe containing any other covariates that should be used; one column per covariate.

Value

A list with the dominanc effect (on log-scale) and corresponding standard error, z statistic and p-value

Examples

g_vec <- rbinom(100000, 2, 0.3)
cc_vec <- rbinom(100000, 1, 0.1 * (1.2 ^ (g_vec^2)))
res <- dominance_CC.calc(cc_vec, g_vec)

Ellipse best fit plot

Description

This tool creates a scatter plot along with regression lines. Additionally it finds and plots the best ellipses that fit the data.

Usage

ellipse.by.gen(
  qt1,
  qt2,
  g,
  trait_name1 = "qt trait 1",
  trait_name2 = "qt trait 2",
  title = "",
  sample_size = 500
)

Arguments

qt1

A numeric vector.

qt2

A numeric vector.

g

An integer vector.

trait_name1

A string.

trait_name2

A string.

title

A string.

sample_size

A positive integer.

Value

A scatter plot.

Examples

n_val <- 10000L
geno_vec <- c(rep(0, n_val), rep(1, n_val), rep(2, n_val))
qt_g0 <- MASS::mvrnorm(n_val, mu = c(0, 0), Sigma = matrix(c(0.93, 0.88, 0.88, 0.92), ncol = 2))
qt_g1 <- MASS::mvrnorm(n_val, mu = c(0, 0), Sigma = matrix(c(0.98, 0.88, 0.88, 0.90), ncol = 2))
qt_g2 <- MASS::mvrnorm(n_val, mu = c(0, 0), Sigma = matrix(c(1.57, 0.81, 0.81, 0.59), ncol = 2))
qt_vec <- rbind(qt_g0, qt_g1)
qt_vec <- rbind(qt_vec, qt_g2)
res <- ellipse.by.gen(qt_vec[, 1], qt_vec[, 2], geno_vec)

Variant-Environmental interaction effects

Description

This function estimates the interaction effect of a genetic variant with an environmental factor on a quantitatvie trait We apply a simple linear regression model to estimate interaction effects. We include a linear term for the variant and environmental variable seperately. The variant is coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. The environmental variable is rank-normalized automatically as part of the function. The interaction term is defined as the product of the genetic and the (normalized) environmental variables. Effect size and significance is based on the interaction term.

Usage

env_interaction.calc(
  qt,
  g,
  env,
  round_imputed = FALSE,
  dominance_term = FALSE,
  square_env = FALSE,
  covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0))
)

Arguments

qt

A numeric vector

g

A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2.

env

A numeric vector with an environmental variable

round_imputed

A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis.

dominance_term

A boolian variable determining whether a dominance term for the variant should be included as a covariates in the analysis

square_env

A boolian variable determining whether the square of the environmental trait should be included as a covariate in the analysis

covariates

A dataframe containing any other covariates that should be used; one column per covariate

Value

A list with the environmental interaction effect and corresponding standard error, t statistic and p-value

Examples

g_vec <- rbinom(100000, 2, 0.1)
env_vec <- round(runif(100000,min=0,max=6))
qt_vec <- rnorm(100000) + 0.1 * g_vec + 0.05 * env_vec + 0.05 * g_vec * env_vec
res <- env_interaction.calc(qt_vec, g_vec, env_vec)

Variant-Environmental interaction effects on a case control variable

Description

This function estimates the interaction effect of a genetic variant with an environmental factor on a case control variable We apply a simple logistic regression model to estimate interaction effects. We include a linear term for the variant and environmental variable seperately. The variant is coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. The environmental variable is rank-normalized automatically as part of the function. The interaction term is defined as the product of the genetic and the (normalized) environmental variable. Effect size and significance is based on the interaction term.

Usage

env_interaction_CC.calc(
  cc,
  g,
  env,
  yob = rep(-1, length(cc)),
  sex = rep(-1, length(cc)),
  round_imputed = FALSE,
  dominance_term = FALSE,
  square_env = FALSE,
  covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0))
)

Arguments

cc

A numeric vector

g

A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2.

env

A numeric vector with an environmental variable

yob

A numerical vector containing year of birth. If some are unknown they should be marked as -1

sex

A numerical vector containing sex, coded 0 for males, 1 for females and -1 for unknown

round_imputed

A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis.

dominance_term

A boolian variable determining whether a dominance term for the variant should be included as a covariates in the analysis

square_env

A boolian variable determining whether the square of the environmental trait should be included as a covariate in the analysis

covariates

A dataframe containing any other covariates that should be used; one column per covariate

Value

A list with the environmental interaction effect and corresponding standard error, t statistic and p-value

Examples

g_vec <- rbinom(100000, 2, 0.9)
env_vec <- round(runif(100000, min = 0, max = 6))
cc_vec <- rbinom(100000,1,0.1 * (1.05^g_vec) *
          (1.1^env_vec)* (1.1 ^ (g_vec * env_vec)))
res <- env_interaction_CC.calc(cc_vec, g_vec, env_vec)

Expected variance effect from additive effect

Description

This function interpolates data from a simple simulation to give an estimate of the variance effect induced by an additive effect. The simulation code is stored under inst/raw/. We assume that the trait has been inverse normal transformed. Under the simulation, there is no variance effect, so the variance effect is fully induced by the inverse normal transform.

Usage

expected.variance.effect(maf, beta_add)

Arguments

maf

Minor allele frequency of the variant, should be in the range 0 to 0.5.

beta_add

Additive effect of the variant, should be in the range 0 to 3.5. This variable can be a vector of values.

Value

The expected variance effect for the variant from the given maf, beta combination.

Examples

maf <- 0.1
beta_val <- 0.3
expected_var <- expected.variance.effect(maf, beta_val)

beta_vec <- seq(0.1,2, length.out = 20)
expected_var <- expected.variance.effect(maf, beta_vec)

Histogram by genotype

Description

This tool creates three histogram plots. One per genotype. Additionally, outliers are colored red (by default subjects that are in the top and bottom 2.5 and blue lines are added to indicate the mean and (by default) one standard deviation in each direction.

Usage

hist_by_gen(
  qt,
  g,
  bins = 100,
  trait_name = "qt trait",
  title = "",
  outlier_quantiles = c(0.025, 0.975),
  sd_lines = c(1, 1)
)

Arguments

qt

A numeric vector.

g

An integer vector.

bins

An integer.

trait_name

A string.

title

A string.

outlier_quantiles

A vector with length 2.

sd_lines

A vector with length 2.

Value

A histgram plot

Examples

n_val <- 50000L
geno_vec <- sample(c(0, 1, 2), size = n_val, replace = TRUE)
qt_vec <- rnorm(n_val) * (1.3^geno_vec) + 1 * geno_vec
hist_by_gen(qt_vec, geno_vec)

Variant-Variant interaction effects

Description

This function estimates the interaction effect of a pair of genetic variant on a quantitatvie trait We apply a simple linear regression model to estimate interaction effects. We include a linear term for each variant seperately, coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. We also include an interaction term, coded as the product of the two genotype values. Effect size and significance is based on the interaction term.

Usage

interaction.calc(
  qt,
  g1,
  g2,
  round_imputed = FALSE,
  dominance_terms = FALSE,
  covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0))
)

Arguments

qt

A numeric vector

g1

A vector with (possibly imputed) genotype values. All entries should be larger than or equal to 0 and smaller than or equal to 2.

g2

A vector with (possibly imputed) genotype values. All entries should be larger than or equal to 0 and smaller than or equal to 2.

round_imputed

A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis.

dominance_terms

A boolian variable determining whether dominance terms for the variants should be included as covariates in the analysis

covariates

A dataframe containing any other covariates that should be used; one column per covariate

Value

A list with the interaction effect and corresponding standard error, t statistic and p-value

Examples

g1_vec <- rbinom(100000, 2, 0.9)
g2_vec <- rbinom(100000, 2, 0.1)
qt_vec <- rnorm(100000) + 0.1 * g1_vec + 0.2 * g2_vec +0.4 * g1_vec * g2_vec
res <- interaction.calc(qt_vec, g1_vec, g2_vec)

Variant-Variant interaction effects on a case control variable

Description

This function estimates the interaction effect of a pair of genetic variant on a case-control variable We apply a logistic regression model to estimate interaction effects. We include a linear term for each variant seperately, coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. We also include an interaction term, coded as the product of the two genotype values. Effect size and significance is based on the interaction term.

Usage

interaction_CC.calc(
  cc,
  g1,
  g2,
  yob = rep(-1, length(cc)),
  sex = rep(-1, length(cc)),
  round_imputed = FALSE,
  dominance_terms = FALSE,
  covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0))
)

Arguments

cc

A numeric vector

g1

A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2.

g2

A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2.

yob

A numerical vector containing year of birth. If some are unknown they should be marked as -1

sex

A numerical vector containing sex, coded 0 for males, 1 for females and -1 for unknown

round_imputed

A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis.

dominance_terms

A boolian variable determining whether dominance terms for the variants should be included as covariates in the analysis

covariates

A dataframe containing any other covariates that should be used; one column per covariate

Value

A list with the interaction effect (on log-scale) and corresponding standard error, z statistic and p-value

Examples

g1_vec <- rbinom(100000, 2, 0.9)
g2_vec <- rbinom(100000, 2, 0.1)
cc_vec <- rbinom(100000,1,0.1 * (1.05^g1_vec) *
          (1.05^g2_vec) * (1.3 ^ (g1_vec * g2_vec)))
res <- interaction_CC.calc(cc_vec, g1_vec, g2_vec)

Genetic correlation effects

Description

This function estimates the correlation effect of a genetic variant on a pair of quantitative traits. The effect (kappa) is measured on Fisher transformed correlation values. The genotype is coded as 0 (non-carrier), 1 (single copy of effect allele) and 2 (two copies of effect allele). We use a likelihood ratio test with 1 degree of freedom: * H0: z_0 = z_1 = z_2, * H1: z_j = z_null +kappa*j, where z_j is the Fisher-transformed correlation between the traits amongst subjects of genotype j.

If the traits follow a joint normal distribution, then (thoretically) the Fisher-transformed sample correlation (z) is approximately normally distributed with mean 0 and standard error 1/sqrt(n-3). Otherwise, we have to correct for possible bias in the estimators and scale the weights we assign to them appropriately

Note, that even though each trait follows a normal distribution individually, that does not necessarily imply that the pair of them follow a joint normal distirbution.

Usage

kappa_calc(qt1, qt2, g, weight_scale = 1, bias_scale = 0)

Arguments

qt1

A numeric vector.

qt2

A numeric vector.

g

An integer vector.

weight_scale

Used to appropriately scale the weight assigned to the correlation estimators

bias_scale

Used to appropriately scale the bias of the correlation estimators

Value

A list with the values:

* kappa, the estimated correlation effect * pval, the p-value of the likelihood ratio test

Examples

Sigma0 <- matrix(c(1,0,0,1),nrow=2,ncol=2)
Sigma1 <- matrix(c(1,0.3,0.3,1),nrow=2,ncol=2)
Sigma2 <- matrix(c(1,0.6,0.6,1),nrow=2,ncol=2)
geno_vec <- c(rep(0,10000),rep(1,1000),rep(2,100))
Q0 <- MASS::mvrnorm(n = 10000, mu = c(0,0), Sigma = Sigma0)
Q1 <- MASS::mvrnorm(n = 1000, mu = c(0,0), Sigma = Sigma1)
Q2 <- MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = Sigma2)
Q <- rbind(Q0,Q1)
Q <- rbind(Q,Q2)
qt1_val <- Q[,1]
qt2_val <- Q[,2]
res <- kappa_calc(qt1_val, qt2_val, geno_vec)

Pairwise environmental interaction effects

Description

Given a set of variants and environmental traits, and a single quantitative trait, this function calculates the interaction effect of all possible variant-environmental pairs

Usage

pairwise_env_int.calc(
  qt,
  g,
  env,
  round_imputed = FALSE,
  dominance_term = FALSE,
  square_env = FALSE,
  covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)),
  variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_"),
  env_names = paste(rep("env", ncol(env)), as.character(1:ncol(env)), sep = "_")
)

Arguments

qt

A numeric vector

g

A matrix, where each colomn represents a variant

env

A matrix, where each row represents an environmental variable

round_imputed

A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis.

dominance_term

A boolian variable determining whether a dominance term for the variant should be included as a covariates in the analysis

square_env

A boolian variable determining whether the square of the environmental trait should be included as a covariate in the analysis

covariates

A dataframe containing any other covariates that should be used; one column per covariate

variant_names

A list of the names of the variants

env_names

A list of the names of the environmental variables

Value

A dataframe with all possible variant-environmental pairs and their estimated interaction effect

Examples

g_vec <- matrix(0, nrow = 100000, ncol = 3)
freqs <- runif(ncol(g_vec), min = 0, max = 1)
env_vec <- matrix(0, nrow = 100000, ncol = 3)
for(i in 1:ncol(g_vec)){
 g_vec[, i] <- rbinom(100000, 2, freqs[i])
}
for( i in 1:ncol(env_vec)){
 env_vec[, i] <- round(runif(100000,min=0,max=6))
}

qt_vec <- rnorm(100000) + 0.1 * g_vec[, 1] + 0.2 *
          g_vec[, 2] -0.1 * env_vec[, 3] + 0.1 *
          env_vec[, 1] + 0.1 * g_vec[, 1] * env_vec[, 1]
res <- pairwise_env_int.calc(qt_vec, g_vec, env_vec)

Pairwise environmental interaction effects for a case control variable

Description

Given a set of variants and environmental traits, and a single case control variable, this function calculates the interaction effect of all possible variant-environmental pairs

Usage

pairwise_env_int_CC.calc(
  cc,
  g,
  env,
  yob = rep(-1, length(cc)),
  sex = rep(-1, length(cc)),
  round_imputed = FALSE,
  dominance_term = FALSE,
  square_env = FALSE,
  covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)),
  variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_"),
  env_names = paste(rep("env", ncol(env)), as.character(1:ncol(env)), sep = "_")
)

Arguments

cc

A numeric vector

g

A matrix, where each colomn represents a variant

env

A matrix, where each row represents an environmental variable

yob

A numerical vector containing year of birth. If some are unknown they should be marked as -1

sex

A numerical vector containing sex, coded 0 for males, 1 for females and -1 for unknown

round_imputed

A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis.

dominance_term

A boolian variable determining whether a dominance term for the variant should be included as a covariates in the analysis

square_env

A boolian variable determining whether the square of the environmental trait should be included as a covariate in the analysis

covariates

A dataframe containing any other covariates that should be used; one column per covariate

variant_names

A list of the names of the variants

env_names

A list of the names of the environmental variables

Value

A dataframe with all possible variant-environmental pairs and their estimated interaction effect

Examples

N_run <- 25000
g_vec <- matrix(0, nrow = N_run, ncol = 3)
freqs <- runif(ncol(g_vec), min = 0, max = 1)
env_vec <- matrix(0, nrow = N_run, ncol = 3)
for(i in 1:ncol(g_vec)){
 g_vec[, i] <- rbinom(N_run, 2, freqs[i])
}
for( i in 1:ncol(env_vec)){
 env_vec[, i] <- round(runif(N_run,min=0,max=6))
}
cc_vec <- rbinom(N_run,1,0.1 * (1.05 ^ g_vec[, 1]) *
          (1.06 ^ env_vec[,1]) * (0.95 ^ g_vec[, 2]) *
          (1.1^(g_vec[, 1] * env_vec[, 1])))
res <- pairwise_env_int_CC.calc(cc_vec, g_vec, env_vec)

Pairwise interaction effects

Description

Given a set of variants and a quantitative trait, this function calculates the interaction effect of all possible variant-variant pairs

Usage

pairwise_int.calc(
  qt,
  g,
  round_imputed = FALSE,
  dominance_terms = FALSE,
  variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_"),
  covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0))
)

Arguments

qt

A numeric vector

g

A matrix, where each colomn represents a variant

round_imputed

A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis.

dominance_terms

A boolian variable determining whether dominance terms for the variants should be included as covariates in the analysis

variant_names

A list of the names of the variants

covariates

A dataframe containing any other covariates that should be used; one column per covariate

Value

A dataframe with all possible variant pairs and their estimated interaction effect

Examples

g_vec <- matrix(0, nrow = 100000, ncol = 5)
freqs <- runif(ncol(g_vec), min = 0, max = 1)
for(i in 1:ncol(g_vec)){
 g_vec[,i] <- rbinom(100000, 2, freqs[i])
}

qt_vec <- rnorm(100000) + 0.1 * g_vec[, 1] + 0.2 *
          g_vec[, 2] -0.1 * g_vec[, 3] + 0.2 *
          g_vec[, 1] * g_vec[, 2]
res <- pairwise_int.calc(qt_vec, g_vec)

Pairwise interaction effects for a case control variable

Description

Given a set of variants and a case control variable, this function calculates the interaction effect of all possible variant-variant pairs

Usage

pairwise_int_CC.calc(
  cc,
  g,
  yob = rep(-1, length(cc)),
  sex = rep(-1, length(cc)),
  round_imputed = FALSE,
  dominance_terms = FALSE,
  covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)),
  variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_")
)

Arguments

cc

A numeric vector

g

A matrix, where each colomn represents a variant

yob

A numerical vector containing year of birth. If some are unknown they should be marked as -1

sex

A numerical vector containing sex, coded 0 for males, 1 for females and -1 for unknown

round_imputed

A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis.

dominance_terms

A boolian variable determining whether dominance terms for the variants should be included as covariates in the analysis

covariates

A dataframe containing any other covariates that should be used; one column per covariate

variant_names

A list of the names of the variants

Value

A dataframe with all possible variant pairs and their estimated interaction effect

Examples

N_run <- 25000
g_vec <- matrix(0, nrow = N_run, ncol = 5)
freqs <- runif(ncol(g_vec), min = 0,max = 1)
for(i in 1:ncol(g_vec)){
 g_vec[, i] <- rbinom(N_run, 2, freqs[i])
}
cc_vec <- rbinom(N_run,1,0.1 * (1.05 ^ g_vec[, 1]) *
          (1.06 ^ g_vec[,2]) * (0.95 ^ g_vec[, 3]) *
          (1.5^(g_vec[,1] * g_vec[,2])))
res <- pairwise_int_CC.calc(cc_vec, g_vec)

Trains and imputes a poligenic risk score (PRS)

Description

This function trains a poligenic risk score model on a dataset, and then imputes the risk score into another dataset

Usage

train_and_impute_PRS(
  qt_training,
  g_training,
  g_impute,
  dominance_effects = rep(FALSE, ncol(g_training)),
  interaction_effects = matrix(0, nrow = 0, ncol = 0)
)

Arguments

qt_training

A numeric vector. Represents the qt values of the data we train the model on.

g_training

A matrix, where each colomn represents a variant and each line represents a subject in the training data

g_impute

A matrix, where each column represents a variant and each line represents a subject.

dominance_effects

A Boolian vector, each term determines whether a dominance term for the corresponding variant is used in the model.

interaction_effects

An integer matrix with two columns. Each line represents a pair of interacting variants that should be included in the model.

Value

Returns a list with the following objects * PRS_imputed, the imputed PRS values * PRS_training, the PRS values for the training data * Residuals_training, the residuals from the model in the training data

Examples

g_train_vec <- matrix(0, nrow = 100000, ncol = 5)
freqs <- runif(ncol(g_train_vec), min = 0, max = 1)
for(i in 1:ncol(g_train_vec)){
 g_train_vec[,i] <- rbinom(100000, 2, freqs[i])
}
g_impute_vec <- matrix(0, nrow = 50000, ncol = 5)
for(i in 1:ncol(g_impute_vec)){
 g_impute_vec[,i] <- rbinom(50000, 2, freqs[i])
}
dom_vec <- c(TRUE, FALSE, FALSE, TRUE, FALSE)

int_vec <- matrix(c(1, 2, 4, 5), nrow = 2 , ncol = 2)

qt_vec <- rnorm(100000) + 0.2 * g_train_vec[, 1] + 0.3 * g_train_vec[, 1] * g_train_vec[, 4]

res <- train_and_impute_PRS(qt_vec, g_train_vec, g_impute_vec,
       dominance_effects = dom_vec, interaction_effects = int_vec)

Mean and variance effect adjustments.

Description

Given is a set of (continuous) variables and a qt trait. First, this function adjusts the trait for the mean effects of the variables with a linear model. Next, the variance effect of the variables are estimated and the trait is adjusted further by scaling it in accordance with the results.

Usage

var.adj(qt, x, iter_num = 50, eps_param = 1e-10)

Arguments

qt

A numeric vector.

x

A numeric matrix, each column represents a covariate.

iter_num

An integer. Represents the number of iterations performed in the Gauss-Newton algorithm

eps_param

A number. The Gauss-Newton algorithm terminates if the incriment change of all variance estimates is smaller than this number.

Value

A vector, representing the adjusted trait.

Examples

n_val <- 50000
x <- matrix(0,nrow = n_val, ncol = 4)
for(i in 1:4) {
x[, i] <- rnorm(n_val)
}
var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4])
qt_vec <- rnorm(n_val, 0, sqrt(var_vec))
res <- var.adj(qt_vec, x)

Variance summary statistics

Description

Estimates the variance effect of several continuous variables jointly

Usage

var.summary(qt, x, iter_num = 50, eps_param = 1e-10)

Arguments

qt

A numeric vector.

x

A data frame, each column represents a covariate that should be numeric.

iter_num

An integer. Represents the number of iterations performed in the Gauss-Newton algorithm

eps_param

A number. The Gauss-Newton algorithm terminates if the incriment change of all variance estimates is smaller than this number.

Value

A list with the following objects: * summary, a dataframe with a variance effect estimate for each variable and summary statistics * chi2, the chi2 statistic obtained by considering all parameteres jointly * df, degrees of freedom for the chi2 statistic * pval, p-value of the model * adjusted_values, a vector with qt values that have been adjusted for both mean and variance effects

Examples

n_val <- 50000
x <- as.data.frame(matrix(0,nrow = n_val, ncol = 4))
colnames(x) <- c('A','B','C','D')
for(i in 1:4) {
x[, i] <- rnorm(n_val)
}
var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4])
qt_vec <- rnorm(n_val, 0, sqrt(var_vec))
res <- var.summary(qt_vec, x)