Type: | Package |
Title: | Functions for Discordant Kinship Modeling |
Version: | 1.2.4.1 |
Description: | Functions for discordant kinship modeling (and other sibling-based quasi-experimental designs). Contains data restructuring functions and functions for generating biometrically informed data for kin pairs. See [Garrison and Rodgers, 2016 <doi:10.1016/j.intell.2016.08.008>], [Sims, Trattner, and Garrison, 2024 <doi:10.3389/fpsyg.2024.1430978>] for empirical examples, and Garrison and colleagues for theoretical work https://osf.io/zpdwt/. |
URL: | https://github.com/R-Computing-Lab/discord, https://r-computing-lab.github.io/discord/ |
License: | GPL-3 |
LazyData: | TRUE |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
Depends: | R (≥ 3.50) |
Imports: | stats |
Suggests: | NlsyLinks, ggpedigree, BGmisc, broom, dplyr, grid, gridExtra, ggplot2, janitor, kableExtra, knitr, magrittr, rmarkdown, scales, stargazer, snakecase, testthat, tidyverse |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-06-10 16:02:46 UTC; smaso |
Author: | S. Mason Garrison |
Maintainer: | S. Mason Garrison <garrissm@wfu.edu> |
Repository: | CRAN |
Date/Publication: | 2025-06-10 16:30:02 UTC |
discord: Functions for Discordant Kinship Modeling
Description
Functions for discordant kinship modeling (and other sibling-based quasi-experimental designs). Contains data restructuring functions and functions for generating biometrically informed data for kin pairs. See [Garrison and Rodgers, 2016 doi:10.1016/j.intell.2016.08.008], [Sims, Trattner, and Garrison, 2024 doi:10.3389/fpsyg.2024.1430978] for empirical examples, and Garrison and colleagues for theoretical work https://osf.io/zpdwt/.
Author(s)
Maintainer: S. Mason Garrison garrissm@wfu.edu (ORCID) [copyright holder]
Authors:
Jonathan Trattner code@jdtrat.com (ORCID) (https://www.jdtrat.com/)
Yoo Ri Hwang yrhwang89@gmail.com
Other contributors:
Cermet Ream [contributor]
See Also
Useful links:
Generate Multivariate Normal Random Variates
Description
Generates random samples from a multivariate normal distribution with a specified covariance structure.
Usage
.rmvn(n, sigma)
Arguments
n |
Integer. Number of samples to generate. |
sigma |
Matrix. Covariance matrix that defines the distribution. |
Value
Matrix of dimension n × ncol(sigma)
containing random samples
from the multivariate normal distribution.
Check Discord Errors
Description
This function checks for common errors in the provided data, including the correct specification of identifiers (ID, sex, race) and their existence in the data.
Usage
check_discord_errors(data, id, sex, race, pair_identifiers)
Arguments
data |
The data to perform a discord regression on. |
id |
A unique kinship pair identifier. |
sex |
A character string for the sex column name. |
race |
A character string for the race column name. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair. |
Value
An error message if one of the conditions are met.
Check Sibling Order
Description
This function determines the order of sibling pairs based on an outcome variable. The function checks which of the two kinship pairs has more of a specified outcome variable. It adds a new column named 'order' to the dataset, indicating which sibling (identified as "s1" or "s2") has more of the outcome. If the two siblings have the same amount of the outcome, it randomly assigns one as having more.
Usage
check_sibling_order(..., fast = FALSE)
Arguments
... |
Additional arguments to be passed to the function. |
fast |
Logical. If TRUE, uses a faster method for data processing. |
Value
A one-row data frame with a new column order indicating which familial member (1, 2, or neither) has more of the outcome.
Check Sibling Order RAM Optimized
Description
This function determines the order of sibling pairs based on an outcome variable. The function checks which of the two kinship pairs has more of a specified outcome variable. It adds a new column named 'order' to the dataset, indicating which sibling (identified as "s1" or "s2") has more of the outcome. If the two siblings have the same amount of the outcome, it randomly assigns one as having more.
Usage
check_sibling_order_ram_optimized(data, outcome, pair_identifiers, row)
Arguments
data |
The data set with kinship pairs |
outcome |
A character string containing the outcome variable of interest. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair |
row |
The row number of the data frame |
Value
A one-row data frame with a new column order indicating which familial member (1, 2, or neither) has more of the outcome.
Flu Vaccination and SES Data
Description
A data frame that accompanies the regression vignette. It contains data on SES and flu vaccination.
Usage
data_flu_ses
Format
A data frame.
Kinship pairs and their relatedness, SES, and flu vaccination information.
Source
NLSY/R Lab
Sample Data from NLSY
Description
A data frame output from the NlsyLinks package that contains data for kinship pairs' height and weight.
Usage
data_sample
Format
A data frame.
Kinship pairs and their relatedness, height, and weight information.
Source
NLSY/R Lab
Perform a Between-Family Linear Regression within the Discordant Kinship Framework
Description
Perform a Between-Family Linear Regression within the Discordant Kinship Framework
Usage
discord_between_model(
data,
outcome,
predictors,
demographics = NULL,
id = NULL,
sex = "sex",
race = "race",
pair_identifiers = c("_s1", "_s2"),
data_processed = FALSE,
coding_method = "none",
fast = TRUE
)
Arguments
data |
The data set with kinship pairs |
outcome |
A character string containing the outcome variable of interest. |
predictors |
A character vector containing the column names for predicting the outcome. |
demographics |
Indicator variable for if the data has the sex and race demographics. If both are present (default, and recommended), value should be "both". Other options include "sex", "race", or "none". |
id |
Default's to NULL. If supplied, must specify the column name corresponding to unique kinship pair identifiers. |
sex |
A character string for the sex column name. |
race |
A character string for the race column name. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair |
data_processed |
Logical operator if data are already preprocessed by discord_data , default is FALSE |
coding_method |
A character string that indicates what kind of additional coding schemes should be used. Default is none. Other options include "binary" and "multi". |
fast |
Logical. If TRUE, uses a faster method for data processing. |
Value
Resulting 'lm' object from performing the between-family regression.
Examples
discord_between_model(
data = data_sample,
outcome = "height",
predictors = "weight",
pair_identifiers = c("_s1", "_s2"),
sex = NULL,
race = NULL
)
Custom Conditions for the discord package
Description
Custom Conditions for the discord package
Usage
discord_cond(type, msg, class = paste0("discord-", type), call = NULL, ...)
Arguments
type |
One of the following conditions: c("error", "warning", "message") |
msg |
Message |
class |
Default is to prefix the 'type' argument with "discord", but can be more specific to the problem at hand. |
call |
What triggered the condition? |
... |
Additional arguments that can be coerced to character or single condition object. |
Value
A condition for discord.
Examples
## Not run:
derr <- function(x) discord_cond("error", x)
dwarn <- function(x) discord_cond("warning", x)
dmess <- function(x) discord_cond("message", x)
return_class <- function(func) {
tryCatch(func,
error = function(cond) class(cond),
warning = function(cond) class(cond),
message = function(cond) class(cond)
)
}
return_class(derr("error-class"))
return_class(dwarn("warning-class"))
return_class(dmess("message-class"))
## End(Not run)
Restructure Data to Determine Kinship Differences
Description
Restructure Data to Determine Kinship Differences
Usage
discord_data(
data,
outcome,
predictors,
id = NULL,
sex = "sex",
race = "race",
pair_identifiers,
demographics = "both",
coding_method = "none",
fast = TRUE,
...
)
Arguments
data |
The data set with kinship pairs |
outcome |
A character string containing the outcome variable of interest. |
predictors |
A character vector containing the column names for predicting the outcome. |
id |
Default's to NULL. If supplied, must specify the column name corresponding to unique kinship pair identifiers. |
sex |
A character string for the sex column name. |
race |
A character string for the race column name. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair |
demographics |
Indicator variable for if the data has the sex and race demographics. If both are present (default, and recommended), value should be "both". Other options include "sex", "race", or "none". |
coding_method |
A character string that indicates what kind of additional coding schemes should be used. Default is none. Other options include "binary" and "multi". |
fast |
Logical. If TRUE, uses a faster method for data processing. |
... |
Additional arguments to be passed to the function. |
Value
A data frame that contains analyzable, paired data for performing kinship regressions.
Examples
discord_data(
data = data_sample,
outcome = "height",
predictors = "weight",
pair_identifiers = c("_s1", "_s2"),
sex = NULL,
race = NULL,
demographics = "none"
)
Discord Data Fast
Description
This function restructures data to determine kinship differences.
Usage
discord_data_fast(
data,
outcome,
predictors,
id = NULL,
sex = "sex",
race = "race",
pair_identifiers,
demographics = "both",
coding_method = "none"
)
Arguments
data |
The data set with kinship pairs |
outcome |
A character string containing the outcome variable of interest. |
predictors |
A character vector containing the column names for predicting the outcome. |
id |
Default's to NULL. If supplied, must specify the column name corresponding to unique kinship pair identifiers. |
sex |
A character string for the sex column name. |
race |
A character string for the race column name. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair |
demographics |
Indicator variable for if the data has the sex and race demographics. If both are present (default, and recommended), value should be "both". Other options include "sex", "race", or "none". |
coding_method |
A character string that indicates what kind of additional coding schemes should be used. Default is none. Other options include "binary" and "multi". |
Legacy Code: Restructure Data
Description
This is from
https://github.com/R-Computing-Lab/discord/blob/74323b2cdd739355cd4a388251c747f1bcd87eb5/R/discord_data.R
and is legacy code used to restructure wide form, double-entered data, into
analyzable data sorted by outcome. This can be used in discord_regression_legacy
.
Usage
discord_data_legacy(
outcome,
predictors = NULL,
doubleentered = TRUE,
sep = "",
scale = FALSE,
df = NULL,
id = NULL,
full = TRUE,
...
)
Arguments
outcome |
Name of outcome variable |
predictors |
Names of predictors. |
doubleentered |
Describes whether data are double entered. Default is FALSE. |
sep |
The character in |
scale |
If TRUE, rescale all variables at the individual level to have a mean of 0 and a SD of 1. |
df |
dataframe with all variables in it. |
id |
id variable (optional). |
full |
If TRUE, returns kin1 and kin2 scores in addition to diff and mean scores. If FALSE, only returns diff and mean scores. |
... |
Optional pass on additional inputs. |
Value
Returns data.frame
with the following variables:
id |
id |
outcome_1 |
outcome for kin1; kin1 is always greater than kin2, except when tied. Then kin1 is randomly selected from the pair |
outcome_2 |
outcome for kin2 |
outcome_diff |
difference between outcome of kin1 and kin2 |
outcome_mean |
mean outcome for kin1 and kin2 |
predictor_i_1 |
predictor variable i for kin1 |
predictor_i_2 |
predictor variable i for kin2 |
predictor_i_diff |
difference between predictor i of kin1 and kin2 |
predictor_i_mean |
mean predictor i for kin1 and kin2 |
Discord Data RAM Optimized
Description
This function restructures data to determine kinship differences.
Usage
discord_data_ram_optimized(
data,
outcome,
predictors,
id = NULL,
sex = "sex",
race = "race",
pair_identifiers,
demographics = "both",
coding_method = "none"
)
Arguments
data |
The data set with kinship pairs |
outcome |
A character string containing the outcome variable of interest. |
predictors |
A character vector containing the column names for predicting the outcome. |
id |
Default's to NULL. If supplied, must specify the column name corresponding to unique kinship pair identifiers. |
sex |
A character string for the sex column name. |
race |
A character string for the race column name. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair |
demographics |
Indicator variable for if the data has the sex and race demographics. If both are present (default, and recommended), value should be "both". Other options include "sex", "race", or "none". |
coding_method |
A character string that indicates what kind of additional coding schemes should be used. Default is none. Other options include "binary" and "multi". |
Perform a Linear Regression within the Discordant Kinship Framework
Description
Perform a Linear Regression within the Discordant Kinship Framework
Usage
discord_regression(
data,
outcome,
predictors,
demographics = NULL,
id = NULL,
sex = "sex",
race = "race",
pair_identifiers = c("_s1", "_s2"),
data_processed = FALSE,
coding_method = "none",
fast = TRUE
)
discord_within_model(
data,
outcome,
predictors,
demographics = NULL,
id = NULL,
sex = "sex",
race = "race",
pair_identifiers = c("_s1", "_s2"),
data_processed = FALSE,
coding_method = "none",
fast = TRUE
)
Arguments
data |
The data set with kinship pairs |
outcome |
A character string containing the outcome variable of interest. |
predictors |
A character vector containing the column names for predicting the outcome. |
demographics |
Indicator variable for if the data has the sex and race demographics. If both are present (default, and recommended), value should be "both". Other options include "sex", "race", or "none". |
id |
Default's to NULL. If supplied, must specify the column name corresponding to unique kinship pair identifiers. |
sex |
A character string for the sex column name. |
race |
A character string for the race column name. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair |
data_processed |
Logical operator if data are already preprocessed by discord_data , default is FALSE |
coding_method |
A character string that indicates what kind of additional coding schemes should be used. Default is none. Other options include "binary" and "multi". |
fast |
Logical. If TRUE, uses a faster method for data processing. |
Value
Resulting 'lm' object from performing the discordant regression.
Examples
discord_regression(
data = data_sample,
outcome = "height",
predictors = "weight",
pair_identifiers = c("_s1", "_s2"),
sex = NULL,
race = NULL
)
Legacy Code: Discord Regression
Description
This is from
https://github.com/R-Computing-Lab/discord/blob/74323b2cdd739355cd4a388251c747f1bcd87eb5/R/discord_regression.R
and is used to perform the discordant regression on the data output from
discord_data_legacy
.
Usage
discord_regression_legacy(
df,
outcome,
predictors,
more_args = NULL,
additional_formula = more_args,
...
)
Arguments
outcome |
A character string containing the outcome variable of interest. |
predictors |
A character vector containing the column names for predicting the outcome. |
more_args |
Optional string to add additional inputs to formula |
additional_formula |
Deprecated |
... |
Additional arguments to be passed to the function. |
Value
Resulting 'lm' object from performing the discordant regression.
Simulate Biometrically Informed Multivariate Data
Description
Generates paired multivariate data for kinship pairs based on specified ACE (Additive genetic, Common environment, unique Environment) parameters with covariance structure.
Usage
kinsim(
r_all = c(1, 0.5),
c_all = 1,
npg_all = 500,
npergroup_all = rep(npg_all, length(r_all)),
mu_all = 0,
variables = 2,
mu_list = rep(mu_all, variables),
r_vector = NULL,
c_vector = NULL,
ace_all = c(1, 1, 1),
ace_list = matrix(rep(ace_all, variables), byrow = TRUE, nrow = variables),
cov_a = 0,
cov_c = 0,
cov_e = 0,
...
)
Arguments
r_all |
Numeric vector. Levels of genetic relatedness for each group; default is c(1, 0.5) representing MZ and DZ twins respectively. |
c_all |
Numeric. Default shared variance for common environment; default is 1. |
npg_all |
Integer. Default sample size per group; default is 500. |
npergroup_all |
Numeric vector. Sample sizes by group;
default repeats |
mu_all |
Numeric. Default mean value for all generated variables; default is 0. |
variables |
Integer. Number of variables to generate; default is 2. Currently limited to a maximum of two variables. |
mu_list |
Numeric vector. Means for each variable;
default repeats |
r_vector |
Numeric vector. Alternative specification providing genetic relatedness coefficients for the entire sample; default is NULL. |
c_vector |
Numeric vector. Alternative specification providing shared-environmental relatedness |
ace_all |
Numeric vector. Default variance components in order c(a, c, e) for all variables; default is c(1, 1, 1). |
ace_list |
Matrix. ACE variance components by variable, where each row
represents a variable and columns are a, c, e components;
default repeats |
cov_a |
Numeric. Shared variance for additive genetics between variables; default is 0. |
cov_c |
Numeric. Shared variance for shared-environment between variables; default is 0. |
cov_e |
Numeric. Shared variance for non-shared-environment between variables; default is 0. |
... |
Additional arguments passed to other methods. |
Details
This function extends the univariate ACE model to multivariate data, allowing simulation of correlated phenotypes across kinship pairs with different levels of genetic relatedness. It supports simulation of up to two phenotypic variables with specified genetic and environmental covariance structures.
Value
A data frame with the following columns:
- Ai_1
genetic component for variable i for kin1
- Ai_2
genetic component for variable i for kin2
- Ci_1
shared-environmental component for variable i for kin1
- Ci_2
shared-environmental component for variable i for kin2
- Ei_1
non-shared-environmental component for variable i for kin1
- Ei_2
non-shared-environmental component for variable i for kin2
- yi_1
generated variable i for kin1
- yi_2
generated variable i for kin2
- r
level of relatedness for the kin pair
- id
Unique identifier for each kinship pair
Examples
# Generate basic multivariate twin data with default parameters
twin_data <- kinsim()
# Generate data with genetic correlation between variables
correlated_data <- kinsim(cov_a = 0.5)
# Generate data for different relatedness groups with custom parameters
family_data <- kinsim(
r_all = c(1, 0.5, 0.25), # MZ twins, DZ twins, and half-siblings
npergroup_all = c(100, 100, 150), # Sample sizes per group
ace_list = matrix(
c(
1.5, 0.5, 1.0, # Variable 1 ACE components
0.8, 1.2, 1.0
), # Variable 2 ACE components
nrow = 2, byrow = TRUE
),
cov_a = 0.3, # Genetic covariance
cov_c = 0.2 # Shared environment covariance
)
Simulate Kinship-Based Biometrically Informed Univariate Data
Description
Generates paired univariate data for kinship pairs with specified genetic relatedness, following the classical ACE model (Additive genetic, Common environment, unique Environment).
Usage
kinsim_internal(
r = c(1, 0.5),
c_rel = 1,
npg = 100,
npergroup = rep(npg, length(r)),
mu = 0,
ace = c(1, 1, 1),
r_vector = NULL,
c_vector = NULL,
...
)
Arguments
r |
Numeric vector. Levels of genetic relatedness for each group; default is c(1, 0.5) representing MZ and DZ twins respectively. |
npg |
Integer. Default sample size per group; default is 100. |
npergroup |
Numeric vector. List of sample sizes by group;
default repeats |
mu |
Numeric. Mean value for the generated variable; default is 0. |
ace |
Numeric vector. Variance components in order c(a, c, e) where a = additive genetic, c = shared environment, e = non-shared environment; default is c(1, 1, 1). |
r_vector |
Numeric vector. Alternative specification method providing relatedness coefficients for the entire sample; default is NULL. |
... |
Additional arguments passed to other methods. |
Details
This function simulates data according to the ACE model, where phenotypic variance is decomposed into additive genetic (A), shared environmental (C), and non-shared environmental (E) components. It can generate data for multiple kinship groups with different levels of genetic relatedness (e.g., MZ twins, DZ twins, siblings).
Value
A data frame with the following columns:
- id
Unique identifier for each kinship pair
- A1
Genetic component for first member of pair
- A2
Genetic component for second member of pair
- C1
Shared-environmental component for first member of pair
- C2
Shared-environmental component for second member of pair
- E1
Non-shared-environmental component for first member of pair
- E2
Non-shared-environmental component for second member of pair
- y1
Generated phenotype for first member of pair with mean
mu
- y2
Generated phenotype for second member of pair with mean
mu
- r
Level of genetic relatedness for the kinship pair
Make Mean Differences
Description
This function calculates differences and means of a given variable for each kinship pair. The order of subtraction and the variables' names in the output dataframe depend on the order column set by check_sibling_order(). If the demographics parameter is set to "race", "sex", or "both", it also prepares demographic information accordingly, swapping the order of demographics as per the order column.
Usage
make_mean_diffs(..., fast = FALSE)
Arguments
... |
Additional arguments to be passed to the function. |
fast |
Logical. If TRUE, uses a faster method for data processing. |