Title: | An Elegant Approach to Summarizing Clinical Data |
Version: | 0.1.0 |
Description: | Streamlines the analysis of clinical data by automatically selecting appropriate statistical descriptions and inference methods based on variable types. For method details see Motulsky H J (2016) https://www.graphpad.com/guides/prism/10/statistics/index.htm and d'Agostino R B (1971) <doi:10.1093/biomet/58.2.341>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | car, cli, dplyr, fBasics, glue, qqplotr, rlang, stats, stringr, tibble, tidyplots, tidyr |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
Depends: | R (≥ 4.1.0) |
NeedsCompilation: | no |
Packaged: | 2025-07-10 07:33:20 UTC; Lixiang |
Author: | Xiang Li [aut, cre] |
Maintainer: | Xiang Li <htqqdd@126.com> |
Repository: | CRAN |
Date/Publication: | 2025-07-15 07:00:02 UTC |
Add statistical test results to summary data
Description
Calculates and appends p-values with optional statistical details to a summary table based on variable types and group comparisons. Handles both continuous and categorical variables with appropriate statistical tests.
Usage
add_p(
summary,
digit = 3,
asterisk = FALSE,
add_method = FALSE,
add_statistic_name = FALSE,
add_statistic_value = FALSE
)
Arguments
summary |
A data frame that has been processed by |
digit |
A numeric determine decimal. Accepts:
|
asterisk |
Logical indicating whether to show asterisk significance markers. |
add_method |
Control parameter for display of statistical methods. Accepts:
|
add_statistic_name |
Logical indicating whether to include test statistic names. |
add_statistic_value |
Logical indicating whether to include test statistic values. |
Value
A data frame merged with statistical test results, containing: - Variable names - Summary - Formatted p-values - Optional method names/codes - Optional statistic names/values
Examples
# `summary` is a data frame processed by `add_var()` and `add_summary()`:
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
summary <- add_summary(data)
# Add statistical test results
result <- add_p(summary)
Add summary statistics to a add_var object
Description
This function generates summary statistics for variables from a data frame that has been processed by add_var()
, with options to format outputs.
Usage
add_summary(
data,
add_overall = TRUE,
continuous_format = NULL,
norm_continuous_format = "{mean} ± {SD}",
unnorm_continuous_format = "{median} ({Q1}, {Q3})",
categorical_format = "{n} ({pct})",
binary_show = "last",
digit = 2
)
Arguments
data |
A data frame that has been processed by |
add_overall |
Logical indicating whether to include an "Overall" summary column. |
continuous_format |
Format string to override both normal/abnormal continuous formats. Accepted placeholders are |
norm_continuous_format |
Format string for normally distributed continuous variables. Default is |
unnorm_continuous_format |
Format string for non-normal continuous variables. Default is |
categorical_format |
Format string for categorical variables. Default is |
binary_show |
Display option for binary variables:
|
digit |
digit A numeric determine decimal. |
Value
A data frame containing summary statistics with the following columns:
-
variable
: Variable name -
Overall (n=X)
: Summary statistics for all data, ifadd_overall=TRUE
Group-specific columns named
[group] (n=X)
with summary statistics
Examples
# `data` is a data frame processed by `add_var()`:
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
# Add summary statistics
result <- add_summary(data, add_overall = TRUE)
result <- add_summary(data, continuous_format = "{mean}, ({SD})")
Prepare variables for add_summary
Description
This function processes a dataset for statistical analysis by categorizing variables into continuous and categorical types. It automatically handles normality checks, equality of variances checks, and expected frequency assumptions checks.
Usage
add_var(data, var = NULL, group = "group", norm = "auto", center = "median")
Arguments
data |
A data frame containing the variables to analyze, with variables at columns and observations at rows. |
var |
A character vector of variable names to include. If |
group |
A character string specifying the grouping variable in |
norm |
Control parameter for normality tests. Accepts:
|
center |
A character string specifying the |
Value
A modified data frame with an attribute 'add_var'
containing a list of categorized variables and their properties:
-
var
: List of categorized variables:-
valid
: All valid variable names after checks -
continuous
: Sublist of continuous variables (further divided by normality/equal variance) -
categorical
: Sublist of categorical variables (further divided by ordered/expected frequency)
-
-
group
: Grouping variable name -
overall_n
: Total number of observations -
group_n
: Observation counts per group -
group_nlevels
: Number of groups -
group_levels
: Group level names -
norm
: Normality check method used
Examples
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
Test for Equality of Variances
Description
Performs Levene's test to assess equality of variances between groups.
Usage
equal_test(data, var, group, center = "median")
Arguments
data |
A data frame containing the variables to be tested. |
var |
A character string specifying the numeric variable in |
group |
A character string specifying the grouping variable in |
center |
A character string specifying the |
Value
Logical value:
-
TRUE
: Variances are equal, p-value more than 0.05 -
FALSE
: Variances are unequal or an error occurred during testing
Methodology for Equality of Variances
Levene's test is the default method adopted in SPSS, the original Levene's test select center = mean, but here select center = median for a more robust test
Examples
equal_test(iris, "Sepal.Length", "Species")
Format p-values with significance markers
Description
Formats p-values as strings with specified precision and optional significance asterisks.
Usage
format_p(p, digit = 3, asterisk = FALSE)
Arguments
p |
A numeric p-value between 0 and 1. |
digit |
A numeric determine decimal. Accepts:
|
asterisk |
Logical indicating whether to return significance asterisks. |
Value
Character of formatted p-value or asterisks.
Examples
format_p(0.00009, 4)
format_p(0.03, 3)
format_p(0.02, asterisk = TRUE)
Perform normality test on a variable
Description
Conducts normality tests for a specified variable, optionally by group. Supports automatic testing and interactive visualization.
Usage
normal_test(data = NULL, var = NULL, group = NULL, norm = "auto")
Arguments
data |
A data frame containing the variables to be tested. |
var |
A character string specifying the numeric variable in |
group |
A character string specifying the grouping variable in |
norm |
Control parameter for test behavior. Accepts:
|
Value
A logical value:
-
TRUE
: data are normally distributed -
FALSE
: data are not normally distributed
Methodology for p-values
Automatically selects test based on sample size per group:
n < 3: Too small, assuming non-normal
(3, 50] Shapiro-Wilk test
(50, 1000]: D'Agostino Chi2 test, instead of Kolmogorov-Smirnov test
n > 1000: Show p-values, plots QQ plots and prompts for decision
Examples
normal_test(iris, "Sepal.Length", "Species", norm = "auto")
normal_test(iris, "Sepal.Length", "Species", norm = TRUE)
Check Sample Size Adequacy for Chi-Squared Test
Description
This function determines if a contingency table meets the expected frequency assumptions for a valid chi-squared test. It categorizes the data into "not_small", "small", or "very_small" based on sample size and expected frequencies.
Usage
small_test(data, var, group)
Arguments
data |
A data frame containing the variables to be tested. |
var |
A character string specifying the factor variable in |
group |
A character string specifying the grouping variable in |
Value
A character string with one of three values:
-
"not_small"
: Sample size more than or euqal to 40 and all expected frequencies more than or euqal to 5 -
"small"
: Sample size more than or euqal to 40, all expected frequencies more than or euqal to 1 and at least one <5, only for 2*2 contingency tables -
"very_small"
: Other conditions, including sample size <40 or any expected frequency <1
Examples
df <- data.frame(
category = factor(c("A", "B", "A", "B")),
group = factor(c("X", "X", "Y", "Y"))
)
small_test(data = df, var = "category", group = "group")