Title: | Antigenic Mapping and Antigenic Velocity Algorithm |
Version: | 1.0.0 |
Description: | An implementation of the TopoLow algorithm, a novel, physics-inspired method for antigenic cartography. TopoLow addresses significant challenges in mapping antigenic relationships, especially from sparse and noisy experimental data. The package transforms cross-reactivity and binding affinity measurements into accurate spatial representations in a phenotype space. Key features include: * Robust Mapping from Sparse Data: Effectively creates complete and consistent maps even with high proportions of missing data (e.g., >95%). * Physics-Inspired Optimization: Models antigens as particles connected by springs (for measured interactions) and subject to repulsive forces (for missing interactions), reducing the need for complex gradient computations. * Automatic Dimensionality Detection: Employs a likelihood-based approach to determine the optimal number of dimensions for the antigenic map, avoiding distortions common in methods with fixed low dimensions. * Noise and Bias Reduction: Naturally mitigates experimental noise and bias through its network-based, error-dampening mechanism. * Antigenic Velocity Calculation: Introduces and quantifies "antigenic velocity," a vector that describes the rate and direction of antigenic drift for each pathogen isolate. This can help identify cluster transitions and potential lineage replacements. * Broad Applicability: Analyzes data from various pathogens, including influenza, HIV, and Dengue viruses. It can be applied to any continuous and relational phenotype under directional selection pressure. Methods are described in Arhami and Rohani (2025) <doi:10.1093/bioinformatics/btaf372>. |
License: | BSD_3_clause + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | ggplot2 (≥ 3.4.0), dplyr (≥ 1.1.0), data.table (≥ 1.14.0), reshape2 (≥ 1.4.4), stats, utils, plotly (≥ 4.10.0), Racmacs (≥ 1.1.2), parallel (≥ 4.1.0), coda (≥ 0.19-4), MASS, vegan, filelock, igraph, lhs, umap, gridExtra, scales, Rtsne, ggrepel, rgl (≥ 1.0.0), rlang, ape |
Suggests: | covr, knitr, rmarkdown, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
URL: | https://github.com/omid-arhami/topolow |
BugReports: | https://github.com/omid-arhami/topolow/issues |
LazyData: | true |
Depends: | R (≥ 4.1.0) |
NeedsCompilation: | no |
Packaged: | 2025-07-07 16:30:10 UTC; aomid |
Author: | Omid Arhami |
Maintainer: | Omid Arhami <omid.arhami@uga.edu> |
Repository: | CRAN |
Date/Publication: | 2025-07-11 12:30:02 UTC |
Latin Hypercube and Adaptive Monte Carlo Sampling Functions
Description
This file contains functions for performing Latin Hypercube and adaptive Monte Carlo sampling in parameter space. The AMC sampling adapts based on previous evaluations to focus sampling in high-likelihood regions. The functions run locally using parallel processing.
Functions handle:
A suite of functions to get an initial estimate of the likelihood space through LHS
Core adaptive sampling algorithm
Likelihood calculations with cross-validation
Distribution updating and resampling
Safe evaluation wrappers
Helper functions for running RACMACS and Topolow during algorithm comparisons.
Core implementations of the TopoLow algorithm for mapping distances in high dimensions. This file contains the main optimization functions and their variants.
Functions for standardizing and preprocessing antigenic assay data from various sources into consistent formats. Handles titer and IC50 measurements, threshold values, and produces both long and matrix formats suitable for mapping.
This file contains functions for assessing model convergence, analyzing chains, and performing diagnostic tests. Functions are designed to be general-purpose and usable with any iterative optimization or sampling procedure.
Functions handle:
Convergence testing
Chain analysis
Statistical diagnostics
Parameter distribution analysis
This file contains functions for calculating error metrics and validation statistics between predicted and true distance matrices. Functions handle missing values and special cases like threshold measurements.
Functions for running parameter optimization, comparison experiments, and other computational experiments.
Functions for comparing different map configurations using Procrustes analysis. These functions help assess:
Statistical significance of differences between maps
Quantitative measures of map differences
Stability of mapping solutions
An implementation of the TopoLow algorithm for antigenic cartography mapping and analysis. The package provides tools for optimizing point configurations in high-dimensional spaces, handling missing and thresholded measurements, processing antigenic assay data, and visualizing antigenic maps.
This file contains functions for transforming data between different formats used in antigenic cartography. Functions handle conversion between:
Long and matrix formats
Distance and titer measurements
Handling of threshold measurements (< and >)
This file contains utility functions used throughout the topolow package. Functions include data manipulation, and format conversion.
This file contains functions for visualizing topolow results including dimension reduction plots and cluster visualizations. Supports multiple plotting methods and customization options.
Functions handle:
Temporal mapping visualizations
Cluster mapping visualizations
2D and 3D projections
Multiple dimension reduction methods
Interactive and static plots
Diagnostic visualizations
Monte Carlo analysis visualizations
Details
The package implements a physics-inspired approach combining spring forces and repulsive interactions to find optimal point configurations. Key features include:
Optimization of point configurations in high-dimensional spaces
Handling of missing and thresholded measurements
Processing of antigenic assay data
Interactive visualization of antigenic maps
Cross-validation and error analysis
Network structure analysis
Support for parallel processing
Main Functions
-
create_topolow_map
: Core optimization algorithm -
process_antigenic_data
: Process raw antigenic data -
initial_parameter_optimization
: Optimize algorithm parameters -
plot_temporal_mapping
: Create temporal visualizations -
plot_cluster_mapping
: Create cluster-based visualizations
Output Files
Functions that generate output files (like parameter optimization results) will create subdirectories in either:
The current working directory (if output_dir = NULL)
A user-specified directory (via output_dir parameter)
The following subdirectories may be created:
model_parameters/: Contains optimization results and parameter evaluations
init_param_optimization/: Contains files and outputs when using initial_parameter_optimization
Citation
If you use this package, please cite: Omid Arhami, Pejman Rohani, Topolow: A mapping algorithm for antigenic cross-reactivity and binding affinity assays, Bioinformatics, 2025;, btaf372, https://doi.org/10.1093/bioinformatics/btaf372 doi:10.1093/bioinformatics/btaf372.
Author(s)
Maintainer: Omid Arhami omid.arhami@uga.edu (ORCID) [copyright holder]
See Also
Useful links:
Useful links:
Useful links:
Useful links:
Useful links:
Useful links:
Useful links:
Useful links:
Useful links:
Useful links:
Useful links:
Perform Adaptive Monte Carlo Sampling
Description
Main function implementing adaptive Monte Carlo sampling to explore parameter space.
Updates sampling distribution based on evaluated likelihoods. This is an internal
function called by run_adaptive_sampling
.
Usage
adaptive_MC_sampling(
samples_file,
distance_matrix,
iterations = 1,
batch_size = 1,
mapping_max_iter,
relative_epsilon,
folds = 20,
num_cores = 1,
scenario_name,
verbose = FALSE
)
Arguments
samples_file |
Path to CSV with initial samples for this job. |
distance_matrix |
Distance matrix to fit |
iterations |
Number of sampling iterations per job |
batch_size |
Samples per iteration (fixed to 1) |
mapping_max_iter |
Maximum map optimization iterations |
relative_epsilon |
Convergence threshold |
folds |
Number of CV folds |
num_cores |
Number of cores for parallel processing |
scenario_name |
Name for output files |
verbose |
Logical. Whether to print progress messages. Default: FALSE |
Value
A data.frame
containing all samples (initial and newly generated)
with their parameters and evaluated performance metrics. The data frame includes
columns for the log-transformed parameters, Holdout_MAE
, and NLL
.
Returns NULL
if the results file was not created.
Add Noise and Bias to Matrix Data
Description
Creates noisy versions of a distance matrix by adding random noise and/or systematic bias. Useful for testing robustness of algorithms to measurement errors and systematic biases.
Usage
add_noise_bias(matrix_data)
Arguments
matrix_data |
Numeric matrix to add noise to |
Details
The function generates three variants of the input matrix:
n1: Matrix with random Gaussian noise
n2: Different realization of random noise
nb: Matrix with both random noise and systematic negative bias
The noise level is scaled relative to the data mean to maintain realistic error magnitudes.
Value
A list
containing three noisy matrix
objects:
n1 |
Matrix with the first realization of random Gaussian noise. |
n2 |
Matrix with a second, different realization of random Gaussian noise. |
nb |
Matrix with both random noise and a systematic negative bias. |
Examples
# Create sample distance matrix
dist_mat <- matrix(runif(100), 10, 10)
dist_mat[lower.tri(dist_mat)] <- t(dist_mat)[lower.tri(dist_mat)]
diag(dist_mat) <- 0
# Generate noisy versions
noisy_variants <- add_noise_bias(dist_mat)
Calculate Network Analysis Metrics
Description
Analyzes the connectivity pattern in a distance matrix by converting it to a network representation. Useful for assessing data completeness and structure.
Usage
analyze_network_structure(distance_matrix)
Arguments
distance_matrix |
Square symmetric matrix of distances |
Value
A list containing the network analysis results:
adjacency |
A logical |
connectivity |
A |
summary |
A list of overall network statistics, including |
Examples
# Create a sample distance matrix
dist_mat <- matrix(runif(25), 5, 5)
# Add row and column names
rownames(dist_mat) <- colnames(dist_mat) <- paste0("Point", 1:5)
dist_mat[lower.tri(dist_mat)] <- t(dist_mat)[lower.tri(dist_mat)]
diag(dist_mat) <- 0
dist_mat[1, 3] <- NA; dist_mat[3, 1] <- NA
# Analyze the network structure
metrics <- analyze_network_structure(dist_mat)
print(metrics$summary$completeness)
Calculate Annual Distance Metrics
Description
Calculates year-over-year antigenic distances and statistics. Compares each point to the mean coordinates of the previous year.
Usage
calculate_annual_distances(df_coords, ndim, na.rm = TRUE)
Arguments
df_coords |
Data frame containing: - V1...Vn coordinate columns - year: Numeric years - name: Point identifiers (will use rownames if missing) |
ndim |
Number of coordinate dimensions |
na.rm |
Logical indicating whether to remove NA values |
Value
A list containing year-over-year antigenic distance metrics:
dist_data |
A |
summary |
A list containing the |
Examples
# Create sample coordinate data
coords <- data.frame(V1 = rnorm(10), V2 = rnorm(10), year = rep(2000:2004, 2))
annual_stats <- calculate_annual_distances(coords, ndim=2)
print(annual_stats$summary$overall_mean)
Calculate Cumulative Distance Metrics
Description
Calculates cumulative distance metrics either from a reference point or between all pairs. Handles both seasonal and year-based analyses.
Usage
calculate_cumulative_distances(
df_coords,
ndim,
reference_row = FALSE,
na.rm = TRUE
)
Arguments
df_coords |
Data frame containing: - V1...Vn coordinate columns - year: Numeric years - season: Character season identifiers. - cluster: Factor cluster assignments - color: Character color codes |
ndim |
Number of coordinate dimensions |
reference_row |
Integer index of reference row (or FALSE for all-pairs analysis) |
na.rm |
Logical indicating whether to remove NA values |
Value
A list containing the calculated distance metrics. The content of the list depends
on the reference_row
parameter.
If
reference_row
is specified, the list containssummary_data
: adata.frame
with distances from the specified reference point to all other points, summarized by season and cluster. The columns includeseason_num
,cluster
,color
,avg_euclidean_dist
,count
,total_count
, andfraction
.If
reference_row
isFALSE
, the list containsdist_data
: adata.frame
with all unique pairwise distances. The columns includeyear_diff
,euclidean_dist
, andref_year
.
Examples
# Create sample coordinate data
coords <- data.frame(V1 = rnorm(10), V2 = rnorm(10), year = rep(2000:2004, 2),
season = paste0(rep(2000:2004, 2), "-S"),
cluster = factor(rep(1:2, 5)), color = "blue")
# Calculate distances from reference point
ref_distances <- calculate_cumulative_distances(coords, ndim=2, reference_row=1)
# Calculate all pairwise distances
all_distances <- calculate_cumulative_distances(coords, ndim=2, reference_row=FALSE)
Calculate Adaptive Monte Carlo Sampling Diagnostics
Description
Calculates standard Adaptive Monte Carlo Sampling diagnostics including R-hat (potential scale reduction) and effective sample size for multiple chains. Can be used with any iterative sampling or optimization procedure that produces chain-like output.
Usage
calculate_diagnostics(chain_files, mutual_size = 500)
Arguments
chain_files |
Character vector of paths to CSV files containing chains |
mutual_size |
Integer number of samples to use from end of each chain |
Value
A list object of class topolow_amcs_diagnostics
containing convergence diagnostics for the MCMC chains.
rhat |
A numeric vector of the R-hat (potential scale reduction factor) statistic for each parameter. Values close to 1 indicate convergence. |
ess |
A numeric vector of the effective sample size for each parameter. |
chains |
A list of data frames, where each data frame is a cleaned and trimmed MCMC chain. |
param_names |
A character vector of the parameter names being analyzed. |
mutual_size |
The integer number of samples used from the end of each chain for calculations. |
Examples
# This example demonstrates how to use the function with temporary files,
# Create dummy chain files in a temporary directory
temp_dir <- tempdir()
chain_files <- character(3)
par_names <- c("log_N", "log_k0", "log_cooling_rate", "log_c_repulsion")
sample_data <- data.frame(
log_N = rnorm(100), log_k0 = rnorm(100),
log_cooling_rate = rnorm(100), log_c_repulsion = rnorm(100),
NLL = runif(100), Holdout_MAE = runif(100)
)
for (i in 1:3) {
chain_files[i] <- file.path(temp_dir, paste0("chain", i, ".csv"))
write.csv(sample_data, chain_files[i], row.names = FALSE)
}
# Calculate diagnostics
diag_results <- calculate_diagnostics(chain_files, mutual_size = 50)
print(diag_results)
# Clean up the temporary files and directory
unlink(chain_files)
unlink(temp_dir, recursive = TRUE)
Calculate prediction interval for distance estimates
Description
Computes prediction intervals for the estimated distances based on residual variation between true and predicted values.
Usage
calculate_prediction_interval(
distance_matrix,
p_dist_mat,
confidence_level = 0.95
)
Arguments
distance_matrix |
Matrix of true distances |
p_dist_mat |
Matrix of predicted distances |
confidence_level |
Confidence level for interval (default: 0.95) |
Value
A single numeric value representing the margin of error for the prediction interval.
Calculate Procrustes Difference Between Maps
Description
Computes the quantitative difference between two maps using Procrustes analysis. The difference is calculated as the sum of squared differences after optimal rotation and scaling.
Usage
calculate_procrustes_difference(map1, map2)
Arguments
map1 |
Data frame with coordinates from first map (must have X, X.1 columns) |
map2 |
Data frame with coordinates from second map (must have X, X.1 columns) |
Value
A single numeric value representing the sum of squared differences after Procrustes transformation.
Examples
# Create sample map data
map1 <- data.frame(name = letters[1:10], X = rnorm(10), X.1 = rnorm(10))
map2 <- data.frame(name = letters[1:10], X = rnorm(10), X.1 = rnorm(10))
diff <- calculate_procrustes_difference(map1, map2)
Calculate Statistical Significance Between Maps Using Procrustes Analysis
Description
Performs Procrustes analysis between two maps and calculates statistical significance of their differences using permutation tests. Handles common data cleaning steps like removing missing values and ensuring comparable point sets.
Usage
calculate_procrustes_significance(map1, map2)
Arguments
map1 |
Data frame with coordinates from first map (must have X, X.1 columns) |
map2 |
Data frame with coordinates from second map (must have X, X.1 columns) |
Value
A single numeric p-value from the Procrustes permutation test.
Examples
# Create sample map data
map1 <- data.frame(name = letters[1:10], X = rnorm(10), X.1 = rnorm(10))
map2 <- data.frame(name = letters[1:10], X = rnorm(10), X.1 = rnorm(10))
p_val <- calculate_procrustes_significance(map1, map2)
Calculate Weighted Marginal Distributions
Description
Calculates marginal distributions for each parameter with weights derived from log-likelihoods.
Usage
calculate_weighted_marginals(samples)
Arguments
samples |
Data frame containing: - log_N, log_k0, log_cooling_rate, log_c_repulsion: Parameter columns - NLL: Negative log-likelihood column |
Details
Uses kernel density estimation weighted by normalized likelihoods.
Value
Named list of marginal distributions, each containing:
x |
Vector of parameter values |
y |
Vector of density estimates |
Check Multivariate Gaussian Convergence
Description
Assesses convergence of multivariate samples by monitoring changes in mean vector and covariance matrix over a sliding window. Useful for checking stability of parameter distributions in optimization or sampling.
Usage
check_gaussian_convergence(data, window_size = 300, tolerance = 0.01)
Arguments
data |
Matrix or data frame of samples where columns are parameters |
window_size |
Integer size of sliding window for statistics |
tolerance |
Numeric convergence threshold for relative changes |
Value
An object of class topolow_convergence
containing diagnostics about the convergence of the multivariate samples. This list includes:
converged |
A logical flag, |
mean_converged |
A logical flag, |
cov_converged |
A logical flag, |
final_mean |
The mean vector calculated from the last |
final_cov |
The covariance matrix calculated from the last |
mean_history |
A matrix tracking the history of the running mean of each parameter. |
cov_changes |
A numeric vector of the relative changes in the Frobenius norm of the covariance matrix. |
param_names |
The names of the parameters (columns) from the input data. |
Examples
# Assuming 'chain_data' is a data frame with samples
chain_data <- as.data.frame(matrix(rnorm(500 * 4), ncol = 4))
colnames(chain_data) <- c("log_N", "log_k0", "log_cooling_rate", "log_c_repulsion")
conv_results <- check_gaussian_convergence(chain_data)
print(conv_results) # Shows summary
# The plot method for this object would create convergence plots.
# plot(conv_results)
Clean Data by Removing MAD-based Outliers
Description
Removes outliers from numeric data using the Median Absolute Deviation method. Outliers are replaced with NA values. This function is particularly useful for cleaning parameter tables where each column may contain outliers.
Usage
clean_data(x, k = 3, take_log = FALSE)
Arguments
x |
Numeric vector to clean |
k |
Numeric threshold for outlier detection (default: 3) |
take_log |
Logical. Whether to log transform data before outlier detection (default: FALSE) |
Value
A numeric vector of the same length as x
, where detected outliers have been replaced with NA
.
See Also
detect_outliers_mad
for the underlying outlier detection
Examples
# Clean parameter values
params <- c(0.01, 0.012, 0.011, 0.1, 0.009, 0.011, 0.15)
clean_params <- clean_data(params)
# Clean multiple parameter columns
param_table <- data.frame(
k0 = runif(100),
cooling_rate = runif(100),
c_repulsion = runif(100)
)
clean_table <- as.data.frame(lapply(param_table, clean_data))
Color Palettes
Description
Predefined color palettes optimized for visualization
Usage
c25
c25_claud
c25_old
c25_older
Format
An object of class character
of length 20.
An object of class character
of length 24.
An object of class character
of length 25.
An object of class character
of length 25.
Convert coordinates to distance matrix
Description
Calculates pairwise Euclidean distances between points in coordinate space
Usage
coordinates_to_matrix(positions)
Arguments
positions |
Matrix of coordinates where rows are points and columns are dimensions; It can be a matrix or a data frame. |
Value
A symmetric matrix
of pairwise Euclidean distances between points.
Create and Optimize a RACMACS Map
Description
Creates and optimizes an antigenic map using the RACMACS package and keeps the best optimization result. This function wraps common RACMACS functionality to provide a simplified interface for map creation and optimization.
Usage
create_and_optimize_RACMACS_map(
titer_table,
dim = 2,
optimization_number = 400,
output_file = NULL,
num_cores = 1
)
Arguments
titer_table |
Matrix or data frame of titer measurements. |
dim |
Integer number of dimensions for the map (default: 2). |
optimization_number |
Integer number of optimization runs (default: 400). |
output_file |
Character. An optional, full path (including filename and extension) where the map coordinates will be saved as a CSV file. If NULL (the default), the coordinates are not saved to a file. |
num_cores |
Integer number of cores to use for parallel optimization (default: 1). |
Value
A racmap
object from the Racmacs
package, containing the optimized coordinates and other map data.
Examples
# Create a dummy titer table for the example
ag_names <- paste("V", 1:5)
sr_names <- paste("S", 1:4)
titer_table <- matrix(
sample(c(10, 20, 40, 80, 160, 320, 640, 1280, 2560, 5120), 20, replace = TRUE),
nrow = length(ag_names), ncol = length(sr_names),
dimnames = list(ag_names, sr_names)
)
# Create and optimize map without saving coordinates
map_obj <- create_and_optimize_RACMACS_map(titer_table)
# Create map and save coordinates to a temporary file.
# tempfile() creates a path in the session's temporary directory.
temp_coords_file <- tempfile(fileext = ".csv")
map_obj_saved <- create_and_optimize_RACMACS_map(
titer_table,
dim = 3,
optimization_number = 100,
output_file = temp_coords_file
)
# Check that the file was created
file.exists(temp_coords_file)
# Clean up the temporary file
unlink(temp_coords_file)
Create Base Theme
Description
Creates a ggplot2 theme based on aesthetic and layout configurations.
Usage
create_base_theme(aesthetic_config, layout_config)
Arguments
aesthetic_config |
Aesthetic configuration object |
layout_config |
Layout configuration object |
Value
ggplot2 theme object
Create Cross-validation Folds for Distance Matrix
Description
Creates k-fold cross-validation splits of a distance matrix while maintaining symmetry. Each fold has a training matrix with some values masked for validation.
Usage
create_cv_folds(
truth_matrix,
no_noise_truth = NULL,
n_folds = 10,
random_seed = NULL
)
Arguments
truth_matrix |
Matrix of true distances |
no_noise_truth |
Optional matrix of noise-free distances. If provided, used as truth. |
n_folds |
Integer number of folds to create |
random_seed |
Integer random seed for reproducibility |
Value
A list of length n_folds
. Each element is a list containing two matrices:
truth |
The truth matrix for that fold. |
train |
The training matrix with some values replaced by |
Examples
# Create a sample distance matrix
dist_matrix <- matrix(runif(100), 10, 10)
diag(dist_matrix) <- 0
# Create 5-fold CV splits
folds <- create_cv_folds(dist_matrix, n_folds = 5, random_seed = 123)
Create Diagnostic Plots for Multiple Chains
Description
Creates trace and density plots for multiple Adaptive Monte Carlo Sampling or optimization chains to assess convergence and mixing. Displays parameter trajectories and distributions across chains.
Usage
create_diagnostic_plots(
chain_files,
mutual_size = 2000,
output_file = "diagnostic_plots.png",
output_dir,
save_plot = FALSE,
width = 3000,
height = 3000,
res = 300
)
Arguments
chain_files |
Character vector of paths to CSV files containing chain data |
mutual_size |
Integer number of samples to use from end of each chain |
output_file |
Character path for saving plot. Required if |
output_dir |
Character. Directory for output files. Required if |
save_plot |
Logical. Whether to save plots to files. Default: FALSE |
width , height , res |
Plot dimensions and resolution for saving |
Value
A ggplot object of the combined plots.
Examples
# This example uses sample data files included with the package.
chain_files <- c(
system.file("extdata", "diag_chain1.csv", package = "topolow"),
system.file("extdata", "diag_chain2.csv", package = "topolow"),
system.file("extdata", "diag_chain3.csv", package = "topolow")
)
# Only run the example if the files are found
if (all(nzchar(chain_files))) {
# Create diagnostic plot without saving to a file
create_diagnostic_plots(chain_files, mutual_size = 2, save_plot = FALSE)
}
Main TopoLow algorithm implementation
Description
TopoLow (Topological Optimization for Low-Dimensional Mapping) optimizes point positions in n-dimensional space to match a target distance matrix. The algorithm uses a physics-inspired approach with spring and repulsive forces to find optimal point configurations while handling missing and thresholded measurements.
Usage
create_topolow_map(
distance_matrix,
ndim,
mapping_max_iter,
k0,
cooling_rate,
c_repulsion,
relative_epsilon = 1e-04,
convergence_counter = 5,
initial_positions = NULL,
write_positions_to_csv = FALSE,
output_dir,
verbose = FALSE
)
Arguments
distance_matrix |
Matrix. Square, symmetric distance matrix. Can contain NA values for missing measurements and character strings with < or > prefixes for thresholded measurements. |
ndim |
Integer. Number of dimensions for the embedding space. |
mapping_max_iter |
Integer. Maximum number of map optimization iterations. |
k0 |
Numeric. Initial spring constant controlling spring forces. |
cooling_rate |
Numeric. Rate of spring constant decay per iteration (0 < cooling_rate < 1). |
c_repulsion |
Numeric. Repulsion constant controlling repulsive forces. |
relative_epsilon |
Numeric. Convergence threshold for relative change in error. Default is 1e-4. |
convergence_counter |
Integer. Number of iterations below threshold before declaring convergence. Default is 5. |
initial_positions |
Matrix or NULL. Optional starting coordinates. If NULL, random initialization is used. Matrix should have nrow = nrow(distance_matrix) and ncol = ndim. |
write_positions_to_csv |
Logical. Whether to save point positions to CSV file. Default is FALSE |
output_dir |
Character. Directory to save CSV file. Required if |
verbose |
Logical. Whether to print progress messages. Default is TRUE. |
Details
The algorithm iteratively updates point positions using:
Spring forces between points with measured distances
Repulsive forces between points without measurements
Modified forces for thresholded measurements (< or >)
Adaptive spring constant that decays over iterations
Convergence monitoring based on relative error change
Valid parameter ranges and constraints:
ndim: Positive integer, typically 2-20.
k0: Initial spring constant, positive numeric > 0. Typical range: 0.1-30 Controls initial force strength
cooling_rate: Spring and repulsion decay rate, numeric between 0 and 1. Typical range: 0.0001-0.1 Controls how quickly spring forces weaken
c_repulsion: Repulsion constant, positive numeric > 0. Typical range: 0.00001-0.1 Controls strength of repulsive forces
relative_epsilon: Positive numeric, typically 1e-9 to 1e-3 Smaller values require more iterations but give higher precision
convergence_counter: Positive integer, typically 5-20 Higher values do not necessarily lead to a better convergence
Value
A list
object of class topolow
. This list contains the results of the
optimization and includes the following components:
-
positions
: Amatrix
of the optimized point coordinates in the n-dimensional space. -
est_distances
: Amatrix
of the Euclidean distances between points in the final optimized configuration. -
mae
: The final Mean Absolute Error between the target distances and the estimated distances. -
iter
: The total number of iterations performed before the algorithm terminated. -
parameters
: Alist
containing the input parameters used for the optimization run. -
convergence
: Alist
containing the final convergence status, including a logicalachieved
flag and the finalerror
value.
Examples
# Create a simple distance matrix
dist_mat <- matrix(c(0, 2, 3, 2, 0, 4, 3, 4, 0), nrow=3)
# Run TopoLow in 2D without writing to a file
result <- create_topolow_map(dist_mat, ndim=2, mapping_max_iter=100,
k0=1.0, cooling_rate=0.001, c_repulsion=0.01, verbose=FALSE)
# results
head(result$positions)
Dengue Virus (DENV) Titer Data
Description
A dataset containing neutralization titer data for Dengue virus. This data can be used to create antigenic maps and explore the antigenic relationships between different DENV strains.
Usage
denv_data
Format
A data frame with the following columns:
- virus_strain
Character, the name of the virus strain.
- serum_strain
Character, the name of the serum strain.
- titer
Character, the neutralization titer value. May include values like '<10' or '>1280'.
- virusYear
Numeric, the year the virus was isolated.
- serumYear
Numeric, the year the serum was collected.
- cluster
Factor, the cluster or serotype assignment for the strains.
- color
Character, a color associated with the cluster for plotting.
Source
Katzelnick, L.C., et al. (2019). An antigenically diverse, representative panel of dengue viruses for neutralizing antibody discovery and vaccine evaluation. eLife. doi:10.7554/eLife.42496
Detect Outliers Using Median Absolute Deviation
Description
Detects outliers in numeric data using the Median Absolute Deviation (MAD) method. This robust method is less sensitive to extreme values than standard deviation and works well for non-normally distributed data.
Usage
detect_outliers_mad(data, k = 3, take_log = FALSE)
Arguments
data |
Numeric vector of values to analyze |
k |
Numeric threshold for outlier detection (default: 3). Points more than k MADs from median are considered outliers. |
take_log |
Logical. Whether to log transform data before (and only for) outlier detection (default: FALSE) |
Details
The function:
Calculates median and MAD of the data
Uses scaled MAD (constant = 1.4826) for normal distribution consistency
Identifies points > k MADs from median as outliers
Returns both outlier mask and summary statistics
MAD scaling constant 1.4826 is calculated as 1/Phi^(-1)(3/4), where Phi is the standard normal CDF. This makes MAD consistent with standard deviation for normal distributions.
Value
A list containing:
outlier_mask |
Logical vector indicating outliers |
stats |
List containing:
|
Convert Distance Matrix to Titer Panel Format
Description
Converts a distance matrix to a titer panel format, handling threshold measurements and logarithmic transformations common in antigenic cartography. The function identifies reference points (typically antisera) and challenge points (typically antigens) based on row/column name prefixes.
Usage
dist_to_titer_table(input_matrix, base = exp(1), tens = 1)
Arguments
input_matrix |
Matrix of distances, with row/column names prefixed with "V/" for antigens and "S/" for sera |
base |
Numeric. Base for logarithmic transformation. Default exp(1). For HI Assay 2 |
tens |
Numeric. Scaling factor for final titers. Default 1. For HI Assay 10 |
Details
The function:
Identifies antigen and serum entries from matrix row/column names
Creates titer table from antigen-serum pairs
Handles threshold indicators (< and >) in distance values
Applies appropriate transformations to convert distances to titers
Transformation steps:
Extract numeric values from thresholded measurements
Convert distances to titers via logarithmic transformation
Apply scaling factor
Reapply threshold indicators to transformed values
Value
A matrix
of titers with:
Rows corresponding to antigen strains (without "V/" prefix)
Columns corresponding to antisera (without "S/" prefix)
Values as character strings including threshold indicators where applicable
-
NA
values replaced with "*"
Examples
# Create sample distance matrix
dist_mat <- matrix(c(0, 2, ">3", 2, 0, 4, "3", 4, 0), nrow=3)
rownames(dist_mat) <- c("V/strain1", "V/strain2", "S/serum1")
colnames(dist_mat) <- c("V/strain1", "V/strain2", "S/serum1")
# Convert to titer panel
titer_panel <- dist_to_titer_table(dist_mat)
Calculate comprehensive error metrics between predicted and true distances
Description
Computes various error metrics including in-sample and out-of-sample errors, and Completeness statistics for model evaluation.
Usage
error_calculator_comparison(p_dist_mat, truth_matrix, input_matrix)
Arguments
p_dist_mat |
Matrix of predicted distances |
truth_matrix |
Matrix of true distances |
input_matrix |
Matrix of input distances (may contain NAs and is used to find the NAs' pattern) |
Details
Input requirements and constraints:
Matrices must have matching dimensions
Row and column names must be consistent between matrices
NAs are allowed and handled appropriately
Threshold indicators (< or >) in input matrix are processed correctly
Value
A list containing:
report_df |
A |
Completeness |
A single numeric value representing the completeness statistic, which is the fraction of validation points for which a prediction could be made. |
Example Antigenic Mapping Data
Description
HI titers of Influenza antigens and antisera published in Smith et al., 2004 were used to find the antigenic relationships and coordinates of the antigens. It can be used for mapping. The data captures how different influenza virus strains (antigens) react with antisera from infected individuals.
Usage
example_positions
Format
A data frame with 285 rows and 11 variables:
- V1
First dimension coordinate from 5D mapping
- V2
Second dimension coordinate from 5D mapping
- V3
Third dimension coordinate from 5D mapping
- V4
Fourth dimension coordinate from 5D mapping
- V5
Fifth dimension coordinate from 5D mapping
- name
Strain identifier
- antigen
Logical; TRUE if point represents an antigen
- antiserum
Logical; TRUE if point represents an antiserum
- cluster
Factor indicating antigenic cluster assignment (A/H3N2 1968-2003)
- color
Color assignment for visualization
- year
Year of strain isolation
Source
Smith et al., 2004
Find Mode of Density Distribution
Description
Calculates the mode (maximum point) of a kernel density estimate.
Usage
find_mode(density)
Arguments
density |
List containing density estimate with components:
|
Value
Numeric value of the mode
Generate Complex High-Dimensional Data for Testing
Description
Generates synthetic high-dimensional data with clusters and trends for testing dimensionality reduction methods. Creates data with specified properties:
Multiple clusters along a trend line
Variable density regions
Controllable noise levels
Optional visualization
The function generates cluster centers along a trend line, adds points around those centers with specified spread, and incorporates random noise to create high and low density areas. The data is useful for testing dimensionality reduction and visualization methods.
Usage
generate_complex_data(
n_points = 500,
n_dim = 10,
n_clusters = 4,
cluster_spread = 1,
fig_name = NA
)
Arguments
n_points |
Integer number of points to generate |
n_dim |
Integer number of dimensions |
n_clusters |
Integer number of clusters |
cluster_spread |
Numeric controlling cluster variance |
fig_name |
Character path to save visualization (optional) |
Value
A data.frame
with n_points
rows and n_dim
columns. Column names
are "Dim1" through "DimN" where N is n_dim.
Examples
# Generate basic dataset
data <- generate_complex_data(n_points = 500, n_dim = 10,
n_clusters = 4, cluster_spread = 1)
# The function returns a data frame, which can be inspected
head(data)
Generate New Parameter Samples Using KDE
Description
Generates new parameter samples using weighted kernel density estimation for each parameter independently.
Usage
generate_kde_samples(samples, n, epsilon = 0)
Arguments
samples |
Data frame of previous samples with parameters and NLL |
n |
Integer number of samples to generate |
epsilon |
Numeric probability of wider bandwidth sampling |
Value
Data frame of n new samples
Generate Synthetic Distance Matrices with Missing Data
Description
Creates synthetic distance matrices with controlled levels of missingness and noise
for testing and validating mapping algorithms. Generates multiple datasets with
different dimensionalities and missingness patterns. If output_dir
is provided,
the generated datasets are saved as RDS files.
Usage
generate_synthetic_datasets(
n_dims_list,
seeds,
n_points,
missingness_levels = list(S = 0.67, M = 0.77, L = 0.87),
output_dir = NULL,
prefix = "sim",
save_plots = FALSE
)
Arguments
n_dims_list |
Numeric vector of dimensions to generate data for |
seeds |
Integer vector of random seeds (same length as n_dims_list) |
n_points |
Integer number of points to generate |
missingness_levels |
Named list of missingness percentages (default: list(S=0.67, M=0.77, L=0.87)) |
output_dir |
Character path to directory for saving outputs. If NULL (the default), no files are saved. |
prefix |
Character string to prefix output files (optional) |
save_plots |
Logical whether to save network visualization plots. Requires |
Value
A list containing the generated synthetic data and metadata:
matrices |
A list of generated symmetric distance matrices for each dimension. |
panels |
A list of generated assay panels (non-symmetric matrices) for each dimension. |
metadata |
A |
Examples
# Generate datasets without saving to disk
results <- generate_synthetic_datasets(
n_dims_list = c(2, 3),
seeds = c(123, 456),
n_points = 50
)
# Generate datasets and save to a temporary directory
temp_out_dir <- tempdir()
results_saved <- generate_synthetic_datasets(
n_dims_list = c(2),
seeds = c(123),
n_points = 10,
missingness_levels = list(low=0.5, high=0.8),
output_dir = temp_out_dir,
save_plots = TRUE
)
list.files(temp_out_dir)
# Clean up the directory
unlink(temp_out_dir, recursive = TRUE)
Generate unique string identifiers with year suffix
Description
Generate unique string identifiers with year suffix
Usage
generate_unique_string(n, length = 8, lower_bound = 1, upper_bound = 20)
Arguments
n |
Number of strings to generate |
length |
Length of random part of string (default: 8) |
lower_bound |
Lower bound for year suffix (default: 1) |
upper_bound |
Upper bound for year suffix (default: 20) |
Value
A character vector of unique strings with year suffixes
Create Grid Around Maximum Likelihood Estimate
Description
Generates a sequence of values centered on the maximum likelihood estimate (MLE) of a parameter, extending by specified factors in each direction.
Usage
get_grid(samples, param, num_points, start_factor, end_factor)
Arguments
samples |
Data frame of MCMC samples with NLL column |
param |
Character name of parameter column |
num_points |
Integer number of points in grid |
start_factor |
Numeric factor for lower bound relative to MLE |
end_factor |
Numeric factor for upper bound relative to MLE |
Value
Numeric vector of grid points
Save ggplot with white background
Description
Wrapper around ggplot2::ggsave that ensures white background. This function masks ggplot2::ggsave.
Usage
ggsave_white_bg(..., bg = "white")
Arguments
... |
Other arguments passed on to the graphics device function,
as specified by |
bg |
Background colour. If |
Value
No return value, called for side effects.
Sigmoid transform function for threshold handling
Description
Helper function that implements a sigmoid transformation used in the Smith variant for smooth threshold handling.
Usage
gmultiple(x)
Arguments
x |
Numeric. Input value |
Value
Numeric. Transformed value in (0,1) range
H3N2 Influenza HI Assay Data from Smith et al. 2004
Description
Hemagglutination inhibition (HI) assay data for influenza A/H3N2 viruses spanning 35 years of evolution.
Usage
h3n2_data
Format
A data frame with the following variables:
- virusStrain
Character. Virus strain identifier
- serumStrain
Character. Antiserum strain identifier
- titer
Numeric. HI assay titer value
- virusYear
Numeric. Year virus was isolated
- serumYear
Numeric. Year serum was collected
- cluster
Factor. Antigenic cluster assignment
- color
Character. Color code for visualization
Source
Smith et al. (2004) Science, 305(5682), 371-376.
HIV Neutralization Assay Data
Description
IC50 neutralization measurements between HIV viruses and antibodies.
Usage
hiv_titers
Format
A data frame with the following variables:
- Antibody
Character. Antibody identifier
- Virus
Character. Virus strain identifier
- IC50
Numeric. IC50 neutralization value
Source
Los Alamos HIV Database (https://www.hiv.lanl.gov/)
HIV Virus Metadata
Description
Reference information for HIV virus strains used in neutralization assays.
Usage
hiv_viruses
Format
A data frame with the following variables:
- Virus.name
Character. Virus strain identifier
- Country
Character. Country of origin
- Subtype
Character. HIV subtype
- Year
Numeric. Year of isolation
Source
Los Alamos HIV Database (https://www.hiv.lanl.gov/)
Increase Missing Values in a Matrix
Description
Strategically introduces NA values into a distance matrix while maintaining symmetry. New NA values are added preferentially farther from the diagonal to simulate real-world measurement patterns where distant pairs are more likely to be unmeasured.
Usage
increase_na_percentage(mat, target_na_percentage)
Arguments
mat |
Matrix to modify |
target_na_percentage |
Numeric between 0 and 1 specifying desired proportion of NAs |
Details
The function:
Calculates needed additional NAs to reach target percentage
Creates probability matrix favoring off-diagonal elements
Randomly selects positions weighted by distance from diagonal
Maintains matrix symmetry by mirroring NAs
Value
A matrix
with an increased number of NA
values, maintaining symmetry.
Examples
# Create sample distance matrix
dist_mat <- matrix(runif(100), 10, 10)
dist_mat[lower.tri(dist_mat)] <- t(dist_mat)[lower.tri(dist_mat)]
diag(dist_mat) <- 0
# Increase NAs to 70%
sparse_mat <- increase_na_percentage(dist_mat, 0.7)
Run Parameter Optimization Via Latin Hypercube Sampling
Description
Performs parameter optimization using Latin Hypercube Sampling (LHS) combined with k-fold cross-validation. Parameters are sampled from specified ranges using maximin LHS design to ensure good coverage of parameter space. Each parameter set is evaluated using k-fold cross-validation to assess prediction accuracy. To calculate one NLL per set of parameters, the function uses a pooled errors approach which combine all validation errors into one set, then calculate a single NLL. This approach has two main advantages: 1- It treats all validation errors equally, respecting the underlying error distribution assumption 2- It properly accounts for the total number of validation points
Usage
initial_parameter_optimization(
distance_matrix,
mapping_max_iter = 1000,
relative_epsilon,
convergence_counter,
scenario_name,
N_min,
N_max,
k0_min,
k0_max,
c_repulsion_min,
c_repulsion_max,
cooling_rate_min,
cooling_rate_max,
num_samples = 20,
max_cores = NULL,
folds = 20,
verbose = FALSE,
write_files = FALSE,
output_dir
)
Arguments
distance_matrix |
Matrix or data frame. Input distance matrix. Must be square and symmetric. Can contain NA values for missing measurements. |
mapping_max_iter |
Integer. Maximum number of optimization iterations. |
relative_epsilon |
Numeric. Convergence threshold for relative change in error. |
convergence_counter |
Integer. Number of iterations below threshold before declaring convergence. |
scenario_name |
Character. Name for output files and job identification. |
N_min , N_max |
Integer. Range for number of dimensions parameter. |
k0_min , k0_max |
Numeric. Range for initial spring constant parameter. |
c_repulsion_min , c_repulsion_max |
Numeric. Range for repulsion constant parameter. |
cooling_rate_min , cooling_rate_max |
Numeric. Range for spring decay parameter. |
num_samples |
Integer. Number of LHS samples to generate (default: 20). |
max_cores |
Integer. Maximum number of cores to use for parallel processing. If NULL, uses all available cores minus 1 (default: NULL). |
folds |
Integer. Number of cross-validation folds. Default: 20. |
verbose |
Logical. Whether to print progress messages. Default: FALSE. |
write_files |
Logical. Whether to save results to CSV. Default: FALSE. |
output_dir |
Character. Directory where output files will be saved.
Required if |
Details
The function performs these steps:
Generates LHS samples in parameter space
Creates k-fold splits of input data
For each parameter set and fold:
Trains model on training set
Evaluates on validation set
Calculates MAE and negative log likelihood
Computations are run locally in parallel.
Parameters ranges are transformed to log scale where appropriate to handle different scales effectively.
Value
A data.frame
containing the parameter sets and their performance metrics
(Holdout_MAE
and NLL
). The columns of the data frame are N
, k0
,
cooling_rate
, c_repulsion
, Holdout_MAE
, and NLL
.
If write_files
is TRUE
, this data frame is also saved to a CSV file as a side effect.
See Also
create_topolow_map
for the core optimization algorithm
Examples
# This example is wrapped in \donttest{} because it can exceed 5 seconds,
# 1. Create a structured, synthetic dataset for the example
# Generate coordinates for a more realistic test case
synth_coords <- generate_complex_data(n_points = 20, n_dim = 3)
# Convert coordinates to a distance matrix
dist_mat <- coordinates_to_matrix(synth_coords)
# 2. Run the optimization on the synthetic data
# ensuring it passes CRAN's automated checks.
results <- initial_parameter_optimization(
distance_matrix = dist_mat,
mapping_max_iter = 100,
relative_epsilon = 1e-3,
convergence_counter = 2,
scenario_name = "test_opt_synthetic",
N_min = 2, N_max = 5,
k0_min = 1, k0_max = 10,
c_repulsion_min = 0.001, c_repulsion_max = 0.05,
cooling_rate_min = 0.001, cooling_rate_max = 0.02,
num_samples = 4,
max_cores = 2,
verbose = FALSE
)
print(results)
Evaluate Likelihood with Cross-Validation
Description
Calculates cross-validated likelihood for a set of parameters by:
Splitting data into training/validation sets
Fitting model on training data
Evaluating likelihood on validation set
Repeating across folds To calculate one NLL per set of parameters, the function uses a pooled errors approach which combines all validation errors into one set, then calculate a single NLL. This approach has two main advantages: 1- It treats all validation errors equally, respecting the underlying error distribution assumption 2- It properly accounts for the total number of validation points
Usage
likelihood_function(
distance_matrix,
mapping_max_iter,
relative_epsilon,
N,
k0,
cooling_rate,
c_repulsion,
folds = 20,
num_cores = 1
)
Arguments
distance_matrix |
Distance matrix to fit |
mapping_max_iter |
Maximum map optimization iterations |
relative_epsilon |
Convergence threshold |
N |
Number of dimensions |
k0 |
Initial spring constant |
cooling_rate |
Spring constant decay rate |
c_repulsion |
Repulsion constant |
folds |
Number of CV folds |
num_cores |
Number of cores for parallel processing |
Value
List with:
Holdout_MAE |
Mean absolute error on validation data |
NLL |
Negative log likelihood |
Log Transform Parameter Samples
Description
Reads samples from a CSV file and log transforms specific parameters (N, k0, cooling_rate, c_repulsion)
if they exist in the data. If output_file
is specified, the transformed data is saved. Otherwise,
the transformed data frame is returned.
Usage
log_transform_parameters(samples_file, output_file = NULL)
Arguments
samples_file |
Character. Path to CSV file containing samples. |
output_file |
Character. Optional path (including filename) for saving transformed data as a CSV. If NULL (the default), the function returns the transformed data frame without writing a file. |
Value
A data.frame
with log-transformed parameters. If output_file
is specified, the function also
writes the data frame to the specified path and returns it invisibly.
Examples
# This example uses a sample file included with the package.
sample_file <- system.file("extdata", "sample_params.csv", package = "topolow")
# Ensure the file exists before running the example
if (nzchar(sample_file)) {
# Transform the data from the sample file and return as a data frame
transformed_data <- log_transform_parameters(sample_file, output_file = NULL)
# Display the first few rows of the transformed data
print(head(transformed_data))
}
Convert Long Format Data to Distance Matrix
Description
Converts a dataset from long format to a symmetric distance matrix. The function handles antigenic cartography data where measurements may exist between antigens and antisera points. Row and column names can be optionally sorted by a time variable.
Usage
long_to_matrix(
data,
chnames,
chorder = NULL,
rnames,
rorder = NULL,
values_column,
rc = FALSE,
sort = FALSE
)
Arguments
data |
Data frame in long format |
chnames |
Character. Name of column holding the challenge point names. |
chorder |
Character. Optional name of column for challenge point ordering. |
rnames |
Character. Name of column holding reference point names. |
rorder |
Character. Optional name of column for reference point ordering. |
values_column |
Character. Name of column containing distance/difference values. It should be from the nature of "distance" (e.g., antigenic distance or IC50), not "similarity" (e.g., HI Titer.) |
rc |
Logical. If TRUE, reference points are treated as a subset of challenge points. If FALSE, they are treated as distinct sets. Default is FALSE. |
sort |
Logical. Whether to sort rows/columns by chorder/rorder. Default FALSE. |
Details
The function expects data in long format with at least three columns:
A column for challenge point names
A column for reference point names
A column containing the distance/difference values
Optionally, ordering columns can be provided to sort the output matrix. The 'rc' parameter determines how to handle shared names between references and challenges.
Value
A symmetric matrix
of distances with row and column names corresponding
to the unique points in the data. NA
values represent unmeasured pairs.
Examples
data <- data.frame(
antigen = c("A", "B", "A"),
serum = c("X", "X", "Y"),
distance = c(2.5, 1.8, 3.0),
year = c(2000, 2001, 2000)
)
# Basic conversion
mat <- long_to_matrix(data,
chnames = "antigen",
rnames = "serum",
values_column = "distance")
# With sorting by year
mat_sorted <- long_to_matrix(data,
chnames = "antigen",
chorder = "year",
rnames = "serum",
rorder = "year",
values_column = "distance",
sort = TRUE)
Create Interactive Plot
Description
Converts a static ggplot visualization to an interactive plotly visualization with customizable tooltips and interactive features.
Usage
make_interactive(plot, tooltip_vars = NULL)
Arguments
plot |
ggplot object to convert |
tooltip_vars |
Vector of variable names to include in tooltips |
Details
The function enhances static plots by adding:
Hover tooltips with data values
Zoom capabilities
Pan capabilities
Click interactions
Double-click to reset
If tooltip_vars is NULL, the function attempts to automatically determine relevant variables from the plot's mapping.
Value
A plotly
object with interactive features.
Examples
if (interactive() && requireNamespace("plotly", quietly = TRUE)) {
# Create sample data and plot
data <- data.frame(
V1 = rnorm(100), V2 = rnorm(100), name=1:100,
antigen = rep(c(0,1), 50), antiserum = rep(c(1,0), 50),
year = rep(2000:2009, each=10), cluster = rep(1:5, each=20)
)
# Create temporal plot
p1 <- plot_temporal_mapping(data, ndim=2)
# Make interactive with default tooltips
p1_interactive <- make_interactive(p1)
# Create cluster plot with custom tooltips
p2 <- plot_cluster_mapping(data, ndim=2)
p2_interactive <- make_interactive(p2,
tooltip_vars = c("cluster", "year", "antigen")
)
}
Plot Aesthetic Configuration Class
Description
S3 class for configuring plot visual aesthetics including points, colors, labels and text elements.
Usage
new_aesthetic_config(
point_size = 3.5,
point_alpha = 0.8,
point_shapes = c(antigen = 16, antiserum = 0),
color_palette = c25,
gradient_colors = list(low = "blue", high = "red"),
show_labels = FALSE,
show_title = FALSE,
label_size = 3,
title_size = 14,
subtitle_size = 12,
axis_title_size = 12,
axis_text_size = 10,
legend_text_size = 10,
legend_title_size = 12,
show_legend = TRUE,
legend_position = "right",
arrow_head_size = 0.2,
arrow_alpha = 0.6
)
Arguments
point_size |
Base point size |
point_alpha |
Point transparency |
point_shapes |
Named vector of shapes for different point types |
color_palette |
Color palette name or custom palette |
gradient_colors |
List with low and high colors for gradients |
show_labels |
Whether to show point labels |
show_title |
Whether to show plot title (default: FALSE) |
label_size |
Label text size |
title_size |
Title text size |
subtitle_size |
Subtitle text size |
axis_title_size |
Axis title text size |
axis_text_size |
Axis text size |
legend_text_size |
Legend text size |
legend_title_size |
Legend title text size |
show_legend |
Whether to show the legend |
legend_position |
Legend position ("none", "right", "left", "top", "bottom") |
arrow_head_size |
Size of the arrow head for velocity arrows (in cm) |
arrow_alpha |
Transparency of arrows (0 = invisible, 1 = fully opaque) |
Value
An S3 object of class aesthetic_config
, which is a list
containing the specified configuration parameters for plot aesthetics.
Plot Annotation Configuration Class
Description
S3 class for configuring point annotations in plots, including labels, connecting lines, and visual properties.
Usage
new_annotation_config(
notable_points = NULL,
size = 4.9,
color = "black",
alpha = 0.9,
fontface = "plain",
box = FALSE,
segment_size = 0.3,
segment_alpha = 0.6,
min_segment_length = 0,
max_overlaps = Inf,
outline_size = 0.4
)
Arguments
notable_points |
Character vector of notable points to highlight |
size |
Numeric. Size of annotations for notable points |
color |
Character. Color of annotations for notable points |
alpha |
Numeric. Alpha transparency of annotations |
fontface |
Character. Font face of annotations ("plain", "bold", "italic", etc.) |
box |
Logical. Whether to draw a box around annotations |
segment_size |
Numeric. Size of segments connecting annotations to points |
segment_alpha |
Numeric. Alpha transparency of connecting segments |
min_segment_length |
Numeric. Minimum length of connecting segments |
max_overlaps |
Numeric. Maximum number of overlaps allowed for annotations |
outline_size |
Numeric. Size of the outline for annotations |
Value
An S3 object of class annotation_config
, which is a list
containing the specified configuration parameters for plot annotations.
Dimension Reduction Configuration Class
Description
S3 class for configuring dimension reduction parameters including method selection and algorithm-specific parameters.
Usage
new_dim_reduction_config(
method = "pca",
n_components = 2,
scale = TRUE,
center = TRUE,
pca_params = list(tol = sqrt(.Machine$double.eps), rank. = NULL),
umap_params = list(n_neighbors = 15, min_dist = 0.1, metric = "euclidean", n_epochs =
200),
tsne_params = list(perplexity = 30, mapping_max_iter = 1000, theta = 0.5),
compute_loadings = FALSE,
random_state = NULL
)
Arguments
method |
Dimension reduction method ("pca", "umap", "tsne") |
n_components |
Number of components to compute |
scale |
Scale the data before reduction |
center |
Center the data before reduction |
pca_params |
List of PCA-specific parameters |
umap_params |
List of UMAP-specific parameters |
tsne_params |
List of t-SNE-specific parameters |
compute_loadings |
Compute and return loadings |
random_state |
Random seed for reproducibility |
Value
An S3 object of class dim_reduction_config
, which is a list
containing the specified configuration parameters for dimensionality reduction.
Plot Layout Configuration Class
Description
S3 class for configuring plot layout including dimensions, margins, grids and coordinate systems.
Usage
new_layout_config(
width = 8,
height = 8,
dpi = 300,
aspect_ratio = 1,
show_grid = TRUE,
grid_type = "major",
grid_color = "grey80",
grid_linetype = "dashed",
show_axis = TRUE,
axis_lines = TRUE,
plot_margin = margin(1, 1, 1, 1, "cm"),
coord_type = "fixed",
background_color = "white",
panel_background_color = "white",
panel_border = TRUE,
panel_border_color = "black",
save_plot = FALSE,
save_format = "png",
reverse_x = 1,
reverse_y = 1,
x_limits = NULL,
y_limits = NULL,
arrow_plot_threshold = 0.1
)
Arguments
width |
Plot width in inches |
height |
Plot height in inches |
dpi |
Plot resolution |
aspect_ratio |
Plot aspect ratio |
show_grid |
Show plot grid |
grid_type |
Grid type ("none", "major", "minor", "both") |
grid_color |
Grid color |
grid_linetype |
Grid line type |
show_axis |
Show axes |
axis_lines |
Show axis lines |
plot_margin |
Plot margins in cm |
coord_type |
Coordinate type ("fixed", "equal", "flip", "polar") |
background_color |
Plot background color |
panel_background_color |
Panel background color |
panel_border |
Show panel border |
panel_border_color |
Panel border color |
save_plot |
Logical. Whether to save the plot to a file. |
save_format |
Plot save format ("png", "pdf", "svg", "eps") |
reverse_x |
Numeric multiplier for x-axis direction (1 or -1) |
reverse_y |
Numeric multiplier for y-axis direction (1 or -1) |
x_limits |
Numeric vector of length 2 specifying c(min, max) for x-axis. If NULL, limits are set automatically. |
y_limits |
Numeric vector of length 2 specifying c(min, max) for y-axis. If NULL, limits are set automatically. |
arrow_plot_threshold |
Threshold for velocity arrows to be drawn in the same antigenic distance unit (default: 0.10) |
Value
An S3 object of class layout_config
, which is a list containing
the specified configuration parameters for plot layout.
Filter matrix to only virus vs antiserum distances
Description
Filter matrix to only virus vs antiserum distances
Usage
only_virus_vs_as(dist_matrix, selected_names)
Arguments
dist_matrix |
Distance matrix |
selected_names |
Names of selected reference points |
Value
A matrix
of the same dimensions as the input, but with non-virus-vs-antiserum distances set to NA
.
Parameter Sensitivity Analysis
Description
Analyzes the sensitivity of model performance (MAE) to changes in a parameter. Uses binning to identify the minimum MAE across parameter ranges and calculates thresholds for acceptable parameter values.
Usage
parameter_sensitivity_analysis(
param,
samples,
bins = 30,
mae_col = "Holdout_MAE",
threshold_pct = 5,
min_samples = 1
)
Arguments
param |
Character name of parameter to analyze |
samples |
Data frame containing parameter samples and performance metrics |
bins |
Integer number of bins for parameter range (default: 40) |
mae_col |
Character name of column containing MAE values (default: "Holdout_MAE") |
threshold_pct |
Numeric percentage above minimum for threshold calculation (default: 5) |
min_samples |
Integer minimum number of samples required in a bin (default: 1) |
Details
The function performs these steps:
Cleans the input data using MAD-based outlier detection
Bins the parameter values into equal-width bins
Calculates the minimum MAE within each bin. Analogous to "poorman's likelihood" approach, minimum MAE within each bin is an empirical estimate of the performance surface at this parameter value when other parameters are at their optimal values.
Identifies a threshold of acceptable performance (default: Topolow min. +5% MAE)
Returns an object for visualization and further analysis
Value
Object of class "parameter_sensitivity" containing:
param_values |
Vector of parameter bin midpoints |
min_mae |
Vector of minimum MAE values per bin |
param_name |
Name of analyzed parameter |
threshold |
Threshold value (default: Topolow min. +5%) |
min_value |
Minimum MAE value across all bins |
sample_counts |
Number of samples per bin |
Plot Method for Parameter Sensitivity Analysis
Description
Creates a visualization of parameter sensitivity showing minimum MAE values across parameter ranges with threshold indicators.
Usage
## S3 method for class 'parameter_sensitivity'
plot(
x,
width = 3.5,
height = 3.5,
save_plot = FALSE,
output_dir,
y_limit_factor = NULL,
...
)
Arguments
x |
A parameter_sensitivity object |
width |
Numeric width of output plot in inches (default: 3.5) |
height |
Numeric height of output plot in inches (default: 3.5) |
save_plot |
Logical. Whether to save plot to file. Default: FALSE |
output_dir |
Character. Directory for output files. Required if |
y_limit_factor |
Numeric. Factor to set the upper y-axis limit as a percentage above the threshold value (e.g., 1.10 for 10% above). Default: NULL (automatic scaling) |
... |
Additional arguments passed to plot |
Value
A ggplot object
Plot Method for Profile Likelihood Objects
Description
Creates a visualization of profile likelihood for a parameter showing maximum likelihood estimates and confidence intervals. Supports mathematical notation for parameter names and configurable output settings.
Confidence interval is found using the likelihood ratio test:
LR(\theta_{ij}) = -2[log L_{max}(\theta_{ij}) - log L_{max}(\hat{\theta})]
where \hat{\theta}
is the maximum likelihood estimate for all parameters.
The 95% confidence interval is:
\{\theta_{ij} : LR(\theta_{ij}) \leq \chi^2_{1,0.05} = 3.84\}
Usage
## S3 method for class 'profile_likelihood'
plot(x, LL_max, width = 3.5, height = 3.5, save_plot = FALSE, output_dir, ...)
Arguments
x |
A profile_likelihood object |
LL_max |
Numeric maximum log-likelihood value |
width |
Numeric width of output plot in inches (default: 3.5) |
height |
Numeric height of output plot in inches (default: 3.5) |
save_plot |
Logical. Whether to save plot to file. Default: FALSE |
output_dir |
Character. Directory for output files. Required if |
... |
Additional arguments passed to plot |
Value
A ggplot object
Examples
# These examples take more than 5 seconds to run, so they are not run by default.
# Use parallel processing (the default) to speed up.
# Create a sample data frame of MCMC samples
samples <- data.frame(
log_N = log(runif(50, 2, 10)),
log_k0 = log(runif(50, 1, 5)),
log_cooling_rate = log(runif(50, 0.01, 0.1)),
log_c_repulsion = log(runif(50, 0.1, 1)),
NLL = runif(50, 20, 100)
)
# Calculate profile likelihood
pl_result <- profile_likelihood("log_N", samples, grid_size = 10)
# Plot with maximum likelihood from samples
LL_max <- max(-samples$NLL)
# The plot function requires the ggplot2 package
if (requireNamespace("ggplot2", quietly = TRUE)) {
plot(pl_result, LL_max, width = 4, height = 3)
}
Plot Method for Adaptive Monte Carlo Sampling Diagnostics
Description
Creates trace and density plots for multiple chains to assess convergence and mixing.
Usage
## S3 method for class 'topolow_amcs_diagnostics'
plot(
x,
output_dir,
output_file = "mc_diagnostics.png",
save_plot = FALSE,
width = 3000,
height = 3000,
res = 300,
...
)
Arguments
x |
A topolow_amcs_diagnostics object |
output_dir |
Character. Directory for output files. Required if |
output_file |
Character path for saving plot |
save_plot |
Logical. Whether to save the plot. |
width , height , res |
Plot dimensions and resolution |
... |
Additional arguments passed to plot functions |
Value
A ggplot object of the combined plots.
Plot Method for Convergence Diagnostics
Description
Creates visualization of convergence diagnostics from Monte Carlo sampling, including parameter mean trajectories and covariance matrix stability over iterations. Helps assess whether parameter estimation has converged to stable distributions.
Usage
## S3 method for class 'topolow_convergence'
plot(x, param_names = NULL, ...)
Arguments
x |
A topolow_convergence object from check_gaussian_convergence(). |
param_names |
Optional character vector of parameter names to use in plot titles. If NULL (default), uses the param_names from the topolow_convergence object. |
... |
Additional arguments passed to underlying plot functions (currently not used). |
Details
The function generates two types of plots:
Parameter mean plots: Shows how the mean value for each parameter changes over iterations. Stabilization of these plots indicates convergence of parameter distributions.
Covariance change plot: Shows relative changes in the covariance matrix using the Frobenius norm (also called Hilbert-Schmidt norm), which is defined as the square root of the sum of the absolute squares of all matrix elements:
\sqrt{\sum|a_{ij}|^2}
. A decreasing trend approaching zero indicates stable relationships between parameters.
Value
A grid of plots showing convergence metrics.
See Also
check_gaussian_convergence
for generating the convergence object
Examples
# Example with simulated data
chain_data <- data.frame(
log_N = rnorm(1000, mean = 1.5, sd = 0.1),
log_k0 = rnorm(1000, mean = -0.5, sd = 0.2)
)
# Check convergence
results <- check_gaussian_convergence(chain_data)
# Plot diagnostics
plot(results)
# With custom parameter names
plot(results, param_names = c("Dimensions (log)", "Spring constant (log)"))
Create 3D Visualization
Description
Creates an interactive or static 3D visualization using rgl. Supports both temporal and cluster-based coloring schemes with configurable point appearances and viewing options.
Usage
plot_3d_mapping(
df,
ndim,
dim_config = new_dim_reduction_config(),
aesthetic_config = new_aesthetic_config(),
layout_config = new_layout_config(),
interactive = TRUE,
output_dir
)
Arguments
df |
Data frame containing: - V1, V2, ... Vn: Coordinate columns - antigen: Binary indicator for antigen points - antiserum: Binary indicator for antiserum points - cluster: (Optional) Factor or integer cluster assignments - year: (Optional) Numeric year values for temporal coloring |
ndim |
Number of dimensions in input coordinates (must be >= 3) |
dim_config |
Dimension reduction configuration object |
aesthetic_config |
Aesthetic configuration object |
layout_config |
Layout configuration object |
interactive |
Logical; whether to create an interactive plot |
output_dir |
Character. Directory for output files. Required if |
Details
The function supports two main visualization modes:
Interactive mode: Creates a manipulatable 3D plot window
Static mode: Generates a static image from a fixed viewpoint
Color schemes are automatically selected based on available data:
If cluster data is present: Uses discrete colors per cluster
If year data is present: Uses continuous color gradient
Otherwise: Uses default point colors
For data with more than 3 dimensions, dimension reduction is applied first.
Note: This function requires the rgl package and OpenGL support. If rgl is not available, the function will return a 2D plot with a message explaining how to enable 3D visualization.
Value
Invisibly returns the rgl scene ID for further manipulation if rgl is available, or a 2D ggplot object as a fallback.
See Also
plot_temporal_mapping
for 2D temporal visualization
plot_cluster_mapping
for 2D cluster visualization
make_interactive
for converting 2D plots to interactive versions
Examples
# Create sample data
set.seed(123)
data <- data.frame(
V1 = rnorm(100), V2 = rnorm(100), V3 = rnorm(100), V4 = rnorm(100), name = 1:100,
antigen = rep(c(0,1), 50), antiserum = rep(c(1,0), 50),
cluster = rep(1:5, each=20), year = rep(2000:2009, each=10)
)
# Create a static plot and save to a temporary file
# This example requires an interactive session and the 'rgl' package.
if (interactive() && requireNamespace("rgl", quietly = TRUE)) {
temp_dir <- tempdir()
# Basic interactive plot (will open a new window)
if(interactive()) {
plot_3d_mapping(data, ndim=4)
}
# Custom configuration for temporal visualization
aesthetic_config <- new_aesthetic_config(
point_size = 5,
point_alpha = 0.8,
gradient_colors = list(
low = "blue",
high = "red"
)
)
layout_config <- new_layout_config(
width = 12,
height = 12,
background_color = "black",
show_axis = TRUE
)
# Create customized static plot and save it
plot_3d_mapping(data, ndim=4,
aesthetic_config = aesthetic_config,
layout_config = layout_config,
interactive = FALSE, output_dir = temp_dir
)
list.files(temp_dir)
unlink(temp_dir, recursive = TRUE)
}
Create Clustered Mapping Plots
Description
Creates a visualization of points colored by cluster assignment using dimension reduction, with optional antigenic velocity arrows. Points are colored by cluster with different shapes for antigens and antisera.
Usage
plot_cluster_mapping(
df_coords,
ndim,
dim_config = new_dim_reduction_config(),
aesthetic_config = new_aesthetic_config(),
layout_config = new_layout_config(),
annotation_config = new_annotation_config(),
output_dir,
show_shape_legend = TRUE,
cluster_legend_title = "Cluster",
draw_arrows = FALSE,
annotate_arrows = TRUE,
phylo_tree = NULL,
sigma_t = NULL,
sigma_x = NULL,
clade_node_depth = NULL,
show_one_arrow_per_cluster = FALSE,
cluster_legend_order = NULL
)
Arguments
df_coords |
Data frame containing: - V1, V2, ... Vn: Coordinate columns - antigen: Binary indicator for antigen points - antiserum: Binary indicator for antiserum points - cluster: Factor or integer cluster assignments |
ndim |
Number of dimensions in input coordinates |
dim_config |
Dimension reduction configuration object specifying method and parameters |
aesthetic_config |
Aesthetic configuration object controlling plot appearance |
layout_config |
Layout configuration object controlling plot dimensions and style. Use x_limits and y_limits in layout_config to set axis limits. |
annotation_config |
Annotation configuration object for labeling notable points |
output_dir |
Character. Directory for output files. Required if |
show_shape_legend |
Logical. Whether to show the shape legend (default: TRUE) |
cluster_legend_title |
Character. Custom title for the cluster legend (default: "Cluster") |
draw_arrows |
logical; if TRUE, compute and draw antigenic drift vectors |
annotate_arrows |
logical; if TRUE, show names of the points having arrows |
phylo_tree |
Optional; phylo object in Newick format. Does not need to be rooted. If provided, used to compute antigenic velocity arrows. |
sigma_t |
Optional; numeric; bandwidth for the Gaussian kernel discounting on time in years or the time unit of the data. If NULL, uses Silverman's rule of thumb. |
sigma_x |
Optional; numeric; bandwidth for the Gaussian kernel discounting on antigenic distance in antigenic units. If NULL, uses Silverman's rule of thumb. |
clade_node_depth |
Optional; integer; number of levels of parent nodes to define clades. Antigens from different clades will be excluded from the calculation antigenic velocity arrows. (Default: Automatically calculated mode of leaf-to-backbone distance of the tree) |
show_one_arrow_per_cluster |
Shows only the largest antigenic velocity arrow in each cluster |
cluster_legend_order |
in case you prefer a certain order for clusters in the legend, provide a list with that order here; e.g., c("cluster 2", "cluster 1") |
Details
The function performs these steps:
Validates input data structure and types
Applies dimension reduction if ndim > 2
Creates visualization with cluster-based coloring
Applies specified aesthetic and layout configurations
Applies custom axis limits if specified in layout_config
Different shapes distinguish between antigens and antisera points, while color represents cluster assignment. The color palette can be customized through the aesthetic_config.
Value
A ggplot
object containing the cluster mapping visualization.
See Also
plot_temporal_mapping
for temporal visualization
plot_3d_mapping
for 3D visualization
new_dim_reduction_config
for dimension reduction options
new_aesthetic_config
for aesthetic options
new_layout_config
for layout options
new_annotation_config
for annotation options
Examples
# Basic usage with default configurations
data <- data.frame(
V1 = rnorm(100), V2 = rnorm(100), V3 = rnorm(100), name = 1:100,
antigen = rep(c(0,1), 50), antiserum = rep(c(1,0), 50),
cluster = rep(1:5, each=20)
)
p1 <- plot_cluster_mapping(data, ndim=3)
# Save plot to a temporary directory
temp_dir <- tempdir()
# Custom configurations with specific color palette and axis limits
aesthetic_config <- new_aesthetic_config(
point_size = 4,
point_alpha = 0.7,
color_palette = c("red", "blue", "green", "purple", "orange"),
show_labels = TRUE,
label_size = 3
)
layout_config_save <- new_layout_config(save_plot = TRUE,
width = 10,
height = 8,
coord_type = "fixed",
show_grid = TRUE,
grid_type = "major",
x_limits = c(-10, 10),
y_limits = c(-8, 8)
)
p_saved <- plot_cluster_mapping(data, ndim=3,
layout_config = layout_config_save,
aesthetic_config = aesthetic_config,
output_dir = temp_dir
)
list.files(temp_dir)
unlink(temp_dir, recursive = TRUE)
Plot Distance Matrix Heatmap
Description
Creates heatmap visualization of distance matrix showing patterns and structure in the measurements.
Usage
plot_distance_heatmap(
heatmap_data,
output_file = NULL,
aesthetic_config = new_aesthetic_config(),
layout_config = new_layout_config()
)
Arguments
heatmap_data |
List output from prepare_heatmap_data() |
output_file |
Character. Full path (including filename and extension) where the plot will be saved. If NULL, the plot is not saved. |
aesthetic_config |
Plot aesthetic configuration object |
layout_config |
Plot layout configuration object |
Value
A ggplot object containing:
Heatmap visualization of the distance matrix
Color gradient representing distance values
Title showing matrix completeness percentage
Examples
# Create sample heatmap data
dist_mat <- matrix(rnorm(100), 10, 10)
hmap_data <- prepare_heatmap_data(dist_mat)
# Create and display the plot object
plot_distance_heatmap(hmap_data)
Plot Network Structure Analysis
Description
Creates visualization of distance matrix network structure showing data availability patterns and connectivity.
Usage
plot_network_structure(
network_results,
output_file = NULL,
aesthetic_config = new_aesthetic_config(),
layout_config = new_layout_config()
)
Arguments
network_results |
List output from analyze_network_structure() |
output_file |
Character. Full path (including filename and extension) where the plot will be saved. If NULL, the plot is not saved. |
aesthetic_config |
Plot aesthetic configuration object |
layout_config |
Plot layout configuration object |
Value
A ggplot
object representing the network graph.
Examples
# Create sample network data
adj_mat <- matrix(sample(c(0,1), 25, replace=TRUE), 5, 5)
# Add row and column names, which are required by the analysis function
rownames(adj_mat) <- colnames(adj_mat) <- paste0("Point", 1:5)
# Ensure the matrix is symmetric for the analysis
adj_mat[lower.tri(adj_mat)] <- t(adj_mat)[lower.tri(adj_mat)]
diag(adj_mat) <- 0
net_analysis <- analyze_network_structure(adj_mat)
# Create plot and return the plot object
plot_network_structure(net_analysis)
Create Temporal Mapping Plot
Description
Creates a visualization of points colored by time (year) using dimension reduction, with optional antigenic velocity arrows. Points are colored on a gradient scale based on their temporal values, with different shapes for antigens and antisera.
Usage
plot_temporal_mapping(
df_coords,
ndim,
dim_config = new_dim_reduction_config(),
aesthetic_config = new_aesthetic_config(),
layout_config = new_layout_config(),
annotation_config = new_annotation_config(),
output_dir,
show_shape_legend = TRUE,
draw_arrows = FALSE,
annotate_arrows = TRUE,
phylo_tree = NULL,
sigma_t = NULL,
sigma_x = NULL,
clade_node_depth = NULL
)
Arguments
df_coords |
Data frame containing: - V1, V2, ... Vn: Coordinate columns - antigen: Binary indicator for antigen points - antiserum: Binary indicator for antiserum points - year: Numeric year values for temporal coloring |
ndim |
Number of dimensions in input coordinates |
dim_config |
Dimension reduction configuration object specifying method and parameters |
aesthetic_config |
Aesthetic configuration object controlling plot appearance |
layout_config |
Layout configuration object controlling plot dimensions and style. Use x_limits and y_limits in layout_config to set axis limits. |
annotation_config |
Annotation configuration object for labeling notable points |
output_dir |
Character. Directory for output files. Required if |
show_shape_legend |
Logical. Whether to show the shape legend (default: TRUE) |
draw_arrows |
logical; if TRUE, compute and draw antigenic drift vectors |
annotate_arrows |
logical; if TRUE, show names of the points having arrows |
phylo_tree |
Optional; phylo object in Newick format. Does not need to be rooted. If provided, used to compute antigenic velocity arrows. |
sigma_t |
Optional; numeric; bandwidth for the Gaussian kernel discounting on time in years or the time unit of the data. If NULL, uses Silverman's rule of thumb. |
sigma_x |
Optional; numeric; bandwidth for the Gaussian kernel discounting on antigenic distancein antigenic units. If NULL, uses Silverman's rule of thumb. |
clade_node_depth |
Optional; integer; number of levels of parent nodes to define clades. Antigens from different clades will be excluded from the calculation antigenic velocity arrows. (Default: Automatically calculated mode of leaf-to-backbone distance of the tree) |
Details
The function performs these steps:
Validates input data structure and types
Applies dimension reduction if ndim > 2
Creates visualization with temporal color gradient
Applies specified aesthetic and layout configurations
Applies custom axis limits if specified in layout_config
Different shapes distinguish between antigens and antisera points, while color represents temporal progression.
Value
A ggplot
object containing the temporal mapping visualization.
See Also
plot_cluster_mapping
for cluster-based visualization
plot_3d_mapping
for 3D visualization
new_dim_reduction_config
for dimension reduction options
new_aesthetic_config
for aesthetic options
new_layout_config
for layout options
new_annotation_config
for annotation options
Examples
# Basic usage with default configurations
data <- data.frame(
V1 = rnorm(100), V2 = rnorm(100), V3 = rnorm(100), name = 1:100,
antigen = rep(c(0,1), 50), antiserum = rep(c(1,0), 50),
year = rep(2000:2009, each=10)
)
# Plot without saving
p1 <- plot_temporal_mapping(data, ndim=3)
# Save plot to a temporary directory
temp_dir <- tempdir()
layout_config_save <- new_layout_config(save_plot = TRUE,
x_limits = c(-10, 10),
y_limits = c(-8, 8))
p_saved <- plot_temporal_mapping(data, ndim = 3, layout_config = layout_config_save,
output_dir = temp_dir)
list.files(temp_dir) # Check that file was created
unlink(temp_dir, recursive = TRUE) # Clean up
Generate Distance Matrix Heatmap Data
Description
Prepares distance matrix data for heatmap visualization by handling missing values and calculating relevant statistics.
Usage
prepare_heatmap_data(
distance_matrix,
cluster_rows = FALSE,
cluster_cols = FALSE
)
Arguments
distance_matrix |
Square symmetric matrix of distances |
cluster_rows |
Logical; whether to cluster rows |
cluster_cols |
Logical; whether to cluster columns |
Value
A list of data prepared for generating a heatmap of the distance matrix:
matrix_data |
The distance |
row_order |
An integer vector of the row indices after clustering. If |
col_order |
An integer vector of the column indices after clustering. If |
stats |
A list of summary statistics for the distance matrix, including |
Examples
# Create a sample distance matrix
dist_mat <- matrix(runif(25), 5, 5)
# Prepare data for a heatmap
heatmap_data <- prepare_heatmap_data(dist_mat)
print(heatmap_data$stats$completeness)
Print Method for Parameter Sensitivity Objects
Description
Print Method for Parameter Sensitivity Objects
Usage
## S3 method for class 'parameter_sensitivity'
print(x, ...)
Arguments
x |
A parameter_sensitivity object |
... |
Additional arguments passed to print |
Value
No return value, called for side effects (prints a summary to the console).
Print Method for Profile Likelihood Objects
Description
Print Method for Profile Likelihood Objects
Usage
## S3 method for class 'profile_likelihood'
print(x, ...)
Arguments
x |
Profile likelihood object |
... |
Additional arguments passed to print |
Value
The original profile_likelihood
object (invisibly). Called for side effects (prints a summary to the console).
Print method for topolow objects
Description
Provides a concise display of key optimization results including dimensions, iterations, error metrics and convergence status.
Usage
## S3 method for class 'topolow'
print(x, ...)
Arguments
x |
A topolow object returned by create_topolow_map() |
... |
Additional arguments passed to print (not used) |
Value
The original topolow
object (invisibly). This function is called for its
side effect of printing a summary to the console.
Examples
# Create a simple distance matrix and run the optimization
dist_mat <- matrix(c(0, 2, 3, 2, 0, 4, 3, 4, 0), nrow=3)
result <- create_topolow_map(dist_mat, ndim=2, mapping_max_iter=50,
k0=1.0, cooling_rate=0.001, c_repulsion=0.1,
verbose = FALSE)
# Print the result object
print(result)
Print Method for Adaptive Monte Carlo Sampling Diagnostics
Description
Print Method for Adaptive Monte Carlo Sampling Diagnostics
Usage
## S3 method for class 'topolow_amcs_diagnostics'
print(x, ...)
Arguments
x |
A topolow_amcs_diagnostics object |
... |
Additional arguments passed to print |
Value
No return value, called for side effects (prints a summary to the console).
Print Method for Convergence Diagnostics
Description
Print Method for Convergence Diagnostics
Usage
## S3 method for class 'topolow_convergence'
print(x, ...)
Arguments
x |
A topolow_convergence object |
... |
Additional arguments passed to print |
Value
No return value, called for side effects (prints a summary to the console).
Process Raw Antigenic Assay Data
Description
Processes raw antigenic assay data from CSV files into standardized long and matrix formats. Handles both titer data (which needs conversion to distances) and direct distance measurements like IC50. Preserves threshold indicators (<, >) and handles repeated measurements by averaging.
Usage
process_antigenic_data(
file_path,
antigen_col,
serum_col,
value_col,
is_titer = TRUE,
metadata_cols = NULL,
id_prefix = FALSE,
base = NULL,
scale_factor = 10
)
Arguments
file_path |
Character. Path to CSV file containing raw data. |
antigen_col |
Character. Name of column containing virus/antigen identifiers. |
serum_col |
Character. Name of column containing serum/antibody identifiers. |
value_col |
Character. Name of column containing measurements (titers or distances). |
is_titer |
Logical. Whether values are titers (TRUE) or distances like IC50 (FALSE). |
metadata_cols |
Character vector. Names of additional columns to preserve. |
id_prefix |
Logical. Whether to prefix IDs with V/ and S/ (default: TRUE). |
base |
Numeric. Base for logarithm transformation (default: 2 for titers, e for IC50). |
scale_factor |
Numeric. Scale factor for titers (default: 10). |
Details
The function handles these key steps:
Reads and validates input data
Transforms values to log scale
Converts titers to distances if needed
Averages repeated measurements
Creates standardized long format
Creates distance matrix
Preserves metadata and threshold indicators
Preserves virusYear and serumYear columns if present
Input requirements and constraints:
CSV file must contain required columns
Column names must match specified parameters in the function input
Values can include threshold indicators (< or >)
Metadata columns must exist if specified
Allowed Year-related column names are "virusYear" and "serumYear"
Value
A list containing two elements:
long |
A |
matrix |
A numeric |
Examples
# Locate the example data file included in the package
file_path <- system.file("extdata", "example_titer_data.csv", package = "topolow")
# Check if the file exists before running the example
if (file.exists(file_path)) {
# Process the example titer data
results <- process_antigenic_data(
file_path,
antigen_col = "virusStrain",
serum_col = "serumStrain",
value_col = "titer",
is_titer = TRUE,
metadata_cols = c("cluster", "color")
)
# View the long format data
print(results$long)
# View the distance matrix
print(results$matrix)
}
Process distance matrix for convergence error calculations
Description
Helper function that processes elements of the distance matrix for calculating convergence error. Handles threshold indicators and NA values.
Usage
process_distance_matrix(reported_distance, distance)
Arguments
reported_distance |
Character or numeric. The reported distance value |
distance |
Numeric. The calculated distance to compare against |
Value
Numeric or NA. Returns processed distance if valid, NA otherwise
Profile Likelihood Analysis
Description
Calculates profile likelihood for a parameter by evaluating conditional maximum likelihood across a grid of parameter values. Uses local sample windowing to estimate conditional likelihoods. This implementation is not a classical profile likelihood calculation, but rather an "empirical profile likelihood" which estimates the profile likelihood at each point based on the many observations previously sampled in Monte Carlo simulations.
Usage
profile_likelihood(
param,
samples,
grid_size = 40,
bandwidth_factor = 0.05,
start_factor = 0.5,
end_factor = 1.5,
min_samples = 5
)
Arguments
param |
Character name of parameter to analyze |
samples |
Data frame containing parameter samples and log-likelihoods |
grid_size |
Integer number of grid points (default: 48) |
bandwidth_factor |
Numeric factor for local sample window (default: 0.03) |
start_factor , end_factor |
Numeric range multipliers for parameter grid (default: 0.5, 1.2) |
min_samples |
Integer minimum samples required for reliable estimate (default: 10) |
Details
For each value in the parameter grid, the function:
Identifies nearby samples using bandwidth window
Calculates conditional maximum likelihood from these samples
Tracks sample counts to assess estimate reliability
Handles boundary conditions and sparse regions
Value
Object of class "profile_likelihood" containing:
param |
Vector of parameter values |
ll |
Vector of log-likelihood values |
param_name |
Name of analyzed parameter |
bandwidth |
Bandwidth used for local windows |
sample_counts |
Number of samples per estimate |
See Also
plot.profile_likelihood
for visualization
Examples
# Create a sample data frame of MCMC samples
mcmc_samples <- data.frame(
log_N = log(runif(50, 2, 10)),
log_k0 = log(runif(50, 1, 5)),
log_cooling_rate = log(runif(50, 0.01, 0.1)),
log_c_repulsion = log(runif(50, 0.1, 1)),
NLL = runif(50, 20, 100)
)
# Calculate profile likelihood for parameter "log_N"
pl <- profile_likelihood("log_N", mcmc_samples,
grid_size = 10, # Smaller grid for a quick example
bandwidth_factor = 0.05)
# Print the results
print(pl)
Profile Likelihood Analysis Results Class
Description
S3 class for storing and manipulating profile likelihood analysis results. Includes parameter values, log-likelihoods, and metadata for visualization.
Usage
profile_likelihood_result(
param_values,
ll_values,
param_name,
bandwidth,
sample_counts
)
Prune Distance Data for Network Quality
Description
Iteratively removes viruses and antibodies with insufficient connections to create a well-connected network subset. The pruning continues until all remaining points have at least the specified minimum number of connections.
Usage
prune_distance_network(
data,
virus_col,
antibody_col,
min_connections,
iterations = 100
)
Arguments
data |
Data frame in long format containing: - Column for viruses/antigens - Column for antibodies/antisera - Distance measurements (can contain NAs) - Optional metadata columns |
virus_col |
Character name of virus/antigen column |
antibody_col |
Character name of antibody/antiserum column |
min_connections |
Integer minimum required connections per point |
iterations |
Integer maximum pruning iterations (default 100) |
Value
A list containing two elements:
pruned_data |
A |
stats |
A
|
Examples
# Create a sparse dataset with 12 viruses and 12 antibodies
viruses <- paste0("V", 1:12)
antibodies <- paste0("A", 1:12)
all_pairs <- expand.grid(Virus = viruses, Antibody = antibodies, stringsAsFactors = FALSE)
# Sample 70 pairs to create a sparse matrix
set.seed(42)
assay_data <- all_pairs[sample(nrow(all_pairs), 70), ]
# Ensure some viruses/antibodies are poorly connected for the example
assay_data <- assay_data[!(assay_data$Virus %in% c("V11", "V12")),]
assay_data <- assay_data[!(assay_data$Antibody %in% c("A11", "A12")),]
# Add back single connections for the poorly connected nodes
poor_connections <- data.frame(
Virus = c("V11", "V1", "V12", "V2"),
Antibody = c("A1", "A11", "A2", "A12")
)
assay_data <- rbind(assay_data, poor_connections)
# View connection counts before pruning
# Virus V11 and V12, and Antibody A11 and A12 have only 1 connection
table(assay_data$Virus)
table(assay_data$Antibody)
# Prune the network to keep only nodes with at least 2 connections
pruned_result <- prune_distance_network(
data = assay_data,
virus_col = "Virus",
antibody_col = "Antibody",
min_connections = 2
)
# View connection counts after pruning
# The poorly connected nodes have been removed
table(pruned_result$pruned_data$Virus)
table(pruned_result$pruned_data$Antibody)
# Check the summary statistics
print(pruned_result$stats)
Perform Dimension Reduction
Description
Applies configured dimension reduction method to input data.
Usage
reduce_dimensions(df, config)
Arguments
df |
Data frame containing coordinate data |
config |
Dimension reduction configuration object |
Value
Data frame with reduced dimensions
Run Adaptive Monte Carlo Sampling
Description
Performs adaptive Monte Carlo sampling to explore parameter space, running locally
in parallel. Samples are drawn adaptively based on previous evaluations
to focus sampling in high-likelihood regions. Results from all parallel jobs
accumulate in a single output file. This function always writes to the file system
and therefore requires the output_dir
argument.
Usage
run_adaptive_sampling(
initial_samples_file,
scenario_name,
distance_matrix,
num_parallel_jobs = 5,
max_cores = NULL,
num_samples = 10,
mapping_max_iter = 1000,
relative_epsilon = 1e-04,
folds = 20,
output_dir,
verbose = FALSE
)
Arguments
initial_samples_file |
Character. Path to CSV file containing initial samples. Must contain columns: log_N, log_k0, log_cooling_rate, log_c_repulsion, NLL |
scenario_name |
Character. Name for output files. |
distance_matrix |
Matrix. Distance matrix of the input data. |
num_parallel_jobs |
Integer. Number of parallel local jobs (chains) to run. |
max_cores |
Integer. Maximum number of cores to use for parallel processing across all jobs. If NULL, uses all available cores minus 1 (default: NULL). |
num_samples |
Integer. Number of new samples to be added to the CSV file containing initial samples through Adaptive Monte Carlo sampling (default: 10). |
mapping_max_iter |
Integer. Maximum iterations per map optimization. |
relative_epsilon |
Numeric. Convergence threshold. |
folds |
Integer. Number of CV folds (default: 20). |
output_dir |
Character. Directory for output job files. The project's working directory is a straightforward example. This argument is required. |
verbose |
Logical. Whether to print progress messages. Default: FALSE. |
Value
No return value, called for side effects. The function writes the
results of the adaptive sampling to a CSV file within the specified output_dir
.
Examples
# 1. Locate the example initial samples file included with the package
initial_file <- system.file(
"extdata", "initial_samples_example.csv",
package = "topolow"
)
# 2. Create a temporary directory for the function's output
# This function requires a writable directory for its results.
temp_out_dir <- tempdir()
# 3. Create a sample distance matrix for the function to use
dist_mat <- matrix(runif(100, 1, 10), 10, 10)
diag(dist_mat) <- 0
# 4. Run the adaptive sampling only if the example file is found
if (nzchar(initial_file)) {
run_adaptive_sampling(
initial_samples_file = initial_file,
scenario_name = "adaptive_test_example",
distance_matrix = dist_mat,
output_dir = temp_out_dir,
num_parallel_jobs = 2, # Use small values for a quick example
num_samples = 2,
verbose = FALSE
)
# 5. Verify output files were created
print("Output files from adaptive sampling:")
print(list.files(temp_out_dir, recursive = TRUE))
# 6. Clean up the temporary directory
unlink(temp_out_dir, recursive = TRUE)
}
Sample from Weighted Distribution
Description
Generates new samples from a multivariate normal distribution with mean and covariance determined by weighting previous samples based on their likelihoods.
Usage
sample_from_distribution(dist, n, epsilon)
Arguments
dist |
List with mean and covariance matrix from previous samples |
n |
Integer number of samples to generate |
epsilon |
Numeric probability of sampling from wider distribution |
Value
Data frame of n new samples with columns for each parameter
Save Plot to File
Description
Saves a plot (ggplot or rgl scene) to file with specified configuration. Supports multiple output formats and configurable dimensions.
Usage
save_plot(plot, filename, layout_config = new_layout_config(), output_dir)
Arguments
plot |
ggplot or rgl scene object to save |
filename |
Output filename (with or without extension) |
layout_config |
Layout configuration object controlling output parameters |
output_dir |
Character. Directory for output files. This argument is required. |
Details
Supported file formats:
PNG: Best for web and general use
PDF: Best for publication quality vector graphics
SVG: Best for web vector graphics
EPS: Best for publication quality vector graphics
The function will:
Auto-detect plot type (ggplot or rgl)
Use appropriate saving method
Apply layout configuration settings
Add file extension if not provided
Value
No return value, called for side effects (saves a plot to a file).
Examples
# The sole purpose of save_plot is to write a file, so its example must demonstrate this.
# For CRAN tests we wrap the example in \donttest{} to avoid writing files.
# Create a temporary directory for saving all plots
temp_dir <- tempdir()
# --- Example 1: Basic ggplot save ---
# Create sample data with 3 dimensions to support both 2D and 3D plots
data <- data.frame(
V1 = rnorm(10), V2 = rnorm(10), V3 = rnorm(10), name=1:10,
antigen = rep(c(0,1), 5), antiserum = rep(c(1,0), 5),
year = 2000:2009
)
p <- plot_temporal_mapping(data, ndim=2)
save_plot(p, "temporal_plot.png", output_dir = temp_dir)
# --- Example 2: Save with custom layout ---
layout_config <- new_layout_config(
width = 12,
height = 8,
dpi = 600,
save_format = "pdf"
)
save_plot(p, "high_res_plot.pdf", layout_config, output_dir = temp_dir)
# --- Verify files and clean up ---
list.files(temp_dir)
unlink(temp_dir, recursive = TRUE)
Scale Reduced Dimensions to Match Original Distances
Description
Helper function to scale reduced dimensions to better match original distances.
Usage
scale_to_original_distances(reduced_coords, orig_dist)
Arguments
reduced_coords |
Matrix of reduced coordinates |
orig_dist |
Original distance matrix |
Value
Scaled coordinate matrix
Plot Fitted vs True Distances
Description
Creates diagnostic plots comparing fitted distances from a model against true distances. Generates both a scatter plot with prediction intervals and a residuals plot.
Usage
scatterplot_fitted_vs_true(
distance_matrix,
p_dist_mat,
scenario_name,
ndim,
save_plot = FALSE,
output_dir,
confidence_level = 0.95
)
Arguments
distance_matrix |
Matrix of true distances |
p_dist_mat |
Matrix of predicted/fitted distances |
scenario_name |
Character string for output file naming. Used if |
ndim |
Integer number of dimensions used in the model |
save_plot |
Logical. Whether to save plots to files. Default: TRUE |
output_dir |
Character. Directory for output files. Required if |
confidence_level |
Numeric confidence level for prediction intervals (default: 0.95) |
Value
A list containing the scatter_plot
and residuals_plot
ggplot objects.
Examples
# Create sample data
true_dist <- matrix(runif(100, 1, 10), 10, 10)
pred_dist <- true_dist + rnorm(100)
# Create plots without saving
plots <- scatterplot_fitted_vs_true(true_dist, pred_dist, save_plot = FALSE)
# You can then display a plot, for instance:
# plots$scatter_plot
Summary method for topolow objects
Description
Provides a detailed summary of the optimization results including parameters, convergence and performance metrics.
Usage
## S3 method for class 'topolow'
summary(object, ...)
Arguments
object |
A topolow object returned by create_topolow_map() |
... |
Additional arguments passed to summary (not used) |
Value
No return value. This function is called for its side effect of printing a detailed summary to the console.
Examples
# Create a simple distance matrix and run the optimization
dist_mat <- matrix(c(0, 2, 3, 2, 0, 4, 3, 4, 0), nrow=3)
result <- create_topolow_map(dist_mat, ndim=2, mapping_max_iter=50,
k0=1.0, cooling_rate=0.001, c_repulsion=0.1,
verbose = FALSE)
# Summarize the result object
summary(result)
Convert distance matrix to assay panel format
Description
Convert distance matrix to assay panel format
Usage
symmetric_to_nonsymmetric_matrix(dist_matrix, selected_names)
Arguments
dist_matrix |
Distance matrix |
selected_names |
Names of reference points |
Value
A non-symmetric matrix
in assay panel format, where rows are test antigens and columns are reference antigens.
Unweighted Kernel Density Estimation
Description
Standard kernel density estimation for univariate data with various bandwidth selection rules.
Usage
unweighted_kde(x, n = 512, from = min(x), to = max(x), bw = "nrd0")
Arguments
x |
Numeric vector of samples |
n |
Integer number of evaluation points |
from , to |
Numeric range for evaluation points |
bw |
Bandwidth selection ("nrd0", "nrd", "ucv", "bcv", "sj" or numeric) |
Value
List containing:
x |
Vector of evaluation points |
y |
Vector of density estimates |
bw |
Selected bandwidth |
Validate Antigenic Dataset
Description
Validates the structure and content of an antigenic assay dataset. Checks for required columns, data types, and valid measurements.
Usage
validate_antigenic_data(
data,
antigen_col,
serum_col,
value_col,
metadata_cols = NULL
)
Arguments
data |
Data frame to validate |
antigen_col |
Character. Name of antigen column |
serum_col |
Character. Name of serum column |
value_col |
Character. Name of value column |
metadata_cols |
Character vector. Optional metadata column names |
Value
Invisibly returns TRUE if validation passes, otherwise stops with error
Validate Input Data Frame
Description
Validates input data frame for visualization functions, checking required columns and data types.
Usage
validate_topolow_df(
df,
ndim,
require_clusters = FALSE,
require_temporal = FALSE
)
Arguments
df |
Data frame to validate |
ndim |
Number of dimensions expected in coordinate columns. Names of coordinate columns must start with a "V". |
require_clusters |
Whether cluster column is required |
require_temporal |
Whether year column is required |
Value
Validated data frame or throws error if invalid
Vectorized Processing of Distance Matrix for Convergence Error Calculations
Description
Efficiently processes elements of the distance matrix for calculating convergence error using pre-processed numeric representations of thresholds. This optimized version eliminates expensive string operations during optimization.
Usage
vectorized_process_distance_matrix(
distances_numeric,
threshold_mask,
p_dist_mat
)
Arguments
distances_numeric |
Numeric matrix. The numeric distance values (without threshold indicators) |
threshold_mask |
Integer matrix. Codes representing threshold types: 1 for "greater than" (>), -1 for "less than" (<), or 0 for exact values |
p_dist_mat |
Numeric matrix. The calculated distance matrix to compare against |
Details
This function handles threshold logic for convergence error calculation by using pre-processed numeric matrices:
For "greater than" thresholds (threshold_mask = 1): Returns the numeric value if the calculated distance is less than the threshold, otherwise returns NA
For "less than" thresholds (threshold_mask = -1): Returns the numeric value if the calculated distance is greater than the threshold, otherwise returns NA
For regular values (threshold_mask = 0): Returns the numeric value
This function operates on entire matrices at once using vectorized operations, which is significantly faster than processing each element individually.
Value
Numeric matrix with processed distance values. Elements where threshold conditions are satisfied will contain NA.
Weighted Kernel Density Estimation
Description
Performs weighted kernel density estimation for univariate data. Uses parallel processing for efficiency. Useful for analyzing parameter distributions with importance weights.
Usage
weighted_kde(x, weights, n = 512, from = min(x), to = max(x))
Arguments
x |
Numeric vector of samples |
weights |
Numeric vector of weights |
n |
Integer number of evaluation points |
from , to |
Numeric range for evaluation points |
Value
List containing:
x |
Vector of evaluation points |
y |
Vector of density estimates |
Convert 2-digit to 4-digit year
Description
Convert 2-digit to 4-digit year
Usage
yy_to_yyyy(dataframe, column_name)
Arguments
dataframe |
Data frame containing year column |
column_name |
Name of year column |
Value
Data frame with converted year column