Help for package ogrdbstats

Type:

Package

Title:

Analysis of Adaptive Immune Receptor Repertoire Germ Line Statistics

Version:

0.5.4

URL:

https://github.com/airr-community/ogrdbstats

BugReports:

https://github.com/airr-community/ogrdbstats/issues

Description:

Multiple tools are now available for inferring the personalised germ line set from an adaptive immune receptor repertoire. Output from these tools is converted to a single format and supplemented with rich data such as usage and characterisation of 'novel' germ line alleles. This data can be particularly useful when considering the validity of novel inferences. Use of the analysis provided is described in <doi:10.3389/fimmu.2019.00435>.

License:

CC BY-SA 4.0

Encoding:

UTF-8

Depends:

R (≥ 2.10)

Imports:

dplyr (≥ 0.8.3), ggplot2 (≥ 3.2.1), magrittr, tigger (≥ 0.4.0), alakazam (≥ 0.3.0), stringr (≥ 1.4.0), data.table, gridExtra (≥ 2.3), tidyr (≥ 1.0.0), stringdist (≥ 0.9.5.2), RColorBrewer (≥ 1.1-2), Biostrings (≥ 2.52.0), argparser (≥ 0.4), ComplexHeatmap, bookdown, scales,

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

RoxygenNote:

7.3.2

LazyData:

true

NeedsCompilation:

Packaged:

2025-06-16 10:25:42 UTC; William

Author:

William Lees

[aut, cre]

Maintainer:

William Lees <william@lees.org.uk>

Repository:

CRAN

Date/Publication:

2025-07-07 22:00:02 UTC

Example repertoire data

Description

A small example of the analytical datasets created by ogrdbstats from repertoires and reference sets. The dataset can be created by running the example shown for the function read_input_data(). The dataset is created from example files provided with the package. The repertoire data is taken from Rubelt et al. 2016, <doi: 10.1038/ncomms11112>

Usage

example_rep

Format

## 'example_rep' - a named list containing the following elements:

ref_genes	named list of IMGT-gapped reference genes
inferred_seqs	named list of IMGT-gapped inferred (novel) sequences.
input_sequences	data frame with one row per annotated read, with CHANGEO-style column names. The column SEG_CALL is the gene call for the segment under analysis. Hence if segment is 'V', 'V_CALL' will be renamed 'SEG_CALL' whereas is segment is 'J', 'J_CALL' is renamed 'SEG_CALL'. This simplifies downstream processing. Rows in the input file with ambiguous SEG_CALLs, or no call, are removed.
genotype_db	named list of gene sequences referenced in the annotated reads (both reference and novel sequences)
haplo_details	data used for haplotype analysis, showing allelic ratios calculated with various potential haplotyping genes
genotype	data frame containing information provided in the OGRDB genotype csv file
calculated_NC	a boolean that is TRUE if mutation counts were calculated by this library, FALSE if they were read from the annotated read file

Source

<doi: 10.1038/ncomms11112>

Generate OGRDB reports from specified files.

Description

This creates the genotype report (suffixed _ogrdb_report.csv) and the plot file (suffixed _ogrdb_plos.pdf). Both are created in the directory holding the annotated read file, and the file names are prefixed by the name of the annotated read file.

Usage

generate_ogrdb_report(
  ref_filename,
  inferred_filename,
  species,
  filename,
  chain,
  hap_gene,
  segment,
  chain_type,
  plot_unmutated,
  all_inferred = FALSE,
  format = "pdf",
  custom_file_prefix = ""
)

Arguments

ref_filename

Name of file containing IMGT-aligned reference genes in FASTA format

inferred_filename

Name of file containing sequences of inferred novel alleles, or '-' if none

species

Species name used in field 3 of the IMGT germline header with spaces omitted, if the reference file is from IMGT. Otherwise ”

filename

Name of file containing annotated reads in AIRR, CHANGEO or IgDiscover format. The format is detected automatically

chain

one of IGHV, IGKV, IGLV, IGHD, IGHJ, IGKJ, IGLJ, TRAV, TRAj, TRBV, TRBD, TRBJ, TRGV, TRGj, TRDV, TRDD, TRDJ

hap_gene

The haplotyping columns will be completed based on the usage of the two most frequent alleles of this gene. If NA, the column will be blank

segment

one of V, D, J

chain_type

one of H, L

plot_unmutated

Plot base composition using only unmutated sequences (V-chains only)

all_inferred

Treat all alleles as novel

format

The format for the plot file ('pdf', 'html' or 'none')

custom_file_prefix

custom prefix to use for output files. If not specified, the prefix is taken from the input file name

Value

None

Examples

# prepare files for example
reference_set = system.file("extdata/ref_gapped.fasta", package = "ogrdbstats")
inferred_set = system.file("extdata/novel_gapped.fasta", package = "ogrdbstats")
repertoire = system.file("extdata/ogrdbstats_example_repertoire.tsv", package = "ogrdbstats")
file.copy(repertoire, tempdir())
repfile = file.path(tempdir(), 'ogrdbstats_example_repertoire.tsv')

generate_ogrdb_report(reference_set, inferred_set, 'Homosapiens',
          repfile, 'IGHV', NA, 'V', 'H', FALSE, format='none')

#clean up
outfile = file.path(tempdir(), 'ogrdbstats_example_repertoire_ogrdb_report.csv')
file.remove(repfile)
file.remove(outfile)

Collect parameters from the command line and use them to create a report and CSV file

Description

Collect parameters from the command line and use them to create a report and CSV file

Usage

genotype_statistics_cmd(args = NULL)

Arguments

args

A string vector containing the command line arguments. If NULL, will take them from the command line

Value

Nothing

Examples

# Prepare files for example
reference_set = system.file("extdata/ref_gapped.fasta", package = "ogrdbstats")
inferred_set = system.file("extdata/novel_gapped.fasta", package = "ogrdbstats")
repertoire = system.file("extdata/ogrdbstats_example_repertoire.tsv", package = "ogrdbstats")
file.copy(repertoire, tempdir())
repfile = file.path(tempdir(), 'repertoire.tsv')

genotype_statistics_cmd(c(
              reference_set,
              'Homosapiens',
              repfile,
              'IGHV',
              '--inf_file', inferred_set,
              '--format', 'none'))

# clean up
outfile = file.path(tempdir(), 'repertoire_ogrdb_report.csv')
plotdir = file.path(tempdir(), 'repertoire_ogrdb_plots')
file.remove(repfile)
file.remove(outfile)
unlink(plotdir, recursive=TRUE)

Create a barplot for each allele, showing number of reads distributed by mutation count

Description

Create a barplot for each allele, showing number of reads distributed by mutation count

Usage

make_barplot_grobs(
  input_sequences,
  genotype_db,
  inferred_seqs,
  genotype,
  segment,
  calculated_NC
)

Arguments

input_sequences

the input_sequences data frame

genotype_db

named list of gene sequences in the personalised genotype

inferred_seqs

named list of novel gene sequences

genotype

data frame created by calc_genotype

segment

one of V, D, J

calculated_NC

a boolean, TRUE if mutation counts had to be calculated, FALSE otherwise

Value

list of grobs

Examples

barplot_grobs = make_barplot_grobs(
                      example_rep$input_sequences,
                      example_rep$genotype_db,
                      example_rep$inferred_seqs,
                      example_rep$genotype,
                      'V',
                      example_rep$calculated_NC
               )

Create haplotyping plots

Description

Create haplotyping plots

Usage

make_haplo_grobs(segment, haplo_details)

Arguments

segment

one of V, D, J

haplo_details

Data structure created by create_haplo_details

Value

named list containing the following elements:

a_allele_plot	plot showing allele usage for each potential haplotyping gene
haplo_grobs	differential plot of allele usage for each usable haplotyping gene

Examples

haplo_grobs = make_haplo_grobs('V', example_rep$haplo_details)

Create plots showing base usage at selected locations in sequences based on novel alleles

Description

Create plots showing base usage at selected locations in sequences based on novel alleles

Usage

make_novel_base_grobs(inferred_seqs, input_sequences, segment, all_inferred)

Arguments

inferred_seqs

named list of novel gene sequences

input_sequences

the input_sequences data frame

segment

one of V, D, J

all_inferred

true if user has requested all alleles in reference set plotted - will suppress some warnings

Value

named list containing the following elements:

cdr3_dist	cdr3 length distribution plots
whole	whole-length usage plots
end	3' end usage plots
conc	3' end consensus composition plots
triplet	3' end triplet usage plots

Examples

base_grobs = make_novel_base_grobs(
                 example_rep$inferred_seqs,
                 example_rep$input_sequences,
                 'V',
                 FALSE
             )

Read input files into memory

Description

Read input files into memory

Usage

read_input_files(
  ref_filename,
  inferred_filename,
  species,
  filename,
  chain,
  hap_gene,
  segment,
  chain_type,
  all_inferred
)

Arguments

ref_filename

Name of file containing IMGT-aligned reference genes in FASTA format

inferred_filename

Name of file containing sequences of inferred novel alleles, or '-' if none

species

Species name used in field 3 of the IMGT germline header with spaces omitted, if the reference file is from IMGT. Otherwise ”

filename

Name of file containing annotated reads in AIRR, CHANGEO or IgDiscover format. The format is detected automatically

chain

one of IGHV, IGKV, IGLV, IGHD, IGHJ, IGKJ, IGLJ, TRAV, TRAj, TRBV, TRBD, TRBJ, TRGV, TRGj, TRDV, TRDD, TRDJ

hap_gene

The haplotyping columns will be completed based on the usage of the two most frequent alleles of this gene. If NA, the column will be blank

segment

one of V, D, J

chain_type

one of H, L

all_inferred

Treat all alleles as novel

Value

A named list containing the following elements:

ref_genes	named list of IMGT-gapped reference genes
inferred_seqs	named list of IMGT-gapped inferred (novel) sequences.
input_sequences	data frame with one row per annotated read, with CHANGEO-style column names One key point: the column SEG_CALL is the gene call for the segment under analysis. Hence if segment is 'V', 'V_CALL' will be renamed 'SEG_CALL' whereas is segment is 'J', 'J_CALL' is renamed 'SEG_CALL'. This simplifies downstream processing. Rows in the input file with ambiguous SEG_CALLs, or no call, are removed.
genotype_db	named list of gene sequences referenced in the annotated reads (both reference and novel sequences)
haplo_details	data used for haplotype analysis, showing allelic ratios calculated with various potential haplotyping genes
genotype	data frame containing information provided in the OGRDB genotype csv file
calculated_NC	a boolean that is TRUE if mutation counts were calculated by this library, FALSE if they were read from the annotated read file

Examples

# Create the analysis data set from example files provided with the package
#(this dataset is also provided in the package as example_rep)
reference_set = system.file("extdata/ref_gapped.fasta", package = "ogrdbstats")
inferred_set = system.file("extdata/novel_gapped.fasta", package = "ogrdbstats")
repertoire = system.file("extdata/ogrdbstats_example_repertoire.tsv", package = "ogrdbstats")

example_data = read_input_files(reference_set, inferred_set, 'Homosapiens',
       repertoire, 'IGHV', NA, 'V', 'H', FALSE)

Write the genotype file required by OGRDB

Description

Write the genotype file required by OGRDB

Usage

write_genotype_file(filename, segment, chain_type, genotype)

Arguments

filename

name of file to create (csv)

segment

one of V, D, J

chain_type

one of H, L

genotype

genotype data frame

Value

None

Examples

genotype_file = tempfile("ogrdb_genotype")
write_genotype_file(genotype_file, 'V', 'H', example_rep$genotype)
file.remove(genotype_file)

Create the OGRDB style plot file

Description

Create the OGRDB style plot file

Usage

write_plot_file(
  filename,
  input_sequences,
  cdr3_dist_grobs,
  end_composition_grobs,
  cons_composition_grobs,
  whole_composition_grobs,
  triplet_composition_grobs,
  barplot_grobs,
  a_allele_plot,
  haplo_grobs,
  message,
  format
)

Arguments

filename

name of file to create (pdf)

input_sequences

the input_sequences data frame

cdr3_dist_grobs

cdr3 length distribution grobs created by make_novel_base_grob

end_composition_grobs

end composition grobs created by make_novel_base_grobs

cons_composition_grobs

consensus composition grobs created by make_novel_base_grobs

whole_composition_grobs

whole composition grobs created by make_novel_base_grobs

triplet_composition_grobs

triplet composition grobs created by make_novel_base_grobs

barplot_grobs

barplot grobs created by make_barplot_grons

a_allele_plot

a_allele_plot grob created by make_haplo_grobs

haplo_grobs

haplo_grobs created by make_haplo_grobs

message

text message to display at end of report

format

Format of report ('pdf', 'html' or 'none')

Value

None

Examples

plot_file = tempfile(pattern = 'ogrdb_plots')

base_grobs = make_novel_base_grobs(
                 example_rep$inferred_seqs,
                 example_rep$input_sequences,
                 'V',
                 FALSE
             )
barplot_grobs = make_barplot_grobs(
                      example_rep$input_sequences,
                      example_rep$genotype_db,
                      example_rep$inferred_seqs,
                      example_rep$genotype,
                      'V',
                      example_rep$calculated_NC
               )
haplo_grobs = make_haplo_grobs('V', example_rep$haplo_details)

write_plot_file(
    plot_file,
    example_rep$input_sequences,
    base_grobs$cdr3_dist,
    base_grobs$end,
    base_grobs$conc,
    base_grobs$whole,
    base_grobs$triplet,
    barplot_grobs,
    haplo_grobs$aplot,
    haplo_grobs$haplo,
    "Notes on this analysis",
    'none'
)

file.remove(plot_file)