Version: | 2.7.0 |
Date: | 2022-05-20 |
Title: | Tools for Nuclear Magnetic Resonance (NMR) Spectra Alignment, Peak Based Processing, Quantitative Analysis and Visualizations |
Author: | Charlie Beirnaert, Trung Nghia Vu, Pieter Meysman, Kris Laukens and Dirk Valkenborg |
Maintainer: | Charlie Beirnaert <charlie_beirnaert@icloud.com> |
Description: | Makes Nuclear Magnetic Resonance spectroscopy (NMR spectroscopy) data analysis as easy as possible by only requiring a small set of functions to perform an entire analysis. 'speaq' offers the possibility of raw spectra alignment and quantitation but also an analysis based on features whereby the spectra are converted to peaks which are then grouped and turned into features. These features can be processed with any number of statistical tools either included in 'speaq' or available elsewhere on CRAN. More details can be found in Vu et al. (2011) <doi:10.1186/1471-2105-12-405> and Beirnaert et al. (2018) <doi:10.1371/journal.pcbi.1006018>. |
Depends: | R (≥ 3.1.0), |
Imports: | MassSpecWavelet, cluster, parallel, doSNOW, data.table, foreach, stats, Rfast, utils, graphics, grDevices, ggplot2, gridExtra, reshape2, rvest, xml2, missForest, impute |
Suggests: | datasets, knitr, rmarkdown, grid, gridBase |
LazyData: | true |
VignetteBuilder: | knitr |
License: | Apache License 2.0 |
RoxygenNote: | 7.2.0 |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2022-05-20 12:31:27 UTC; Charlie |
Repository: | CRAN |
Date/Publication: | 2022-05-23 09:10:02 UTC |
Add plotting variables
Description
This functions adds a few variables which make plotting features easier (and more informative). Since for example every peaks keeps it original ppm value, if you want to plot the groups this function adds the group ppm value. Also sample labels can be added.
Usage
AddPlottingStuff(Y.peaks, X.ppm = NULL, groupLabels = NULL)
Arguments
Y.peaks |
data frame obtained from either of the 'getWaveletPeaks', 'PeakAligner' or 'PeakFilling' function. |
X.ppm |
The vector with the ppm values (numeric vector). |
groupLabels |
The groupLabels (numeric or factor). |
Value
Returns a data frame with added plotting variables (groupPPM for aligned features and labels for plotting).
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Examples
subset <- GetWinedata.subset()
subset.spectra = as.matrix(subset$Spectra)
subset.ppm = as.numeric(subset$PPM)
test.peaks <- getWaveletPeaks(Y.spec=subset.spectra,
X.ppm=subset.ppm,
nCPU = 1) # nCPU set to 2 for the vignette build
test.peaks.plot = AddPlottingStuff(test.peaks, subset.ppm, subset$Color)
#head(test.peaks.plot)
BW ratio calculation
Description
Compute the BW ratios from data groups
Usage
BWR(X, groupLabel)
Arguments
X |
The spectral dataset in the matrix format in which each row contains a single sample. |
groupLabel |
Group label of samples in the dataset. |
Value
Return BW ratio
Author(s)
Trung Nghia Vu
See Also
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
peakList <- detectSpecPeaks(X,
nDivRange = c(128),
scales = seq(1, 16, 2),
baselineThresh = 50000,
SNR.Th = -1,
verbose=FALSE
);
resFindRef<- findRef(peakList);
refInd <- resFindRef$refInd;
maxShift = 50;
Y <- dohCluster(X,
peakList = peakList,
refInd = refInd,
maxShift = maxShift,
acceptLostPeak = TRUE, verbose=FALSE);
# find the BW-statistic
BW = BWR(Y, groupLabel);
Build a Feature matrix from the with speaq 2.0 processed data
Description
This function converts the grouped peak data to a matrix. The matrix has features (peaks groups) in the columns and the value of the peak for every sample in the rows.
Usage
BuildFeatureMatrix(
Y.data,
var = "peakValue",
impute = "zero",
imputation_val = NA,
delete.below.threshold = FALSE,
baselineThresh = 500,
snrThres = 3,
thresholds.pass = "any-to-pass"
)
Arguments
Y.data |
The dataset after (at least) peak detection and grouping with speaq 2.0. The dataset after peak filling is recommended. |
var |
The variable to be used in the Featurematrix. This can be any of 'peakIndex', 'peakPPM', 'peakValue' (default), 'peakSNR', 'peakScale', or 'Sample'. |
impute |
What to impute when a certain peak is missing for a certain sample and feature combo. Options are "zero" (or "zeros", the default), "median" (imputation with feature median), "randomForest" (imputation with missForest function from package missForest) or kNN followed by a number indicating the amount of neighbours to use e.g. "kNN5" or "kNN10" (as per the method of Troyanskaya, 2001) or lasty "User_value" (this will allow the use of any value specified with the imputation_val argument e.g. the median of the raw spectra). Any other statement will produce NA's. |
imputation_val |
If the "User_value" imputation option is chosen this value will be used to impute the missing values. |
delete.below.threshold |
Whether to ignore peaks for which the 'var' variable has a value below 'baselineThresh' (default = FALSE). |
baselineThresh |
The threshold for the 'var' variable that peaks have to surpass to be included in the feature matrix. |
snrThres |
The threshold for the signal-to-noise ratio of a peak. |
thresholds.pass |
This variable lets users decide whether a peak has to pass all the thresholds (both snrThres and baselineThresh), or just one. (If the peak does not need to surpass any thresholds set 'delete.below.threshold' to FALSE). |
Value
a matrix, data.matrix, with samples for rows and features for columns. The values in the matrix are those of the 'var' variable.
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
References
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays BIOINFORMATICS Vol. 17 no. 6, 2001 Pages 520-525
Examples
subset <- GetWinedata.subset()
# to reduce the example time we only select spectra 1 & 2
subset.spectra = as.matrix(subset$Spectra)[1:2,]
subset.ppm = as.numeric(subset$PPM)
test.peaks <- getWaveletPeaks(Y.spec=subset.spectra,
X.ppm=subset.ppm,
nCPU = 1) # nCPU set to 2 for the vignette build
test.grouped <- PeakGrouper(Y.peaks = test.peaks)
test.Features <- BuildFeatureMatrix(test.grouped)
Build a raw data matrix (spectra) from spectra of unequal length
Description
This function can be used to build a data matrix from ill aligned spectra or of spectra of unequal length. the result is a matrix whereby the first column matches (approximately) with a single left ppm value and the last column matches (approximately) with a single right ppm value. Crucial is that the sample rates of the machine are the same this should be always the case otherwise comparing intensities becomes meaningless. Note that, as standard in NMR spectra, the highest ppm value is on the left
Usage
BuildRawDataMatrix(spectrum.list, ppm.list = NULL, ppm.edges.matrix = NULL)
Arguments
spectrum.list |
A list of the spectra (y-values). Since by definition some of these differ in length this has to be in list form with a single spectrum per list item. |
ppm.list |
The list of corresponding ppm values (x-values) with the highest ppm-value at the beginning (left) as is the convention for NMR spectra. (This our ppm.edges.matrix has to be provided). |
ppm.edges.matrix |
The list with the starting and ending ppm values (highest ppm-value on the left/in the beginning). This or ppm.list has to be provided. |
Value
SpectraAndPPM A list with 2 elements, the DataMatrix and the ppmMatrix.
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Examples
# this is an example for 3 meaningless spectra
lengths_of_spectra <- c(100,150,120)
measurement_distance <- 0.01
starting_ppm_values <- c(8.7, 9.0, 9.0)
spectra <- list()
ppm_values <- list()
for (k in 1:3) {
spectra[[k]] <- runif(lengths_of_spectra[k], min = 0, max = 10)
# note the minus sign in the 'by' statement
ppm_values[[k]] <- seq(from = starting_ppm_values[k], by = -measurement_distance,
length.out = lengths_of_spectra[k])
}
new.Data <- BuildRawDataMatrix(spectrum.list = spectra, ppm.list = ppm_values)
spectraMatrix <- new.Data$DataMatrix
ppmMatrix <- new.Data$ppmMatrix
Get subset of Winedata for code examples
Description
This functions extracts a small part of the Winedata to be used in code testing and code examples
Usage
GetWinedata.subset()
Value
list of 2: spectra, ppm values, color and origin.
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Examples
subset <- GetWinedata.subset()
subset.spectra = subset$Spectra
subset.ppm = subset$PPM.vector
Submit 1H NMR peaks to HMDB for compound search
Description
This function allows to search HMDB from within R by simply submitting the peaks you want to search for. The function will open a webpage with the query results or provide a link to the HMDB page with the results.
Usage
HMDBsearchR(peakVector, ppmTol = 0.02, returnURL = FALSE)
Arguments
peakVector |
A vector with ppm values of peaks |
ppmTol |
The ppm tolerance for the HMDB search (default = 0.02). |
returnURL |
Return the URL instead of opening a webpage. |
Value
Opens a webpage or returns a URL with the HMDB results
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Examples
## Not run:
HMDBsearchR(peakVector = c(3.2, 3.38), ppmTol = 0.2, returnURL = TRUE)
## End(Not run)
Peak filling of any missed peaks
Description
This functions detects which samples (after grouping) are missing from every peak group and reanalyses the raw data to verify whether this peak is actually non-existent for this sample
Usage
PeakFilling(
Y.grouped,
Y.spec,
max.index.shift = 10,
window.width = "small",
nCPU = -1,
FilMethod = "new"
)
Arguments
Y.grouped |
Peaks groups (output from the 'PeakGrouper' function). |
Y.spec |
The raw NMR spectra in matrix format. |
max.index.shift |
Maximal shift in index between a filled peak and the group it belongs to. |
window.width |
The width of the detection window for the wavelets. Because of the Fourier transform lengths of 512 ( window.width = 'small') of 1024 ( window.width = 'large') are preferable. |
nCPU |
The amount of cpu's to be used for peak detection. If set to '-1' all available cores minus 1 will be used. |
FilMethod |
A more robust method for peak filling has been implemented. This is now the default. The former method can be used by specifying FilMethod == "old" however this will be deprecated. |
Value
Returns a data frame with grouped peaks and possibly extra peaks obtained from the raw data (these peaks have SNR = NA).
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Examples
subset <- GetWinedata.subset()
# to reduce the example time we only select spectra 1 & 2
subset.spectra = as.matrix(subset$Spectra)[1:2,]
subset.ppm = as.numeric(subset$PPM)
test.peaks <- getWaveletPeaks(Y.spec=subset.spectra,
X.ppm=subset.ppm,
nCPU = 1) # nCPU set to 1 for the vignette build
test.grouped <- PeakGrouper(Y.peaks = test.peaks)
test.filled <- PeakFilling(Y.grouped = test.grouped,
Y.spec = subset.spectra,
nCPU = 1) # nCPU set to 1 for the vignette build
Peak grouping with hierarchical clustering
Description
This functions groups the peaks obtained after wavelet based peak detection (with the 'getWaveletPeaks' function).
Usage
PeakGrouper(
Y.peaks,
grouping.window.width = 100,
verbose = FALSE,
min.samp.grp = 1,
max.dupli.prop = 0.25,
maxClust = 10,
Jaccard.regroup.threshold = 0.25,
linkage = "average"
)
Arguments
Y.peaks |
data frame obtained from the 'getWaveletPeaks' function. |
grouping.window.width |
The width of the sliding window (in measurement points). Measurements are taken for when this sliding window is taken too small, but best set this too a value that a normal peak is comfortably in a window. Note if large shifts occur in your dataset (like in the wine dataset) it is best to set this parameter larger. |
verbose |
If set to TRUE the window selection process is documented in real time (default = FALSE). |
min.samp.grp |
The minimal amount of samples needed to form a group, see hclust.grouping. |
max.dupli.prop |
The maximal duplication proportion allowed for a group to be considered a single group, see hclust.grouping. |
maxClust |
The maximum number of clusters (depth of the tree), see hclust.grouping. |
Jaccard.regroup.threshold |
If 2 neighbouring groups have a jaccard index smaller than this 'Jaccard.regroup.threshold' (indicating that they are quite complementary as they have little peaks samples in common), then they are merged and regrouped. This situation can occur if a group is accidentally cut in half by the window approach. |
linkage |
The linkage to be used in the hierarchical clustering. See the 'method' argument in hclust. |
Value
Returns a data frame with grouped peaks. Peaks in a group are indicated with an identical peakIndex
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
See Also
Examples
subset <- GetWinedata.subset()
# to reduce the example time we only select spectra 1 & 2
subset.spectra = as.matrix(subset$Spectra)[1:2,]
subset.ppm = as.numeric(subset$PPM)
test.peaks <- getWaveletPeaks(Y.spec=subset.spectra,
X.ppm=subset.ppm ,
nCPU = 1) # nCPU set to 2 for the vignette build
test.grouped <- PeakGrouper(Y.peaks = test.peaks)
Plot NMR spectra, together with raw and grouped peaks
Description
This function plots NMR spectra, peak plots and grouped peak plots all in figure for easy comparison.
Usage
ROIplot(
Y.spec,
X.ppm,
ungrouped.peaks,
grouped.peaks,
ROI = NULL,
ROI.ppm = NULL,
roiWidth = 100,
roiWidth.ppm = NULL,
groupLabels = NULL,
output = NULL
)
Arguments
Y.spec |
(required) The raw spectra in matrix format (1 sample per row) or numeric vector (in case of 1 spectrum) |
X.ppm |
(required) The vector with the ppm values |
ungrouped.peaks |
(required) The data resulting from peak detection with wavelets |
grouped.peaks |
(required) The data after grouping (with PeakGrouper) |
ROI |
If provided (with an index value, not a ppm value) only this region of interest will be plotted. (supply no ROI or ROI.ppm values, for the full spectrum, or specify only 1, either ROI or ROI.ppm). |
ROI.ppm |
If provided (a ppm value, not an index value) only this region of interest will be plotted. (supply no ROI or ROI.ppm values, for the full spectrum, or specify only 1, either ROI or ROI.ppm). |
roiWidth |
The width of the ROI (region of interest) plot in index points/measurement points. The plot will span from ROI/ROI.ppm - roiWidth to ROI/ROI.ppm + roiWidth. (only supply roiWidth or roiWidth.ppm if needed). |
roiWidth.ppm |
The width of the ROI (region of interest) plot in ppm. The plot will span from ROI/ROI.ppm - roiWidth.ppm to ROI/ROI.ppm + roiWidth.ppm. (only supply roiWidth or roiWidth.ppm if needed). |
groupLabels |
The vector with group labels (as factors) |
output |
Whether to return a plot (default), or the individual ggplot objects (output = "ggObjects") |
Value
a plot
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Examples
subset <- GetWinedata.subset()
# to reduce the example time we only select spectra 1 & 2
subset.spectra = as.matrix(subset$Spectra)[1:2,]
subset.ppm = as.numeric(subset$PPM)
test.peaks <- getWaveletPeaks(Y.spec=subset.spectra,
X.ppm=subset.ppm,
nCPU = 1) # nCPU set to 2 for the vignette build
test.grouped <- PeakGrouper(Y.peaks = test.peaks)
ROI.ppm <- 4.9
roiWidth.ppm <- 0.15
plots <- ROIplot(Y.spec = subset.spectra,
X.ppm =subset.ppm,
ungrouped.peaks = test.peaks,
grouped.peaks = test.grouped ,
ROI.ppm = ROI.ppm,
roiWidth.ppm = roiWidth.ppm ,
output = "ggObjects"
)
SCAle, Normalize and Transform a data matrix
Description
This function allows the column-wise or row-wise scaling, normalization and transformation operations on a data matrix.
Usage
SCANT(data.matrix, type = "unit", feature_orientation = "columns")
Arguments
data.matrix |
the data matrix to be scaled, normalized or transformed. |
type |
the operations to be performed, this can be multiple and are performed sequentially. Any of 'unit', 'pareto', 'log10', 'log2', 'center', 'range', 'vast', or 'max' are accepted. |
feature_orientation |
default = "columns". This corresponds to the default feature matrix with samples as rows and features as columns. The other option is "rows": samples as columns and different features as different rows. |
Value
The scaled, normalized and/or transformed matrix.
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
References
van den Berg RA, Hoefsloot HCJ, Westerhuis JA, et al. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 2006; 7:142.
Examples
Samples <- 10
Features <- 20
data.matrix <- matrix(runif(n=Features*Samples, min=0,max=100),
ncol = Features, nrow = Samples)
changed_matrix = SCANT(data.matrix, type=c('pareto', 'center'), feature_orientation = 'columns')
SilhouetR
Description
This function calculate Silhouette values. The function is generic, as such silhouette values can be calculated between samples of different classes or it can be used to calculate silhouette values between different groups of peaks. This is the way in which it is used for the speaq package (see the example).
Usage
SilhouetR(DataMatrix, GroupIndices, distance = "euclidean")
Arguments
DataMatrix |
a matrix with the raw data, 1 variable per column. |
GroupIndices |
The vector with the group indices (length must be equal to the amount of rows in DataMatrix). |
distance |
The distance metric to be used, "euclidean" or "manhattan". |
Value
Returns the silhouette values. Note if a group contains only 1 no Silhouette value can be calculated (will give NA)
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Examples
subset <- GetWinedata.subset()
# to reduce the example time we only select spectra 1 & 2
subset.spectra = as.matrix(subset$Spectra)[1:2,]
subset.ppm = as.numeric(subset$PPM)
test.peaks <- getWaveletPeaks(Y.spec=subset.spectra,
X.ppm=subset.ppm,
nCPU = 1) # nCPU set to 2 for the vignette build
test.grouped <- PeakGrouper(Y.peaks = test.peaks)
Silhouette.values = SilhouetR(DataMatrix = test.grouped$peakPPM,
test.grouped$peakIndex,
distance = "euclidean")
hist(Silhouette.values$SilhouetteValues)
Wine dataset
Description
1H-NMR data of 40 wines, different origins and colors are included.
Usage
data(Winedata)
Format
A list with the spectra, ppm values, color and origin as list entries.
Source
University of Copenhagen, Dept. of Food Science, Quality & Technology. Available at 'models.life.ku.dk/datasets'
References
Larsen et al. (2006) An exploratory chemometric study of 1H-NMR spectra of table wines. J.Chemom. 20 (2006) 198-208 (Wiley Online Library)
Examples
data(Winedata)
Spectra <- Winedata$spectra
ppm.wine <- Winedata$ppm
wine.color <- Winedata$wine.color
wine.origin <- Winedata$origin
Building a null hypothesis data
Description
Create a null sampling data (N times) and write them to a file
Usage
createNullSampling(X, groupLabel, N = 100, verbose = TRUE)
Arguments
X |
The spectral dataset in the matrix format in which each row contains a single sample |
groupLabel |
Group label of samples in the dataset |
N |
The number of iteration for creating null sample distribution |
verbose |
A boolean value to allow print out process information |
Value
A matrix with N rows containing the null distribution.
Author(s)
Trung Nghia Vu
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
peakList <- detectSpecPeaks(X,
nDivRange = c(128),
scales = seq(1, 16, 2),
baselineThresh = 50000,
SNR.Th = -1,
verbose=FALSE
);
resFindRef<- findRef(peakList);
refInd <- resFindRef$refInd;
maxShift = 50;
Y <- dohCluster(X,
peakList = peakList,
refInd = refInd,
maxShift = maxShift,
acceptLostPeak = TRUE, verbose=FALSE);
# find the BW-statistic
BW = BWR(Y, groupLabel);
H0 = createNullSampling(Y, groupLabel, N = 100,verbose=FALSE)
Peak detection for spectra
Description
Divide the whole spectra into smaller segments and detect peaks by using MassSpecWavelet package. Note that, the peak lists could be found by using other methods, this function is just a choice.
Usage
detectSpecPeaks(
X,
nDivRange = 128,
scales = seq(1, 16, 2),
baselineThresh = 50000,
SNR.Th = -1,
verbose = TRUE
)
Arguments
X |
The spectral dataset in matrix format in which each row contains a single sample |
nDivRange |
The size of a single small segment after division of spectra |
scales |
The parameter of peakDetectionCWT function of MassSpecWavelet package, look it up in the original function. |
baselineThresh |
It will remove all peaks under an intensity set by baselineThresh. |
SNR.Th |
The parameter of peakDetectionCWT function of MassSpecWavelet package, look it up in the original function. If you set -1, the function will itself re-compute this value. |
verbose |
A boolean value to allow print out process information. |
Value
The peak lists of the spectra
Author(s)
Trung Nghia Vu
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
peakList <- detectSpecPeaks(X,
nDivRange = c(128),
scales = seq(1, 16, 2),
baselineThresh = 50000,
SNR.Th = -1,
verbose=FALSE
);
Segment shift
Description
Move a spectral segment of a sample shiftStep points to right or left
Usage
doShift(specSeg, shiftStep)
Arguments
specSeg |
The segment which needs to be shifted |
shiftStep |
The shift step for moving. If it is a negative (positive) value, the segment is moved to left (right). |
Value
The new segment after shifting.
Author(s)
Trung Nghia Vu
See Also
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
maxShift=50;
refSpec=X[1,];
tarSpec=X[2,];
adj=findShiftStepFFT(refSpec, tarSpec,maxShift=maxShift);
newTarSpec=doShift(tarSpec,adj$stepAdj);
CluPA function for multiple spectra.
Description
Use CluPA for alignment for multiple spectra.
Usage
dohCluster(
X,
peakList,
refInd = 0,
maxShift = 100,
acceptLostPeak = TRUE,
verbose = TRUE
)
Arguments
X |
The spectral dataset in the matrix format in which each row contains a single sample |
peakList |
The peak lists of the spectra |
refInd |
The index of the reference spectrum. |
maxShift |
The maximum number of the points for a shift step. |
acceptLostPeak |
This is an option for users, TRUE is the default value. If the users believe that all the peaks in the peak list are true positive, change it to FALSE. |
verbose |
A boolean value to allow print out process information. |
Value
The aligned spectra.
Author(s)
Trung Nghia Vu
See Also
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
peakList <- detectSpecPeaks(X,
nDivRange = c(128),
scales = seq(1, 16, 2),
baselineThresh = 50000,
SNR.Th = -1,
verbose=FALSE
);
resFindRef<- findRef(peakList);
refInd <- resFindRef$refInd;
maxShift = 50;
Y <- dohCluster(X,
peakList = peakList,
refInd = refInd,
maxShift = maxShift,
acceptLostPeak = TRUE, verbose=FALSE);
Use CluPA for alignment with additional information
Description
This function integrates some additional information from user such as references for each specific segment, segment ignorance, maximum step size.. to align spectra using CluPA.
Usage
dohClusterCustommedSegments(
X,
peakList,
refInd,
segmentInfoMat,
minSegSize = 128,
maxShift = 100,
acceptLostPeak = TRUE,
verbose = TRUE
)
Arguments
X |
The spectral dataset in the matrix format in which each row contains a single sample |
peakList |
The peak lists of the spectra |
refInd |
The index of the reference spectrum. |
segmentInfoMat |
The matrix containing the additional information for segments from the users. This parameter must be a matrix. |
minSegSize |
The minimum size of the segments which could be considered for alignment. |
maxShift |
The maximum number of the points for a shift step. |
acceptLostPeak |
This is an option for users, TRUE is the default value. If the users believe that all the peaks in the peak list are true positive, change it to FALSE. |
verbose |
A boolean value to allow print out process information. |
Details
Each row of the segmentInfoMat matrix includes 5 values. For example, it could be imported from a CSV file consisting of following content: # begin,end,forAlign,ref,maxShift 100,200,0,0,0 450,680,1,0,50 # Each column could be explained as the following: * begin: the starting point of the segment. * end: the end point of the segment. * forAlign: the segment is aligned (1) or not (0). * ref: the index of the reference spectrum. If 0, the algorithm will select the reference found by the reference finding step. * maxShift: the maximum number of points of a shift to left/right. It is worth to note that only segments with forAlign=1 (column 3) will be taken into account for spectral alignment.
Value
The aligned spectral segments.
Author(s)
Trung Nghia Vu
See Also
Examples
cat("\n Please see more examples in the vignettes file.")
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
peakList <- detectSpecPeaks(X,
nDivRange = c(128),
scales = seq(1, 16, 2),
baselineThresh = 50000,
SNR.Th = -1,
verbose=FALSE
);
resFindRef<- findRef(peakList);
refInd <- resFindRef$refInd;
segmentInfoMat=matrix(data=c(100,200,0,0,0,
50,680,1,0,50),nrow=2,ncol=5,byrow=TRUE
)
colnames(segmentInfoMat)=c("begin","end","forAlign","ref","maxShift")
segmentInfoMat
maxShift = 50;
Yc <- dohClusterCustommedSegments(X,
peakList = peakList,
refInd = refInd,
maxShift = maxShift,
acceptLostPeak = TRUE,
segmentInfoMat = segmentInfoMat,
minSegSize = 128,
verbose=FALSE)
BW and percentile ratios plot
Description
This function is used to plot BW and percentile ratios
Usage
drawBW(
BW,
perc,
X,
startP = -1,
endP = -1,
groupLabel = NULL,
highBound = -1,
lowBound = -1,
nAxisPos = 4,
offside = 0
)
Arguments
BW |
An array of the BW ratios. |
perc |
An array of the percentile ratios. |
X |
The spectral dataset in matrix format in which each row contains a single sample. |
startP |
The starting point of the segment. If it is -1, the starting point is from beginning of the spectra. |
endP |
The ending point of the segment. If it is -1, the ending point is the last point of the spectra. |
groupLabel |
The default value is NULL, it means that a single spectrum has a distinct color. Otherwise, the spectra is colored by their label. |
highBound |
Default value is -1, that means the plot covers also the highest intensity peaks in the figure. If the users want to limit the upper height of the figure, set this parameter by the limited value. |
lowBound |
Default value is -1, that means the plot covers also the lowest intensity peaks in the figure. If the users want to limit the under height of the figure, set this parameter by the limited value. |
nAxisPos |
The number of ticks that will be displayed in the horizontal axis. |
offside |
The offside of values in x-axis for display. |
Value
Return a plot containing both the BW and the spectra.
Author(s)
Trung Nghia Vu
See Also
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
peakList <- detectSpecPeaks(X,
nDivRange = c(128),
scales = seq(1, 16, 2),
baselineThresh = 50000,
SNR.Th = -1,
verbose=FALSE
);
resFindRef<- findRef(peakList);
refInd <- resFindRef$refInd;
maxShift = 50;
Y <- dohCluster(X,
peakList = peakList,
refInd = refInd,
maxShift = maxShift,
acceptLostPeak = TRUE, verbose=FALSE);
# find the BW-statistic
BW = BWR(Y, groupLabel);
N = 100;
alpha = 0.05;
# create sampled H0 and export to file
H0 = createNullSampling(Y, groupLabel, N = N,verbose=FALSE)
#compute percentile of alpha
perc = double(ncol(Y));
alpha_corr = alpha/sum(returnLocalMaxima(Y[2,])$pkMax>50000);
for (i in seq_along(perc)) {
perc[i] = quantile(H0[,i],1-alpha_corr, type = 3);
}
drawBW(BW, perc,Y, groupLabel = groupLabel)
Spectral plot
Description
This function allows to draw a segment or the whole spectra with limited high/low bounds of intensity.
Usage
drawSpec(
X,
startP = -1,
endP = -1,
groupLabel = NULL,
useLog = -1,
highBound = -1,
lowBound = -1,
xlab = NULL,
ylab = NULL,
main = NULL,
nAxisPos = 4,
offside = 0
)
Arguments
X |
The spectral dataset in matrix format in which each row contains a single sample. |
startP |
The starting point of the segment. If it is -1, the starting point is from beginning of the spectra. |
endP |
The ending point of the segment. If it is -1, the ending point is the last point of the spectra. |
groupLabel |
The default value is NULL, it means that a single spectrum has a distinct color. Otherwise, the spectra is colored by their label. |
useLog |
The default value is -1, that means do not use a log transformation. If users want to transform the intensities to logarit values before plotting, set it to 1. |
highBound |
Default value is -1, that means the plot covers also the highest intensity peaks in the figure. If the users want to limit the upper height of the figure, set this parameter by the limited value. |
lowBound |
Default value is -1, that means the plot covers also the lowest intensity peaks in the figure. If the users want to limit the under height of the figure, set this parameter by the limited value. |
xlab |
The default value is NULL, if so, "index" is displayed at the horizontal axis. |
ylab |
The default value is NULL, if so, "intensity" is displayed at the vertical axis. |
main |
The default value is NULL, if so, the title shows the values of startP and endP. |
nAxisPos |
The number of ticks that you want to display in horizontal axis. |
offside |
The offside of values in x-axis for display. |
Value
Return a plot of the spectra.
Author(s)
Trung Nghia Vu
See Also
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
drawSpec(X)
Plot NMR spectra from a spectra data matrix
Description
This function plots NMR spectra (so with the largest ppm values on the left) with a number of plotting options
Usage
drawSpecPPM(
Y.spec,
X.ppm,
LeftIndex = -1,
RightIndex = -1,
groupFactor = NULL,
useLog = FALSE,
maxHeight = -1,
minHeight = -1,
nAxisPos = 4,
xlab = NULL,
ylab = NULL,
title = NULL,
ticks = NULL,
ROI = NULL,
ROI.ppm = NULL,
roiWidth = 100,
roiWidth.ppm = NULL,
legend.extra.x = 2,
legend.extra.y = 2,
legendpos = NULL,
colourstyle = "ggplot",
manual.colours = NULL,
lwd = 1,
noLegend = FALSE
)
Arguments
Y.spec |
(required) The raw spectra in matrix format (1 sample per row) or numeric vector (in case of 1 spectrum) |
X.ppm |
(required) The vector with the ppm values |
LeftIndex |
The starting index of the ppm values for plotting. default = -1 indicates the first ppm (the largest) value is the start of the plot |
RightIndex |
The stopping index for plotting. default = -1 indicates the last ppm value (the smallest) is the end of the plot |
groupFactor |
The groupFactors. If provided different colors will be used for each group. |
useLog |
If set to 'TRUE' the spectra will be log10 transformed (default = FALSE). |
maxHeight |
The maximal height of the plot (default = -1, this indicates no maximal value). |
minHeight |
The minimal height of the plot (default = -1, this indicates no minimal value). |
nAxisPos |
The number of equally spaced tickmarks. |
xlab |
The label on the x axis. |
ylab |
The label on the y axis. |
title |
The title of the plot. |
ticks |
Position tick manually by providing ppm values. |
ROI |
If provided (with an index value, not a ppm value) only this region of interest will be plotted. (supply no ROI or ROI.ppm values, for the full spectrum, or specify only 1, either ROI or ROI.ppm). |
ROI.ppm |
If provided (a ppm value, not an index value) only this region of interest will be plotted. (supply no ROI or ROI.ppm values, for the full spectrum, or specify only 1, either ROI or ROI.ppm). |
roiWidth |
The width of the ROI (region of interest) plot in index points/measurement points. The plot will span from ROI/ROI.ppm - roiWidth to ROI/ROI.ppm + roiWidth. (only supply roiWidth or roiWidth.ppm if needed). |
roiWidth.ppm |
The width of the ROI (region of interest) plot in ppm. The plot will span from ROI/ROI.ppm - roiWidth.ppm to ROI/ROI.ppm + roiWidth.ppm. (only supply roiWidth or roiWidth.ppm if needed). |
legend.extra.x |
Increase (or decrease) the horizontal space in the legend, this is useful for exporting larger figures. |
legend.extra.y |
Increase (or decrease) the vertical space in the legend, this is useful for exporting larger figures. |
legendpos |
The position of the legend (standard R legend positioning, default = 'topleft'). |
colourstyle |
The colours used in the plot, either standard R or ggplot colours (default). |
manual.colours |
Provide specific colours to be used in the plot. |
lwd |
The linewidth. |
noLegend |
If set to TRUE no legend will be plotted (default = FALSE). |
Value
an R plot
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Examples
data(Winedata)
Spectra = Winedata$spectra
ppm.wine = Winedata$ppm
wine.color = Winedata$wine.color
drawSpecPPM(Y.spec=Spectra, X.ppm=ppm.wine, groupFactor = wine.color,
title = 'Raw wine data spectra')
Reference finding
Description
This function is to heuristically detect a reference spectrum.
Usage
findRef(peakList)
Arguments
peakList |
The peak lists of the spectra. |
Value
list of 2: refInd (The index of the reference spectrum found by the algorithm) and orderSpec (A sorted array of the spectra by their goodness values)
Author(s)
Trung Nghia Vu
References
Vu TN, Valkenborg D, Smets K, Verwaest KA, Dommisse R, Lemi\'ere F, Verschoren A, Goethals B, Laukens K. (2011) An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data. BMC Bioinformatics. 2011 Oct 20;12:405.
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
peakList <- detectSpecPeaks(X,
nDivRange = c(128),
scales = seq(1, 16, 2),
baselineThresh = 50000,
SNR.Th = -1,
verbose=FALSE
);
cat("\n Find the spectrum reference...")
resFindRef<- findRef(peakList);
refInd <- resFindRef$refInd;
cat("\n Order of spectrum for reference \n");
for (i in seq_along(resFindRef$orderSpec))
cat(paste(i, ":",resFindRef$orderSpec[i],sep=""), " ");
cat("\n The reference is: ", refInd);
Selecting the peaks in a segment
Description
This function is to find out which peaks belonging to a segment which ranges from startP to endP
Usage
findSegPeakList(peakList, startP, endP)
Arguments
peakList |
The peak lists of the spectra. |
startP |
The starting point of the segment. |
endP |
The ending point of the segment. |
Value
The list of indices of the peaks in the segment.
Author(s)
Trung Nghia Vu
See Also
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
peakList <- detectSpecPeaks(X,
nDivRange = c(128),
scales = seq(1, 16, 2),
baselineThresh = 50000,
SNR.Th = -1,
verbose=FALSE
);
cat("\n ", peakList[[1]])
segmentpeakList= findSegPeakList(peakList[[1]],400,600);
cat("\n ", segmentpeakList)
Finding the shift-step by using Fast Fourier Transform cross- correlation
Description
This function uses Fast Fourier Transform cross-correlation to find out the shift step between two spectra.
Usage
findShiftStepFFT(refSpec, tarSpec, maxShift = 0, scale = NULL)
Arguments
refSpec |
The reference spectrum. |
tarSpec |
The target spectrum which needs to be aligned. |
maxShift |
The maximum number of points for a shift step. If this value is zero, the algorithm will check on the whole length of the spectra. |
scale |
Boolean value (TRUE/FALSE) for scaling data before Fast Fourier Transform cross-correlation step. If scale=NULL but mean/median of absolute data is too small (<1), the scaling will be applied. This might happen for very low abundant spectra like chromatograms. For normal NMR spectra, the scaling is usually not applied. |
Value
list of 2: corValue (The best correlation value) and stepAdj (The shift step found by the algorithm)
Author(s)
Trung Nghia Vu
See Also
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
maxShift=50;
refSpec=X[1,];
tarSpec=X[2,];
adj=findShiftStepFFT(refSpec, tarSpec,maxShift=maxShift);
Convert raw NMR spectra to peak data by using wavelets
Description
This function converts phase corrected NMR spectra to peak data by using wavelet based peak detection (with the MassSpecWavelet package)
Usage
getWaveletPeaks(
Y.spec,
X.ppm,
sample.labels = NULL,
window.width = "small",
window.split = 4,
scales = seq(1, 16, 1),
baselineThresh = 1000,
SNR.Th = -1,
nCPU = -1,
include_nearbyPeaks = TRUE,
raw_peakheight = FALSE,
duplicate_detection_multiplier = 1
)
Arguments
Y.spec |
The spectra in matrix format (rows = samples, columns = measurement points ). |
X.ppm |
The x/ppm values of the spectra (in single vector or matrix format). |
sample.labels |
The sample labels (optional), if not supplied these will simply be the sample numbers. |
window.width |
The width of the detection window for the wavelets. Because of the Fourier transform lengths of 512 ( window.width = 'small') of 1024 ( window.width = 'large') are preferable. |
window.split |
A positive, even and whole number indicating in how many parts the sliding window is split up. With every iteration the window slides one part further. |
scales |
The scales to be used in the wavelet based peak detection, see peakDetectionCWT. |
baselineThresh |
Peaks with a peakValue lower than this threshold will be removed (default = 1000). |
SNR.Th |
The Signal-to-noise threshold, see peakDetectionCWT. |
nCPU |
The amount of cpu's to be used for peak detection. If set to '-1' all available cores minus 1 will be used. |
include_nearbyPeaks |
If set to TRUE small peaks in the tails of larger ones will be included in the peak data, see peakDetectionCWT. |
raw_peakheight |
(default = FALSE) Whether to use the raw peak height of a peak instead of the optimal CWT coefficient (which is a measure for AUC). |
duplicate_detection_multiplier |
(default 1) In case users want to process other spectra besides NMR, this parameter will increase the limit for two peaks to be considered a duplicate detection. When dealing with more distorted spectra this parameter can be increased (recommended to not increase above 10). |
Value
The peaks detected with the wavelets.
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Examples
subset <- GetWinedata.subset()
# to reduce the example time we only select spectra 1 & 2
subset.spectra = as.matrix(subset$Spectra)[1:2,]
subset.ppm = as.numeric(subset$PPM)
test.peaks <- getWaveletPeaks(Y.spec=subset.spectra,
X.ppm=subset.ppm ,
nCPU = 1) # nCPU set to 2 for the vignette build
CluPA function for two spectra.
Description
This function implements the idea of the CluPA algorithm to align the target spectrum against the reference spectrum.
Usage
hClustAlign(
refSpec,
tarSpec,
peakList,
peakLabel,
startP,
endP,
distanceMethod = "average",
maxShift = 0,
acceptLostPeak = FALSE
)
Arguments
refSpec |
The reference spectrum. |
tarSpec |
The target spectrum. |
peakList |
List of peaks of the both reference and target spectra |
peakLabel |
The list of the labels of the peaks |
startP |
The starting point of the segment. |
endP |
The ending point of the segment. |
distanceMethod |
The distance method for the hierarchical clustering algorithm. |
maxShift |
The maximum number of points for a shift step. |
acceptLostPeak |
This is an option for users, TRUE is the default value. If the users believe that all the peaks in the peak list are true positive, change it to FALSE. |
Value
list of 2: tarSpec (The target spectrum after alignment) and peakList (The peak list after alignment)
Author(s)
Trung Nghia Vu
References
Vu TN, Valkenborg D, Smets K, Verwaest KA, Dommisse R, Lemi\'ere F, Verschoren A, Goethals B, Laukens K. (2011) An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data. BMC Bioinformatics. 2011 Oct 20;12:405.
See Also
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
peakList <- detectSpecPeaks(X,
nDivRange = c(128),
scales = seq(1, 16, 2),
baselineThresh = 50000,
SNR.Th = -1,
verbose=FALSE
);
resFindRef<- findRef(peakList);
refInd <- resFindRef$refInd;
tarInd=1;
refSpec=X[refInd,];
tarSpec=X[tarInd,];
mergedPeakList=c(peakList[[refInd]],peakList[[tarInd]]);
mergedPeakLabel=double(length(mergedPeakList));
for (i in seq_along(peakList[[refInd]]) ) mergedPeakLabel[i]=1;
startP=1;
endP=length(tarSpec);
res=hClustAlign(refSpec,tarSpec,mergedPeakList,mergedPeakLabel,startP,endP,
maxShift=50,acceptLostPeak=TRUE)
Grouping with hierarchical clustering (used in the PeakGrouper function)
Description
Internal function in the PeakGrouper function for generating the hierarchical clustering tree and cutting it.
Usage
hclust.grouping(
current.peaks,
min.samp.grp = 1,
max.dupli.prop = 0.25,
maxClust = 10,
linkage = "average"
)
Arguments
current.peaks |
A number of neighbouring peaks to be grouped. |
min.samp.grp |
The minimal amount of samples needed to form a group. |
max.dupli.prop |
The maximal duplication proportion allowed for a group to be considered a single group. |
maxClust |
The maximum number of clusters (depth of the tree). |
linkage |
The linkage to be used in the hierarchical clustering. See the 'method' argument in hclust. |
Value
Returns a data frame with grouped peaks.
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
See Also
Create a simulated NMR spectral data
Description
Generate an NMR spectral data for testing.
Usage
makeSimulatedData()
Details
We generate a NMR spectral data sets that contains two group A and group B. One at around 300 has a single tip and the other at around 600 has double tips that intentionally contains biological variation. First, a single spectrum is created based on statistic information (mean, standard deviation of intensity) achieved from real NMR spectra. Then, we randomly shift the spectrum to maximum 50 data points and add some biological and technical variations to each point intensity to the spectrum to create a new spectrum. The collection of spectra from each group is the final dataset.
Value
a list with 2 elements: data (The simulated NMR spectral data matrix) and label (Group label of each spectrum)
Author(s)
Trung Nghia Vu
Examples
res <- makeSimulatedData();
X <- res$data;
groupLabel <- res$label;
Regroup faulty grouped peaks
Description
If there are peaks wrongly grouped by the peakGrouper function, they will be regrouped by using the ppm values together with the peak signal to noise ratio.
Usage
regroupR(
grouped.peaks,
list.to.regroup,
min.samp.grp = 1,
max.dupli.prop = 0.1,
maxClust = 10
)
Arguments
grouped.peaks |
The grouped peaks data. |
list.to.regroup |
The peak indices of groups to regroup (the groups, indicated by their peakIndex, in 1 list item will be merged and regrouped). |
min.samp.grp |
The minimal amount of samples needed to form a group. |
max.dupli.prop |
The maximal duplication proportion allowed for a group to be considered a single group. |
maxClust |
The maximum number of clusters (depth of the tree). |
Value
Returns a data frame with regrouped peaks.
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Identify features (columns in the datamatrix) which are significantly associated with the outcome.
Description
This function produces a p-value for every column in the datamatrix, corresponding to the null hypothesis that outcome/response is independent of that feature.
Usage
relevant.features.p(
datamatrix,
response,
p.adj = "BH",
POI = 1,
responsevector = NULL
)
Arguments
datamatrix |
The data matrix with a column for each feature. |
response |
A vector or matrix of outcomes/responses (e.g. class labels). the length of this vector or the amount of rows in this matrix should match the amount of rows in datamatrix. |
p.adj |
The adjustment method for the p-values. Any of 'holm', 'hochberg', 'hommel', 'bonferroni', 'BH' (default), 'BY', 'fdr' or 'none' are accepted. |
POI |
Only if 'response' is a matrix! The p values of interest. This is a number indicating which column of the 'response' matrix you are interested in. POI can range from 1 (default) to the number of columns in 'response'. |
responsevector |
(deprecated), please use the the more general 'response' variable instead. |
Value
data with the features and their (adjusted) p-values, one for every column in the datamatrix .
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
Examples
nSamples <- 10
nFeatures <- 20
data.matrix <- matrix( stats::runif(n=nFeatures*nSamples, min=0,max=100),
ncol = nFeatures, nrow = nSamples)
responseVec <- c( rep(0,nSamples/2), rep(1,nSamples/2) )
p_values <- relevant.features.p(datamatrix = data.matrix, response =
responseVec, p.adj = 'none')
p_values_adjusted <- relevant.features.p( datamatrix = data.matrix,
response = responseVec, p.adj = 'bonferroni')
Local maximum detection
Description
Find and return local maximum of a single spectrum.
Usage
returnLocalMaxima(spectrum)
Arguments
spectrum |
A spectral sample in the vector format. |
Value
list of 2: locMax (Locations of the found local maximum peaks) and pkMax (Intensities of the found local maximum peaks)
Author(s)
Trung Nghia Vu
Examples
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;
returnLocalMaxima(X[2,])