Title: | Modular Leaf Ordering Methods for Dendrogram Nodes |
Version: | 0.3.4 |
Description: | An implementation of functions to optimize ordering of nodes in a dendrogram, without affecting the meaning of the dendrogram. A dendrogram can be sorted based on the average distance of subtrees, or based on the smallest distance value. These sorting methods improve readability and interpretability of tree structure, especially for tasks such as comparison of different distance measures or linkage types and identification of tight clusters and outliers. As a result, it also introduces more meaningful reordering for a coupled heatmap visualization. This method is described in "dendsort: modular leaf ordering methods for dendrogram representations in R", F1000Research 2014, 3: 177 <doi:10.12688/f1000research.4784.1>. |
License: | GPL-2 | GPL-3 |
Encoding: | UTF-8 |
Suggests: | knitr, RColorBrewer, gplots, seriation, gapmap, rmarkdown |
VignetteBuilder: | knitr |
URL: | https://github.com/evanbiederstedt/dendsort |
BugReports: | https://github.com/evanbiederstedt/dendsort/issues |
RoxygenNote: | 7.1.1 |
NeedsCompilation: | no |
Maintainer: | Evan Biederstedt <evan.biederstedt@gmail.com> |
Packaged: | 2021-04-19 19:12:09 UTC; evanbiederstedt |
Author: | Ryo Sakai [aut], Evan Biederstedt [cre, aut] |
Repository: | CRAN |
Date/Publication: | 2021-04-20 11:40:02 UTC |
Modular Leaf Ordering Methods for Dendrogram Nodes
Description
Modular Leaf Ordering Methods for Dendrogram Nodes
Details
This package includes functions to optimize ordering of nodes in a dendrogram, without affecting the meaning of the dendrogram. A dendrogram can be sorted based on the average distance of subtrees, or based on the smallest distance value. These sorting methods improve readability and interpretability of tree structure, especially for tasks such as comparison of different distance measures or linkage types and identification of tight clusters and outliers. As a result, it also introduces more meaningful reordering for a coupled heatmap visualization.
Author(s)
Ryo Sakai ryo.sakai@esat.kuleuven.be
Recursive function to calculate the length of branches
Description
cal_length
is a code modified from plotNode()
to calculate the length of lines to draw the branch of a dendrogram. This
function was developed to evaluate the use of ink for visualization.
Usage
cal_length(x1, x2, subtree, center, nodePar, edgePar, horiz = FALSE, sum)
Arguments
x1 |
A x coordinate |
x2 |
Another x coordinate |
subtree |
A dendrogram object. |
center |
A logical whether the dendrogram is centered. |
nodePar |
A node parameter. |
edgePar |
An edge parameter. |
horiz |
A logical about layout. |
sum |
A sum of length. |
Value
output The length.
Examples
#generate sample data
set.seed(1234); par(mar=c(0,0,0,0))
x <- rnorm(10, mean=rep(1:5, each=2), sd=0.4)
y <- rnorm(10, mean=rep(c(1,2), each=5), sd=0.4)
dataFrame <- data.frame(x=x, y=y, row.names=c(1:10))
#calculate Euclidian distance
distxy <- dist(dataFrame)
#hierachical clustering "complete" linkage by default
hc <- hclust(distxy)
total_length <- cal_total_length(as.dendrogram(hc))
Calculate the x coordinates given a branch of dendrogram
Description
cal_node_limit
is a code modified from plotNodeLimit()
to x coordinates of branches given a branch of dendrogram.
Usage
cal_node_limit(x1, x2, subtree, center)
Arguments
x1 |
A x coordinate |
x2 |
Another x coordinate |
subtree |
A dendrogram object. |
center |
A logical whether the dendrogram is centered. |
Value
output A list of parameters.
Examples
#generate sample data
set.seed(1234); par(mar=c(0,0,0,0))
x <- rnorm(10, mean=rep(1:5, each=2), sd=0.4)
y <- rnorm(10, mean=rep(c(1,2), each=5), sd=0.4)
dataFrame <- data.frame(x=x, y=y, row.names=c(1:10))
#calculate Euclidian distance
distxy <- dist(dataFrame)
#hierachical clustering "complete" linkage by default
hc <- hclust(distxy)
total <- cal_total_length(as.dendrogram(hc))
Calculate the total length of lines to draw the dendrogram
Description
cal_total_length
is a code modified from plot.dendrogram()
to calculate the total length of lines to draw a dendrogram. This
function was developed to evaluate the use of ink for visualization.
Usage
cal_total_length(x)
Arguments
x |
A dendrogram object. |
Value
output The total length.
Examples
#generate sample data
set.seed(1234); par(mar=c(0,0,0,0))
x <- rnorm(10, mean=rep(1:5, each=2), sd=0.4)
y <- rnorm(10, mean=rep(c(1,2), each=5), sd=0.4)
dataFrame <- data.frame(x=x, y=y, row.names=c(1:10))
#calculate Euclidian distance
distxy <- dist(dataFrame)
#hierachical clustering "complete" linkage by default
hc <- hclust(distxy)
total_length <- cal_total_length(as.dendrogram(hc))
Sorting and reordering dendrogram nodes
Description
dendsort
sorts a dendrogram object which is
typically a result of hierarchical clustering (hclust). The
subtrees in the resulting dendrogram are sorted based on the
average distance of subtrees at every merging point. The
tighter cluster, in other words the cluster with smaller
average distance, is placed on the left side of branch.
When a leaf merge with a cluster, the leaf is placed on the
right side.
Usage
dendsort(d, isReverse = FALSE, type = "min")
Arguments
d |
a dendrogram or hclust object. |
isReverse |
logical indicating if the order should be reversed.Defaults to FALSE |
type |
character indicating the type of sorting. Default to "min" |
Value
output A sorted dendrogram or hclust.
Examples
#generate sample data
set.seed(1234); par(mar=c(0,0,0,0))
x <- rnorm(10, mean=rep(1:5, each=2), sd=0.4)
y <- rnorm(10, mean=rep(c(1,2), each=5), sd=0.4)
dataFrame <- data.frame(x=x, y=y, row.names=c(1:10))
#calculate Euclidian distance
distxy <- dist(dataFrame)
#hierachical clustering "complete" linkage by default
hc <- hclust(distxy)
#sort dendrogram
dd <- dendsort(as.dendrogram(hc))
hc_sorted <- as.hclust(dd)
#sort in reverse, you can also pass hclust object
plot(dendsort(hc, isReverse=TRUE))
#sort by average distance
plot(dendsort(hc, type="average"))
#plot the result
par(mfrow = c(1, 3), mai=c(0.8,0.8,2,0.8))
plot(x, y, col="gray", pch=19, cex=2)
text(x, y, labels=as.character(1:10), cex=0.9)
plot(hc,main="before sorting", xlab="", sub="")
plot(hc_sorted, main="after sorting", xlab="", sub="")
Sample data matrix from the integrated pathway analysis of gastric cancer from the Cancer Genome Atlas (TCGA) study
Description
a multivariate table obtained from the integrated pathway analysis of gastric cancer from the Cancer Genome Atlas (TCGA) study. In this data set, each column represents a pathway consisting of a set of genes and each row represents a cohort of samples based on specific clinical or genetic features. For each pair of a pathway and a feature, a continuous value of between 1 and -1 is assigned to score positive or negative association, respectively.
Usage
data(sample_tcga)
Format
A data frame with 215 rows and 117 variables
Details
We would like to thank Sheila Reynolds and Vesteinn Thorsson from the Institute for Systems Biology for sharing this sample data set.
Sorting and reordering dendrogram nodes by average distances
Description
sort_average
sorts a dendrogram object based on
the average distance of its subtrees, recursively.
The tighter cluster, in other words the cluster with smaller
average distance, is placed on the left side of branch.
When a leaf merge with a cluster, the leaf is placed on the
right side.
Usage
sort_average(d)
Arguments
d |
A dendrogram object. |
Value
output A sorted dendrogram object.
Examples
#generate sample data
set.seed(1234); par(mar=c(0,0,0,0))
x <- rnorm(10, mean=rep(1:5, each=2), sd=0.4)
y <- rnorm(10, mean=rep(c(1,2), each=5), sd=0.4)
dataFrame <- data.frame(x=x, y=y, row.names=c(1:10))
#calculate Euclidian distance
distxy <- dist(dataFrame)
#hierachical clustering "complete" linkage by default
hc <- hclust(distxy)
#sort dendrogram
dd <- dendsort(as.dendrogram(hc))
hc_sorted <- as.hclust(dd)
#sort in reverse, you can also pass hclust object
plot(dendsort(hc, isReverse=TRUE))
#sort by average distance
plot(dendsort(hc, type="average"))
#plot the result
par(mfrow = c(1, 3), mai=c(0.8,0.8,2,0.8))
plot(x, y, col="gray", pch=19, cex=2)
text(x, y, labels=as.character(1:10), cex=0.9)
plot(hc,main="before sorting", xlab="", sub="")
plot(hc_sorted, main="after sorting", xlab="", sub="")
Sorting and reordering dendrogram nodes by average distances in reverse
Description
sort_average_r
sorts a dendrogram object in reverse based on
the average distance of its subtrees, recursively.
The tighter cluster, in other words the cluster with smaller
average distance, is placed on the right side of branch.
When a leaf merge with a cluster, the leaf is placed on the
left side.
Usage
sort_average_r(d)
Arguments
d |
A dendrogram object. |
Value
output A sorted dendrogram object.
Examples
#generate sample data
set.seed(1234); par(mar=c(0,0,0,0))
x <- rnorm(10, mean=rep(1:5, each=2), sd=0.4)
y <- rnorm(10, mean=rep(c(1,2), each=5), sd=0.4)
dataFrame <- data.frame(x=x, y=y, row.names=c(1:10))
#calculate Euclidian distance
distxy <- dist(dataFrame)
#hierachical clustering "complete" linkage by default
hc <- hclust(distxy)
#sort dendrogram
dd <- dendsort(as.dendrogram(hc))
hc_sorted <- as.hclust(dd)
#sort in reverse, you can also pass hclust object
plot(dendsort(hc, isReverse=TRUE))
#sort by average distance
plot(dendsort(hc, type="average"))
#plot the result
par(mfrow = c(1, 3), mai=c(0.8,0.8,2,0.8))
plot(x, y, col="gray", pch=19, cex=2)
text(x, y, labels=as.character(1:10), cex=0.9)
plot(hc,main="before sorting", xlab="", sub="")
plot(hc_sorted, main="after sorting", xlab="", sub="")
Sorting and reordering dendrogram nodes by the smallest value
Description
sort_smallest
sorts a dendrogram object based on
the smallest distance in its subtrees, recursively.
The cluster with the smallest distance is placed on the left
side of branch.When a leaf merge with a cluster, the leaf is
placed on the right side.
Usage
sort_smallest(d)
Arguments
d |
A dendrogram object. |
Value
output A sorted dendrogram object.
Examples
#generate sample data
set.seed(1234); par(mar=c(0,0,0,0))
x <- rnorm(10, mean=rep(1:5, each=2), sd=0.4)
y <- rnorm(10, mean=rep(c(1,2), each=5), sd=0.4)
dataFrame <- data.frame(x=x, y=y, row.names=c(1:10))
#calculate Euclidian distance
distxy <- dist(dataFrame)
#hierachical clustering "complete" linkage by default
hc <- hclust(distxy)
#sort dendrogram
dd <- dendsort(as.dendrogram(hc))
hc_sorted <- as.hclust(dd)
#sort in reverse, you can also pass hclust object
plot(dendsort(hc, isReverse=TRUE))
#sort by average distance
plot(dendsort(hc, type="average"))
#plot the result
par(mfrow = c(1, 3), mai=c(0.8,0.8,2,0.8))
plot(x, y, col="gray", pch=19, cex=2)
text(x, y, labels=as.character(1:10), cex=0.9)
plot(hc,main="before sorting", xlab="", sub="")
plot(hc_sorted, main="after sorting", xlab="", sub="")
Sorting and reordering dendrogram nodes by the smallest value in reverse
Description
sort_smallest_r
sorts a dendrogram object in reverse
based on the smallest distance in its subtrees, recursively.
The cluster with the smallest distance is placed on the right
side of branch.When a leaf merge with a cluster, the leaf is
placed on the left side.
Usage
sort_smallest_r(d)
Arguments
d |
A dendrogram object. |
Value
output A sorted dendrogram object.
Examples
#generate sample data
set.seed(1234); par(mar=c(0,0,0,0))
x <- rnorm(10, mean=rep(1:5, each=2), sd=0.4)
y <- rnorm(10, mean=rep(c(1,2), each=5), sd=0.4)
dataFrame <- data.frame(x=x, y=y, row.names=c(1:10))
#calculate Euclidian distance
distxy <- dist(dataFrame)
#hierachical clustering "complete" linkage by default
hc <- hclust(distxy)
#sort dendrogram
dd <- dendsort(as.dendrogram(hc))
hc_sorted <- as.hclust(dd)
#sort in reverse, you can also pass hclust object
plot(dendsort(hc, isReverse=TRUE))
#sort by average distance
plot(dendsort(hc, type="average"))
#plot the result
par(mfrow = c(1, 3), mai=c(0.8,0.8,2,0.8))
plot(x, y, col="gray", pch=19, cex=2)
text(x, y, labels=as.character(1:10), cex=0.9)
plot(hc,main="before sorting", xlab="", sub="")
plot(hc_sorted, main="after sorting", xlab="", sub="")