Type: | Package |
Title: | Fuzzy Clustering of Vegetation Data |
Version: | 2.0.3 |
Date: | 2025-05-19 |
Description: | A set of functions to: (1) perform fuzzy clustering of vegetation data (De Caceres et al, 2010) <doi:10.1111/j.1654-1103.2010.01211.x>; (2) to assess ecological community similarity on the basis of structure and composition (De Caceres et al, 2013) <doi:10.1111/2041-210X.12116>. |
Depends: | R (≥ 3.4.0) |
Imports: | vegan |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://emf-creaf.github.io/vegclust/ |
BugReports: | https://github.com/emf-creaf/vegclust/issues |
LazyLoad: | yes |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
VignetteBuilder: | utils, knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-05-19 16:23:29 UTC; miquel |
Author: | Miquel De Cáceres [aut, cre] |
Maintainer: | Miquel De Cáceres <miquelcaceres@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-05-19 17:50:02 UTC |
Fuzzy Clustering of Vegetation Data
Description
A set of functions to: (1) perform fuzzy clustering of vegetation data; (2) to assess ecological community similarity on the basis of structure and composition.
Author(s)
Maintainer: Miquel De Cáceres miquelcaceres@gmail.com [ORCID](https://orcid.org/0000-0001-7132-2080)
References
De Caceres et al, 2010 (doi:10.1111/j.1654-1103.2010.01211.x), De Caceres et al, 2013 (doi:10.1111/2041-210X.12116).
See Also
Useful links:
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create noise clustering with 3 clusters. Perform 10 starts from random seeds
## and keep the best solution
wetland.nc <- vegclust(wetland.chord, mobileCenters=3, m = 1.2, dnoise=0.75,
method="NC", nstart=10)
Cumulative abundance profile (CAP)
Description
Functions to calculate cumulative abundance profiles (CAPs), to build matrices from them, and to summarize several profiles.
Usage
CAP(x, transform = NULL, verbose = FALSE)
CAP2matrix(CAP, type = "cumulative", classWeights = NULL)
CAPcenters(CAP, y = NULL)
CAPquantile(CAP, q = 0.5, y = NULL)
Arguments
x |
A stratified vegetation data set (see function |
transform |
A function or the name of a function to be applied to each cumulative abundance value. |
verbose |
A logical flag to indicate extra output. |
CAP |
An object of class ' |
type |
The type of information that the resulting matrix should contain. Either |
classWeights |
A numerical vector containing the weight for size class. If |
y |
A vector used as a factor to calculate average or quantile profiles per each level. Alternatively, an object of class |
q |
Probability value for which the quantile is desired. By default the median is given. |
Details
Function CAP
replaces the abundance value of a size class by the sum of abundances in this and larger size classes (strata). Thus, upper classes contain smaller abundance values than lower classes, creating a cumulative abundance profile. Function CAP2matrix
takes an object of class 'CAP
' and returns a data matrix, where values differ depending on parameter type
: (1) type="cumulative"
simply reshapes the 'CAP
' object (a list) into a matrix with as many rows as plot records and where columns are organized in blocks (there are as many blocks as species and each block has as many columns as size classes); (2) type="total"
returns a plot-by-species matrix where each value is the total abundance of the species in the plot (i.e. the CAP value at the ground level); (3) type="volume"
returns a plot-by-species matrix where each value is the sum of CAP values across size classes (a measure of the "volume" occupied by the species in the plot). When provided, classWeights
are used to weight size classes of the cumulative abundance profiles (for (1) and (3) only). Function CAPcenters
calculates the average abundance profile for a set of plot records. If y
is a factor, it is used to speficy groups of samples for which average profiles are to be calculated. If y
is an object of class 'vegclust
' then the function returns the CAP centroids or medoids corresponding to the clustering result. Function CAPquantile
calculates a quantile profile for a set of CAPs. The usage of y
is the same as for CAPcenters
.
Value
Function CAP
returns an object of class 'CAP
', similar to objects of class 'stratifiedvegdata
' but where abundance values of upper size classes have beed added to those of lower size classes. Function CAP2matrix
returns a matrix with species as rows (columns depend on the value of type
). Functions CAPcenters
and CAPquantile
return an object of class 'CAP
'.
Author(s)
Miquel De Cáceres, CREAF.
References
De Cáceres, M., Legendre, P. & He, F. (2013) Dissimilarity measurements and the size structure of ecological communities. Methods in Ecology and Evolution 4: 1167-1177.
De Cáceres, M., Coll, L., Martín-Alcón, S., González-Olabarria, J.R. (submitted) A general method for the classification of forest stands using structure and composition.
See Also
stratifyvegdata
, plot.CAP
, vegdiststruct
Examples
## Load stratified data
data(medreg)
## Check that 'medreg' has correct class
class(medreg)
## Look at the data for the third plot
medreg[[3]]
## Create cumulative abundance profile (CAP) for each plot
medreg.CAP <- CAP(medreg)
## Look at the profile of the third plot
medreg.CAP[[3]]
## Create matrix with species abundances
medreg.X <- CAP2matrix(medreg.CAP, type="total")
head(medreg.X)
## Generate and plot average profile
average.CAP <- CAPcenters(medreg.CAP)
plot(average.CAP)
## Generate and plot median profile
median.CAP <- CAPquantile(medreg.CAP, q = 0.5)
plot(median.CAP)
Cumulative abundance surface (CAS)
Description
Functions to calculate cumulative abundance surfaces (CASs), to build matrices from them, and to summarize several CASs.
Usage
CAS(x, transform = NULL, verbose = FALSE)
CAS2matrix(CAS, type = "cumulative", classWeights = NULL)
CAScenters(CAS, y = NULL)
CASmargin(CAS, margin = 1, verbose = FALSE)
CASquantile(CAS, q = 0.5, y = NULL)
Arguments
x |
An object of class 'doublestratifiedvegdata' (see function |
transform |
A function or the name of a function to be applied to each cumulative abundance value. |
verbose |
A logical flag to indicate extra output. |
CAS |
An object of class ' |
type |
The type of information that the resulting matrix should contain (either |
classWeights |
A numerical matrix containing the weight for each combination of size classes. If |
y |
A vector used as a factor to calculate average or quantile surfaces per each level. Alternatively, an object of class |
margin |
Indicates whether marginalization should be done in primary ( |
q |
Probability value for which the quantile is desired. By default the median is given. |
Details
Function CAS
replaces the abundance value of each combination of size classes by the sum of abundances in this and larger size classes. This creates a cumulative abundance surface (similar to a bivariant cummulative distribution function). Function CASmargin
takes an object of class 'CAS
' and returns an object of class 'CAP
' that corresponds marginal profile in either the primary or the secondary size classes. Function CAS2matrix
takes an object of class 'CAS
' and returns a data matrix, where values differ depending on parameter type
: (1) type="cummulative"
simply reshapes the 'CAS
' object (a list) into a matrix with as many rows as plot records and where columns are organized in blocks (there are as many blocks as species and each block has as many columns as combinations of size classes); (2) type="total"
returns a plot-by-species matrix where each value is the total abundance of the species in the plot (i.e. the CAS value at the ground level). When provided, classWeights
are used to weight size classes of the cumulative abundance surfaces (for (1) only). Function CAScenters
calculates the average abundance surface for a set of plot records. If y
is a factor, it is used to speficy groups of samples for which average profiles are to be calculated. If y
is an object of class 'vegclust
' then the function returns the CAS centroids or medoids corresponding to the clustering result. Function CASquantile
calculates a quantile surface for a set of CASs. The usage of y
is the same as for CAScenters
.
Value
Function CAS
returns an object of class 'CAS
', similar to objects of class 'doublestratifiedvegdata
' but where abundance values of upper size classes have beed added to those of lower size classes. Function CAS2matrix
returns a matrix with species as rows (columns depend on the value of type
). Functions CAScenters
and CASquantile
return an object of class 'CAS
'.
Author(s)
Miquel De Cáceres, CREAF.
References
De Cáceres, M., Legendre, P. & He, F. (2013) Dissimilarity measurements and the size structure of ecological communities. Methods in Ecology and Evolution 4: 1167-1177.
De Cáceres, M., Coll, L., Martín-Alcón, S., González-Olabarria, J.R. (submitted) A general method for the classification of forest stands using structure and composition.
See Also
stratifyvegdata
, plot.CAS
, vegdiststruct
Examples
## Load tree data
data(treedata)
## Define stratum thresholds (4 strata)
heights <- seq(0,4, by=0.5)
diameters <- seq(0,2, by=0.5)
## Stratify tree data using heights and diameters as structural variables
X <- stratifyvegdata(treedata, sizes1=heights, sizes2=diameters, plotColumn="plotID",
speciesColumn="species", size1Column="height", size2Column="diam",
counts=TRUE)
X[[2]]
## Build cummulative abundance surface
Y <- CAS(X)
Y[[2]]
## Extracts the first and second marginal (i.e. CAP on heights or diameters respectively)
Y.M1 <- CASmargin(Y, margin = 1)
Y.M1[[2]]
Y.M2 <- CASmargin(Y, margin = 2)
Y.M2[[2]]
## For comparison we calculate the same profiles using the stratifyvegdata and CAP functions
Y1 <- CAP(stratifyvegdata(treedata, sizes1=heights, plotColumn="plotID",
speciesColumn="species", size1Column="height",
counts=TRUE))
Y1[[2]]
Y2 <- CAP(stratifyvegdata(treedata, sizes1=diameters, plotColumn="plotID",
speciesColumn="species", size1Column="diam",
counts=TRUE))
Y2[[2]]
## Compare Y.M1[[2]] with Y1[[2]] and Y.M2[[2]] with Y2[[2]]
Turns into membership matrix
Description
Attempts to turn its cluster vector argument into a membership matrix
Usage
as.memb(cluster)
Arguments
cluster |
A vector indicating the hard membership of each object in |
Value
An matrix with as many rows as the length of cluster
and as many columns as different cluster levels. NA
values will have zero membership to all clusters
Author(s)
Miquel De Cáceres, CREAF.
See Also
Examples
as.memb(factor(c(1,2,NA)))
Turns into vegclust objects
Description
Attempts to turn its arguments into a vegclust
object
Usage
as.vegclust(x, y, method = "KM", m = 1, dnoise = NULL, eta = NULL)
Arguments
x |
A site-by-species data matrix (raw mode), or a site-by-site distance matrix (distance mode). |
y |
A vector indicating the cluster that each object in |
method |
A clustering model from which
|
m |
The fuzziness exponent to be used, relevant for all fuzzy models (FCM, FCMdd, NC, NCdd, PCM and PCMdd). |
dnoise |
The distance to the noise cluster, relevant for noise clustering models (NC, HNC, NCdd and HNCdd). |
eta |
A vector of reference distances, relevant for possibilistic models (PCM and PCMdd). |
Details
This function is used to generate vegclust
objects which can then be used in vegclass
to classify new data. If the input classification is hard (i.e. yes/no membership), cluster centers are calculated as multivariate means, and the method for assigning new data is assumed to be k-means ("KM"
), i.e. plots will be assigned to the nearest cluster center. If community data is given as site-by-species data matrix the cluster centroids are added as mobileCenters
in the vegclust
object. Centroids will not be computed if community data is given as a site-by-site dissimilarity matrix. Moreover, current implementation does not allow y
to be a membership matrix when x
is a distance matrix.
Value
An object of class vegclust
.
Author(s)
Miquel De Cáceres, CREAF.
See Also
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Splits wetland data into two matrices of 30x27 and 11x22
wetland.30 <- wetland.chord[1:30,]
wetland.30 <- wetland.30[,colSums(wetland.30)>0]
dim(wetland.30)
wetland.11 <- wetland.chord[31:41,]
wetland.11 <- wetland.11[,colSums(wetland.11)>0]
dim(wetland.11)
## Performs a K-means clustering of the data set with 30 sites
wetland.km <- kmeans(wetland.30, centers=3, nstart=10)
## Transforms the 'external' classification of 30 sites into a 'vegclust' object
wetland.30.vc <- as.vegclust(wetland.30, wetland.km$cluster)
## Assigns the second set of sites according to the (k-means) membership rule
## That is, sites are assigned to the cluster whose cluster centroids is nearest.
wetland.11.km <- vegclass(wetland.30.vc, wetland.11)
## A similar 'vegclust' object is obtained when using the distance mode...
wetland.d.vc <- as.vegclust(dist(wetland.30), wetland.km$cluster)
## which can be also used to produce the assignment of the second set of objects
wetland.d.11 <- as.data.frame(as.matrix(dist(wetland.chord)))[31:41,1:30]
wetland.d.11.km <- vegclass(wetland.d.vc,wetland.d.11)
Cluster centers of a classification
Description
Function clustcentroid
calculates the centroid (multivariate average) coordinates of a classification. Function clustmedoid
determines the medoid (object whose average dissimilarity to all the other objects is minimal) for each cluster in the classification.
Usage
clustcentroid(x, y, m = 1)
clustmedoid(x, y, m = 1)
Arguments
x |
Community data, a site-by-species data frame. In function |
y |
It can be (a) A vector indicating the cluster that each object in |
m |
Fuzziness exponent, only effective when |
Details
In order to assign new plot record data into a predefined set of classes, one should use functions as.vegclust
and vegclass
instead.
Value
Function clustcentroid
returns a group-by-species matrix containing species average abundance values (i.e. the coordinates of each cluster centroid). Function clustmedoid
returns a vector of indices (medoids).
Author(s)
Miquel De Cáceres, CREAF.
See Also
as.vegclust
, vegclass
, vegclust
, kmeans
Examples
## Loads stats
library(stats)
## Loads data
data(wetland)
## This equals the chord transformation
## (see also \code{\link{decostand}} in package 'vegan')
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Performs a K-means clustering
wetland.km <- kmeans(wetland.chord, centers=3, nstart=10)
## Gets the coordinates corresponding to the centroids of KM clusters
clustcentroid(wetland.chord, y=wetland.km$cluster)
## Gets the object indices corresponding to the medoids of KM clusters
clustmedoid(wetland.chord, y=wetland.km$cluster)
Constancy table of a classification
Description
Allows studying the constancy table (i.e. the frequency of species in each class) of a classification represented in the form of a membership data matrix.
Usage
clustconst(x, memb)
## S3 method for class 'clustconst'
summary(
object,
mode = "all",
name = NULL,
sort = TRUE,
minconst = 0.5,
digits = 3,
...
)
Arguments
x |
Community data, a site by species data frame. |
memb |
An site-by-group matrix indicating the (hard or fuzzy) membership of each object in |
object |
An object of class 'clustconst'. |
mode |
Use |
name |
A string with the name of a cluster (in |
sort |
A flag to indicate whether constancy table should be sorted in descending order. |
minconst |
A threshold used to limit the values shown. |
digits |
The number of digits for rounding. |
... |
Additional parameters for summary (actually not used). |
Details
The constancy value of a species in a vegetation unit is the relative frequency of occurrence of the species in plot records that belong to the unit. In case of a fuzzy vegetation unit the constancy value is the sum of memberships of sites that contain the species divided by the sum of memberships of all sites. Use the 'summary' function to obtain information about: (1) which species are more frequent on a given vegetation unit; (2) which vegetation units have higher frequencies of a given target species. Additionally, the 'summary' function can sort a constancy table if mode="all"
and sort=TRUE
are indicated.
Value
Function clustconst
returns an object of type 'clustconst', in fact a data frame with the constancy value of each species (rows) on each cluster (column).
Author(s)
Miquel De Cáceres, CREAF
See Also
Examples
## Loads stats
library(stats)
## Loads data
data(wetland)
## This equals the chord transformation
## (see also \code{\link{decostand}} in package 'vegan')
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Performs a K-means clustering
wetland.km <- kmeans(wetland.chord, centers=3, nstart=10)
## Gets constancy table of KM (i.e. hard) clusters
c <- clustconst(wetland.chord, memb=as.memb(wetland.km$cluster))
## Prints constancy values ordered and store the result in d
d <- summary(c, mode="all")
## Prints the most frequent species in the first cluster
summary(c, mode="cluster", name=names(c)[1])
Cluster variance
Description
Computes the variation in community composition (i.e. beta diversity) found within the sites of a set of hard or fuzzy clusters.
Usage
clustvar(x, cluster = NULL, defuzzify = FALSE, ...)
Arguments
x |
Community data. Either a site-by-species matrix or a site-by-site matrix of compositional distances between sites (i.e., an object of class |
cluster |
A vector indicating the hard membership of each object in |
defuzzify |
A flag indicating whether fuzzy memberships should be defuzzified (see function |
... |
Additional parameters for function |
Details
This function can be used in two ways:
if
x
is a data matrix (site by species or distances among sites) andcluster
isnull
, the function assumes a single cluster of all points inx
. Whencluster
is provided, the function computes cluster variance for each (hard) group and this computation implies setting the centroid of the group. Cluster variance is defined as the average squared distance to the centroid.-
If
x
is an object of classvegclust
orvegclass
, the function uses the information contained there (distances to cluster centers, memberships and exponent of fuzziness) in order to compute cluster variances. Cluster centers do not need to be recomputed, and the distances to cluster centers are used directly. For centroid-based cluster models (KM, FCM, NC, HNC and PCM) the variance is defined as the average squared distance to the centroid. For medoid-based cluster models (KMdd, FCMdd, NCdd, HNCdd and PCMdd) the variance is defined as average distance to the medoid. The variance for both mobile and fixed clusters is returned. Additionally, membership matrices may be defuzzified ifdefuzzify=TRUE
.
Value
A double value (for one cluster) or a vector of values, one per each cluster.
Author(s)
Miquel De Cáceres, CREAF
See Also
Examples
## Loads data
data(wetland)
## This equals the chord transformation
## (see also \code{\link{decostand}} in package 'vegan')
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create noise clustering with 3 clusters. Perform 10 starts from random seeds
## and keep the best solution
wetland.nc <- vegclust(wetland.chord, mobileCenters=3, m = 1.2, dnoise=0.75,
method="NC", nstart=10)
## Gets cluster variance of fuzzy clusters
clustvar(wetland.nc)
## Gets cluster variance of fuzzy clusters after defuzzification
clustvar(wetland.nc, defuzzify=TRUE)
## Similar to the previous, this gets cluster variance of defuzzified (i.e. hard) clusters
clustvar(wetland.chord, cluster=defuzzify(wetland.nc)$cluster)
## Gets cluster variance of K-means (i.e. hard) clusters
clustvar(wetland.chord, cluster=kmeans(wetland.chord, centers=3, nstart=10)$cluster)
Concordance between two classifications
Description
Computes an index to compare two classifications.
Usage
concordance(x, y, method = "adjustedRand", ...)
Arguments
x , y |
Classification vector or membership matrix. Alternatively, objects of type |
method |
A string vector to indicate the desired indices (see details). |
... |
Additional parameters for function |
Details
Several indices for comparison of partitions are available:
method="Rand"
: Rand (1971) index.method="adjustedRand"
: Rand index adjusted for random effects (Hubert & Arabie 1985).method="Wallace"
: Wallace (1983) index (for asymmetrical comparisons). This index (and its adjusted version) is useful to quantify how muchx
is nested intoy
.method="adjustedWallace"
: Wallace index adjusted for random effects (Pinto et al. 2008).
Value
A numeric vector with the desired index values.
Author(s)
Miquel De Cáceres, CREAF
References
Hubert, L. & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Pinto, F.R., Melo-Cristino, J. & Ramirez, M. (2008). A confidence interval for the wallace coefficient of concordance and its application to microbial typing methods. PLoS ONE, 3.
Rand, W.M. (1971). Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66, 846–850.
Wallace, D.L. (1983). A method for comparing two hierarchical clusterings: Comment. Journal of the American Statistical Association, 78, 569–576.
See Also
Conform two community data tables
Description
Conforms two community data tables to have the same set of columns (species)
Usage
conformveg(x, y, fillvalue = 0, verbose = FALSE)
Arguments
x |
Community data, a site-by-species matrix. |
y |
Community data, a site-by-species matrix. |
fillvalue |
The value to be used to fill new entries in inflated matrices. |
verbose |
Displays information about the number of species shared between |
Details
This function adds to x
as many new columns as columns of y
that are not in x
. The same is done for y
, so the two tables have the same set of columns when they are returned.
Value
A list with the two inflated matrices x
and y
.
Author(s)
Miquel De Cáceres, CREAF.
See Also
Examples
## Loads data (38 columns and 33 species)
data(wetland)
dim(wetland)
## Splits wetland data into two matrices of 30x27 and 11x22
wetland.30 <- wetland[1:30,]
wetland.30 <- wetland.30[,colSums(wetland.30)>0]
dim(wetland.30)
wetland.11 <- wetland[31:41,]
wetland.11 <- wetland.11[,colSums(wetland.11)>0]
dim(wetland.11)
## Conforms the two matrices so they can eventually be merged
wetland.cf <- conformveg(wetland.30, wetland.11)
dim(wetland.cf$x)
dim(wetland.cf$y)
names(wetland.cf$x)==names(wetland.cf$y)
Cross-table of two fuzzy classifications
Description
Calculates a cross-tabulated matrix relating two fuzzy membership matrices
Usage
crossmemb(x, y, relativize = TRUE)
Arguments
x |
A site-by-group fuzzy membership matrix. Alternatively, an object of class 'vegclust' or 'vegclass'. |
y |
A site-by-group fuzzy membership matrix. Alternatively, an object of class 'vegclust' or 'vegclass'. |
relativize |
If |
Value
A cross-tabulated matrix comparing the two classifications. In general, each cell's value is the (fuzzy) number of objects that in x
are assigned to the cluster corresponding to the row and in y
are assigned to the cluster corresponding to the column. If relativize=TRUE
then the values of each row are divided by the (fuzzy) size of the corresponding cluster in x
.
Author(s)
Miquel De Cáceres, CREAF.
See Also
defuzzify
, vegclust
, decostand
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create clustering with 3 clusters. Perform 10 starts from random seeds
## and keep the best solution. Try both FCM and NC methods:
wetland.fcm <- vegclust(wetland.chord, mobileCenters=3, m = 1.2, method="FCM", nstart=10)
wetland.nc <- vegclust(wetland.chord, mobileCenters=3, m = 1.2, dnoise=0.75, method="NC",
nstart=10)
## Compare the results
crossmemb(wetland.fcm, wetland.nc, relativize=FALSE)
Defuzzifies a fuzzy partition
Description
Transforms a fuzzy classification into a crisp (hard) classification.
Usage
defuzzify(object, method = "max", alpha = 0.5, na.rm = FALSE)
Arguments
object |
A site-by-group fuzzy membership matrix. Alternatively, an object of class 'vegclust' or 'vegclass'. |
method |
Either |
alpha |
Threshold for the alpha-cut, bounded between 0 and 1. |
na.rm |
If |
Details
Alpha-cut means that memberships lower than alpha are transformed into 0 while memberships higher than alpha are transformed into 1. This means that if alpha values are low (i.e. close to 0), an object may belong to more than one group after defuzzification. These will generate a concatenation of cluster names in the output cluster
vector and a row with sum more than one in the memb
matrix). Similarly, if alpha is high (i.e. close to 1) there are objects that may be left unclassified. These will get NA
in the cluster
vector and zero row in the memb
matrix.
Value
A list with the following items:
memb
: A data frame with the hard membership partition.cluster
: A vector (factor) with the name of the cluster for each object.
Author(s)
Miquel De Cáceres, CREAF.
References
Davé, R. N. and R. Krishnapuram (1997) Robust clustering methods: a unified view. IEEE Transactions on Fuzzy Systems 5, 270-293.
See Also
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create noise clustering with 3 clusters. Perform 10 starts from random seeds
## and keep the best solution
wetland.nc <- vegclust(wetland.chord, mobileCenters=3, m = 1.2, dnoise=0.75,
method="NC", nstart=10)
## Defuzzification using an alpha-cut (alpha=0.5)
wetland.nc.df <- defuzzify(wetland.nc$memb, method="cut")
## Cluster vector, with 'N' for objects that are unclassified,
## and 'NA' for objects that are intermediate
print(wetland.nc.df$cluster)
## Hard membership matrix (site 22 does not get any cluster assigned)
print(wetland.nc.df$memb)
Heterogeneity-constrained random resampling (HCR)
Description
Returns a set of indices of the original data set that maximizes the mean and minimizes the variance of the distances between pairs of plot records.
Usage
hcr(d, nout, nsampl = 1000)
Arguments
d |
An object of class |
nout |
The number of sites (plot records) to be chosen among those available in |
nsampl |
The number of resampling trials to be compared. |
Details
Many subsets of the input data are selected randomly. These subsets are sorted by decreasing mean dissimilarity between pairs of plot records, and then sorted again by increasing variance of these dissimilarities. Ranks from both sortings are summed for each subset, and the subset with the lowest summed rank is considered as the most representative.
Value
Returns a vector containing the indices of the selected sites (plot records) to be used for sub-setting the original table.
Author(s)
Miquel De Cáceres, CREAF
References
Lengyel, A., Chytry, M., Tichy, L. (2011) Heterogeneity-constrained random resampling of phytosociological databases. Journal of Vegetation Science 22: 175-183.
See Also
Examples
## Loads data (38 columns and 33 species)
data(wetland)
dim(wetland)
## Constructs the chord distance matrix
wetland.chord <-dist(as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/")))
## Performs HCR resampling. Returns indices of objects
sel <- hcr(wetland.chord, nout=20, nsampl=1000)
## Prints the names of the plot records
print(row.names(wetland)[sel])
## Subset the original distance matrix
sel.chord <- as.dist(as.matrix(wetland.chord)[sel,sel])
Clustering with several number of clusters
Description
Performs several runs of function 'vegclust' (or 'vegclustdist') on a community data matrix (or distance matrix) using different number of clusters
Usage
hier.vegclust(
x,
hclust,
cmin = 2,
cmax = 20,
min.size = NULL,
verbose = TRUE,
...
)
random.vegclust(
x,
cmin = 2,
cmax = 20,
nstart = 10,
min.size = NULL,
verbose = TRUE,
...
)
hier.vegclustdist(
x,
hclust,
cmin = 2,
cmax = 20,
min.size = NULL,
verbose = TRUE,
...
)
random.vegclustdist(
x,
cmin = 2,
cmax = 20,
nstart = 10,
min.size = NULL,
verbose = TRUE,
...
)
Arguments
x |
For |
hclust |
A hierarchical clustering represented in an object of type |
cmin |
Number of minimum mobile clusters. |
cmax |
Number of maximum mobile clusters. |
min.size |
If |
verbose |
Flag used to print which number of clusters is currently running. |
... |
Additional parameters for function |
nstart |
A number indicating how many random trials should be performed for each number of groups |
Details
Function hier.vegclust
takes starting cluster configurations from cuts of a dendrogram given by object hclust
. Function random.vegclust
chooses random objects as cluster centroids and for each number of clusters performs nstart
trials. Functions hier.vegclustdist
and random.vegclustdist
are analogous to hier.vegclust
and random.vegclust
but accept distance matrices as input.
Value
Returns an object of type 'mvegclust' (multiple vegclust), which contains a list vector with objects of type vegclust
.
Author(s)
Miquel De Cáceres, CREAF
See Also
vegclust
, vegclustdist
, vegclass
, defuzzify
, hclust
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create noise clustering from hierarchical clustering at different number of clusters
wetland.hc <- hclust(dist(wetland.chord),method="ward")
wetland.nc1 <- hier.vegclust(wetland.chord, wetland.hc, cmin=2, cmax=5,
m = 1.2, dnoise=0.75, method="NC")
## Create noise clustering from random seeds at different levels
wetland.nc2 <- random.vegclust(wetland.chord, cmin=2, cmax=5, nstart=10,
m = 1.2, dnoise=0.75, method="NC")
Noise clustering with increasing number of clusters
Description
Performs several runs of function 'vegclust' on a community data matrix using an increasing number of clusters until some conditions are met.
Usage
incr.vegclust(
x,
method = "NC",
ini.fixed.centers = NULL,
min.size = 10,
max.var = NULL,
alpha = 0.5,
nstart = 100,
fix.previous = TRUE,
dnoise = 0.75,
m = 1,
...
)
Arguments
x |
Community data table. A site (rows) by species (columns) matrix or data frame. |
method |
A clustering model. Current accepted models are of the noise clustering family:
|
ini.fixed.centers |
The coordinates of initial fixed cluster centers. These will be used as |
min.size |
The minimum size (cardinality) of clusters. If any of the current k clusters does not have enough members the algorithm will stop and return the solution with k-1 clusters. |
max.var |
The maximum variance allowed for clusters (see function |
alpha |
Criterion to choose cluster seeds from the noise class. Specifically, an object is considered as cluster seed if the membership to the noise class is larger than |
nstart |
A number indicating how many random trials should be performed for number of groups. Each random trial uses the k-1 cluster centers plus the coordinates of the current cluster seed as initial solution for |
fix.previous |
Flag used to indicate that the cluster centers found when determining k-1 clusters are fixed when determining k clusters. |
dnoise |
The distance to the noise cluster. |
m |
The fuzziness exponent. |
... |
Additional parameters for function |
Details
Function hier.vegclust
takes starting cluster configurations from cuts of a dendrogram given by object hclust
. Function random.vegclust
chooses random objects as cluster centroids and for each number of clusters performs nstart
trials.
Value
Returns an object of class vegclust
; or NULL
if the initial cluster does not contain enough members.
Author(s)
Miquel De Cáceres, CREAF
References
Davé, R. N. and R. Krishnapuram (1997) Robust clustering methods: a unified view. IEEE Transactions on Fuzzy Systems 5, 270-293.
See Also
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Call incremental noise clustering
wetland.nc <- incr.vegclust(wetland.chord, method="NC", m = 1.2, dnoise=0.75,
min.size=5)
## Inspect cluster sizes
print(wetland.nc$size)
Distance between pairs of cluster centroids
Description
Calculates the distance between pairs of cluster centroids, given a distance matrix and a cluster vector.
Usage
interclustdist(x, cluster)
Arguments
x |
A site-by-site data matrix or an object of class |
cluster |
A vector indicating the hard membership of each object in |
Value
An object of class dist
containing the distances between cluster centers.
Author(s)
Miquel De Cáceres, CREAF
Regeneration of Mediterranean vegetation data set
Description
A stratified vegetation data set containing with several plot records laid to assess vegetation recovery three years after a wildfire. Collected in 2012 by Miquel De Caceres and Albert Petit in Horta de Sant Joan (Catalonia, Spain).
Format
An object of class stratifiedvegdata
with 96 elements (plots), each of them consisting of a data.frame where rows correspond to species groups and columns correspond to vegetation strata. Abundance values are percentage cover.
See Also
CAP
, plot.CAP
, stratifyvegdata
Draws cummulative abundance profiles
Description
Create plots used to inspect one or more cumulative abundance profiles.
Usage
## S3 method for class 'CAP'
plot(
x,
sizes = NULL,
species = NULL,
plots = NULL,
switchAxes = FALSE,
add = FALSE,
drawAxes = TRUE,
xlab = "",
ylab = "",
type = "s",
...
)
## S3 method for class 'stratifiedvegdata'
plot(
x,
sizes = NULL,
species = NULL,
plots = NULL,
switchAxes = FALSE,
add = FALSE,
drawAxes = TRUE,
xlab = "",
ylab = "",
type = "s",
...
)
Arguments
x |
An object returned from function |
sizes |
A vector containing the size values associated to each size class. If |
species |
A vector of strings indicating the species whose profile is to be drawn. If |
plots |
A vector indicating the plot records whose profile is to be drawn. Can be a |
switchAxes |
A flag indicating whether ordinate and abscissa axes should be interchanged. |
add |
A flag indicating whether profiles should be drawn on top of current drawing area. If |
drawAxes |
A flag indicating whether axes should be drawn. |
xlab |
String label for the x axis. |
ylab |
String label for the y axis. |
type |
Type of plot to be drawn ("p" for points, "l" for lines, "s" for steps, ...). |
... |
Additional plotting parameters. |
Author(s)
Miquel De Cáceres, CREAF
References
De Cáceres, M., Legendre, P. & He, F. (2013) Dissimilarity measurements and the size structure of ecological communities. Methods in Ecology and Evolution 4: 1167-1177.
See Also
Examples
## Load stratified data
data(medreg)
## Check that 'medreg' has correct class
class(medreg)
## Create cumulative abundance profile (CAP) for each plot
medreg.CAP <- CAP(medreg)
## Draw the stratified data and profile corresponding to the third plot
plot(medreg, plots="3")
plot(medreg.CAP, plots="3")
## Look at the plot and CAP of the same plot
medreg[["3"]]
medreg.CAP[["3"]]
Draws a cummulative abundance surface
Description
Create plots used to inspect one or more cumulative abundance surfaces.
Usage
## S3 method for class 'CAS'
plot(
x,
plot = NULL,
species = NULL,
sizes1 = NULL,
sizes2 = NULL,
palette = colorRampPalette(c("light blue", "light green", "white", "yellow", "orange",
"red")),
zlim = NULL,
...
)
Arguments
x |
An object of class |
plot |
A string indicating the plot record whose surface is to be drawn. |
species |
A string indicating the species whose profile is to be drawn. |
sizes1 |
A vector containing the size values associated to each primary size class. If |
sizes2 |
A vector containing the size values associated to each secondary size class. If |
palette |
Color palette for z values. |
zlim |
The limits for the z-axis. |
... |
Additional plotting parameters for function |
Author(s)
Miquel De Cáceres, CREAF
References
De Cáceres, M., Legendre, P. & He, F. (2013) Dissimilarity measurements and the size structure of ecological communities. Methods in Ecology and Evolution 4: 1167-1177.
See Also
Examples
## Create synthetic tree data
pl <- rep(1,100) # All trees in the same plot
sp <- ifelse(runif(100)>0.5,1,2) # Random species identity (species 1 or 2)
h <- rgamma(100,10,2) # Heights (m)
d <- rpois(100, lambda=h^2) # Diameters (cm)
m <- data.frame(plot=pl,species=sp, height=h,diameter=d)
m$ba <- pi*(m$diameter/200)^2
print(head(m))
## Size classes
heights <- seq(0,4, by=.25)^2 # Quadratic classes
diams <- seq(0,130, by=5) # Linear classes
## Stratify tree data
X <- stratifyvegdata(m, sizes1=heights, sizes2=diams,
plotColumn = "plot", speciesColumn = "species",
size1Column = "height", size2Column = "diameter",
abundanceColumn = "ba")
## Build cummulative abundance surface
Y <- CAS(X)
## Plot the surface of species '1' in plot '1' using heights and diameters
plot(Y, species=1, sizes1=heights[-1], xlab="height (m)",
ylab="diameter (cm)", sizes2=diams[-1], zlab="Basal area (m2)",
zlim = c(0,6), main="Species 1")
Plots clustering results
Description
Create plots used to study vegclust clustering results for an increasing number of clusters
Usage
## S3 method for class 'mvegclust'
plot(
x,
type = "hnc",
excludeFixed = TRUE,
verbose = FALSE,
ylim = NULL,
xlab = NULL,
ylab = NULL,
maxvar = 0.6,
minsize = 20,
...
)
Arguments
x |
An object returned from functions |
type |
A string indicating the type of plot desired. Current accepted values are "hnc","hmemb","var","hcs" and "valid". |
excludeFixed |
A flag to indicate whether clusters with fixed centroids should be excluded from plots. |
verbose |
A flag to print extra information. |
ylim |
A vector with the limits for the y axis. |
xlab |
String label for the x axis. |
ylab |
String label for the y axis. |
maxvar |
Maximum cluster variance allowed for the |
minsize |
Minimum cluster size allowed for the |
... |
Additional plotting parameters. |
Value
Different information is returned depending on the type of plot chosen.
Author(s)
Miquel De Cáceres, CREAF
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create noise clustering from hierarchical clustering at different number of clusters
wetland.hc <- hclust(dist(wetland.chord),method="ward")
wetland.nc <- hier.vegclust(wetland.chord, wetland.hc, cmin=2, cmax=5, m = 1.2,
dnoise=0.75, method="NC")
## Plot changes in the number of objects falling into the noise cluster
plot(wetland.nc, type="hnc")
## Plots the number of objects falling into "true" clusters,
## the number of objects considered intermediate,
## and the number of objects falling into the noise
plot(wetland.nc, type="hmemb")
## Plot minimum, maximum and average cluster size
plot(wetland.nc, type="hcs")
## Plot minimum, maximum and average cluster variance
plot(wetland.nc, type="var")
## Plot number of groups with high variance, low membership or both
plot(wetland.nc, type="valid")
Relates two clustering level results
Description
Analyzes how lower level clusters are assigned into upper level ones. The analysis is made for several number of clusters.
Usage
relate.levels(
lower,
upper,
defuzzify = FALSE,
excludeFixed = FALSE,
verbose = FALSE,
...
)
Arguments
lower |
A list of objects of type |
upper |
A list of objects of type |
defuzzify |
A logical flag used to indicate whether the result of calling |
excludeFixed |
A logical used to indicate whether fixed clusters should be excluded from the comparison of levels. |
verbose |
A flag used to ask for extra screen output. |
... |
Additional parameters for function |
Details
For each pair of vegclust
(or vegclass
) objects in upper
and lower
, the function calls function crossmemb
and then, if asked, deffuzifies the resulting memberships (by calling function defuzzify
) and several quantities are calculated (see 'value' section).
Value
A list with several data frames (see below). In each of them, the rows are items of upper
and columns are items of lower
. The names of rows and columns are the number of clusters of each vegclust
(or vegclass
) object.
nnoise
: The number of low level clusters that are assigned to the Noise class (forupper
objects using Noise clustering).maxnoise
: The maximum membership value of low level clusters to the Noise class (forupper
objects using Noise clustering).minmaxall
: The minimum value (across upper level clusters) of the maximum membership value observed among the lower level clusters.minallsize
: The minimum value (across upper level clusters) of the sum of membership values across lower level clusters.empty
: The number of upper level clusters (mobile or fixed) that do not have any member among the lower level clusters.
Author(s)
Miquel De Cáceres, CREAF
See Also
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create noise clustering from hierarchical clustering at different number of cluster
wetland.hc <- hclust(dist(wetland.chord),method="ward")
wetland.nc1 <- hier.vegclust(wetland.chord, wetland.hc, cmin=2, cmax=6, m = 1.2,
dnoise=0.75, method="NC")
wetland.nc2 <- hier.vegclust(wetland.chord, wetland.hc, cmin=2, cmax=4, m = 1.2,
dnoise=0.85, method="NC")
## Studies the assignment of levels
relate.levels(wetland.nc1, wetland.nc2, method="cut")
Reshapes community data from individual into stratified form
Description
Function stratifyvegdata
reshapes individual abundance values into species abundance values per size class or combination of size classes. Function as.stratifiedvegdata
checks if the input list has appropriate properties and turns it into an object of class 'stratifiedvegdata
'.
Usage
stratifyvegdata(
x,
sizes1,
sizes2 = NULL,
treeSel = NULL,
spcodes = NULL,
plotColumn = "plot",
speciesColumn = "species",
abundanceColumn = "abundance",
size1Column = "size",
size2Column = NULL,
cumulative = FALSE,
counts = FALSE,
mergeSpecies = FALSE,
verbose = FALSE
)
as.stratifiedvegdata(X)
Arguments
x |
A data frame containing individual plant data. Individuals are in rows, while measurements are in columns. |
sizes1 |
A numerical vector containing the breaks for primary size classes in ascending order. |
sizes2 |
A numerical vector containing the breaks for secondary size classes in ascending order. |
treeSel |
A logical vector specifying which rows in |
spcodes |
A character vector indicating the codes of species to be used for stratification (species codes beyond those appearing in |
plotColumn |
The name of the column in |
speciesColumn |
The name of the column in |
abundanceColumn |
The name of the column in |
size1Column |
The name of the column in |
size2Column |
The name of the column in |
cumulative |
A flag to indicate that cumulative abundance profiles or surfaces are desired. |
counts |
A flag to indicate that the output should be individual counts instead of added abundance values. |
mergeSpecies |
A flag to indicate that species identity should be ignored. This leads to analyzing the structure of biomass disregarding species identity. |
verbose |
A logical flag to indicate extra output. |
X |
A list with as many elements as plot records. Each element should be of class 'matrix' or 'data.frame' with species in rows and strata in columns. Furthermore, the number of rows (species) and columns (strata) should be the same for all elements. |
Details
For each individual (row) in x
, stratifyvegdata
assigns it to the size class (stratum) containing its size. The corresponding abundance value (e.g. crown cover) of the individual is added to the abundance of the corresponding species at the size class (stratum). If sizes2
and size2Column
are supplied, the function assigns each individual (row) in x
to the combination of size classes (e.g. tree height and diameter).
Value
Both functions return an object of class 'stratifiedvegdata
', which is a list of matrices, one for each plot record. Each element (matrix) has as many rows as species and as many columns as size classes (i.e., as many as elements in vector sizes1
). Columns are named starting with 'S' and continuing with the size class (stratum) number. If mergeSpecies=TRUE
then all matrices have a single row (whose name is "all"
). If sizes2
and size2Column
are supplied to stratifyvegdata
, the function returns an object of class 'doublestratifiedvegdata
', which is a list of arrays, one for each plot record. Each element (array) has three dimensions corresponding to species, primary sizes (number of elements in in vector sizes1
) and secondary sizes (number of elements in in vector sizes2
). If cumulative=TRUE
then the function returns cumulative abundances (see CAP
and CAS
).
Author(s)
Miquel De Cáceres, CREAF.
References
De Cáceres, M., Legendre, P. & He, F. (2013) Dissimilarity measurements and the size structure of ecological communities. Methods in Ecology and Evolution 4: 1167-1177.
See Also
Examples
## Load tree data
data(treedata)
## Inspect tree data
head(treedata)
## Define stratum thresholds (4 strata)
heights <- seq(0,4, by=0.5)
diameters <- seq(0,2, by=0.5)
## Stratify tree data using heights as structural variable
X <- stratifyvegdata(treedata, sizes1=heights, plotColumn="plotID",
speciesColumn="species", size1Column="height", counts=TRUE)
## Inspect the second plot record
X[[2]]
## Stratify tree data using heights as structural variable and cover as abundance
Y <- stratifyvegdata(treedata, sizes1=heights, plotColumn="plotID",
speciesColumn="species", size1Column="height",
abundanceColumn="cover")
Y[[2]]
## Stratify tree data using heights and diameters as structural variables
Z <- stratifyvegdata(treedata, sizes1=heights, sizes2=diameters, plotColumn="plotID",
speciesColumn="species", size1Column="height", size2Column="diam",
counts=TRUE)
Z[[2]]
Synthetic vegetation data set with tree data
Description
A synthetic data set used to illustrate the stratification of data originally collected on an individual basis (e.g. forest inventory).
Format
A data frame where each row corresponds to a different tree. Columns are plot code, species identity, tree height, tree diameter and cover value.
See Also
Classifies vegetation communities
Description
Classifies vegetation communities into a previous fuzzy or hard classification.
Usage
vegclass(y, x)
Arguments
y |
An object of class |
x |
Community data to be classified, in form of a site by species matrix (if the vegclust object is in |
Details
This function uses the classification model specified in y
to classify the communities (rows) in x
. When vegclust is in raw
mode, the function calls first to conformveg
in order to cope with different sets of species. See the help of as.vegclust
to see an example of vegclass
with distance matrices.
Value
Returns an object of type vegclass
with the following items:
method
: The clustering model used iny
m
: The fuzziness exponent iny
dnoise
:The distance to the noise cluster used for noise clustering (models NC, NCdd, HNC, HNCdd). This is set toNULL
for other models.eta
: The reference distance vector used for possibilistic clustering (models PCM and PCMdd). This is set toNULL
for other models.memb
: The fuzzy membership matrix.dist2clusters
: The matrix of object distances to cluster centers.
Author(s)
Miquel De Cáceres, CREAF.
References
Davé, R. N. and R. Krishnapuram (1997) Robust clustering methods: a unified view. IEEE Transactions on Fuzzy Systems 5, 270-293.
Bezdek, J. C. (1981) Pattern recognition with fuzzy objective functions. Plenum Press, New York.
Krishnapuram, R. and J. M. Keller. (1993) A possibilistic approach to clustering. IEEE transactions on fuzzy systems 1, 98-110.
De Cáceres, M., Font, X, Oliva, F. (2010) The management of numerical vegetation classifications with fuzzy clustering methods [Related software]. Journal of Vegetation Science 21 (6): 1138-1151.
See Also
vegclust
, as.vegclust
, kmeans
, conformveg
Examples
## Loads data (38 columns and 33 species)
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Splits wetland data into two matrices of 30x27 and 11x22
wetland.30 <- wetland.chord[1:30,]
wetland.30 <- wetland.30[,colSums(wetland.30)>0]
dim(wetland.30)
wetland.11 <- wetland.chord[31:41,]
wetland.11 <- wetland.11[,colSums(wetland.11)>0]
dim(wetland.11)
## Create noise clustering with 3 clusters from the data set with 30 sites.
wetland.30.nc <- vegclust(wetland.30, mobileCenters=3, m = 1.2, dnoise=0.75,
method="NC", nstart=10)
## Cardinality of fuzzy clusters (i.e., the number of objects belonging to)
wetland.30.nc$size
## Classifies the second set of sites according to the clustering of the first set
wetland.11.nc <- vegclass(wetland.30.nc, wetland.11)
## Fuzzy membership matrix
wetland.11.nc$memb
## Obtains hard membership vector, with 'N' for objects that are unclassified
defuzzify(wetland.11.nc$memb)$cluster
Vegetation clustering methods
Description
Performs hard or fuzzy clustering of vegetation data
Usage
vegclust(
x,
mobileCenters,
fixedCenters = NULL,
method = "NC",
m = 2,
dnoise = NULL,
eta = NULL,
alpha = 0.001,
iter.max = 100,
nstart = 1,
maxminJ = 10,
seeds = NULL,
verbose = FALSE
)
vegclustdist(
x,
mobileMemb,
fixedDistToCenters = NULL,
method = "NC",
m = 2,
dnoise = NULL,
eta = NULL,
alpha = 0.001,
iter.max = 100,
nstart = 1,
seeds = NULL,
verbose = FALSE
)
Arguments
x |
Community data. A site-by-species matrix or data frame (for |
mobileCenters |
A number, a vector of seeds, or coordinates for mobile clusters. |
fixedCenters |
A matrix or data frame with coordinates for fixed (non-mobile) clusters. |
method |
A clustering model. Current accepted models are:
|
m |
The fuzziness exponent to be used (this is relevant for all models except for kmeans) |
dnoise |
The distance to the noise cluster, relevant for noise clustering (NC). |
eta |
A vector of reference distances, relevant for possibilistic C-means (PCM). |
alpha |
Threshold used to stop iterations. The maximum difference in the membership matrix of the current vs. the previous iteration will be compared to this value. |
iter.max |
The maximum number of iterations allowed. |
nstart |
If |
maxminJ |
When random starts are used, these will stop if at least |
seeds |
If |
verbose |
Flag to print extra output. |
mobileMemb |
A number, a vector of seeds, or starting memberships for mobile clusters. |
fixedDistToCenters |
A matrix or data frame with the distances to fixed cluster centers. |
Details
Functions vegclust
and vegclustdist
try to generalize the kmeans
function in stats
in three ways.
Firstly, they allows different clustering models. Clustering models can be divided in (a) fuzzy or hard; (b) centroid-based or medoid-based; (c) Partitioning (KM and FCM family), noise clustering (NC family), and possibilistic clustering (PCM and PCMdd). The reader should refer to the original publications to better understand the differences between models.
Secondly, users can specify fixed clusters (that is, centroids that do not change their positions during iterations). Fixed clusters are intended to be used when some clusters were previously defined and new data has been collected. One may allow some of these new data points to form new clusters, while some other points will be assigned to the original clusters. In the case of models with cluster repulsion (such as KM, FCM or NC) the new (mobile) clusters are not allowed to 'push' the fixed ones. As a result, mobile clusters will occupy new regions of the reference space.
Thirdly, vegclustdist
implements the distance-based equivalent of vegclust
. The results of vegclust
and vegclustdist
will be the same (if seeds are equal) if the distance matrix is calculated using the Euclidean distance (see function dist
). Otherwise, the equivalence holds by resorting on principal coordinates analysis.
Note that all data frames or matrices used as input of vegclust
should be defined on the same space of species (see conformveg
). Unlike kmeans
, which allows different specific algorithms, here updates of prototypes (centroids or medoids) are done after all objects have been reassigned (Forgy 1965). In order to obtain hard cluster definitions, users can apply the function defuzzify
to the vegclust
object.
Value
Returns an object of type vegclust
with the following items:
mode
:raw
for functionvegclust
anddist
for functionvegclustdist
.method
: The clustering model usedm
: The fuzziness exponent used (m=1
in case of kmeans)dnoise
: The distance to the noise cluster used for noise clustering (NC, HNC, NCdd or HNCdd). This is set toNULL
for other models.eta
: The reference distance vector used for possibilistic clustering (PCM or PCMdd). This is set toNULL
for other models.memb
: The fuzzy membership matrix. Columns starting with "M" indicate mobile clusters, whereas columns starting with "F" indicate fixed clusters.mobileCenters
: Ifvegclust
is used, this contains a data frame with the coordinates of the mobile centers (centroids or medoids). Ifvegclustdist
is used, it will contain the indices of mobile medoids for models KMdd, FCMdd, HNCdd, NCdd and PCMdd; orNULL
otherwise.fixedCenters
: Ifvegclust
is used, this contains a data frame with the coordinates of the fixed centers (centroids or medoids). Ifvegclustdist
is used, it will contain the indices of fixed medoids for models KMdd, FCMdd, HNCdd, NCdd and PCMdd; orNULL
otherwise.dist2clusters
: The matrix of object distances to cluster centers. Columns starting with "M" indicate mobile clusters, whereas columns starting with "F" indicate fixed clusters.withinss
: In the case of methods KM, FCM, NC, PCM and HNC it contains the within-cluster sum of squares for each cluster (squared distances to cluster center weighted by membership). In the case of methods KMdd, FCMdd, NCdd, HNCdd and PCMdd it contains the sum of distances to each cluster (weighted by membership).size
: The number of objects belonging to each cluster. In case of fuzzy clusters the sum of memberships is given.functional
: The objective function value (the minimum value attained after all iterations).
Author(s)
Miquel De Cáceres, CREAF
References
Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768-769.
MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam and J. Neyman, 1, pp. 281-297. Berkeley, CA: University of California Press.
Davé, R. N. and R. Krishnapuram (1997) Robust clustering methods: a unified view. IEEE Transactions on Fuzzy Systems 5, 270-293.
Bezdek, J. C. (1981) Pattern recognition with fuzzy objective functions. Plenum Press, New York.
Krishnapuram, R., Joshi, A., & Yi, L. (1999). A Fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. IEEE International Fuzzy Systems (pp. 1281–1286).
Krishnapuram, R. and J. M. Keller. (1993) A possibilistic approach to clustering. IEEE transactions on fuzzy systems 1, 98-110.
De Cáceres, M., Font, X, Oliva, F. (2010) The management of numerical vegetation classifications with fuzzy clustering methods. Journal of Vegetation Science 21 (6): 1138-1151.
See Also
hier.vegclust
,incr.vegclust
,kmeans
,vegclass
,defuzzify
,clustvar
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create noise clustering with 3 clusters. Perform 10 starts from random seeds
## and keep the best solution
wetland.nc <- vegclust(wetland.chord, mobileCenters=3, m = 1.2, dnoise=0.75,
method="NC", nstart=10)
## Fuzzy membership matrix
wetland.nc$memb
## Cardinality of fuzzy clusters (i.e., the number of objects belonging to each cluster)
wetland.nc$size
## Obtains hard membership vector, with 'N' for objects that are unclassified
defuzzify(wetland.nc$memb)$cluster
## The same result is obtained with a matrix of chord distances
wetland.d <- dist(wetland.chord)
wetland.d.nc <- vegclustdist(wetland.d, mobileMemb=3, m = 1.2, dnoise=0.75,
method="NC", nstart=10)
Reshapes as kmeans object
Description
This function casts an object of class vegclust
into an object of class kmeans
.
Usage
vegclust2kmeans(x)
Arguments
x |
An object of class |
Value
An object of class kmeans
Author(s)
Miquel De Cáceres, CREAF
See Also
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create noise clustering with 3 clusters. Perform 10 starts from random seeds
wetland.vc <- vegclust(wetland.chord, mobileCenters=3,
method="KM", nstart=10)
## Reshapes as kmeans object
wetland.km <- vegclust2kmeans(wetland.vc)
wetland.km
Fuzzy evaluation statistics
Description
Computes several evaluation statistics on the fuzzy clustering results on objects of class vegclust
.
Usage
vegclustIndex(y)
Arguments
y |
An object of class |
Details
These statistics were conceived to be computed on fuzzy partitions, such as the ones coming from Fuzzy C-means (Bezdek 1981). Maximum values of PCN or minimum values of PEN can be used as criteria to choose the number of clusters.
Value
Returns an vector of four values: partition coefficient (PC), normalized partition coefficient (PCN), partition entropy (PE) and normalized partition entropy (PEN).
Author(s)
Miquel De Cáceres, CREAF.
References
Bezdek, J. C. (1981) Pattern recognition with fuzzy objective functions. Plenum Press, New York.
See Also
Examples
## Loads data
data(wetland)
## This equals the chord transformation
wetland.chord <- as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create noise clustering with 2, 3 and 4 clusters. Perform 10 starts from random seeds
## and keep the best solutions
wetland.fcm2 <- vegclust(wetland.chord, mobileCenters=2, m = 1.2, method="FCM", nstart=10)
wetland.fcm3 <- vegclust(wetland.chord, mobileCenters=3, m = 1.2, method="FCM", nstart=10)
wetland.fcm4 <- vegclust(wetland.chord, mobileCenters=4, m = 1.2, method="FCM", nstart=10)
## Compute statistics. Both PCN and PEN indicate that three groups are more advisable
## than 2 or 4.
print(vegclustIndex(wetland.fcm2))
print(vegclustIndex(wetland.fcm3))
print(vegclustIndex(wetland.fcm4))
Structural and compositional dissimilarity
Description
Function to calculate the dissimilarity between ecological communities taking into account both their composition and the size of organisms.
Usage
vegdiststruct(
x,
y = NULL,
paired = FALSE,
type = "cumulative",
method = "bray",
transform = NULL,
classWeights = NULL
)
Arguments
x |
A stratified vegetation data set (see function |
y |
A second stratified vegetation data set (see function |
paired |
Only relevant when |
type |
Whether dissimilarities between pairs of sites should be calculated from differences in cummulative abundance ( |
method |
The dissimilarity coefficient to calculate (see details). |
transform |
A function or the name of a function to be applied to each cumulative abundance value. |
classWeights |
A numerical vector or a matrix containing the weight of each size class or combination of size classes (see functions |
Details
The six different coefficients available are described in De Caceres et al. (2013): (1) method="bray"
for percentage difference (alias Bray-Curtis dissimilarity); (2) method="ruzicka"
for Ruzicka index (a generalization of Jaccard); (3) method="kulczynski"
for the Kulczynski dissimilarity index; (4) method="ochiai"
for the complement of a quantitative generalization of Ochiai index of similarity; (5) method="canberra"
for the Canberra index (Adkins form); (6) method="relman"
for the relativized Manhattan coefficient (Whittaker's index of association). Currently, the function also supports (7) method="manhattan"
for the city block metric.
Value
Returns an object of class 'dist
'.
References
De Cáceres, M., Legendre, P. & He, F. (2013) Dissimilarity measurements and the size structure of ecological communities. Methods in Ecology and Evolution 4: 1167-1177.
See Also
Examples
## Load stratified data
data(medreg)
## Check that 'medreg' has correct class
class(medreg)
## Create cumulative abundance profile (CAP) for each plot
medreg.CAP <- CAP(medreg)
## Create dissimilarity (percentage difference) matrix using profiles
medreg.D <- vegdiststruct(medreg, method="bray")
## Create dissimilarity (percentage difference) matrix using abundances
medreg.D2 <- vegdiststruct(medreg, method="bray", type="total")
## Calculate correlation
cor(as.vector(medreg.D), as.vector(medreg.D2))
Wetland vegetation data set Vegetation of the Adelaide river alluvial plain (Australia). This data set was published by Bowman & Wilson (1987) and used in Dale (1988) to compare fuzzy classification approaches.
Description
Wetland vegetation data set
Vegetation of the Adelaide river alluvial plain (Australia). This data set was published by Bowman & Wilson (1987) and used in Dale (1988) to compare fuzzy classification approaches.
Format
A data frame with 41 sites (rows) and 33 species (columns). Abundance values are represented in abundance classes.
Source
Bowman, D. M. J. S. and B. A. Wilson. 1986. Wetland vegetation pattern on the Adelaide River flood plain, Northern Territory, Australia. Proceedings of the Royal Society of Queensland 97:69-77.
References
Dale, M. B. 1988. Some fuzzy approaches to phytosociology. Ideals and instances. Folia geobotanica et phytotaxonomica 23:239-274.
Examples
data(wetland)