Type: Package
Title: Simultaneous Selection by Trait and WAASB Index
Version: 0.3
Date: 2025-05-22.
Maintainer: Ali Arminian <abeyran@gmail.com>
Description: This tool proposes a new ranking algorithm that utilizes a "Y*WAASB" biplot generated by the 'metan'. The aim of the current package is to effectively distinguish the top-ranked genotypes in MET (Multi-Environmental Trials). For a detailed explanation of the process of obtaining "WAASB", "WAASBY" indices, and a "Y*WAASB" biplot, refer to the manual included in this package as well as the study by Olivoto & Lúcio (2020) <doi:10.1111/2041-210X.13384>. In this context, "WAASB" refers to the "Weighted Average of Absolute Scores" provided by Olivoto et al. (2019) <doi:10.2134/agronj2019.03.0220>, which quantifies the stability of genotypes across different environments using linear mixed-effect models. To run the package, you need to extract the "WAASB" and "WAASBY" coefficients using the 'metan' and apply them. This tool utilizes PCA (Principal Component Analysis) and differentiates the entries which may be genotypes, hybrids, varieties, etc using "WAASB", "WAASBY", and a combination of the specified trait and WAASB index.
License: GPL-3
URL: https://github.com/abeyran/rYWAASB
BugReports: https://github.com/abeyran/rYWAASB/issues
Depends: R (≥ 3.5)
Imports: ggplot2, graphics, factoextra, FactoMineR, lifecycle, mathjaxr
Suggests: car, metan, devtools, usethis, testthat, knitr, rmarkdown, roxygen2, spelling
VignetteBuilder: knitr
RdMacros: mathjaxr
Encoding: UTF-8
Copyright: Ali Arminian
RoxygenNote: 7.3.2
Language: en-US
LazyData: true
LazyLoad: true
NeedsCompilation: no
Author: Ali Arminian ORCID iD [aut, cre, cph]
Packaged: 2025-05-23 10:40:14 UTC; Administrator
Repository: CRAN
Date/Publication: 2025-05-23 11:02:02 UTC

Simultaneous Selection by Trait and WAASB Index

Description

rYWAASB performs a new ranking algorithm based on a "Y*WAASB" biplot generated by the 'metan' package which is used in MET(Multi-Environmental Trials). This package effectively distinguishes the top-ranked genotypes based on a given trait (e.g. grain yield or any other trait in agricultural experiments) and the "WAASB" index in the Genotype-by-environment interaction effect studies. Note: Fortunately, this package can impute missing observations and computes them, eliminating any concerns about their presence in the data set. A complete guide may be found at: https://github.com/abeyran/rYWAASB/issues

Author(s)

Ali Arminian abeyran@gmail.com

See Also

Useful links:


The PCA biplot with loadings

Description

[Stable]

Usage

PCA_biplot(datap, lowt = FALSE)

Arguments

datap

The data set

lowt

A parameter indicating whether lower rates of the trait is preferred or not. For grain yield e.g. Upper values is preferred. For plant height lower values e.g. is preferred.

Details

PCA is a machine learning method and dimension reduction technique. It is utilized to simplify large data sets by extracting a smaller set that preserves significant patterns and trends(1). According to Johnson and Wichern (2007), a PCA explains the var-covar structure of a set of variables \(X_1, X_2, ..., X_p\) with a less linear combinations of such variables. Moreover the common objective of PCA is 1) data reduction and 2) interpretation.

Biplot and PCA: The biplot is a method used to visually represent both the rows and columns of a data table. It involves approximating the table using a two-dimensional matrix product, with the aim of creating a plane that represents the rows and columns. The techniques used in a biplot typically involve an eigen decomposition, similar to the one used in PCA. It is common for the biplot to be conducted using mean-centered and scaled data(2). For scaling variables, the data can be transformed as follow: \[z = \frac{x-\bar{x}}{s(x)}\] where 's(x)' denotes the sample standard deviation of 'x' parameter, calculated as: \[s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2}\] Algebra of PCA: As Johnson and Wichern (2007) stated(3), if the random vector \(\mathbf{X'} = {X_1, X_2,...,X_p }\) have the covariance matrix \(\sum\) with eigenvalues \(\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \ge 0\).

Regarding the linear combinations: \[Y_1 = a'_1X = a_{11}X_1 + a_{12}X_2 + ... + a_{1P}X_p \] \[Y_2 = a'_2X = a_{21}X_1 + a_{22}X_2 + ... + a_{2p}X_p\] \[...\] \[Y_p = a'_pX = a_{p1}X_1 + a_{p2}X_2 + ... + a_{pp}X_p\]

where \(Var(Y_i) = \mathbf{a'_i\sum{a_i}}\) , i = 1, 2, ..., p \(Cov(Y_i, Y_k) = \mathbf{a'_i\sum{a_k}}\) , i, k = 1, 2, ..., p

The principal components refer to the uncorrelated linear combinations \(Y_1, Y_2, ..., Y_p\) which aim to have the largest possible variances.

For the random vector \(\mathbf{X'}=\left [ X_1, X_2, ..., X_p \right ]\), if \(\mathbf{\sum}\) be the associated covariance matrix, then \(\mathbf{\sum}\) have the eigenvalue-eigenvector pairs \((\lambda_1, e_1), (\lambda_2, e_2), ..., (\lambda_p, e_p)\), and as said \(\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \ge 0\).

Then the \(\it{i}\)th principal component is as follows: \[Y_i = \mathbf{e'_iX} = e_{i1}X_1 + e_{i2}X_2 + ... + e_{ip}X_p, i = 1, 2, ..., p\], where \(Var(Y_i) =\mathbf(e'_i\sum{e_i}) = \lambda_i, i = 1, 2, ..., p\) \(Cov(Y_i, Y_k) = \mathbf{e'_i\sum e_i = 0, i \not\equiv k}\), and: \(\sigma_{11} + \sigma_{22} + ... + \sigma_{pp} = \sum_{i=1}^p{Var(X_i)} = \lambda_1 + \lambda_2 + ... + \lambda_p = \sum_{i=1}^p{Var(Y_i)}\).

Interestingly, Total population variance = \(\sigma_{11} + \sigma_{22} + ... + \sigma_{pp} = \lambda_1 + \lambda_2 + ... + \lambda_{p}\).

Another issues that are significant in PCA analysis are:

  1. The proportion of total variance due to (explained by) the \(\mathit{k}\)th principal component: \[\frac{\lambda_k}{(\lambda_1 + \lambda_2 + ... + \lambda_p)}, k=1, 2, ..., p\]

  2. The correlation coefficients between the components \(Y_i\) and the variables \(X_k\) is as follows: \(\rho_{Y_i, X_k} = \frac{e_{ik}\sqrt{\lambda_i}}{\sqrt{\sigma_{kk}}}\), i,k = 1, 2, ..., p

Please note that PCA can be performed on Covariance or ⁠correlation matrices⁠. And before PCA the data should be centered, generally.

Value

Returns a a list of dataframes

Author(s)

Ali Arminian abeyran@gmail.com

References

(1) https://builtin.com

(2) https://pca4ds.github.io/biplot-and-pca.html.

(3) Johnson, R.A. and Wichern, D.W. 2007. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 773 p.

Examples

# Case 1: for maize dataset, grain yield

data(maize)
PCA_biplot(maize) # or: PCA_biplot(maize, lowt = FALSE)

# Case 2: for days to maturity (dm) trait of chickpea

data(dm)
PCA_biplot(dm, lowt = TRUE)


The first barplot of the ranks of genotypes

Description

[Stable]

Usage

bar_plot1(datap, lowt = FALSE)

Arguments

datap

The data set

lowt

A parameter indicating whether lower rates of the trait is preferred or not. For grain yield e.g. Upper values is preferred. For plant height lower values e.g. is preferred.

Value

Returns an object of class ⁠gg, ggmatrix⁠.

Author(s)

Ali Arminian abeyran@gmail.com

References

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Examples

# Case 1:  Higher trait values are preferred. For instance grain yield
# in cereals is a trait which its higher values are preferred and ranking
# is performed from the higher to lower values i.e. 1st, 2nd, 3rd etc
# in maize dataset.

data(maize)
bar_plot1(maize) # or: bar_plot1(maize, lowt = FALSE)

# Case 2:  In this case, the lower values of the given trait are preferred.
# For instance days to maturity (dm) and plant height are traits where their
# lower values are preferred.

data(dm)
bar_plot1(dm, lowt = TRUE)


The second barplot of the ranks of genotypes

Description

[Stable]

Usage

bar_plot2(datap, lowt = FALSE, verbose = FALSE)

Arguments

datap

The data set

lowt

A parameter indicating whether lower rates of the trait is preferred or not. For grain yield e.g. Upper values is preferred. For plant height lower values e.g. is preferred.

verbose

If verbose = TRUE then some results are printed

Value

Returns an object of class ⁠gg, graphics⁠

Author(s)

Ali Arminian abeyran@gmail.com

Examples

# Case 1:  Higher trait values are preferred. For instance grain yield
# in cereals is a trait which its higher values are preferred and ranking
# is performed from the higher to lower values i.e. 1st, 2nd, 3rd etc
# in maize dataset.

data(maize)
bar_plot2(maize) # or: bar_plot2(maize, lowt = FALSE, verbose = FALSE)

# Case 2:  In this case, the lower values of the given trait are preferred.
# For instance days to maturity (dm) and plant height are traits where their
# lower values are preferred.

data(dm)
bar_plot2(dm, lowt = TRUE, verbose = TRUE)


Dataset3: a tibble containing ENV, GEN, REP factors and GY(grain yield) and HM agronomic traits from the metan package.

Description

Dataset3: a tibble containing ENV, GEN, REP factors and GY(grain yield) and HM agronomic traits from the metan package.

Usage

data(data_ge)

Format

A data.frame with 420 rows in 5 columns.

ENV

a character vector

GEN

a character vector

REP

a character vector

GY

a numeric vector

HM

a numeric vector

References

Olivoto, T., & Lúcio, A.D.C.2020. metan: An R package for multi‐environment trial analysis. Methods in Ecology and Evolution , 11(6), 783-789.

Examples


library(rYWAASB)
data(data_ge)


Dataset2: a tibble containing GEN, Trait (days to maturity), WAASB and WAASBY indexes.

Description

Dataset2: a tibble containing GEN, Trait (days to maturity), WAASB and WAASBY indexes.

Usage

data(dm)

Format

A data.frame with 13 observations (chickpea genotypes) within rows and columns including the trait i.e. days to maturity (named as Y), WAASB and WAASBY indexes values.

GEN

a character vector saved as factor

Y

a numeric vector

WAASB

a numeric vector

WAASBY

a numeric vector

Examples


library(rYWAASB)
data(dm)
ranki(dm)
bar_plot1(dm)
bar_plot2(dm)
PCA_biplot(dm)


Dataset1: a tibble containing GEN, Trait(here grain yield), WAASB and WAASBY indexes.

Description

Dataset1: a tibble containing GEN, Trait(here grain yield), WAASB and WAASBY indexes.

Usage

data(maize)

Format

A data.frame with 20 observations (genotypes) within rows and columns including the trait (named as Y), WAASB and WAASBY indexes values.

GEN

a character vector saved as factor

Y

a numeric vector

WAASB

a numeric vector

WAASBY

a numeric vector

The input format of table of data(NA free), here maize data, should be as follows:

GEN Y WAASB WAASBY
Dracma 262.22 0.81 81.6
DKC6630 284.04 2.20 88.5
NS770 243.48 0.33 71.4
...

Examples


library(rYWAASB)
data(maize)
ranki(maize)
bar_plot1(maize)
bar_plot2(maize)
PCA_biplot(maize)


Data read and estimate the cluster number

Description

[Experimental]

nbclust() reads and prepares the data, and determine the optimum number of clusters using Average Silhouette Method by factoextra package. The average silhouette approach assesses the quality of clustering by evaluating how well each object fits within its cluster. A high average silhouette width signifies effective clustering. This method calculates the average silhouette for different values of k, and the optimal number of clusters (k) is the one that maximizes the average silhouette across a range of potential k values.

Usage

nbclust(datap, verbose = FALSE)

Arguments

datap

The data set

verbose

If verbose = TRUE then some results are

Details

The silhouette coefficient (SC) refers to a criterion to decide number of clusters.It is defined as follows. Though there are numerous methods determining number of clusters such as ⁠the gap statistic⁠ etc.

\[SC = maxK{\bar{S}K}\]

.

In other words, for each observation \(i\), the ⁠silhouette width⁠ \(s(i)\) is defined as follows: Put \(a(i)\) = average dissimilarity between i and all other points of the cluster to which i belongs (if i is the only observation in its cluster, \(s(i):= 0\) without further calculations). For all other clusters C, put \(d(i, C)\) = average dissimilarity of i to all observations of C. The smallest of these \(d(i, C)\) is \(b(i)=min\it(C) d(i, C)\), and can be seen as the dissimilarity between i and its “neighbor” cluster, i.e., the nearest one to which it does not belong. Finally, \[s(i)=\frac{b(i)-a(i)}{max(a(i), b(i))}\]

Value

Returns a data frame

Author(s)

Ali Arminian abeyran@gmail.com

References

Lletı, R., Ortiz, M.C., Sarabia, L.A., Sánchez, M.S. 2004. Selecting variables for k-means cluster analysis by using a genetic algorithm that optimizes the silhouettes, Analytica Chimica Acta, 515(1): 87-100.

Rousseeuw, P.J. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53-65.

https://uc-r.github.io/

Examples


library(factoextra)
data(maize)
maize <- as.data.frame(maize)
row.names(maize) <- maize[, 1]
maize[, 1] = NULL
GEN <- row.names(maize)
maize <- scale(maize)
nbclust(maize, verbose = FALSE)

# Performing bootstrap or jackknife clustering
# by shipunov package. The examples should be run in the
# console manually due to problems occurs in the ORPHANED
# package `shipunov`.
#
# library(shipunov)
# 1- Bootstrap clustering:
# data.jb <- Jclust(maize,
#   method.d = "euclidean",
#    method.c = "average", n.cl = 2,
#     bootstrap = TRUE)
#
# plot.Jclust(data.jb, top=TRUE, lab.pos=1,
#  lab.offset=1, lab.col=2, lab.font=2)
# Fence(data.jb$hclust, GEN)
#
# data.jb <- Jclust(maize,
#  method.d = "euclidean",
#    method.c = "ward.D", n.cl = 2,
#     bootstrap = TRUE)
#
# plot.Jclust(data.jb, top=TRUE, lab.pos=1,
#            lab.offset=1, lab.col=2, lab.font=2)
# Fence(data.jb$hclust, GEN)
#
# if(verbose = TRUE):
# cat("\nnumber of iterations:\n", data.jb$iter, "\n")
#
# for "bootstrap":
# data.jb$mat <- as.matrix((data.jb$mat))
# data.jb$mat
# cat("\nmatrix of results:\n", data.jb$mat, "\n")
# cat("clustering info, by eucledean distance measure:\n")
# print(data.jb$hclust)
# cat("groups:\n", data.jb$gr, "\n")
# cat("\nsupport values:\n", data.jb$supp, "\n")
# cat("\nnumber of clusters used:\n", data.jb$n.cl, "\n")

# 2- Jackknife clustering:
# data.jb <- Bclust(maize,
#   method.d = "euclidean", method.c = "average",
#    bootstrap = FALSE)
# plot(data.jb)
#
# data.jb <- Bclust(maize,
#   method.d = "euclidean", method.c = "ward.D",
#    bootstrap = FALSE)
# plot(data.jb)
#
# if(verbose = TRUE):
# For"jackknife":
# cat("Consensus:\n", data.jb$consensus, "\n")
# cat("Vlaues:\n", data.jb$values, "\n")


The values and ranks of genotypes

Description

[Stable]

ranki() function ranks the genotypes (or entries) based on a new index utilizing the given trait and "WAASB" index to simultaneous select the top-ranked ones. This can be compared with WAASBY index of Olivoto (2019). We suggest users handle the missing data in inputs before considering analyses, due rank codes dose not implement a widespread algorithm to do this task. WAASB(Weighted Average of Absolute Scores), Computes the Weighted Average of Absolute Scores (Olivoto et al., 2019) for quantifying the stability of g genotypes conducted in e environments using linear mixed-effect models.

Usage

ranki(datap,  lowt = FALSE)

Arguments

datap

The data set

lowt

A parameter indicating whether lower rates of the trait is preferred or not. For grain yield e.g. Upper values is preferred. For plant height lower values e.g. is preferred.

Details

According to Olivoto et al. (2019a), WAASB(The weighted average of absolute scores) is computed considering all Interaction Principal Component Axis (IPCA) from the Singular Value Decomposition (SVD) of the matrix of genotype-environment interaction (GEI) effects generated by a linear mixed-effect model, as follows:

\[ WAASB_i = \sum_{k = 1}^{p} |IPCA_{ik} \times EP_k|/ \sum_{k = 1}^{p}EP_k\]

where \(WAASB_i\) is the weighted average of absolute scores of the ith genotype; \(IPCA_{ik}\) is the score of the ith genotype in the kth Interaction Principal Component Axis (IPCA); and \(EP_k\) is the explained variance of the kth IPCA for k = 1,2,..,p, considering \(p=min(g-1; e-1)\).

Further, \(WAASBY_i\) is a superiority or simultaneous selection index allowing weighting between mean performance and stability \[ WAASBY_i=\frac{\left({rY}_i\times\theta_Y\right)+ \left({rW}_i\times\theta_s\right)}{\theta_Y+\theta_s} \] , where \(WAASBY_i\) is the superiority index for genotype \(\it{i}\) that weights between mean performance and stability; \(\theta_Y\) and \(\theta_s\) are the weights for mean performance and stability, respectively; \({rY_i}\) and \({rW}_i\) are the rescaled values for mean performance \(\bar{Y_i}\) and stability \(W_i\), respectively of the genotype i. For the details of calculations, rescaling and mathematics notations see (Olivoto et al., 2019).

Finally, \(rYWAASB_i\) index is the sum of the ranks (or in fact the rank of sum of ranks of the trait and WAASB index) as follows: (\(rY_i\)) and WAASB index (\(rWAASB_i\)) for each individual:

\[ rYWAASB_i = {rY_i} + {rWAASB_i}\]

or: = \[rank{{rY_i} + {rWAASB_i}}\].

The input format of table of data(NA free), here maize data, should be as follows:

GEN Y WAASB WAASBY
Dracma 262.22 0.81 81.6
DKC6630 284.04 2.20 88.5
NS770 243.48 0.33 71.4
...

Value

Returns a data frame showing numerical rankings

Author(s)

Ali Arminian abeyran@gmail.com

References

Olivoto, T., Lúcio, A., DC, da Silva, J.A.G., Sari, B.G. and Diel, M. 2019. Mean performance and stability in multi-environment trials II: Selection based on multiple traits. Agronomy Journal, 111(6):2961-2969.

Olivoto, T., & Lúcio, A.D.C.2020. metan: An R package for multi‐environment trial analysis. Methods in Ecology and Evolution, 11(6), 783-789.

Kang, M.S. 1988. “A Rank-Sum Method for Selecting High-Yielding, Stable Corn Genotypes.” Cereal Research Communications 16: 113–15.

Examples

# Case 1:  Higher trait values are preferred. For instance grain yield
# in cereals is a trait which its higher values are preferred and ranking
# is performed from the higher to lower values i.e. 1st, 2nd, 3rd etc
# in maize dataset.

data(maize)
ranki(maize) # or: ranki(maize, lowt = FALSE)

# Case 2:  In this case, the lower values of the given trait are preferred.
# For instance days to maturity (dm) and plant height are traits where their
# lower values are preferred.

data(dm)
ranki(dm, lowt = TRUE)