Type: | Package |
Title: | Core Collection |
Version: | 0.9.5 |
Description: | Create a custom sized Core Collection based on a distance matrix and applying the A-NE (accession nearest entry), E-NE (entry nearest entry) or E-E (entry entry) method as introduced in Jansen and van Hintum (2007) <doi:10.1007/s00122-006-0433-9> and further elaborated on in Odong, T.L. (2012) https://edepot.wur.nl/212422. Optionally a list of preselected accessions to be included into the core can be set. For each accession in the computed core, if available nearby accessions are retrievable that can be used as an alternative. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
Depends: | R (≥ 3.5) |
LinkingTo: | Rcpp |
Imports: | Rcpp (≥ 1.0.0), R6 (≥ 2.4.0), methods |
Collate: | coreCollection.package.R coreCollection.R coreSelection.R RcppExports.R |
Suggests: | testthat (≥ 3.0.0), vcfR, adegenet, ggfortify |
RoxygenNote: | 7.2.2 |
NeedsCompilation: | yes |
Packaged: | 2022-12-20 08:37:25 UTC; matthijs |
Author: | Matthijs Brouwer |
Maintainer: | Matthijs Brouwer <matthijs.brouwer@wur.nl> |
Config/testthat/edition: | 3 |
Repository: | CRAN |
Date/Publication: | 2022-12-20 13:00:02 UTC |
The coreCollection package
Description
This package can be used to create a CoreCollection object.
Author(s)
Matthijs Brouwer <matthijs.brouwer@wur.nl>
References
Odong, T.L. (2012) Quantative methods for sampling of germplasm collections -
Getting the best out of molecular markers when creating core collections. PhD diss., Wageningen
University and Research, Wageningen, The Netherlands. http://edepot.wur.nl/212422
Jansen, J & Hintum, Theo. (2007) Genetic distance sampling: A novel
sampling method for obtaining core collections using genetic distances with an
application to cultivated lettuce. TAG. Theoretical and applied genetics.
Theoretische und angewandte Genetik. 114. 421-8. 10.1007/s00122-006-0433-9
See Also
- vcfR provides a suite of tools for input and output of variant call format (VCF) files, manipulation of their content and visualization.
- adegenet provides the genlight
class for genome-wide SNP data, and includes a method to create a distance matrix.
Other core collection:
CoreCollection()
The CoreCollection Class
Description
The CoreCollection Class
The CoreCollection Class
Format
A R6Class
generator object
Methods
Public methods
Method recompute()
Usage
.CoreCollectionClass$recompute()
Method alternativeCore()
Usage
.CoreCollectionClass$alternativeCore(n)
Method new()
Usage
.CoreCollectionClass$new( distanceMatrix, n, preselected, coreSelectMethod, adjustedGroupMethod, algorithm, seed )
Method print()
Usage
.CoreCollectionClass$print(...)
Method summary()
Usage
.CoreCollectionClass$summary(...)
Method measure()
Usage
.CoreCollectionClass$measure(coreSelectMethod)
Method measures()
Usage
.CoreCollectionClass$measures(...)
Method clone()
The objects of this class are cloneable with this method.
Usage
.CoreCollectionClass$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
The CoreCollection class
Description
R6 class for creating a core collection based on the provided distanceMatrix
,
required size of the core n
and optionally a set of preselected
accessions to be included
into the core.
Usage
CoreCollection(
distanceMatrix,
n,
preselected = c(),
coreSelectMethod = "A-NE",
adjustedGroupMethod = "split",
algorithm = "randomDescent",
seed = NULL
)
Arguments
distanceMatrix |
|
n |
The number of items in the core |
preselected |
An optional list of preselected accessions to be included in the core collection; the provided accessions should occur in the labels or rownames of the provided distanceMatrix |
coreSelectMethod |
The method for computing core accessions within the groups:
|
adjustedGroupMethod |
The method to handle adjusting groups when multiple preselected accessions occur within a single group:
|
algorithm |
Algorithm applied to compute a solution: currently, only |
seed |
The seed used when generating the core collection. If no seed is provided, a random
seed is chosen and each time the |
Details
Based on a provided distanceMatrix
and required number n
of accessions
within the core, a random set of accessions is created, implicitly dividing the full
population into initial groups based on the nearest randomly chosen random accession. If a
set of preselected
accessions is provided, this initial division is adjusted using the
adjustedGroupMethod
. Then, using the coreSelectMethod
in the algorithm
, the
core accessions within these groups are calculated, resulting in the final core collection.
Fields
adjustedBasedGroups
A list describing the initial random division of all accessions into groups, adjusted for the set of
preselected
accessions by using the definedadjustedGroupMethod
.adjustedGroupMethod
The method to handle adjusting groups when multiple preselected accessions occur within a single group.
adjustedSelected
A data.frame representing the intial random selection of accesions, adjusted for the set of
preselected
accessions by using the definedadjustedGroupMethod
, with the accession names as labels and the following columns:-
contains
: the (positive) number of accessions that have this accessions as the closest random selected accession -
preselects
: the number of these closest accessions that were preselected -
preselected
: a boolean indicating if the random selected accession was preselected -
random
: a boolean indiciating if the selected accesion was initially randomly chosen or introduced later by the appliedadjustedGroupMethod
.
-
algorithm
The applied algorithm to compute the solution.
core
A data.frame representing the core collection with the accession names as labels and in the first and only column a boolean value indicating whether or not the accession was preselected.
coreSelectMethod
The applied method to select the core accessions based on the computed
adjustedBasedGroups
.distanceMatrix
The distance matrix; this will allways be a dist object.
n
The required core size
pop
A data.frame representing the whole collection with the accession names as labels and in the first and only column:
-
result
: a string describing if the accession is marked asother
or as included in thecore
, and if in thecore
because it waspreselected
or because of the appliedcoreSelectMethod
.
-
preselected
The list of preselected accessions.
randomBasedGroups
A list with the initial division into groups based on the initial random selection of accessions described by
randomSelected
. Each item describes all accessions that have the random selected accesion from the label as the nearest neighbour, including the random selected accession.randomSelected
A data.frame representing the intial random selection of accesions with the accession names as labels and the following columns:
-
contains
: the (positive) number of accessions that have this accessions as the closest random selected accession -
preselects
: the number of these closest accessions that were preselected -
preselected
: a boolean indicating if the random selected accession was preselected -
random
: a boolean indiciating if the random selected accesion was randomly chosen. This will always be TRUE for this field, but including this column makes the output comparable withadjustedSelected
.
-
seed
The last applied seed for the randomizer. This will only change when the
recompute()
method is called and no initialseed
is defined.
Methods
alternativeCore(n)
The
n
th alternative core withn
a positive integer. Provides for each accession in the core, if available, then
th nearest accession from within the same group as an alternative.clone(deep = FALSE)
The default R6Class clone method.
initialize(distanceMatrix, n, preselected, coreSelectMethod, adjustedGroupMethod, algorithm, seed)
Initialisation of the object, is called automatically on creation or recomputing.
measure(coreSelectMethod)
The measure for the provided
coreSelectMethod
. If no value is provided, the current selectedcoreSelectMethod
is used. The measure is used by the algorithm to compute the core collection.measures()
A data.frame with the available
coreSelectMethods
as labels and in the first and only column the measures for these methods.recompute()
Recompute the core collection: If on initialisation of the object a seed was provided, this same seed will be applied and therefore the same core collection will be created. Otherwise, a new seed is generated, resulting in a new core.
print()
Create a summary of the core collection object, same as
summary()
.summary()
Create a summary of the core collection object, same as
print()
.
See Also
Other core collection:
coreCollection-package
Compute selection - recompute method
Description
The function computeAdjustedSelectionUsingRecomputeMethod
is used internally by
the CoreCollection
object to compute an adjusted selection using the recompute
method.
Usage
computeAdjustedSelectionUsingRecomputeMethod(dist, adjustedSelected)
Arguments
dist |
distance matrix, used for distances and implicitly defining the set of entries |
adjustedSelected |
the selected entries defined as a list of zero-based integers referring to the row/columns of |
Details
This function returns a list describing for each of the row/columns entries of dist
the
closest selected entry. The entries are implicetly defined by the row/columns of
dist
and referred to by a zero-based integer describing the position.
Compute selection - split method
Description
The function computeAdjustedSelectionUsingSplitMethod
is used internally by
the CoreCollection
object to compute an adjusted selection using the split
method.
Usage
computeAdjustedSelectionUsingSplitMethod(dist, groups, preselected)
Arguments
dist |
distance matrix, used for distances and implicitly defining the set of entries |
groups |
the initial division into group defined as a list of zero-based integers referring to the row/columns of |
preselected |
the set of preselected entries |
Details
This function returns a list describing for each of the row/columns entries of dist
the
corresponding entry referred to in groups
. However, groups with one or multiple
preselected
entries are divided, and the returned list wil contain references to
the closest preselected entry within this group, implying a split if multiple preselected
entries occur within one group. The entries are implicetly defined by the row/columns of
dist
and referred to by a zero-based integer describing the position.
Compute the core
Description
The function computeCore
is used internally by
the CoreCollection
object to compute the core.
Usage
computeCore(algorithm, method, dist, groups)
Arguments
algorithm |
applied algorithm to find solution with method: currently, only |
method |
required method for choosing the entries within the groups: |
dist |
distance matrix, used for distances and implicitly defining the set of entries |
groups |
the initially created subdivision into groups |
Details
The A-NE
method requires the core to minimize the average distance between each accession and
the nearest entry within the core. The E-NE
method requires the core to maximize the
average distance between each core entry and its nearest neighbouring entry within the core.
The E-E
method requires the core to maximize the average distance between all core entries.
Compute a random selection
Description
The function computeRandomSelection
is used internally by
the CoreCollection
object.
Usage
computeRandomSelection(dist, requiredN, preselected, seed)
Arguments
dist |
distance matrix, used for distances and implicitly defining the set of entries |
requiredN |
the required size of the random selection |
preselected |
a list of preselected entries, referring to the row/column of |
seed |
the applied seed for the randomizer |
Details
This function returns a random selection of approximately size
requiredN
by choosing entries sequentually and randomly, while excluding all entries
within a certain radius of an entry chosen before, and by finding iteratively the most
appropiate radius to end up with a number close to requiredN
of selected entries.