Title: | Prepare and Explore Data for Palaeobiological Analyses |
Version: | 1.4.0 |
Description: | Provides functionality to support data preparation and exploration for palaeobiological analyses, improving code reproducibility and accessibility. The wider aim of 'palaeoverse' is to bring the palaeobiological community together to establish agreed standards. The package currently includes functionality for data cleaning, binning (time and space), exploration, summarisation and visualisation. Reference datasets (i.e. Geological Time Scales https://stratigraphy.org/chart) and auxiliary functions are also provided. Details can be found in: Jones et al., (2023) <doi:10.1111/2041-210X.14099>. |
License: | GPL (≥ 3) |
Language: | en-GB |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 4.0) |
Imports: | stats, utils, graphics, methods, curl, ape, sf, stringdist, geosphere, h3jsr (≥ 1.3.0), httr, pbapply, lifecycle |
Suggests: | rmarkdown, knitr, testthat (≥ 3.0.0), vdiffr (≥ 1.0.0), paleotree, phytools, covr |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
URL: | https://palaeoverse.palaeoverse.org, https://github.com/palaeoverse/palaeoverse, https://palaeoverse.org |
BugReports: | https://github.com/palaeoverse/palaeoverse/issues |
NeedsCompilation: | no |
Packaged: | 2024-10-14 14:00:20 UTC; lewis |
Author: | Lewis A. Jones |
Maintainer: | Lewis A. Jones <LewisA.Jones@outlook.com> |
Repository: | CRAN |
Date/Publication: | 2024-10-14 15:00:02 UTC |
palaeoverse: Prepare and Explore Data for Palaeobiological Analyses
Description
Provides functionality to support data preparation and exploration for palaeobiological analyses, improving code reproducibility and accessibility. The wider aim of 'palaeoverse' is to bring the palaeobiological community together to establish agreed standards. The package currently includes functionality for data cleaning, binning (time and space), exploration, summarisation and visualisation. Reference datasets (i.e. Geological Time Scales https://stratigraphy.org/chart) and auxiliary functions are also provided. Details can be found in: Jones et al., (2023) doi: 10.1111/2041-210X.14099.
Author(s)
Maintainer: Lewis A. Jones LewisA.Jones@outlook.com (ORCID)
Authors:
William Gearty willgearty@gmail.com (ORCID)
Bethany J. Allen Bethany.Allen@bsse.ethz.ch (ORCID)
Kilian Eichenseer kilian.eichenseer@gmail.com (ORCID)
Christopher D. Dean christopherdaviddean@gmail.com (ORCID)
Joseph T. Flannery-Sutherland jf15558@bristol.ac.uk (ORCID)
Other contributors:
Sofia Galvan sofia.galvan@uvigo.es (ORCID) [contributor]
Miranta Kouvari m.kouvari@ucl.ac.uk (ORCID) [contributor]
Pedro L. Godoy pedrolorenagodoy@gmail.com (ORCID) [contributor]
Cecily Nicholl cecily.nicholl@ucl.ac.uk (ORCID) [contributor]
Lucas Buffan lucas.l.buffan@gmail.com (ORCID) [contributor]
Erin M. Dillon emdillon23@gmail.com (ORCID) [contributor]
A. Alessandro Chiarenza a.chiarenza15@gmail.com (ORCID) [contributor]
See Also
Useful links:
Report bugs at https://github.com/palaeoverse/palaeoverse/issues
Geological Timescale 2012
Description
A dataframe of the Geological Timescale 2012. Age data from the International Commission on Stratigraphy. Supplementary information is also included in the dataset for plotting functionality (e.g. GTS2012 colour scheme).
Usage
GTS2012
Format
A data frame with 186 rows and 9 variables:
- interval_number
Index number for the temporal order of all intervals present in the dataset.
- interval_name
Names of intervals in the dataset.
- rank
The temporal rank of intervals in the dataset.
- max_ma
The maximum age of the interval in millions of years before present.
- mid_ma
The midpoint age of the interval in millions of years before present.
- min_ma
The minimum age of the interval in millions of years before present.
- duration_myr
The duration of the interval in millions of years.
- font
Colour of font to use for plotting in conjunction with the colour column.
- colour
Colours of stages based on the ICS timescale.
- abbr
Standard abbreviations of interval names where appropiate.
References
Gradstein, F.M., Ogg, J.G., Schmitz, M.D. and Ogg, G.M. eds. (2012).
Geologic Timescale 2012. Elsevier.
Source
Compiled by Lewis A. Jones (2022-07-02) from the ICS.
Geological Timescale 2020
Description
A dataframe of the Geological Timescale 2020. Age data from the International Commission on Stratigraphy. Supplementary information is included in the dataset for plotting functionality (e.g. GTS2020 colour scheme).
Usage
GTS2020
Format
A data frame with 189 rows and 9 variables:
- interval_number
Index number for the temporal order of all intervals present in the dataset.
- interval_name
Names of intervals in the dataset.
- rank
The temporal rank of intervals in the dataset.
- max_ma
The maximum age of the interval in millions of years before present.
- mid_ma
The midpoint age of the interval in millions of years before present.
- min_ma
The minimum age of the interval in millions of years before present.
- duration_myr
The duration of the interval in millions of years.
- font
Colour of font to use for plotting in conjunction with the colour column.
- colour
Colours of stages based on the ICS timescale.
- abbr
Standard abbreviations of interval names where appropiate.
References
Gradstein, F.M., Ogg, J.G., Schmitz, M.D. and Ogg, G.M. eds. (2020).
Geologic Timescale 2020. Elsevier.
Source
Compiled by Lewis A. Jones (2022-07-02) from the ICS.
Add an axis with a geological timescale
Description
axis_geo
behaves similarly to axis
in that it
adds an axis to the specified side of a base R plot. The main difference is
that it also adds a geological timescale between the plot and the axis. The
default scale includes international epochs from the the Geological Timescale
2020 (GTS2020
). However, international stages, periods, eras,
and eons are also available. Interval data hosted by
Macrostrat are also available (see
time_bins
). A custom interval dataset can also be used (see
Details below). The appearance of the axis is highly customisable (see Usage
below), with the intent that plots will be publication-ready.
Usage
axis_geo(
side = 1,
intervals = "epoch",
height = 0.05,
fill = NULL,
lab = TRUE,
lab_col = NULL,
lab_size = 1,
rot = 0,
abbr = TRUE,
center_end_labels = TRUE,
skip = c("Quaternary", "Holocene", "Late Pleistocene"),
bord_col = "black",
lty = par("lty"),
lwd = par("lwd"),
bkgd = "grey90",
neg = FALSE,
exact = FALSE,
round = FALSE,
tick_at = NULL,
tick_labels = TRUE,
phylo = FALSE,
root.time = NULL,
...
)
axis_geo_phylo(...)
Arguments
side |
|
intervals |
The interval information to use to plot the axis: either A)
a |
height |
|
fill |
|
lab |
|
lab_col |
|
lab_size |
|
rot |
|
abbr |
|
center_end_labels |
|
skip |
A |
bord_col |
|
lty |
|
lwd |
|
bkgd |
|
neg |
|
exact |
|
round |
|
tick_at |
A |
tick_labels |
Either a) a |
phylo |
|
root.time |
|
... |
Further arguments that are passed directly to
|
Details
If a custom data.frame
is provided (with intervals
), it should
consist of at least 3 columns of data. See GTS2020
for an
example.
The
interval_name
column (name
is also allowed) lists the names of each time interval. These will be used as labels if no abbreviations are provided.The
max_ma
column (max_age
is also allowed) lists the oldest boundary of each time interval. Values should always be positive.The
min_ma
column (min_age
is also allowed) lists the youngest boundary of each time interval. Values should always be positive.The
abbr
column is optional and lists abbreviations that may be used as labels.The
colour
column (color
is also allowed) is also optional and lists a colour for the background for each time interval (see the Color Specification sectionhere
).The
font
(lab_color
is also allowed) column is also optional and lists a colour for the label for each time interval (see the Color Specification sectionhere
).
intervals
may also be a list if multiple time scales should be added
to a single side of the plot. In this case, height
, fill
,
lab
, lab_col
, lab_size
, rot
, abbr
,
center_end_labels
, skip
, bord_col
, lty
, and
lwd
can also be lists. If these lists are not as long as
intervals
, the elements will be recycled. If individual values
(or vectors, e.g. for skip
) are used for these parameters, they will
be applied to all time scales (and recycled as necessary). If multiple scales
are requested they will be added sequentially outwards starting from the plot
border. The axis will always be placed on the outside of the last scale.
If you would like to use intervals from the Geological Time Scale 2012
(GTS2012
), you can use time_bins
and supply the
returned data.frame
to the intervals
argument.
axis_geo_phylo(...)
is shorthand for
axis_geo(..., phylo = TRUE)
.
Value
No return value. Function is used for its side effect, which is to add an axis of the geological timescale to an already existing plot.
Authors
William Gearty & Kilian Eichenseer
Reviewer
Lewis A. Jones
Examples
# track user par
oldpar <- par(no.readonly = TRUE)
# single scale on bottom
par(mar = c(6.1, 4.1, 4.1, 2.1)) # modify margin
plot(0:100, axes = FALSE, xlim = c(100, 0), ylim = c(100, 0),
xlab = NA, ylab = "Depth (m)")
box()
axis(2)
axis_geo(side = 1, intervals = "period")
# the line argument here depends on the absolute size of the plot
title(xlab = "Time (Ma)", line = 4)
# stack multiple scales, abbreviate only one set of labels
par(mar = c(7.1, 4.1, 4.1, 2.1)) # further expand bottom margin
plot(0:100, axes = FALSE, xlim = c(100, 0), ylim = c(100, 0),
xlab = NA, ylab = "Depth (m)")
box()
axis(2)
axis_geo(side = 1, intervals = list("epoch", "period"),
abbr = list(TRUE, FALSE))
# the line argument here depends on the absolute size of the plot
title(xlab = "Time (Ma)", line = 6)
# scale with MacroStrat intervals
par(mar = c(6.1, 4.1, 4.1, 2.1)) # modify margin
plot(0:30, axes = FALSE, xlim = c(30, 0), ylim = c(30, 0),
xlab = NA, ylab = "Depth (m)")
box()
axis(2)
axis_geo(side = 1, intervals = "North American land mammal ages")
# the line argument here depends on the absolute size of the plot
title(xlab = "Time (Ma)", line = 4)
# scale with custom intervals
intervals <- data.frame(min_ma = c(0, 10, 25, 32),
max_ma = c(10, 25, 32, 40),
interval_name = c("A", "B", "C", "D"))
par(mar = c(6.1, 4.1, 4.1, 2.1)) # modify margin
plot(0:40, axes = FALSE, xlim = c(40, 0), ylim = c(40, 0),
xlab = NA, ylab = "Depth (m)")
box()
axis(2)
axis_geo(side = 1, intervals = intervals)
# the line argument here depends on the absolute size of the plot
title(xlab = "Time (Ma)", line = 4)
# scale with phylogeny
library(phytools)
data(mammal.tree)
plot(mammal.tree)
axis_geo_phylo()
title(xlab = "Time (Ma)", line = 4)
# scale with fossil phylogeny
library(paleotree)
data(RaiaCopesRule)
plot(ceratopsianTreeRaia)
axis_geo_phylo()
title(xlab = "Time (Ma)", line = 4)
# reset user par
par(oldpar)
Assign fossil occurrences to latitudinal bins
Description
A function to assign fossil occurrences to user-specified latitudinal bins.
Usage
bin_lat(occdf, bins, lat = "lat", boundary = FALSE)
Arguments
occdf |
|
bins |
|
lat |
|
boundary |
|
Value
A dataframe of the original input occdf
with appended
columns containing respective latitudinal bin information.
Developer(s)
Lewis A. Jones
Reviewer(s)
Sofia Galvan
Examples
# Load occurrence data
occdf <- tetrapods
# Generate latitudinal bins
bins <- lat_bins_degrees(size = 10)
# Bin data
occdf <- bin_lat(occdf = occdf, bins = bins, lat = "lat")
Assign fossil occurrences to spatial bins
Description
A function to assign fossil occurrences (or localities) to spatial bins/samples using a hexagonal equal-area grid.
Usage
bin_space(
occdf,
lng = "lng",
lat = "lat",
spacing = 100,
sub_grid = NULL,
return = FALSE,
plot = FALSE
)
Arguments
occdf |
|
lng |
|
lat |
|
spacing |
|
sub_grid |
|
return |
|
plot |
|
Details
This function assigns fossil occurrence data into
equal-area grid cells using discrete hexagonal grids via the
h3jsr
package. This package relies on
Uber's H3 library, a geospatial indexing system
that partitions the world into hexagonal cells. In H3, 16 different
resolutions are available
(see here). In the
implementation of the bin_space()
function, the resolution is defined by
the user-input spacing
which represents the distance between the centroid
of adjacent cells. Using this distance, the function identifies which
resolution is most similar to the input spacing
, and uses this resolution.
Additional functionality allows the user to simultaneously assign occurrence
data to equal-area grid cells of a finer-scale grid (i.e. a ‘sub-grid’)
within the primary grid via the sub_grid
argument. This might be desirable
for users to evaluate the differences in the amount of area occupied by
occurrences within their primary grid cells. This functionality also allows
the user to easily rarefy across sub-grid cells within primary cells to
further standardise spatial sampling (see example for basic implementation).
Note: prior to implementation, coordinate reference system (CRS) for input data is defined as EPSG:4326 (World Geodetic System 1984). The user should transform their data accordingly if this is not appropriate. If you are unfamiliar with working with geographic data, we highly recommend checking out Geocomputation with R.
Value
If the return
argument is set to FALSE
, a dataframe is
returned of the original input occdf
with cell information. If return
is
set to TRUE
, a list is returned with both the input occdf
and grid
information and polygons.
Developer(s)
Lewis A. Jones
Reviewer(s)
Bethany Allen & Kilian Eichenseer
Examples
# Get internal data
data("reefs")
# Reduce data for plotting
occdf <- reefs[1:250, ]
# Bin data using a hexagonal equal-area grid
ex1 <- bin_space(occdf = occdf, spacing = 500, plot = TRUE)
# Bin data using a hexagonal equal-area grid and sub-grid
ex2 <- bin_space(occdf = occdf, spacing = 1000, sub_grid = 250, plot = TRUE)
# EXAMPLE: rarefy
# Load data
occdf <- tetrapods[1:250, ]
# Assign to spatial bin
occdf <- bin_space(occdf = occdf, spacing = 1000, sub_grid = 250)
# Get unique bins
bins <- unique(occdf$cell_ID)
# n reps
n <- 10
# Rarefy data across sub-grid grid cells
# Returns a list with each element a bin with respective mean genus richness
df <- lapply(bins, function(x) {
# subset occdf for respective grid cell
tmp <- occdf[which(occdf$cell_ID == x), ]
# Which sub-grid cells are there within this bin?
sub_bin <- unique(tmp$cell_ID_sub)
# Sample 1 sub-grid cell n times
s <- sample(sub_bin, size = n, replace = TRUE)
# Count the number of unique genera within each sub_grid cell for each rep
counts <- sapply(s, function(i) {
# Number of unique genera within each sample
length(unique(tmp[which(tmp$cell_ID_sub == i), ]$genus))
})
# Mean richness across subsamples
mean(counts)
})
Assign fossil occurrences to time bins
Description
A function to assign fossil occurrences to specified time bins based on different approaches commonly applied in palaeobiology.
Usage
bin_time(
occdf,
min_ma = "min_ma",
max_ma = "max_ma",
bins,
method = "mid",
reps = 100,
fun = dunif,
...
)
Arguments
occdf |
|
min_ma |
|
max_ma |
|
bins |
|
method |
|
reps |
|
fun |
|
... |
Additional arguments available in the called function ( |
Details
Five approaches (methods) exist in the bin_time()
function for
assigning occurrences to time bins:
Midpoint: The "mid" method is the simplest approach and uses the midpoint of the fossil occurrence age range to bin the occurrence.
Majority: The "majority" method bins an occurrence into the bin which it most overlaps with. As part of this implementation, the majority percentage overlap of the occurrence is also calculated and returned as an additional column in
occdf
. If desired, these percentages can be used to further filter an occurrence dataset.All: The "all" method bins an occurrence into every bin its age range covers. For occurrences with age ranges of more than one bin, the occurrence row is duplicated. Each occurrence is assigned an ID in the column
occdf$id
so that duplicates can be tracked. Additionally,occdf$n_bins
records the number of bins each occurrence appears within.Random: The "random" method randomly samples X amount of bins (with replacement) from the bins that the fossil occurrence age range covers with equal probability regardless of bin length. The
reps
argument determines the number of times the sample process is repeated. All replications are stored as individual elements within the returned list with an appendedbin_assignment
andbin_midpoint
column to the original inputoccdf
. If desired, users can easily bind this list usingdo.call(rbind, x)
.Point: The "point" method randomly samples X (
reps
) amount of point age estimates from the age range of the fossil occurrence. Sampling follows a user-input probability density function such as dnorm (see example 5). Users should also provide any additional arguments for the probability density function (see...
). However,x
(vector of quantiles) values should not be provided as these values are input from the age range of each occurrence. These values range between 0 and 1, and therefore function arguments should be scaled to be within these bounds. Thereps
argument determines the number of times the sample process is repeated. All replications are stored as individual elements within the returned list with an appendedbin_assignment
andpoint_estimates
column to the original inputoccdf
. If desired, users can easily bind this list usingdo.call(rbind, x)
.
Value
For methods "mid", "majority" and "all", a dataframe
of the
original input occdf
with the following appended columns is returned:
occurrence id (id
), number of bins that the occurrence age range covers
(n_bins
), bin assignment (bin_assignment
), and bin midpoint
(bin_midpoint
). In the case of the "majority" method, an additional
column of the majority percentage overlap (overlap_percentage
) is also
appended. For the "random" and "point" method, a list
is returned
(of length reps) with each element a copy of the occdf
and appended
columns (random: bin_assignment
and bin_midpoint
; point:
bin_assignment
and point_estimates
).
Developer(s)
Christopher D. Dean & Lewis A. Jones
Reviewer(s)
William Gearty
Examples
#Grab internal tetrapod data
occdf <- tetrapods[1:100, ]
bins <- time_bins()
#Assign via midpoint age of fossil occurrence data
ex1 <- bin_time(occdf = occdf, bins = bins, method = "mid")
#Assign to all bins that age range covers
ex2 <- bin_time(occdf = occdf, bins = bins, method = "all")
#Assign via majority overlap based on fossil occurrence age range
ex3 <- bin_time(occdf = occdf, bins = bins, method = "majority")
#Assign randomly to overlapping bins based on fossil occurrence age range
ex4 <- bin_time(occdf = occdf, bins = bins, method = "random", reps = 5)
#Assign point estimates following a normal distribution
ex5 <- bin_time(occdf = occdf, bins = bins, method = "point", reps = 5,
fun = dnorm, mean = 0.5, sd = 0.25)
Apply a function over grouping(s) of data
Description
A function to apply palaeoverse
functionality across subsets (groups) of
data, delineated using one or more variables. Functions which receive a
data.frame
as input (e.g. nrow
, ncol
, lengths
, unique
) may also be
used.
Usage
group_apply(occdf, group, fun, ...)
Arguments
occdf |
|
group |
|
fun |
|
... |
Additional arguments available in the called function. These arguments may be required for function arguments without default values, or if you wish to overwrite the default argument value (see examples). |
Details
group_apply
applies functions to subgroups of data within a
supplied dataset, enabling the separate analysis of occurrences or taxa from
different time intervals, spatial regions, or trait values. The function
serves as a wrapper around palaeoverse
functions. Other functions which
can be applied to a data.frame
(e.g. nrow
, ncol
, lengths
,
unique
) may also be used.
All palaeoverse
functions which require a dataframe input can be used in
conjunction with the group_apply
function. However, this is unnecessary
for many functions (e.g. bin_time
) as groups do not need to
be partitioned before binning. This list provides
users with palaeoverse
functions that might be interesting to apply across
group(s):
-
tax_unique
: return the number of unique taxa per grouping variable. -
tax_range_time
: return the temporal range of taxa per grouping variable. -
tax_range_space
: return the geographic range of taxa per grouping variable. -
tax_check
: return potential spelling variations of the same taxon per grouping variable. Note:verbose
needs to be set to FALSE.
Value
A data.frame
of the outputs from the selected function, with
appended column(s) indicating the user-defined groups. If a single vector
is returned via the called function, it will be transformed to a
data.frame
with the column name equal to the input function.
Developer(s)
Lewis A. Jones & William Gearty
Reviewer(s)
Kilian Eichenseer & Bethany Allen
Examples
# Examples
# Get tetrapods data
occdf <- tetrapods[1:100, ]
# Remove NA data
occdf <- subset(occdf, !is.na(genus))
# Count number of occurrences from each country
ex1 <- group_apply(occdf = occdf, group = "cc", fun = nrow)
# Unique genera per collection with group_apply and input arguments
ex2 <- group_apply(occdf = occdf,
group = c("collection_no"),
fun = tax_unique,
genus = "genus",
family = "family",
order = "order",
class = "class",
resolution = "genus")
# Use multiple variables (number of occurrences per collection and formation)
ex3 <- group_apply(occdf = occdf,
group = c("collection_no", "formation"),
fun = nrow)
# Compute counts of occurrences per latitudinal bin
# Set up lat bins
bins <- lat_bins_degrees()
# bin occurrences
occdf <- bin_lat(occdf = occdf, bins = bins)
# Calculate number of occurrences per bin
ex4 <- group_apply(occdf = occdf, group = "lat_bin", fun = nrow)
Example dataset: Interval key for the look_up function
Description
A table of geological intervals and the earliest and latest corresponding international geological stages from the International Commission on Stratigraphy (ICS). The table was compiled using regional stratigraphies, the GeoWhen Database, temporal information from the Paleobiology Database and the Geological Timescale 2022. Some assignments were made with incomplete information on the stratigraphic provenance of intervals. The assignments in this table should be verified before research use. They are provided here as an example of functionality only.
Usage
interval_key
Format
A data frame with 1323 rows and 3 variables:
- interval_name
Stratigraphic interval
- early_stage
Earliest (oldest) geological stage which overlaps with the interval
- late_stage
Latest (youngest) geological stage which overlaps with the interval
Source
Compiled by Kilian Eichenseer and Lewis Jones for assigning geological stages to ccurrences from the Paleobiology Database and the PaleoReefs Database.
Generate equal-width latitudinal bins
Description
lat_bins()
was renamed to lat_bins_degrees()
to be consistent
with lat_bins_area().
Usage
lat_bins(size = 10, min = -90, max = 90, fit = FALSE, plot = FALSE)
Arguments
size |
|
min |
|
max |
|
fit |
|
plot |
|
Generate equal-area latitudinal bins
Description
A function to generate approximately equal-area latitudinal bins for a user-specified number of bins and latitudinal range. This approach is based on calculating the curved surface area of spherical segments bounded by two parallel discs.
Usage
lat_bins_area(n = 12, min = -90, max = 90, r = 6371, plot = FALSE)
Arguments
n |
|
min |
|
max |
|
r |
|
plot |
|
Value
A data.frame
of user-defined number of latitudinal bins. The
data.frame
contains the following columns: bin (bin number), min
(minimum latitude of the bin), mid (midpoint latitude of the bin),
max (maximum latitude of the bin), area (the area of the bin in
km2), area_prop (the
proportional area of the bin across all bins).
Developer(s)
Lewis A. Jones & Kilian Eichenseer
Reviewer(s)
Kilian Eichenseer & Bethany Allen
See Also
For bins with unequal area, but equal latitudinal range, see lat_bins_degrees.
Examples
# Generate 12 latitudinal bins
bins <- lat_bins_area(n = 12)
# Generate latitudinal bins for just the (sub-)tropics
bins <- lat_bins_area(n = 6, min = -30, max = 30)
# Generate latitudinal bins and a plot
bins <- lat_bins_area(n = 24, plot = TRUE)
Generate equal-width latitudinal bins
Description
A function to generate latitudinal bins of a given size for a user-defined latitudinal range. If the desired size of the bins is not compatible with the defined latitudinal range, bin size can be updated to the nearest integer which is divisible into this range.
Usage
lat_bins_degrees(size = 10, min = -90, max = 90, fit = FALSE, plot = FALSE)
Arguments
size |
|
min |
|
max |
|
fit |
|
plot |
|
Value
A dataframe
of latitudinal bins of user-defined size. The
data.frame
contains the following columns: bin (bin number), min
(minimum latitude of the bin), mid (midpoint latitude of
the bin), max (maximum latitude of the bin).
Developer(s)
Lewis A. Jones
Reviewer(s)
Bethany Allen
See Also
For equal-area latitudinal bins, see lat_bins_area.
Examples
# Generate 20 degrees latitudinal bins
bins <- lat_bins_degrees(size = 20)
# Generate latitudinal bins with closest fit to 13 degrees
bins <- lat_bins_degrees(size = 13, fit = TRUE)
# Generate latitudinal bins for defined latitudinal range
bins <- lat_bins_degrees(size = 10, min = -50, max = 50)
Look up geological intervals and assign geological stages
Description
A function that uses interval names to assign either international geological stages and numeric ages from the International Commission on Stratigraphy (ICS), or user-defined intervals, to fossil occurrences.
Usage
look_up(
occdf,
early_interval = "early_interval",
late_interval = "late_interval",
int_key = FALSE,
assign_with_GTS = "GTS2020",
return_unassigned = FALSE
)
Arguments
occdf |
|
early_interval |
|
late_interval |
|
int_key |
Optionally, named
If set to |
assign_with_GTS |
|
return_unassigned |
|
Details
If int_key
is set to FALSE
(default), this function can be used to
assign numerical ages solely based on stages from a GTS table, and to assign
stages based on GTS interval names.
Instead of geological stages, the user can supply any names in the
early_stage
and late_stage
column of int_key
.
assign_with_GTS
should then be set to FALSE
.
An exemplary int_key
has been included within this package
(interval_key
). This key works well for assigning
geological stages to many of the intervals from the
Paleobiology Database
and the PaleoReefs Database.
palaeoverse
cannot guarantee that all of
the stage assignments with the exemplary key are accurate.
The table corresponding to this key can be loaded with
palaeoverse::interval_key
.
Value
A dataframe
of the original input data
with the following
appended columns is returned: early_stage
and late_stage
, corresponding
to the earliest and latest international geological stage which
could be assigned to the occurrences based on the given interval names.
interval_max_ma
and interval_min_ma
return maximum and minimum interval
ages if provided in the interval key, or if they can be fetched from GTS2012
or GTS2020. A column interval_mid_ma
is appended to provide the midpoint
ages of the intervals.
Developer(s)
Kilian Eichenseer & William Gearty
Reviewer(s)
Lewis A. Jones & Christopher D. Dean
Examples
## Just use GTS2020 (default):
# create exemplary dataframe
taxdf <- data.frame(name = c("A", "B", "C"),
early_interval = c("Maastrichtian", "Campanian", "Sinemurian"),
late_interval = c("Maastrichtian", "Campanian", "Bartonian"))
# assign stages and numerical ages
taxdf <- look_up(taxdf)
## Use exemplary int_key
# Get internal reef data
occdf <- reefs
# assign stages and numerical ages
occdf <- look_up(occdf,
early_interval = "interval",
late_interval = "interval",
int_key = interval_key)
## Use exemplary int_key and return unassigned
# Get internal tetrapod data
occdf <- tetrapods
# assign stages and numerical ages
occdf <- look_up(occdf, int_key = palaeoverse::interval_key)
# return unassigned intervals
unassigned <- look_up(occdf, int_key = palaeoverse::interval_key,
return_unassigned = TRUE)
## Use own key and GTS2012:
# create example data
occdf <- data.frame(
stage = c("any Permian", "first Permian stage",
"any Permian", "Roadian"))
# create example key
interval_key <- data.frame(
interval_name = c("any Permian", "first Permian stage"),
early_stage = c("Asselian", "Asselian"),
late_stage = c("Changhsingian", "Asselian"))
# assign stages and numerical ages:
occdf <- look_up(occdf,
early_interval = "stage", late_interval = "stage",
int_key = interval_key, assign_with_GTS = "GTS2012")
Palaeorotate fossil occurrences
Description
A function to estimate palaeocoordinates for fossil occurrence data (i.e. reconstruct the geographic distribution of organisms' remains at time of deposition). Each occurrence is assigned palaeocoordinates based on its current geographic position and age estimate.
Usage
palaeorotate(
occdf,
lng = "lng",
lat = "lat",
age = "age",
model = "MERDITH2021",
method = "point",
uncertainty = TRUE,
round = 3
)
Arguments
occdf |
|
lng |
|
lat |
|
age |
|
model |
|
method |
|
uncertainty |
|
round |
|
Details
This function can estimate palaeocoordinates using two different
approaches (method
):
Reconstruction files: The "grid"
method
uses reconstruction files from Jones & Domeier (2024) to spatiotemporally link present-day geographic coordinates and age estimates with a discrete global grid rotated at one million-year time steps throughout the Phanerozoic (540–0 Ma). Here, resolution 3 (~119 km spacing) of the reconstruction files is used. All files, and the process used to generate them, are available and documented in Jones & Domeier (2024). If fine-scale spatial analyses are being conducted, use of the "point"method
(see GPlates API below) may be preferred (particularly if occurrences are close to plate boundaries). When using the "grid"method
, coordinates within the same grid cell will be assigned equivalent palaeocoordinates due to spatial aggregation. However, this approach enables efficient estimation of the past distribution of fossil occurrences. Note: each reconstruction file is ~45 MB in size.GPlates API: The "point"
method
uses the GPlates Web Service to reconstruct palaeocoordinates for point data. The use of thismethod
is slower than the "grid"method
if many unique time intervals exist in your dataset. However, it provides palaeocoordinates with higher precision.
Available models and timespan for each method
:
"MERDITH2021" (Merdith et al., 2021)
0–1000 Ma (point)
0–540 Ma (grid)
"TorsvikCocks2017" (Torsvik and Cocks, 2016)
0–540 Ma (point/grid)
"PALEOMAP" (Scotese, 2016)
0–1100 Ma (point)
0–540 Ma (grid)
"MATTHEWS2016_pmag_ref" (Matthews et al., 2016)
0–410 Ma (grid/point)
"GOLONKA" (Wright et al., 2013)
0–540 Ma (grid/point)
Value
A data.frame
containing the original input occurrence
data.frame
and the reconstructed coordinates (i.e. "p_lng", "p_lat"). The
"grid" method
also returns the age of rotation ("rot_age") and the
reference coordinates rotated ("rot_lng" and "rot_lat"). If only one
model is requested, a column containing the rotation model used
("rot_model") is also appended. Otherwise, the name of each model is
appended to the name of each column containing palaeocoordinates (e.g.
"p_lng_GOLONKA"). If uncertainty
is set to TRUE
, the
palaeolatitudinal range ("range_p_lat") and the maximum geographic
distance ("max_dist") in km between palaeocoordinates will also be
returned (the latter calculated via distGeo
).
References
Jones, L.A., Domeier, M. A Phanerozoic gridded dataset for palaeogeographic reconstructions. Sci Data 11, 710 (2024). doi:10.1038/s41597-024-03468-w.
Matthews, K.J., Maloney, K.T., Zahirovic, S., Williams, S.E., Seton, M., and Müller, R.D. (2016). Global plate boundary evolution and kinematics since the late Paleozoic. Global and Planetary Change, 146, 226-250. doi:10.1016/j.gloplacha.2016.10.002.
Merdith, A., Williams, S.E., Collins, A.S., Tetley, M.G., Mulder, J.A., Blades, M.L., Young, A., Armistead, S.E., Cannon, J., Zahirovic, S., Müller. R.D. (2021). Extending full-plate tectonic models into deep time: Linking the Neoproterozoic and the Phanerozoic. Earth-Science Reviews, 214(103477). doi:10.1016/j.earscirev.2020.103477.
Scotese, C., & Wright, N. M. (2018). PALEOMAP Paleodigital Elevation Models (PaleoDEMs) for the Phanerozoic. PALEOMAP Project.
Torsvik, T. H. & Cocks, L. R. M. Earth History and Palaeogeography. Cambridge University Press, 2016.
Wright, N., Zahirovic, S., Müller, R. D., & Seton, M. (2013). Towards community-driven paleogeographic reconstructions: integrating open-access paleogeographic and paleobiology data with plate tectonics. Biogeosciences, 10(3), 1529-1541. doi:10.5194/bg-10-1529-2013.
See GPlates documentation for additional information and details.
Developer(s)
Lewis A. Jones
Reviewer(s)
Kilian Eichenseer, Lucas Buffan & Will Gearty
Examples
## Not run:
#Generic example with a few occurrences
occdf <- data.frame(lng = c(2, -103, -66),
lat = c(46, 35, -7),
age = c(88, 125, 200))
#Calculate palaeocoordinates using reconstruction files
ex1 <- palaeorotate(occdf = occdf, method = "grid")
#Calculate palaeocoordinates using the GPlates API
ex2 <- palaeorotate(occdf = occdf, method = "point")
#Calculate uncertainity in palaeocoordinates from models
ex3 <- palaeorotate(occdf = occdf,
method = "grid",
model = c("MERDITH2021",
"GOLONKA",
"PALEOMAP"),
uncertainty = TRUE)
#Now with some real fossil occurrence data!
#Grab some data from the Paleobiology Database
data(tetrapods)
#Assign midpoint age of fossil occurrence data for reconstruction
tetrapods$age <- (tetrapods$max_ma + tetrapods$min_ma)/2
#Rotate the data
ex3 <- palaeorotate(occdf = tetrapods)
#Calculate uncertainity in palaeocoordinates from models
ex4 <- palaeorotate(occdf = tetrapods,
model = c("MERDITH2021",
"GOLONKA",
"PALEOMAP"),
uncertainty = TRUE)
## End(Not run)
Check phylogeny tip names
Description
A function to check the list of tip names in a phylogeny against a vector of taxon names, and if desired, to trim the phylogeny to only include taxon names within the vector.
Usage
phylo_check(tree = NULL, list = NULL, out = "full_table", sort = "presence")
Arguments
tree |
|
list |
|
out |
|
sort |
|
Details
Phylogenies can be read into R from .txt or .tree files containing
the Newick formatted tree using ape::read.tree()
, and can be saved as
files using ape::write.tree()
. When out = "tree", tips are trimmed using
ape::drop.tip()
; if your tree is not ultrametric (i.e. the tip dates are
not all the same), we recommend using paleotree::fixRootTime()
to readjust
your branch lengths following pruning.
Value
If out = "full_table", a dataframe
describing whether taxon
names are present in the list and/or the tree. If out = "diff_table", a
dataframe
describing which taxon names are present in the list or the
tree, but not both. If out = "counts", a summary table containing the number
of taxa in the list but not the tree, in the tree but not the list, and in
both. If out = "tree", a phylo object consisting of the input phylogeny
trimmed to only include the tips present in the list.
Developer(s)
Bethany Allen
Reviewer(s)
William Gearty & Pedro Godoy
Examples
# track user par
oldpar <- par(no.readonly = TRUE)
#Read in example tree of ceratopsians from paleotree
library(paleotree)
data(RaiaCopesRule)
#Set smaller margins for plotting
par(mar = rep(0.5, 4))
plot(ceratopsianTreeRaia)
#Specify list of names
dinosaurs <- c("Nasutoceratops_titusi", "Diabloceratops_eatoni",
"Zuniceratops_christopheri", "Psittacosaurus_major",
"Psittacosaurus_sinensis", "Avaceratops_lammersi",
"Xenoceratops_foremostensis", "Leptoceratops_gracilis",
"Triceratops_horridus", "Triceratops_prorsus")
#Table of taxon names in list, tree or both
ex1 <- phylo_check(tree = ceratopsianTreeRaia, list = dinosaurs)
#Counts of taxa in list, tree or both
ex2 <- phylo_check(tree = ceratopsianTreeRaia, list = dinosaurs,
out = "counts")
#Trim tree to tips in the list
my_ceratopsians <- phylo_check(tree = ceratopsianTreeRaia, list = dinosaurs,
out = "tree")
plot(my_ceratopsians)
# reset user par
par(oldpar)
Example dataset: Phanerozoic reefs from the PaleoReefs Database
Description
A dataset of Phanerozoic reef occurrences from the
PaleoReefs Database (PARED).
This example dataset includes a subset of the available data from PARED,
but can be used to demonstrate how the functions in the palaeoverse
package might be applied.
Usage
reefs
Format
A data frame with 4363 rows and 14 variables:
- r_number
Reference number given to the particular fossil reef in PARED
- name
Reference name given to the particular fossil reef in PARED
- formation
The geological formation to which the fossil reef belongs
- system
The stratigraphic system to which the fossil reef belongs
- series
The stratigraphic series to which the fossil reef belongs
- interval
The stratigraphic interval to which the fossil reef belongs
- biota_main
The main biota present within the fossil reef
- biota_sec
The secondary biota present within the fossil reef
- lng
The modern-day longitude of the fossil reef
- lat
The modern-day latitude of the fossil reef
- country
The country or ocean the fossil reef is located in
- authors
The authors of the publication documenting the fossil reef
- title
The title of the publication documenting the fossil reef
- year
The year of the publication documenting the fossil reef
References
Kiessling, W. & Krause, M. C. (2022). PaleoReefs Database (PARED)
(1.0) Data set. doi:10.5281/zenodo.6037852
Source
Compiled by Lewis A. Jones. Downloaded on the 25th July 2022. doi:10.5281/zenodo.6037852
Taxonomic spell check
Description
A function to check for and count potential spelling variations of the same taxon. Spelling variations are checked within alphabetical groups (default), or within higher taxonomic groups if provided.
Usage
tax_check(
taxdf,
name = "genus",
group = NULL,
dis = 0.05,
start = 1,
verbose = TRUE
)
Arguments
taxdf |
|
name |
|
group |
|
dis |
|
start |
|
verbose |
|
Details
When higher taxonomy is provided, but some entries are missing,
comparisons will still be made within alphabetical groups of taxa which lack
higher taxonomic affiliations. The function also performs a check for
non-letter characters which are not expected to be present in
correctly-formatted taxon names. This detection may be made available to the
user via the verbose
argument. Comparisons are performed using the
Jaro dissimilarity metric via
stringdist::stringdistmatrix()
.
As all string distance metrics rely on approximate string matching,
different metrics can produce different results. This function uses Jaro
distance as it was designed with short, typed strings in mind, but good
practice should include comparisons using multiple metrics, and ultimately
specific taxonomic vetting where possible. A more complete implementation
and workflow for cleaning taxonomic occurrence data is available in the
fossilbrush
R package on CRAN.
Value
If verbose = TRUE
(default), a list
with three elements. The
first element in the list (synonyms) is a data.frame
with each row
reporting a pair of potential synonyms. The first column "group" contains the
higher group in which they occur (alphabetical groupings if group
is
not provided). The second column "greater" contains the most common synonym
in each pair. The third column "lesser" contains the least common synonym in
each pair. The third and fourth column (count_greater
, count_lesser
)
contain the respective counts of each synonym in a pair. If no matches were
found for the filtering arguments, this element is NULL
instead. The second
element (non_letter_name
) is a vector of taxon names which contain
non-letter characters, or NULL
if none were detected. The third element
(non_letter_group) is a vector of taxon groups which contain non-letter
characters, or NULL
if none were detected. If verbose = FALSE
, a
data.frame
as described above is returned, or NULL
if no matches
were found.
Reference
van der Loo, M. P. J. (2014). The stringdist package for approximate string matching. The R Journal 6, 111-122.
Developer(s)
Joseph T. Flannery-Sutherland & Lewis A. Jones
Reviewer(s)
Lewis A. Jones, Kilian Eichenseer & Christopher D. Dean
Examples
## Not run:
# load occurrence data
data("tetrapods")
# Check taxon names alphabetically
ex1 <- tax_check(taxdf = tetrapods, name = "genus", dis = 0.1)
# Check taxon names by group
ex2 <- tax_check(taxdf = tetrapods, name = "genus",
group = "family", dis = 0.1)
## End(Not run)
Generate pseudo-occurrences from latitudinal range data
Description
A function to generate pseudo-occurrences for taxa based on latitudinal
ranges (e.g. the output of the 'lat' method in
tax_range_space
).
While the resulting pseudo-occurrences should not be treated as equivalent
to actual occurrence data (e.g. like that from the Paleobiology Database),
such pseudo-occurrences may be useful for performing statistical analyses
where the row representing a taxon must be replicated for each latitudinal
bin through which the taxon ranges.
Usage
tax_expand_lat(taxdf, bins, max_lat = "max_lat", min_lat = "min_lat")
Arguments
taxdf |
|
bins |
|
max_lat |
|
min_lat |
|
Value
A dataframe
where each row represents a latitudinal bin which
a taxon ranges through. The columns are identical to those in the
user-supplied data with additional columns included to identify bins. Output
will be returned in the order of supplied bins.
Developer(s)
Lewis A. Jones & William Gearty
Reviewer(s)
Christopher D. Dean
Examples
bins <- lat_bins_degrees()
taxdf <- data.frame(name = c("A", "B", "C"),
max_lat = c(60, 20, -10),
min_lat = c(20, -40, -60))
ex <- tax_expand_lat(taxdf = taxdf,
bins = bins,
max_lat = "max_lat",
min_lat = "min_lat")
Generate pseudo-occurrences from temporal range data
Description
A function to generate interval-level pseudo-occurrences for taxa based on
temporal ranges (e.g. the output of tax_range_time
). While the
resulting pseudo-occurrences should not be treated as equivalent to actual
occurrence data (e.g. like that from the Paleobiology Database), such
pseudo-occurrences may be useful for performing statistical analyses where
the row representing a taxon must be replicated for each interval through
which the taxon persisted.
Usage
tax_expand_time(
taxdf,
max_ma = "max_ma",
min_ma = "min_ma",
bins = NULL,
scale = "GTS2020",
rank = "stage",
ext_orig = TRUE
)
Arguments
taxdf |
|
max_ma |
|
min_ma |
|
bins |
|
scale |
|
rank |
|
ext_orig |
|
Value
A dataframe
where each row represents an interval during which
a taxon in the original user-supplied data persisted. The columns are
identical to those in the user-supplied data with additional columns
included to identify the intervals. If ext_orig
is TRUE
,
two additional columns are added to identify in which intervals taxa
originated and went extinct.
Developer(s)
William Gearty & Lewis A. Jones
Reviewer(s)
Lewis A. Jones
Examples
taxdf <- data.frame(name = c("A", "B", "C"),
max_ma = c(150, 60, 30),
min_ma = c(110, 20, 0))
ex <- tax_expand_time(taxdf)
bins <- time_bins(scale = "GTS2012", rank = "stage")
ex2 <- tax_expand_time(taxdf, bins = bins)
Calculate the geographic range of fossil taxa
Description
A function to calculate the geographic range of fossil taxa from occurrence data. The function can calculate geographic range in four ways: convex hull, latitudinal range, maximum Great Circle Distance, and the number of occupied equal-area hexagonal grid cells.
Usage
tax_range_space(
occdf,
name = "genus",
lng = "lng",
lat = "lat",
method = "lat",
spacing = 100,
coords = FALSE
)
Arguments
occdf |
|
name |
|
lng |
|
lat |
|
method |
|
spacing |
|
coords |
|
Details
Four commonly applied approaches (Darroch et al. 2020)
are available using the tax_range_space
function for calculating ranges:
Convex hull: the "con" method calculates the geographic range of taxa using a convex hull for each taxon in
occdf
, and calculates the area of the convex hull (in km2) usinggeosphere::areaPolygon()
. The convex hull method works by creating a polygon that encompasses all occurrence points of the taxon.Latitudinal: the "lat" method calculates the palaeolatitudinal range of a taxon. It does so for each taxon in
occdf
by finding their maximum and minimum latitudinal occurrence (from inputlat
). The palaeolatitudinal range of each taxon is also calculated (i.e. the difference between the minimum and maximum latitude).Maximum Great Circle Distance: the "gcd" method calculates the maximum Great Circle Distance between occurrences for each taxon in
occdf
. It does so usinggeosphere::distHaversine()
. This function calculates Great Circle Distance using the Haversine method with the radius of the Earth set to the 6378.137 km. Great Circle Distance represents the shortest distance between two points on the surface of a sphere. This is different from Euclidean Distance, which represents the distance between two points on a plane.Occupied cells: the "occ" method calculates the number and proportion of occupied equal-area grid cells. It does so using discrete hexagonal grids via the
h3jsr
package. This package relies on Uber's H3 library, a geospatial indexing system that partitions the world into hexagonal cells. In H3, 16 different resolutions are available (see here). In the implementation of thetax_range_space()
function, the resolution is defined by the user-inputspacing
which represents the distance between the centroid of adjacent cells. Using this distance, the function identifies which resolution is most similar to the inputspacing
, and uses this resolution.
Value
A dataframe
with method-specific columns:
For the "con" method, a
dataframe
with each unique taxa (taxon
) and taxon ID (taxon_id
) by convex hull coordinate (lng
&lat
) combination, and area (area
) in km2 is returned.For the "lat" method, a
dataframe
with unique taxa (taxon
), taxon ID (taxon_id
), maximum latitude of occurrence (max_lat
), minimum latitude of occurrence (min_lat
), and latitudinal range (range_lat
) is returned.For the "gcd" method, a
dataframe
with each unique taxa (taxon
) and taxon ID (taxon_id
) by coordinate combination (lng
&lat
) of the two most distant points, and the Great Circle Distance (gcd
) between these points in km is returned.For the "occ" method, a
dataframe
with unique taxa (taxon
), taxon ID (taxon_id
), the number of occupied cells (n_cells
), proportion of occupied cells from all occupied by occurrences (proportional_occ
), and the spacing between cells (spacing
) in km is returned. Note: the number of occupied cells and proportion of occupied cells is highly dependent on the user-definedspacing.
For the "con", "lat" and "gcd" method, values of zero indicate that the respective taxon is a singleton (i.e. represented by only one occurrence).
Reference(s)
Darroch, S. A., Casey, M. M., Antell, G. S., Sweeney, A., & Saupe, E. E. (2020). High preservation potential of paleogeographic range size distributions in deep time. The American Naturalist, 196(4), 454-471.
Developer(s)
Lewis A. Jones
Reviewer(s)
Bethany Allen & Christopher D. Dean
Examples
# Grab internal data
occdf <- tetrapods[1:100, ]
# Remove NAs
occdf <- subset(occdf, !is.na(genus))
# Convex hull
ex1 <- tax_range_space(occdf = occdf, name = "genus", method = "con")
# Latitudinal range
ex2 <- tax_range_space(occdf = occdf, name = "genus", method = "lat")
# Great Circle Distance
ex3 <- tax_range_space(occdf = occdf, name = "genus", method = "gcd")
# Occupied grid cells
ex4 <- tax_range_space(occdf = occdf, name = "genus",
method = "occ", spacing = 500)
# Convex hull with coordinates
ex5 <- tax_range_space(occdf = occdf, name = "genus", method = "con",
coords = TRUE)
Generate a stratigraphic section plot
Description
A function to plot the stratigraphic ranges of fossil taxa from occurrence data.
Usage
tax_range_strat(
occdf,
name = "genus",
level = "bed",
certainty = NULL,
by = "FAD",
plot_args = NULL,
x_args = NULL,
y_args = NULL
)
Arguments
occdf |
|
name |
|
level |
|
certainty |
|
by |
|
plot_args |
A list of optional arguments that are passed directly to
|
x_args |
A list of optional arguments that are passed directly to
|
y_args |
A list of optional arguments that are passed directly to
|
Details
Note that the default spacing for the x-axis title may cause it to
overlap with the x-axis tick labels. To avoid this, you can call
graphics::title()
after running tax_range_strat()
and specify both
xlab
and line
to add the x-axis title farther from the axis (see
examples).
The styling of the points and line segments can be adjusted by supplying
named arguments to plot_args
. col
(segment and point color), lwd
(segment width), pch
(point symbol), bg
(background point color for
some values of pch
), lty
(segment line type), and cex
(point size)
are supported. In the case of a column being supplied to the certainty
argument, these arguments may be vectors of length two, in which case the
first value of the vector will be used for the "certain" points and
segments, and the second value of the vector will be used for the
"uncertain" points and segments. If only a single value is supplied, it
will be used for both. The default values for these arguments are as
follows:
-
col
=c("black", "black")
-
lwd
=c(1.5, 1.5)
-
pch
=c(19, 21)
-
bg
=c("black", "white")
-
lty
=c(1, 2)
-
cex
=c(1, 1)
Value
Invisibly returns a data.frame of the calculated taxonomic stratigraphic ranges.
The function is usually used for its side effect, which is to create a plot showing the stratigraphic ranges of taxa in a section, with levels at which the taxon was sampled indicated with a point.
Developer(s)
Bethany Allen, William Gearty & Alexander Dunhill
Reviewer(s)
William Gearty & Lewis A. Jones
Examples
# Load tetrapod dataset
data(tetrapods)
# Sample tetrapod occurrences
tetrapod_names <- tetrapods$accepted_name[1:50]
# Simulate bed numbers
beds_sampled <- sample.int(n = 10, size = 50, replace = TRUE)
# Simulate certainty values
certainty_sampled <- sample(x = 0:1, size = 50, replace = TRUE)
# Combine into data frame
occdf <- data.frame(taxon = tetrapod_names,
bed = beds_sampled,
certainty = certainty_sampled)
# Plot stratigraphic ranges
par(mar = c(12, 5, 2, 2))
tax_range_strat(occdf, name = "taxon")
tax_range_strat(occdf, name = "taxon", certainty = "certainty",
plot_args = list(ylab = "Stratigraphic height (m)"))
# Plot stratigraphic ranges with more labelling
tax_range_strat(occdf, name = "taxon", certainty = "certainty", by = "name",
plot_args = list(main = "Section A",
ylab = "Stratigraphic height (m)"))
eras_custom <- data.frame(name = c("Mesozoic", "Cenozoic"),
max_age = c(0.5, 3.5),
min_age = c(3.5, 10.5),
color = c("#67C5CA", "#F2F91D"))
axis_geo(side = 4, intervals = eras_custom, tick_labels = FALSE)
title(xlab = "Taxon", line = 10.5)
Calculate the temporal range of fossil taxa
Description
A function to calculate the temporal range of fossil taxa from occurrence data.
Usage
tax_range_time(
occdf,
name = "genus",
min_ma = "min_ma",
max_ma = "max_ma",
by = "FAD",
plot = FALSE,
plot_args = NULL,
intervals = "periods"
)
Arguments
occdf |
|
name |
|
min_ma |
|
max_ma |
|
by |
|
plot |
|
plot_args |
|
intervals |
|
Details
The temporal range(s) of taxa are calculated by extracting all
unique taxa (name
column) from the input occdf
, and checking their
first and last appearance. The temporal duration of each taxon is also
calculated. If the input data columns contain NAs, these must be
removed prior to function call. A plot of the temporal range of each
taxon is also returned if plot = TRUE
. Customisable argument options
(i.e. graphics::par()
) to pass to plot_args
as a list (and their
defaults) for plotting include:
xlab = "Time (Ma)"
ylab = "Taxon ID"
col = "black"
bg = "black"
pch = 20
cex = 1
lty = 1
lwd = 1
Note: this function provides output based solely on the user input data. The true duration of a taxon is likely confounded by uncertainty in dating occurrences, and incomplete sampling and preservation.
Value
A dataframe
containing the following columns:
unique taxa (taxon
), taxon ID (taxon_id
), first appearance of taxon
(max_ma
), last appearance of taxon (min_ma
), duration of temporal
range (range_myr
), and number of occurrences per taxon (n_occ
) is
returned.
Developer(s)
Lewis A. Jones
Reviewer(s)
Bethany Allen, Christopher D. Dean & Kilian Eichenseer
Examples
# Grab internal data
occdf <- tetrapods
# Remove NAs
occdf <- subset(occdf, !is.na(order) & order != "NO_ORDER_SPECIFIED")
# Temporal range
ex <- tax_range_time(occdf = occdf, name = "order", plot = TRUE)
# Customise appearance
ex <- tax_range_time(occdf = occdf, name = "order", plot = TRUE,
plot_args = list(ylab = "Orders",
pch = 21, col = "black", bg = "blue",
lty = 2),
intervals = list("periods", "eras"))
Filter occurrences to unique taxa
Description
A function to filter a list of taxonomic occurrences to unique taxa of a predefined resolution. Occurrences identified to a coarser taxonomic resolution than the desired level are retained if they belong to a clade which is not otherwise represented in the dataset (see details section for further information). This has previously been described as "cryptic diversity" (e.g. Mannion et al. 2011).
Usage
tax_unique(
occdf = NULL,
binomial = NULL,
species = NULL,
genus = NULL,
...,
name = NULL,
resolution = "species",
append = FALSE
)
Arguments
occdf |
|
binomial |
|
species |
|
genus |
|
... |
|
name |
|
resolution |
|
append |
|
Details
Palaeobiologists usually count unique taxa by retaining only unique occurrences identified to a given taxonomic resolution, however this function retains occurrences identified to a coarser taxonomic resolution which are not already represented within the dataset. For example, consider the following set of occurrences:
-
Albertosaurus sarcophagus
-
Ankylosaurus sp.
Aves indet.
Ceratopsidae indet.
Hadrosauridae indet.
-
Ornithomimus sp.
-
Tyrannosaurus rex
A filter for species-level identifications would reduce the species richness to two. However, none of these clades are nested within one another, so each of the indeterminately identified occurrences represents at least one species not already represented in the dataset. This function is designed to deal with such taxonomic data, and would retain all seven 'species' in this example.
Taxonomic information is supplied within a dataframe, in which columns
provide identifications at different taxonomic levels. Occurrence
data can be filtered to retain either unique species, or unique genera. If a
species-level filter is desired, the minimum input requires either (1)
binomial
, (2) species
and genus
, or (3) name
and genus
columns to
be entered, as well as at least one column of a higher taxonomic level.
In a standard Paleobiology Database
occurrence dataframe, species names are only
captured in the 'accepted_name' column, so a species-level filter should use
'genus
= "genus"' and 'name
= "accepted_name"' arguments. If a
genus-level filter is desired, the minimum input requires either (1)
binomial
or (2) genus
columns to be entered, as well as at least one
column of a higher taxonomic level.
Missing data should be indicated with NAs, although the function can handle common labels such as "NO_FAMILY_SPECIFIED" within Paleobiology Database datasets.
The function matches taxonomic names at face value, so homonyms may be falsely filtered out.
Value
A dataframe
of taxa, with each row corresponding to a unique
"species" or "genus" in the dataset (depending on the chosen resolution).
The dataframe will include the taxonomic information provided into the
function, as well as a column providing the 'unique' names of each taxon. If
append
is TRUE
, the original dataframe (occdf
) will be
returned with these 'unique' names appended as a new column. Occurrences that
are identified to a coarse taxonomic resolution and belong to a clade which
is already represented within the dataset will have their 'unique' names
listed as NA
.
References
Mannion, P. D., Upchurch, P., Carrano, M. T., and Barrett, P. M. (2011). Testing the effect of the rock record on diversity: a multidisciplinary approach to elucidating the generic richness of sauropodomorph dinosaurs through time. Biological Reviews, 86, 157-181. doi:10.1111/j.1469-185X.2010.00139.x.
Developer(s)
Bethany Allen & William Gearty
Reviewer(s)
Lewis A. Jones & William Gearty
Examples
#Retain unique species
occdf <- tetrapods[1:100, ]
species <- tax_unique(occdf = occdf, genus = "genus", family = "family",
order = "order", class = "class", name = "accepted_name")
#Retain unique genera
genera <- tax_unique(occdf = occdf, genus = "genus", family = "family",
order = "order", class = "class", resolution = "genus")
#Append unique names to the original occurrences
genera_append <- tax_unique(occdf = occdf, genus = "genus", family = "family",
order = "order", class = "class", resolution = "genus", append = TRUE)
#Create dataframe from lists
occdf2 <- data.frame(species = c("rex", "aegyptiacus", NA), genus =
c("Tyrannosaurus", "Spinosaurus", NA), family = c("Tyrannosauridae",
"Spinosauridae", "Diplodocidae"))
dinosaur_species <- tax_unique(occdf = occdf2, species = "species", genus =
"genus", family = "family")
#Retain unique genera per collection with group_apply
genera <- group_apply(occdf = occdf,
group = c("collection_no"),
fun = tax_unique,
genus = "genus",
family = "family",
order = "order",
class = "class",
resolution = "genus")
Example dataset: Early tetrapod data from the Paleobiology Database
Description
A dataset of tetrapod occurrences ranging from the Carboniferous through to the Early Triassic, from the Palaeobiology Database. Dataset includes a range of variables relevant to common palaeobiological analyses, relating to identification, geography, environmental context, traits and more. Additional information can be found here. The downloaded data is unaltered, with the exception of removing some superfluous variables, and can be used to demonstrate how the functions in the palaeoverse package might be applied.
Usage
tetrapods
Format
A data frame with 5270 rows and 32 variables:
- occurrence_no
Reference number given to the particular occurrence in the Paleobiology Database
- collection_no
Reference number given to the Paleobiology Database collection (locality) that the occurrence belongs to
- identified_name
Taxon name as it appears in the original publication, which may include expressions of uncertainty (e.g. "cf.", "aff.", "?") or novelty (e.g. "n. gen.", "n. sp.")
- identified_rank
The taxonomic rank, or resolution, of the identified name
- accepted_name
Taxon name once the identified name has passed through the Paleobiology Database's internal taxonomy, which collapses synonyms, amends binomials which have been altered (e.g. species moving to another genus) and updates taxa which are no longer valid (e.g. nomina dubia)
- accepted_rank
The taxonomic rank, or resolution, of the accepted name
- early_interval
The oldest (or only) time interval within which the occurrence is thought to have been deposited
- late_interval
The youngest time interval within which the occurrence is thought to have been deposited
- max_ma, min_ma
The age range given to the occurrence
- phylum, class, order, family, genus
The taxa (of decreasing taxonomic level) which the occurrence is identified as belonging to
- abund_value, abund_unit
The number (and units) of fossils attributed to the occurrence
- lng, lat
The modern-day longitude and latitude of the fossil locality
- collection_name
The name of the Paleobiology Database collection which the occurrence belongs to, typically a spatio-temporally restricted locality
- cc
The country (code) where the fossils were discovered
- formation, stratgroup, member
The geological units from which the fossils were collected
- zone
The biozone which the occurrence is attributed to
- lithology1
The main lithology of the beds in the section where the fossils were collected
- environment
The inferred environmental conditions in the place of deposition
- pres_mode
The mode of preservation of the fossils found in the collection (not necessarily of that specific occurrence), which will include information on whether they are body or trace fossils
- taxon_environment
The environment within which the taxon is thought to have lived, collated within the Paleobiology Database
- motility, life_habit, diet
Various types of trait data for the taxon, collated within the Paleobiology Database
References
Uhen MD et al. (2023). Paleobiology Database User Guide Version 1.0.
PaleoBios, 40 (11). doi:10.5070/P9401160531.
Source
Compiled by Bethany Allen, current version downloaded on 14th July 2022. See item descriptions for details.
Generate time bins
Description
A function to generate time bins for a given study interval and geological
timescale. This function is flexible in that either stage-level or higher
stratigraphic-level (e.g. period) time bins can be called, valid timescales
from Macrostrat can be
used, or a data.frame
of a geological timescale can be provided. In
addition, near equal-length time bins can be generated by grouping
intervals together. For example, for a target bin size of 10 Myr, the
function will generate groups of bins that have a mean bin length close to
10 Myr. However, users may also want to consider grouping stages based on
other reasoning e.g. availability of outcrop (see Dean et al. 2020).
Usage
time_bins(
interval = "Phanerozoic",
rank = "stage",
size = NULL,
assign = NULL,
scale = "GTS2020",
plot = FALSE
)
Arguments
interval |
|
rank |
|
size |
|
assign |
|
scale |
|
plot |
|
Details
This function uses either the Geological Time Scale 2020,
Geological Time Scale 2012, a valid timescale from
Macrostrat, or a
user-input data.frame
(see scale
argument) to generate time bins.
Additional information on included Geological Time Scales and source can
be accessed via:
Available interval names are accessible via the interval_name
column
in GTS2012
and GTS2020
. Data of the Geological Timescale 2020 and
2012 were compiled by Lewis A. Jones (2022-07-02).
Value
A data.frame
of time bins for the specified intervals or a
list with a data.frame
of time bins and a named numeric
vector (bin number) of binned age estimates (midpoint of specified bins)
if assign
is specified. By default, the time bins data.frame
contains the following columns: bin, interval_name, rank, max_ma, mid_ma,
min_ma, duration_myr, abbr (interval abbreviation), colour and font
(colour). If size
is specified, the time bins data.frame
contains the following columns: bin, max_ma, mid_ma, min_ma,
duration_myr, grouping_rank, intervals, colour and font.
References
Dean, C.D., Chiarenza, A.A. and Maidment, S.C., 2020. Formation binning: a new method for increased temporal resolution in regional studies, applied to the Late Cretaceous dinosaur fossil record of North America. Palaeontology, 63(6), 881-901. doi:10.1111/pala.12492.
Developer(s)
Lewis A. Jones
Reviewer(s)
Kilian Eichenseer & William Gearty
Examples
#Using numeric age
ex1 <- time_bins(interval = 10, plot = TRUE)
#Using numeric age range
ex2 <- time_bins(interval = c(50, 100), plot = TRUE)
#Using a single interval name
ex3 <- time_bins(interval = c("Maastrichtian"), plot = TRUE)
#Using a range of intervals and near-equal duration bins
ex4 <- time_bins(interval = c("Fortunian", "Meghalayan"),
size = 10, plot = TRUE)
#Assign bins based on given age estimates
ex5 <- time_bins(interval = c("Fortunian", "Meghalayan"),
assign = c(232, 167, 33))
#Use user-input data.frame to generate near-equal length bins
scale <- data.frame(interval_name = 1:5,
min_ma = c(0, 18, 32, 38, 45),
max_ma = c(18, 32, 38, 45, 53))
ex6 <- time_bins(scale = scale, size = 20, plot = TRUE)
#Use North American land mammal ages from Macrostrat
ex7 <- time_bins(scale = "North American land mammal ages", size = 10)