Type: | Package |
Title: | Client for Delphi's 'COVIDcast Epidata' API |
Version: | 0.5.3 |
Date: | 2025-07-01 |
URL: | https://cmu-delphi.github.io/covidcast/covidcastR/, https://github.com/cmu-delphi/covidcast |
BugReports: | https://github.com/cmu-delphi/covidcast/issues |
Description: | Tools for Delphi's 'COVIDcast Epidata' API: data access, maps and time series plotting, and basic signal processing. The API includes a collection of numerous indicators relevant to the COVID-19 pandemic in the United States, including official reports, de-identified aggregated medical claims data, large-scale surveys of symptoms and public behavior, and mobility data, typically updated daily and at the county level. All data sources are documented at https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html. |
Depends: | R (≥ 4.0.0) |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | dplyr, ggplot2, grDevices, httr, MMWRweek, purrr, rlang, sf, tidyr, xml2 |
RoxygenNote: | 7.3.2 |
Suggests: | gridExtra, httptest, knitr, mockery, rmarkdown, testthat, tibble, vdiffr |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-07-01 18:44:31 UTC; alexreinhart |
Author: | Taylor Arnold [aut],
Jacob Bien [aut],
Logan Brooks [aut],
Sarah Colquhoun [aut],
David Farrow [aut],
Jed Grabman [ctb],
Pedrito Maynard-Zhang [ctb],
Kathryn Mazaitis [aut],
Alex Reinhart |
Maintainer: | Alex Reinhart <areinhar@stat.cmu.edu> |
Repository: | CRAN |
Date/Publication: | 2025-07-01 19:10:02 UTC |
covidcast: Client for Delphi's COVIDcast API
Description
The covidcast package provides access to numerous COVID-19 data streams, updated daily, covering the United States of America. These include publicly reported cases and deaths data, along with data the Delphi research group collects or obtains from partners.
Finding data sources and documentation
The COVIDcast API includes:
publicly reported COVID case and death data
insurance claims data reporting on COVID-related doctor's visits and hospitalizations, obtained from health partners
aggregate results from massive COVID symptom surveys conducted by Delphi
mobility data aggregated from SafeGraph
symptom search trends from Google
and numerous other important signals, most available daily at the county level.
Each data stream is identified by its data source and signal names. These are documented on the COVIDcast API website: https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html
Each data stream has a page giving detailed technical documentation on how the data is collected, how it is aggregated, and any limitations or known problems with the data.
Getting started
We recommend browsing the vignettes, which include numerous examples:
browseVignettes(package = "covidcast")
.
See also covidcast_signal()
for details on how to obtain COVIDcast data as
a data frame.
Author(s)
Maintainer: Alex Reinhart areinhar@stat.cmu.edu (ORCID)
Authors:
Taylor Arnold
Jacob Bien
Logan Brooks
Sarah Colquhoun
David Farrow
Kathryn Mazaitis
Ryan Tibshirani
Other contributors:
Jed Grabman [contributor]
Pedrito Maynard-Zhang [contributor]
References
A. Reinhart et al., An open repository of real-time COVID-19 indicators. Proc. Natl. Acad. Sci. U.S.A. 118, e2111452118 (2021). doi:10.1073/pnas.2111452118
See Also
Useful links:
Report bugs at https://github.com/cmu-delphi/covidcast/issues
Get FIPS codes from state abbreviations
Description
Look up FIPS codes by state abbreviations (including District of Columbia and
Puerto Rico); this function is based on grep()
, and hence allows for
regular expressions.
Usage
abbr_to_fips(
abbr,
ignore.case = TRUE,
perl = FALSE,
fixed = FALSE,
ties_method = c("first", "all")
)
Arguments
abbr |
Vector of state abbreviations to look up. |
ignore.case , perl , fixed |
Arguments to pass to |
ties_method |
If "first", then only the first match for each name is returned. If "all", then all matches for each name are returned. |
Value
A vector of FIPS codes if ties_method
equals "first", and a list of
FIPS codes otherwise. These FIPS codes have five digits (ending in "000").
See Also
Examples
abbr_to_fips("PA")
abbr_to_fips(c("PA", "PR", "DC"))
# Note that name_to_fips() works for state names too:
name_to_fips("^Pennsylvania$")
Get state names from state abbreviations
Description
Look up state names by state abbreviations (including District of Columbia
and Puerto Rico); this function is based on grep()
, and hence allows for
regular expressions.
Usage
abbr_to_name(
abbr,
ignore.case = FALSE,
perl = FALSE,
fixed = FALSE,
ties_method = c("first", "all")
)
Arguments
abbr |
Vector of state abbreviations to look up. |
ignore.case , perl , fixed |
Arguments to pass to |
ties_method |
If "first", then only the first match for each name is returned. If "all", then all matches for each name are returned. |
Value
A vector of state names if ties_method
equals "first", and a list
of state names otherwise.
See Also
Examples
abbr_to_name("PA")
abbr_to_name(c("PA", "PR", "DC"))
Aggregate covidcast_signal
objects into one data frame
Description
Aggregates covidcast_signal
objects into one data frame, in either "wide"
or "long" format. (In "wide" aggregation, only the latest issue from each
data frame is retained, and several columns, including data_source
and
signal
are dropped; see details below). See vignette("multi-signals", package = "covidcast")
for examples.
Usage
aggregate_signals(x, dt = NULL, format = c("wide", "long"))
Arguments
x |
Single |
dt |
Vector of shifts to apply to the values in the data frame |
format |
One of either "wide" or "long". The default is "wide". |
Details
This function can be thought of having three use cases. In all three
cases, the result will be a new data frame in either "wide" or "long"
format, depending on format
.
The first use case is to apply time-shifts to the values in a given
covidcast_signal
object. In this use case, x
is a covidcast_signal
data frame and dt
is a vector of shifts.
The second use case is to bind together, into one data frame, signals that
are returned by covidcast_signals()
. In this use case, x
is a list of
covidcast_signal
data frames, and dt
is NULL
.
The third use case is a combination of the first two: to bind together
signals returned by covidcast_signals()
, and simultaneously, apply
time-shifts to their values. In this use case, x
is a list of
covidcast_signal
data frames, and dt
is either a vector of shifts—to
apply the same shifts for each signal in x
, or a list of vector of
shifts—to apply different shifts for each signal in x
.
Value
Data frame of aggregated signals in "wide" or "long" form, depending
on format
. In "long" form, an extra column dt
is appended to indicate
the value of the time-shift. In "wide" form, only the latest issue of data
is retained; the returned data frame is formed via full joins of the input
data frames (on geo_value
and time_value
as the join key), and the
columns data_source
, signal
, issue
, lag
, stderr
, sample_size
are all dropped from the output. Each unique signal—defined by a
combination of data source name, signal name, and time-shift—is given its
own column, whose name indicates its defining quantities. For example, the
column name "value+2:usa-facts_confirmed_incidence_num" corresponds to a
signal defined by data_source = "usa-facts"
, signal = "confirmed_incidence_num"
, and dt = 2
.
See Also
covidcast_wider()
, covidcast_longer()
Convert data from an external source into a form compatible with
covidcast_signal
.
Description
Several methods are provided to convert common objects (such as data frames)
into covidcast_signal
objects, which can be used with the various
covidcast_signal
methods (such as plot.covidcast_signal()
or
covidcast_cor()
). See vignette("external-data")
for examples.
Usage
as.covidcast_signal(x, ...)
## S3 method for class 'covidcast_signal'
as.covidcast_signal(x, ...)
## S3 method for class 'data.frame'
as.covidcast_signal(
x,
signal = NULL,
geo_type = c("county", "msa", "hrr", "dma", "state", "hhs", "nation"),
time_type = c("day", "week"),
data_source = "user",
issue = NULL,
metadata = list(),
...
)
Arguments
x |
Object to be converted. See Methods section below for details on formatting of each input type. |
... |
Additional arguments passed to methods. |
signal |
The signal name to use for this data. |
geo_type |
The geography type stored in this object. |
time_type |
The time resolution stored in this object. If "day", the default, each observation covers one day. If "week", each time value is assumed to be the start date of the epiweek (MMWR week) that the data represents. |
data_source |
The name of the data source to use as a label for this data. |
issue |
Issue date to use for this data, if not present in |
metadata |
List of metadata to attach to the |
Value
covidcast_signal
object; see covidcast_signal()
for documentation
of fields and structure.
Methods (by class)
-
as.covidcast_signal(covidcast_signal)
: Simply returns thecovidcast_signal
object unchanged. -
as.covidcast_signal(data.frame)
: The input data framex
must contain the columnstime_value
,value
, andgeo_value
. If anissue
column is present inx
, it will be used as the issue date for each observation; if not, theissue
argument will be used. Other columns will be preserved as-is.
See Also
County census population data
Description
Data set on county populations, from the 2019 US Census.
Usage
county_census
Format
A data frame with 3193 rows, one for each county (along with the 50 states and DC). Columns include:
- SUMLEV
Geographic summary level. Either 40 (state) or 50 (county).
- REGION
Census Region code
- DIVISION
Census Division code
- STATE
State FIPS code.
- COUNTY
County FIPS
- STNAME
Name of the state in which this county belongs.
- CTYNAME
County name, to help find counties by name.
- POPESTIMATE2019
Estimate of the county's resident population as of July 1, 2019.
- FIPS
Five-digit county FIPS codes. These are unique identifiers used, for example, as the
geo_values
argument tocovidcast_signal()
to request data from a specific county.
Source
United States Census Bureau, at https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv
References
Census Bureau documentation of all columns and their meaning: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.pdf, https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-puerto-rico-municipios.html, and https://www.census.gov/data/tables/2010/dec/2010-island-areas.html
See Also
county_fips_to_name()
, name_to_fips()
Compute correlations between two covidcast_signal
data frames
Description
Computes correlations between two covidcast_signal
data frames, allowing
for slicing by geo location, or by time. (Only the latest issue from each
data frame is used for correlations.) See the correlations vignette
for examples: vignette("correlation-utils", package = "covidcast")
.
Usage
covidcast_cor(
x,
y,
dt_x = 0,
dt_y = 0,
by = c("geo_value", "time_value"),
use = "na.or.complete",
method = c("pearson", "kendall", "spearman")
)
Arguments
x , y |
The |
dt_x , dt_y |
Time shifts (in days) to consider for |
by |
If "geo_value", then correlations are computed for each geo location, over all time. Each correlation is measured between two time series at the same location. If "time_value", then correlations are computed for each time, over all geo locations. Each correlation is measured between all locations at one time. Default is "geo_value". |
use , method |
Arguments to pass to |
Value
A data frame with first column geo_value
or time_value
(matching
by
), and second column value
, which gives the correlation.
Examples
## Not run:
# For all these examples, let x and y be two signals measured at the county
# level over several months.
## `by = "geo_value"`
# Correlate each county's time series together, returning one correlation per
# county:
covidcast_cor(x, y, by = "geo_value")
# Correlate x in each county with values of y 14 days later
covidcast_cor(x, y, dt_y = 14, by = "geo_value")
# Equivalently, x can be shifted -14 days:
covidcast_cor(x, y, dt_x = -14, by = "geo_value")
## `by = "time_value"`
# For each date, correlate x's values in every county against y's values in
# the same counties. Returns one correlation per date:
covidcast_cor(x, y, by = "time_value")
# Correlate x values across counties against y values 7 days later
covidcast_cor(x, y, dt_y = 7, by = "time_value")
## End(Not run)
Pivot aggregated signals between "wide" and "long" formats
Description
These functions take signals returned from aggregate_signals()
and convert
between formats. covidcast_longer()
takes the output of
aggregate_signals(..., format = "wide")
and converts it to "long" format,
while covidcast_wider()
takes the output of aggregate_signals(..., format = "long")
and converts it to "wide" format.
Usage
covidcast_longer(x)
covidcast_wider(x)
Arguments
x |
A |
Value
The object pivoted into the opposite form, i.e. as if
aggregate_signals()
had been called in the first place with that
format
argument.
See Also
Obtain COVIDcast metadata
Description
Obtains a data frame of metadata describing all publicly available data streams from the COVIDcast API.
Usage
covidcast_meta()
Value
Data frame containing one row per signal, with the following columns:
data_source |
Data source name. |
signal |
Signal name. |
min_time |
First day for which this signal is available. |
max_time |
Most recent day for which this signal is available. |
geo_type |
Geographic level for which this signal is available, such as county, state, msa, or hrr. Most signals are available at multiple geographic levels and will hence be listed in multiple rows with their own metadata. |
time_type |
Temporal resolution at which this signal is reported. "day", for example, means the signal is reported daily. |
num_locations |
Number of distinct geographic locations available for
this signal. For example, if |
min_value |
Smallest value that has ever been reported. |
max_value |
Largest value that has ever been reported. |
mean_value |
Arithmetic mean of all reported values. |
stdev_value |
Sample standard deviation of all reported values. |
max_issue |
Most recent issue date for this signal. |
min_lag |
Smallest lag from observation to issue, in |
max_lag |
Largest lag from observation to issue, in |
References
COVIDcast API sources and signals documentation: https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html
See Also
Obtain a data frame for one COVIDcast signal
Description
Obtains data for selected date ranges for all geographic regions of the
United States. Available data sources and signals are documented in the
COVIDcast signal documentation.
Most (but not all) data sources are available at the county level, but the
API can also return data aggregated to metropolitan statistical areas,
hospital referral regions, or states, as desired, by using the geo_type
argument.
Usage
covidcast_signal(
data_source,
signal,
start_day = NULL,
end_day = NULL,
geo_type = c("county", "hrr", "msa", "dma", "state", "hhs", "nation"),
geo_values = "*",
as_of = NULL,
issues = NULL,
lag = NULL,
time_type = c("day", "week")
)
Arguments
data_source |
String identifying the data source to query. See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html for a list of available data sources. |
signal |
String identifying the signal from that source to query. Again, see https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html for a list of available signals. |
start_day |
Query data beginning on this date. Date object, or string in
the form "YYYY-MM-DD". If |
end_day |
Query data up to this date, inclusive. Date object or string
in the form "YYYY-MM-DD". If |
geo_type |
The geography type for which to request this data, such as "county" or "state". Defaults to "county". See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html for details on which types are available. |
geo_values |
Which geographies to return. The default, "*", fetches all geographies. To fetch specific geographies, specify their IDs as a vector or list of strings. See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html for details on how to specify these IDs. |
as_of |
Fetch only data that was available on or before this date,
provided as a |
issues |
Fetch only data that was published or updated ("issued") on
these dates. Provided as either a single |
lag |
Integer. If, for example, |
time_type |
The temporal resolution to request this data. Most signals are available at the "day" resolution (the default); some are only available at the "week" resolution, representing an MMWR week ("epiweek"). |
Details
For data on counties, metropolitan statistical areas, and states, this
package provides the county_census
, msa_census
, and state_census
datasets. These include each area's unique identifier, used in the
geo_values
argument to select specific areas, and basic information on
population and other Census data.
Downloading large amounts of data may be slow, so this function prints
messages for each chunk of data it downloads. To suppress these, use
base::suppressMessages()
, as in
suppressMessages(covidcast_signal("fb-survey", ...))
.
Value
covidcast_signal
object with matching data. The object is a data
frame with additional metadata attached. Each row is one observation of one
signal on one day in one geographic location. Contains the following
columns:
data_source |
Data source from which this observation was obtained. |
signal |
Signal from which this observation was obtained. |
geo_value |
String identifying the location, such as a state name or county FIPS code. |
time_value |
Date object identifying the date of this observation. For
data with |
issue |
Date object identifying the date this estimate was issued.
For example, an estimate with a |
lag |
Integer giving the difference between |
value |
Signal value being requested. For example, in a query for the
"confirmed_cumulative_num" signal from the "usa-facts" source, this would
be the cumulative number of confirmed cases in the area, as of the given
|
stderr |
Associated standard error of the signal value, if available. |
sample_size |
Integer indicating the sample size available in that
geography on that day; sample size may not be available for all signals,
due to privacy or other constraints, in which case it will be |
Consult the signal documentation for more details on how values and standard errors are calculated for specific signals.
The returned data frame has a metadata
attribute containing metadata
about the signal contained within; see "Metadata" below for details.
Metadata
The returned object has a metadata
attribute attached containing basic
information about the signal. Use attributes(x)$metadata
to access this
metadata. The metadata is stored as a data frame of one row, and contains the
same information that covidcast_meta()
would return for a given signal.
Note that not all covidcast_signal
objects may have all fields of metadata
attached; for example, an object created with as.covidcast_signal()
using
data from another source may only contain the geo_type
variable, along with
data_source
and signal
. Before using the metadata of a covidcast_signal
object, always check for the presence of the attributes you need.
Issue dates and revisions
The COVIDcast API tracks updates and changes to its underlying data, and
records the first date each observation became available. For example, a data
source may report its estimate for a specific state on June 3rd on June 5th,
once records become available. This data is considered "issued" on June 5th.
Later, the data source may update its estimate for June 3rd based on revised
data, creating a new issue on June 8th. By default, covidcast_signal()
returns the most recent issue available for every observation. The as_of
,
issues
, and lag
parameters allow the user to select specific issues
instead, or to see all updates to observations. These options are mutually
exclusive, and you should only specify one; if you specify more than one, you
may get an error or confusing results.
Note that the API only tracks the initial value of an estimate and changes
to that value. If a value was first issued on June 5th and never updated,
asking for data issued on June 6th (using issues
or lag
) would not
return that value, though asking for data as_of
June 6th would. See
vignette("covidcast")
for examples.
Note also that the API enforces a maximum result row limit; results beyond
the maximum limit are truncated. This limit is sufficient to fetch
observations in all counties in the United States on one day. This client
automatically splits queries for multiple days across multiple API calls.
However, if data for one day has been issued many times, using the issues
argument may return more results than the query limit. A warning will be
issued in this case. To see all results, split your query across multiple
calls with different issues
arguments.
API keys
By default, covidcast_signal()
submits queries to the API anonymously. All
the examples in the package documentation are compatible with anonymous use
of the API, but there are some limits on anonymous queries,
including a rate limit. If you regularly query large amounts of data, please
consider registering for a free API key, which lifts
these limits. Even if your usage falls within the anonymous usage limits,
registration helps us understand who and how others are using the Delphi
Epidata API, which may in turn inform future research, data partnerships, and
funding.
If you have an API key, you can use it by setting the covidcast.auth
option once before calling covidcast_signal()
or covidcast_signals()
:
options(covidcast.auth = "your_api_key") cli <- covidcast_signal(data_source = "fb-survey", signal = "smoothed_cli", start_day = "2020-05-01", end_day = "2020-05-07", geo_type = "state")
References
COVIDcast API documentation: https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
Documentation of all COVIDcast sources and signals: https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html
COVIDcast public dashboard: https://delphi.cmu.edu/covidcast/
See Also
plot.covidcast_signal()
, covidcast_signals()
,
as.covidcast_signal()
, county_census
, msa_census
,
state_census
Examples
## Not run:
## Fetch all counties from 2020-05-10 to the most recent available data
covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10")
## Fetch all counties on just 2020-05-10 and no other days
covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10",
end_day = "2020-05-10")
## Fetch all states on 2020-05-10, 2020-05-11, 2020-05-12
covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10",
end_day = "2020-05-12", geo_type = "state")
## Fetch all available data for just Pennsylvania and New Jersey
covidcast_signal("fb-survey", "smoothed_cli", geo_type = "state",
geo_values = c("pa", "nj"))
## Fetch all available data in the Pittsburgh metropolitan area
covidcast_signal("fb-survey", "smoothed_cli", geo_type = "msa",
geo_values = name_to_cbsa("Pittsburgh"))
## End(Not run)
Obtain multiple COVIDcast signals at once
Description
This convenience function uses covidcast_signal()
to obtain multiple
signals, potentially from multiple data sources.
Usage
covidcast_signals(
data_source,
signal,
start_day = NULL,
end_day = NULL,
geo_type = c("county", "hrr", "msa", "dma", "state", "hhs", "nation"),
time_type = c("day", "week"),
geo_values = "*",
as_of = NULL,
issues = NULL,
lag = NULL
)
Arguments
data_source |
String identifying the data source to query. See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html for a list of available data sources. |
signal |
String identifying the signal from that source to query. Again, see https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html for a list of available signals. |
start_day |
Query data beginning on this date. Date object, or string in
the form "YYYY-MM-DD". If |
end_day |
Query data up to this date, inclusive. Date object or string
in the form "YYYY-MM-DD". If |
geo_type |
The geography type for which to request this data, such as "county" or "state". Defaults to "county". See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html for details on which types are available. |
time_type |
The temporal resolution to request this data. Most signals are available at the "day" resolution (the default); some are only available at the "week" resolution, representing an MMWR week ("epiweek"). |
geo_values |
Which geographies to return. The default, "*", fetches all geographies. To fetch specific geographies, specify their IDs as a vector or list of strings. See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html for details on how to specify these IDs. |
as_of |
Fetch only data that was available on or before this date,
provided as a |
issues |
Fetch only data that was published or updated ("issued") on
these dates. Provided as either a single |
lag |
Integer. If, for example, |
Details
The argument structure is just as in covidcast_signal()
, except
the first four arguments data_source
, signal
, start_day
, end_day
are permitted to be vectors. The first two arguments data_source
and
signal
are recycled appropriately in the calls to covidcast_signal()
;
see example below. The next two arguments start_day
, end_day
, unless
NULL
, must be either length 1 or N.
See vignette("multi-signals")
for additional examples.
Value
A list of covidcast_signal
data frames, of length N = max(length(data_source), length(signal))
. This list can be aggregated
into a single data frame of either "wide" or "long" format using
aggregate_signals()
.
See Also
covidcast_signal()
, aggregate_signals()
Examples
## Not run:
## Fetch USAFacts confirmed cases and deaths over the same time period
covidcast_signals("usa-facts", signal = c("confirmed_incidence_num",
"deaths_incidence_num"),
start_day = "2020-08-15", end_day = "2020-10-01")
## End(Not run)
Get state abbreviations from FIPS codes
Description
Look up state abbreviations by FIPS codes (including District of Columbia and Puerto Rico). Will match the first two digits of the input codes, so should work for 5-digit county codes, or even longer tract and census block FIPS codes.
Usage
fips_to_abbr(code)
Arguments
code |
Vector of FIPS codes to look up; will match the first two digits of the code. Note that these are treated as strings; the number 1 will not match "01". |
Value
A vector of state abbreviations.
See Also
Examples
fips_to_abbr("42000")
fips_to_abbr(c("42", "72", "11"))
Fetch the latest or earliest issue for each observation
Description
The data returned from covidcast_signal()
or covidcast_signals()
can, if
called with the issues
argument, contain multiple issues for a single
observation in a single location. These functions filter the data frame to
contain only the earliest issue or only the latest issue.
Usage
latest_issue(df)
earliest_issue(df)
Arguments
df |
A |
Value
A data frame in the same form, but with only the earliest or latest issue of every observation. Note that these functions sort the data frame as part of their filtering, so the output data frame rows may be in a different order.
Metro area population data
Description
Data set on metropolitan area populations, from the 2019 US Census. This includes metropolitan and micropolitan statistical areas, although the COVIDcast API only supports fetching data from metropolitan statistical areas.
Usage
msa_census
Format
A data frame with 2797 rows, each representing one core-based statistical area (including metropolitan and micropolitan statistical areas, county or county equivalents, and metropolitan divisions). There are many columns. The most crucial are:
- CBSA
Core Based Statistical Area code. These are unique identifiers used, for example, as the
geo_values
argument tocovidcast_signal()
when requesting data from specific metro areas (withgeo_type = 'msa'
).- MDIV
Metropolitan Division code
- STCOU
State and county code
- NAME
Name or title of the area.
- LSAD
Legal/Statistical Area Description, identifying if this is a metropolitan or micropolitan area, a metropolitan division, or a county.
- STATE
State FIPS code.
- POPESTIMATE2019
Estimate of the area's resident population as of July 1, 2019.
Source
United States Census Bureau, at https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/metro/totals/cbsa-est2019-alldata.csv
References
Census Bureau documentation of all columns and their meaning: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/metro/totals/cbsa-est2019-alldata.pdf
See Also
cbsa_to_name()
, name_to_cbsa()
Get state abbreviations from state names
Description
Look up state abbreviations by state names (including District of Columbia
and Puerto Rico); this function is based on grep()
, and hence allows for
regular expressions.
Usage
name_to_abbr(
name,
ignore.case = FALSE,
perl = FALSE,
fixed = FALSE,
ties_method = c("first", "all")
)
Arguments
name |
Vector of state names to look up. |
ignore.case , perl , fixed |
Arguments to pass to |
ties_method |
If "first", then only the first match for each name is returned. If "all", then all matches for each name are returned. |
Value
A vector of state abbreviations if ties_method
equals "first", and
a list of state abbreviations otherwise.
See Also
Examples
name_to_abbr("Penn")
name_to_abbr(c("Penn", "New"), ties_method = "all")
Get FIPS or CBSA codes from county or metropolitan area names
Description
Look up FIPS or CBSA codes by county or metropolitan area names,
respectively; these functions are based on grep()
, and hence allow for
regular expressions.
Usage
name_to_fips(
name,
ignore.case = FALSE,
perl = FALSE,
fixed = FALSE,
ties_method = c("first", "all"),
state = NULL
)
name_to_cbsa(
name,
ignore.case = FALSE,
perl = FALSE,
fixed = FALSE,
ties_method = c("first", "all"),
state = NULL
)
Arguments
name |
Vector of county or metropolitan area names to look up. |
ignore.case , perl , fixed |
Arguments to pass to |
ties_method |
If "first", then only the first match for each name is returned. If "all", then all matches for each name are returned. |
state |
Two letter state abbreviation (case insensitive) indicating a
parent state used to restrict the search. For example, when |
Value
A vector of FIPS or CBSA codes if ties_method
equals "first", and a
list of FIPS or CBSA codes otherwise.
See Also
state_fips_to_name()
, cbsa_to_name()
Examples
name_to_fips("Allegheny")
name_to_cbsa("Pittsburgh")
name_to_fips("Miami")
name_to_fips("Miami", ties_method = "all")
name_to_fips(c("Allegheny", "Miami", "New "), ties_method = "all")
Plot covidcast_signal
object as choropleths, bubbles, or time series
Description
Several plot types are provided, including choropleth plots (maps), bubble
plots, and time series plots showing the change of signals over time, for a
data frame returned by covidcast_signal()
. (Only the latest issue from the
data frame is used for plotting.) See vignette("plotting-signals", package = "covidcast")
for examples.
Usage
## S3 method for class 'covidcast_signal'
plot(
x,
plot_type = c("choro", "bubble", "line"),
time_value = NULL,
include = c(),
range = NULL,
choro_col = c("#FFFFCC", "#FD893C", "#800026"),
alpha = 0.5,
bubble_col = "purple",
num_bins = 8,
title = NULL,
choro_params = list(),
bubble_params = list(),
line_params = list(),
...
)
Arguments
x |
The |
plot_type |
One of "choro", "bubble", "line" indicating whether to plot a choropleth map, bubble map, or line (time series) graph, respectively. The default is "choro". |
time_value |
Date object (or string in the form "YYYY-MM-DD") specifying
the day to map, for choropleth and bubble maps. If |
include |
Vector of state abbreviations (case insensitive, so "pa" and
"PA" both denote Pennsylvania) indicating which states to include in the
choropleth and bubble maps. Default is |
range |
Vector of two values: min and max, in this order, to use when
defining the color scale for choropleth maps and the size scale for bubble
maps, or the range of the y-axis for the time series plot. If |
choro_col |
Vector of colors, as specified in hex code, to use for the choropleth color scale. Can be arbitrary in length. Default is similar to that from https://delphi.cmu.edu/covidcast/. |
alpha |
Number between 0 and 1, indicating the transparency level to be used in the maps. For choropleth maps, this determines the transparency level for the mega counties. For bubble maps, this determines the transparency level for the bubbles. Default is 0.5. |
bubble_col |
Bubble color for the bubble map. Default is "purple". |
num_bins |
Number of bins for determining the bubble sizes for the
bubble map (here and throughout, to be precise, by bubble size we mean
bubble area). Default is 8. These bins are evenly-spaced in between the min
and max as specified through the |
title |
Title for the plot. If |
choro_params , bubble_params , line_params |
Additional parameter lists for the different plot types, for further customization. See details below. |
... |
Additional arguments, for compatibility with |
Details
The following named arguments are supported through the lists
choro_params
, bubble_params
, and line_params
.
For both choropleth and bubble maps:
subtitle
Subtitle for the map.
missing_col
Color assigned to missing or NA geo locations.
border_col
Border color for geo locations.
border_size
Border size for geo locations.
legend_position
Position for legend; use "none" to hide legend.
legend_height
,legend_width
Height and width of the legend.
breaks
Breaks for a custom (discrete) color or size scale. Note that we must set
breaks
to be a vector of the same length aschoro_col
for choropleth maps. This works as follows: we assign thei
th color for choropleth maps, or thei
th size for bubble maps, if and only if the given value satisfiesbreaks[i] <= value < breaks[i+1]
, where we take by conventionbreaks[0] = -Inf
andbreaks[N+1] = Inf
forN = length(breaks)
.legend_digits
Number of decimal places to show for the legend labels.
For choropleth maps only:
legend_n
Number of values to label on the legend color bar. Ignored for discrete color scales (when
breaks
is set manually).
For bubble maps only:
remove_zero
Should zeros be excluded from the size scale (hence effectively drawn as bubbles of zero size)?
min_size
,max_size
Min size for the size scale.
For line graphs:
xlab
,ylab
Labels for the x-axis and y-axis.
stderr_bands
Should standard error bands be drawn?
stderr_alpha
Transparency level for the standard error bands.
Value
A ggplot
object that can be customized and styled using standard
ggplot2 functions.
Print covidcast_meta
object
Description
Prints a brief summary of the metadata, and then prints the underlying data
frame, for an object returned by covidcast_meta()
.
Usage
## S3 method for class 'covidcast_meta'
print(x, ...)
Arguments
x |
The |
... |
Additional arguments passed to |
Value
The covidcast_meta
object, unchanged.
Print covidcast_signal
object
Description
Prints a brief summary of the data source, signal, and geographic level, and
then prints the underlying data frame, for an object returned by
covidcast_signal()
.
Usage
## S3 method for class 'covidcast_signal'
print(x, ...)
Arguments
x |
The |
... |
Additional arguments passed to |
Value
The covidcast_signal
object, unchanged.
State population data
Description
Data set on state populations, from the 2019 US Census.
Usage
state_census
Format
Data frame with 53 rows (including one for the United States as a whole, plus the District of Columbia and the Puerto Rico Commonwealth). Important columns:
- SUMLEV
Geographic summary level.
- REGION
Census Region code
- DIVISION
Census Division code
- STATE
State FIPS code
- NAME
Name of the state
- POPESTIMATE2019
Estimate of the state's resident population in 2019.
- POPEST18PLUS2019
Estimate of the state's resident population in 2019 that is over 18 years old.
- PCNT_POPEST18PLUS
Estimate of the percent of a state's resident population in 2019 that is over 18.
- ABBR
Postal abbreviation of the state
Source
United States Census Bureau, at https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.pdf, https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-puerto-rico-municipios.html, and https://www.census.gov/data/tables/2010/dec/2010-island-areas.html
See Also
abbr_to_name()
, name_to_abbr()
, abbr_to_fips()
, fips_to_abbr()
Get state, county or metropolitan area names from FIPS or CBSA codes
Description
Look up county or metropolitan area names by FIPS or CBSA codes. Looking up FIPS code is done with the first 2 numbers (state) or 5 numbers (county) and therefore can be called with longer FIPS codes.
Usage
state_fips_to_name(code)
county_fips_to_name(code)
cbsa_to_name(code)
Arguments
code |
Vector of FIPS or CBSA codes to look up. |
Value
A vector of state, county or metro names.
See Also
name_to_fips()
, name_to_cbsa()
Examples
state_fips_to_name("42")
state_fips_to_name("42003") # same as previous
county_fips_to_name("42003")
county_fips_to_name("42000") # the county "000" returns the state name
cbsa_to_name("38300")
Summarize covidcast_meta
object
Description
Prints a tabular summary of the object returned by covidcast_meta()
,
containing each source and signal and a summary of the geographic levels it
is available at.
Usage
## S3 method for class 'covidcast_meta'
summary(object, ...)
Arguments
object |
The |
... |
Additional arguments, for compatibility with |
Value
A data frame with one row per unique signal in the metadata, with the following columns:
data_source |
Data source name |
signal |
Signal name |
county |
"*" if this signal is available at the county level, |
msa |
|
dma |
|
hrr |
|
state |
|
hhs |
|
nation |
|
Summarize covidcast_signal
object
Description
Prints a variety of summary statistics about the underlying data, such as
median values, the date range included, sample sizes, and so on, for an
object returned by covidcast_signal()
.
Usage
## S3 method for class 'covidcast_signal'
summary(object, ...)
Arguments
object |
The |
... |
Additional arguments, for compatibility with |
Value
No return value; called only to print summary statistics.