Title: | Data on the States and Counties of the United States |
Version: | 0.3.1 |
Description: | Demographic data on the United States at the county and state levels spanning multiple years. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.1 |
URL: | https://github.com/OpenIntroStat/usdata, https://openintrostat.github.io/usdata/ |
BugReports: | https://github.com/OpenIntroStat/usdata/issues |
Suggests: | dplyr, ggplot2, maps, lubridate, sf, testthat |
Imports: | tibble |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2024-06-02 01:19:18 UTC; mine |
Author: | Mine Çetinkaya-Rundel
|
Maintainer: | Mine Çetinkaya-Rundel <cetinkaya.mine@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-06-02 09:40:02 UTC |
usdata: Data on the States and Counties of the United States
Description
Demographic data on the United States at the county and state levels spanning multiple years.
Author(s)
Maintainer: Mine Çetinkaya-Rundel cetinkaya.mine@gmail.com (ORCID)
Authors:
David Diez david@openintro.org
Leah Dorazio leah.dorazio@sfuhs.org
See Also
Useful links:
Report bugs at https://github.com/OpenIntroStat/usdata/issues
Convert state abbreviations to names
Description
Two utility functions. One converts state names to the state abbreviations, and the second does the opposite.
Usage
abbr2state(abbr)
Arguments
abbr |
A vector of state abbreviation. |
Value
Returns a vector of the same length with the corresponding state names or abbreviations.
Author(s)
David Diez
See Also
state2abbr
, county
, county_complete
Examples
abbr2state("MN")
Airline Delays for December 2019 and 2020.
Description
Summary Data counts for airline per carrier per US City.
Usage
airline_delay
Format
A data frame with 3351 rows and 21 variables.
- year
Year data collected
- month
Numeric representation of the month
- carrier
Carrier.
- carrier_name
Carrier Name.
- airport
Airport code.
- airport_name
Name of airport.
- arr_flights
Number of flights arriving at airport
- arr_del15
Number of flights more than 15 minutes late
- carrier_ct
Number of flights delayed due to air carrier. (e.g. no crew)
- weather_ct
Number of flights due to weather.
- nas_ct
Number of flights delayed due to National Aviation System (e.g. heavy air traffic).
- security_ct
Number of flights canceled due to a security breach.
- late_aircraft_ct
Number of flights delayed as a result of another flight on the same aircraft delayed
- arr_cancelled
Number of cancelled flights
- arr_diverted
Number of flights that were diverted
- arr_delay
Total time (minutes) of delayed flight.
- carrier_delay
Total time (minutes) of delay due to air carrier
- weather_delay
Total time (minutes) of delay due to inclement weather.
- nas_delay
Total time (minutes) of delay due to National Aviation System.
- security_delay
Total time (minutes) of delay as a result of a security issue .
- late_aircraft_delay
Total time (minutes) of delay flights as a result of a previous flight on the same airplane being late.
Source
Bureau of Transportation Statistics
Examples
library(ggplot2)
ggplot(airline_delay, aes(arr_flights, arr_del15, color = as.factor(year))) +
geom_point(alpha = 0.3) +
labs(
x = "Total Number of inbound flights",
y = "Number of flights delayed by more than 15 mins",
title = "Inbound vs delayed flights by year",
color = "Year"
)
United States Counties
Description
Data for 3142 counties in the United States. See the
county_complete
data set for additional variables.
Usage
county
Format
A data frame with 3142 observations on the following 14 variables.
- name
County names.
- state
State names.
- pop2000
Population in 2000.
- pop2010
Population in 2010.
- pop2017
Population in 2017.
- pop_change
Population change from 2010 to 2017.
- poverty
Percent of population in poverty in 2017.
- homeownership
Home ownership rate, 2006-2010.
- multi_unit
Percent of housing units in multi-unit structures, 2006-2010.
- unemployment_rate
Unemployment rate in 2017.
- metro
Whether the county contains a metropolitan area.
- median_edu
Median education level (2013-2017).
- per_capita_income
Per capita (per person) income (2013-2017).
- median_hh_income
Median household income.
- smoking_ban
Describes whether the type of county-level smoking ban in place in 2010, taking one of the values
"none"
,"partial"
, or"comprehensive"
.
Source
These data were collected from Census Quick Facts (no longer available as of 2020) and its accompanying pages. Smoking ban data were from a variety of sources.
See Also
Examples
library(ggplot2)
ggplot(county, aes(x = median_edu, y = median_hh_income)) +
geom_boxplot()
American Community Survey 2019
Description
Data for 3142 counties in the United States with many variables of the 2019 American Community Survey.
Usage
county_2019
Format
A data frame with 3142 observations on the following 95 variables.
- state
State.
- name
County name.
- fips
FIPS code.
- median_individual_income
Median individual income (2019).
- median_individual_income_moe
Margin of error for
median_individual_income
.- pop
2019 population.
- pop_moe
Margin of error for
pop
.- white
Percent of population that is white alone (2015-2019).
- white_moe
Margin of error for
white
.- black
Percent of population that is black alone (2015-2019).
- black_moe
Margin of error for
black
.- native
Percent of population that is Native American alone (2015-2019).
- native_moe
Margin of error for
native
.- asian
Percent of population that is Asian alone (2015-2019).
- asian_moe
Margin of error for
asian
.- pac_isl
Percent of population that is Native Hawaiian or other Pacific Islander alone (2015-2019).
- pac_isl_moe
Margin of error for
pac_isl
.- other_single_race
Percent of population that is some other race alone (2015-2019).
- other_single_race_moe
Margin of error for
other_single_race
.- two_plus_races
Percent of population that is two or more races (2015-2019).
- two_plus_races_moe
Margin of error for
two_plus_races
.- hispanic
Percent of population that identifies as Hispanic or Latino (2015-2019).
- hispanic_moe
Margin of error for
hispanic
.- white_not_hispanic
Percent of population that is white alone, not Hispanic or Latino (2015-2019).
- white_not_hispanic_moe
Margin of error for
white_not_hispanic
.- median_age
Median age (2015-2019).
- median_age_moe
Margin of error for
median_age
.- age_under_5
Percent of population under 5 (2015-2019).
- age_under_5_moe
Margin of error for
age_under_5
.- age_over_85
Percent of population 85 and over (2015-2019).
- age_over_85_moe
Margin of error for
age_over_85
.- age_over_18
Percent of population 18 and over (2015-2019).
- age_over_18_moe
Margin of error for
age_over_18
.- age_over_65
Percent of population 65 and over (2015-2019).
- age_over_65_moe
Margin of error for
age_over_65
.- mean_work_travel
Mean travel time to work (2015-2019).
- mean_work_travel_moe
Margin of error for
mean_work_travel
.- persons_per_household
Persons per household (2015-2019)
- persons_per_household_moe
Margin of error for
persons_per_household
.- avg_family_size
Average family size (2015-2019).
- avg_family_size_moe
Margin of error for
avg_family_size
.- housing_one_unit_structures
Percent of housing units in 1-unit structures (2015-2019).
- housing_one_unit_structures_moe
Margin of error for
housing_one_unit_structures
.- housing_two_unit_structures
Percent of housing units in multi-unit structures (2015-2019).
- housing_two_unit_structures_moe
Margin of error for
housing_two_unit_structures
.- housing_mobile_homes
Percent of housing units in mobile homes and other types of units (2015-2019).
- housing_mobile_homes_moe
Margin of error for
housing_mobile_homes
.- median_individual_income_age_25plus
Median individual income (2019 dollars, 2015-2019).
- median_individual_income_age_25plus_moe
Margin of error for
median_individual_income_age_25plus
.- hs_grad
Percent of population 25 and older that is a high school graduate (2015-2019).
- hs_grad_moe
Margin of error for
hs_grad
.- bachelors
Percent of population 25 and older that earned a Bachelor's degree or higher (2015-2019).
- bachelors_moe
Margin of error for
bachelors
.- households
Total households (2015-2019).
- households_moe
Margin of error for
households
.- households_speak_spanish
Percent of households speaking Spanish (2015-2019).
- households_speak_spanish_moe
Margin of error for
households_speak_spanish
.- households_speak_other_indo_euro_lang
Percent of households speaking other Indo-European language (2015-2019).
- households_speak_other_indo_euro_lang_moe
Margin of error for
households_speak_other_indo_euro_lang
.- households_speak_asian_or_pac_isl
Percent of households speaking Asian and Pacific Island language (2015-2019).
- households_speak_asian_or_pac_isl_moe
Margin of error for
households_speak_asian_or_pac_isl
.- households_speak_other
Percent of households speaking non European or Asian/Pacific Island language (2015-2019).
- households_speak_other_moe
Margin of error for
households_speak_other
.- households_speak_limited_english
Percent of limited English-speaking households (2015-2019).
- households_speak_limited_english_moe
Margin of error for
households_speak_limited_english
.- poverty
Percent of population below the poverty level (2015-2019).
- poverty_moe
Margin of error for
poverty
.- poverty_under_18
Percent of population under 18 below the poverty level (2015-2019).
- poverty_under_18_moe
Margin of error for
poverty_under_18
.- poverty_65_and_over
Percent of population 65 and over below the poverty level (2015-2019).
- poverty_65_and_over_moe
Margin of error for
poverty_65_and_over
.- mean_household_income
Mean household income (2019 dollars, 2015-2019).
- mean_household_income_moe
Margin of error for
mean_household_income
.- per_capita_income
Per capita money income in past 12 months (2019 dollars, 2015-2019).
- per_capita_income_moe
Margin of error for
per_capita_income
.- median_household_income
Median household income (2015-2019).
- median_household_income_moe
Margin of error for
median_household_income
.- veterans
Percent among civilian population 18 and over that are veterans (2015-2019).
- veterans_moe
Margin of error for
veterans
.- unemployment_rate
Unemployment rate among those ages 20-64 (2015-2019).
- unemployment_rate_moe
Margin of error for
unemployment_rate
.- uninsured
Percent of civilian noninstitutionalized population that is uninsured (2015-2019).
- uninsured_moe
Margin of error for
uninsured
.- uninsured_under_6
Percent of population under 6 years that is uninsured (2015-2019).
- uninsured_under_6_moe
Margin of error for
uninsured_under_6
.- uninsured_under_19
Percent of population under 19 that is uninsured (2015-2019).
- uninsured_under_19_moe
Margin of error for
uninsured_under_19
.- uninsured_65_and_older
Percent of population 65 and older that is uninsured (2015-2019).
- uninsured_65_and_older_moe
Margin of error for
uninsured_65_and_older
.- household_has_computer
Percent of households that have desktop or laptop computer (2015-2019).
- household_has_computer_moe
Margin of error for
household_has_computer
.- household_has_smartphone
Percent of households that have smartphone (2015-2019).
- household_has_smartphone_moe
Margin of error for
household_has_smartphone
.- household_has_broadband
Percent of households that have broadband internet subscription (2015-2019).
- household_has_broadband_moe
Margin of error for
household_has_broadband
.
Source
The data were downloaded via the tidycensus
R package.
See Also
Examples
library(ggplot2)
ggplot(
county_2019,
aes(
x = hs_grad, y = median_individual_income,
size = sqrt(pop) / 1000
)
) +
geom_point(alpha = 0.5) +
scale_color_discrete(na.translate = FALSE) +
guides(size = FALSE) +
labs(
x = "Percentage of population graduated from high school",
y = "Median individual income"
)
United States Counties
Description
Data for 3142 counties in the United States.
Usage
county_complete
Format
A data frame with 3142 observations on the following 188 variables.
- state
State.
- name
County name.
- fips
FIPS code.
- pop2000
2000 population.
- pop2010
2010 population.
- pop2011
2011 population.
names
- pop2012
2012 population.
- pop2013
2013 population.
- pop2014
2014 population.
- pop2015
2015 population.
- pop2016
2016 population.
- pop2017
2017 population.
- age_under_5_2010
Percent of population under 5 (2010).
- age_under_5_2017
Percent of population under 5 (2017).
- age_under_18_2010
Percent of population under 18 (2010).
- age_over_65_2010
Percent of population over 65 (2010).
- age_over_65_2017
Percent of population over 65 (2017).
- median_age_2017
Median age (2017).
- female_2010
Percent of population that is female (2010).
- white_2010
Percent of population that is white (2010).
- black_2010
Percent of population that is black (2010).
- black_2017
Percent of population that is black (2017).
- native_2010
Percent of population that is a Native American (2010).
- native_2017
Percent of population that is a Native American (2017).
- asian_2010
Percent of population that is a Asian (2010).
- asian_2017
Percent of population that is a Asian (2017).
- pac_isl_2010
Percent of population that is Hawaii or Pacific Islander (2010).
- pac_isl_2017
Percent of population that is Hawaii or Pacific Islander (2017).
- other_single_race_2017
Percent of population that identifies as another single race (2017).
- two_plus_races_2010
Percent of population that identifies as two or more races (2010).
- two_plus_races_2017
Percent of population that identifies as two or more races (2017).
- hispanic_2010
Percent of population that is Hispanic (2010).
- hispanic_2017
Percent of population that is Hispanic (2017).
- white_not_hispanic_2010
Percent of population that is white and not Hispanic (2010).
- white_not_hispanic_2017
Percent of population that is white and not Hispanic (2017).
- speak_english_only_2017
Percent of population that speaks English only (2017).
- no_move_in_one_plus_year_2010
Percent of population that has not moved in at least one year (2006-2010).
- foreign_born_2010
Percent of population that is foreign-born (2006-2010).
- foreign_spoken_at_home_2010
Percent of population that speaks a foreign language at home (2006-2010).
- women_16_to_50_birth_rate_2017
Birth rate for women ages 16 to 50 (2017).
- hs_grad_2010
Percent of population that is a high school graduate (2006-2010).
- hs_grad_2016
Percent of population that is a high school graduate (2012-2016).
- hs_grad_2017
Percent of population that is a high school graduate (2017).
- some_college_2016
Percent of population with some college education (2012-2016).
- some_college_2017
Percent of population with some college education (2017).
- bachelors_2010
Percent of population that earned a bachelor's degree (2006-2010).
- bachelors_2016
Percent of population that earned a bachelor's degree (2012-2016).
- bachelors_2017
Percent of population that earned a bachelor's degree (2017).
- veterans_2010
Percent of population that are veterans (2006-2010).
- veterans_2017
Percent of population that are veterans (2017).
- mean_work_travel_2010
Mean travel time to work (2006-2010).
- mean_work_travel_2017
Mean travel time to work (2017).
- broadband_2017
Percent of population who has access to broadband (2017).
- computer_2017
Percent of population who has access to a computer (2017).
- housing_units_2010
Number of housing units (2010).
- homeownership_2010
Home ownership rate (2006-2010).
- housing_multi_unit_2010
Housing units in multi-unit structures (2006-2010).
- median_val_owner_occupied_2010
Median value of owner-occupied housing units (2006-2010).
- households_2010
Households (2006-2010).
- households_2017
Households (2017).
- persons_per_household_2010
Persons per household (2006-2010).
- persons_per_household_2017
Persons per household (2017).
- per_capita_income_2010
Per capita money income in past 12 months (2010 dollars, 2006-2010)
- per_capita_income_2017
Per capita money income in past 12 months (2017 dollars, 2017)
- metro_2013
Whether the county contained a metropolitan area in 2013.
- median_household_income_2010
Median household income (2006-2010).
- median_household_income_2016
Median household income (2012-2016).
- median_household_income_2017
Median household income (2017).
- private_nonfarm_establishments_2009
Private nonfarm establishments (2009).
- private_nonfarm_employment_2009
Private nonfarm employment (2009).
- percent_change_private_nonfarm_employment_2009
Private nonfarm employment, percent change from 2000 to 2009.
- nonemployment_establishments_2009
Nonemployer establishments (2009).
- firms_2007
Total number of firms (2007).
- black_owned_firms_2007
Black-owned firms, percent (2007).
- native_owned_firms_2007
Native American-owned firms, percent (2007).
- asian_owned_firms_2007
Asian-owned firms, percent (2007).
- pac_isl_owned_firms_2007
Native Hawaiian and other Pacific Islander-owned firms, percent (2007).
- hispanic_owned_firms_2007
Hispanic-owned firms, percent (2007).
- women_owned_firms_2007
Women-owned firms, percent (2007).
- manufacturer_shipments_2007
Manufacturer shipments, 2007 ($1000).
- mercent_whole_sales_2007
Mercent wholesaler sales, 2007 ($1000).
- sales_2007
Retail sales, 2007 ($1000).
- sales_per_capita_2007
Retail sales per capita, 2007.
- accommodation_food_service_2007
Accommodation and food services sales, 2007 ($1000).
- building_permits_2010
Building permits (2010).
- fed_spending_2009
Federal spending, in thousands of dollars (2009).
- area_2010
Land area in square miles (2010).
- density_2010
Persons per square mile (2010).
- smoking_ban_2010
Describes whether the type of county-level smoking ban in place in 2010, taking one of the values
"none"
,"partial"
, or"comprehensive"
.- poverty_2010
Percent of population below poverty level (2006-2010).
- poverty_2016
Percent of population below poverty level (2012-2016).
- poverty_2017
Percent of population below poverty level (2017).
- poverty_age_under_5_2017
Percent of population under age 5 below poverty level (2017).
- poverty_age_under_18_2017
Percent of population under age 18 below poverty level (2017).
- civilian_labor_force_2007
Civilian labor force in 2007.
- employed_2007
Number of civilians employed in 2007.
- unemployed_2007
Number of civilians unemployed in 2007.
- unemployment_rate_2007
Unemployment rate in 2007.
- civilian_labor_force_2008
Civilian labor force in 2008.
- employed_2008
Number of civilians employed in 2008.
- unemployed_2008
Number of civilians unemployed in 2008.
- unemployment_rate_2008
Unemployment rate in 2008.
- civilian_labor_force_2009
Civilian labor force in 2009.
- employed_2009
Number of civilians employed in 2009.
- unemployed_2009
Number of civilians unemployed in 2009.
- unemployment_rate_2009
Unemployment rate in 2009.
- civilian_labor_force_2010
Civilian labor force in 2010.
- employed_2010
Number of civilians employed in 2010.
- unemployed_2010
Number of civilians unemployed in 2010.
- unemployment_rate_2010
Unemployment rate in 2010.
- civilian_labor_force_2011
Civilian labor force in 2011.
- employed_2011
Number of civilians employed in 2011.
- unemployed_2011
Number of civilians unemployed in 2011.
- unemployment_rate_2011
Unemployment rate in 2011.
- civilian_labor_force_2012
Civilian labor force in 2012.
- employed_2012
Number of civilians employed in 2012.
- unemployed_2012
Number of civilians unemployed in 2012.
- unemployment_rate_2012
Unemployment rate in 2012.
- civilian_labor_force_2013
Civilian labor force in 2013.
- employed_2013
Number of civilians employed in 2013.
- unemployed_2013
Number of civilians unemployed in 2013.
- unemployment_rate_2013
Unemployment rate in 2013.
- civilian_labor_force_2014
Civilian labor force in 2014.
- employed_2014
Number of civilians employed in 2014.
- unemployed_2014
Number of civilians unemployed in 2014.
- unemployment_rate_2014
Unemployment rate in 2014.
- civilian_labor_force_2015
Civilian labor force in 2015.
- employed_2015
Number of civilians employed in 2015.
- unemployed_2015
Number of civilians unemployed in 2015.
- unemployment_rate_2015
Unemployment rate in 2015.
- civilian_labor_force_2016
Civilian labor force in 2016.
- employed_2016
Number of civilians employed in 2016.
- unemployed_2016
Number of civilians unemployed in 2016.
- unemployment_rate_2016
Unemployment rate in 2016.
- uninsured_2017
Percent of population who are uninsured (2017).
- uninsured_age_under_6_2017
Percent of population under 6 who are uninsured (2017).
- uninsured_age_under_19_2017
Percent of population under 19 who are uninsured (2017).
- uninsured_age_over_74_2017
Percent of population under 74 who are uninsured (2017).
- civilian_labor_force_2017
Civilian labor force in 2017.
- employed_2017
Number of civilians employed in 2017.
- unemployed_2017
Number of civilians unemployed in 2017.
- unemployment_rate_2017
Unemployment rate in 2017.
- median_individual_income_2019
Median individual income (2019).
- pop_2019
2019 population.
- white_2019
Percent of population that is white alone (2015-2019).
- black_2019
Percent of population that is black alone (2015-2019).
- native_2019
Percent of population that is Native American alone (2015-2019).
- asian_2019
Percent of population that is Asian alone (2015-2019).
- pac_isl_2019
Percent of population that is Native Hawaiian or other Pacific Islander alone (2015-2019).
- other_single_race_2019
Percent of population that is some other race alone (2015-2019).
- two_plus_races_2019
Percent of population that is two or more races (2015-2019).
- hispanic_2019
Percent of population that identifies as Hispanic or Latino (2015-2019).
- white_not_hispanic_2019
Percent of population that is white alone, not Hispanic or Latino (2015-2019).
- median_age_2019
Median age (2015-2019).
- age_under_5_2019
Percent of population under 5 (2015-2019).
- age_over_85_2019
Percent of population 85 and over (2015-2019).
- age_over_18_2019
Percent of population 18 and over (2015-2019).
- age_over_65_2019
Percent of population 65 and over (2015-2019).
- mean_work_travel_2019
Mean travel time to work (2015-2019).
- persons_per_household_2019
Persons per household (2015-2019)
- avg_family_size_2019
Average family size (2015-2019).
- housing_one_unit_structures_2019
Percent of housing units in 1-unit structures (2015-2019).
- housing_two_unit_structures_2019
Percent of housing units in multi-unit structures (2015-2019).
- housing_mobile_homes_2019
Percent of housing units in mobile homes and other types of units (2015-2019).
- median_individual_income_age_25plus_2019
Median individual income (2019 dollars, 2015-2019).
- hs_grad_2019
Percent of population 25 and older that is a high school graduate (2015-2019).
- bachelors_2019
Percent of population 25 and older that earned a Bachelor's degree or higher (2015-2019).
- households_2019
Total households (2015-2019).
- households_speak_spanish_2019
Percent of households speaking Spanish (2015-2019).
- households_speak_other_indo_euro_lang_2019
Percent of households speaking other Indo-European language (2015-2019).
- households_speak_asian_or_pac_isl_2019
Percent of households speaking Asian and Pacific Island language (2015-2019).
- households_speak_other_2019
Percent of households speaking non European or Asian/Pacific Island language (2015-2019).
- households_speak_limited_english_2019
Percent of limited English-speaking households (2015-2019).
- poverty_2019
Percent of population below the poverty level (2015-2019).
- poverty_under_18_2019
Percent of population under 18 below the poverty level (2015-2019).
- poverty_65_and_over_2019
Percent of population 65 and over below the poverty level (2015-2019).
- mean_household_income_2019
Mean household income (2019 dollars, 2015-2019).
- per_capita_income_2019
Per capita money income in past 12 months (2019 dollars, 2015-2019).
- median_household_income_2019
Median household income (2015-2019).
- veterans_2019
Percent among civilian population 18 and over that are veterans (2015-2019).
- unemployment_rate_2019
Unemployment rate among those ages 20-64 (2015-2019).
- uninsured_2019
Percent of civilian noninstitutionalized population that is uninsured (2015-2019).
- uninsured_under_6_2019
Percent of population under 6 years that is uninsured (2015-2019).
- uninsured_under_19_2019
Percent of population under 19 that is uninsured (2015-2019).
- uninsured_65_and_older_2019
Percent of population 65 and older that is uninsured (2015-2019).
- household_has_computer_2019
Percent of households that have desktop or laptop computer (2015-2019).
- household_has_smartphone_2019
Percent of households that have smartphone (2015-2019).
- household_has_broadband_2019
Percent of households that have broadband internet subscription (2015-2019).
Source
The data prior to 2011 was from http://census.gov, though the exact page it came from is no longer available.
More recent data comes from the following sources.
Downloaded via the
tidycensus
R package.Download links for spreadsheets were found on https://www.ers.usda.gov/data-products/county-level-data-sets/download-data
Unemployment - Bureau of Labor Statistics - LAUS data - https://www.bls.gov/lau/.
Median Household Income - Census Bureau - Small Area Income and Poverty Estimates (SAIPE) data.
The original data table was prepared by USDA, Economic Research Service.
Census Bureau.
2012-16 American Community Survey 5-yr average.
The original data table was prepared by USDA, Economic Research Service.
Tim Parker (tparker at ers.usda.gov) is the contact for much of the new data incorporated into this data set.
See Also
Examples
library(dplyr)
library(ggplot2)
county_complete |>
mutate(
pop_change = 100 * ((pop2017 / pop2013) - 1),
metro_area = if_else(metro_2013 == 1, TRUE, FALSE)
) |>
ggplot(aes(
x = poverty_2016,
y = pop_change,
color = metro_area,
size = sqrt(pop2017) / 1e3
)) +
geom_point(alpha = 0.5) +
scale_color_discrete(na.translate = FALSE) +
guides(size = FALSE) +
labs(
x = "Percentage of population in poverty (2016)",
y = "Percentage population change between 2013 to 2017",
color = "Metropolitan area",
title = "Population change and poverty"
)
# Counties with high population change
county_complete |>
mutate(pop_change = 100 * ((pop2017 / pop2013) - 1)) |>
filter(pop_change < -10 | pop_change > 25) |>
select(state, name, fips, pop_change)
# Population by metro area
county_complete |>
mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |>
filter(!is.na(metro_area)) |>
ggplot(aes(x = metro_area, y = log(pop2017))) +
geom_violin() +
labs(
x = "Metro area",
y = "Log of population in 2017",
title = "Population by metro area"
)
# Poverty and median household income
county_complete |>
mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |>
ggplot(aes(
x = poverty_2016,
y = median_household_income_2016,
color = metro_area,
size = sqrt(pop2017) / 1e3
)) +
geom_point(alpha = 0.5) +
scale_color_discrete(na.translate = FALSE) +
guides(size = FALSE) +
labs(
x = "Percentage of population in poverty (2016)",
y = "Median household income (2016)",
color = "Metropolitan area",
title = "Poverty and median household income"
)
# Unemployment rate and poverty
county_complete |>
mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |>
ggplot(aes(
x = unemployment_rate_2017,
y = poverty_2016,
color = metro_area,
size = sqrt(pop2017) / 1e3
)) +
geom_point(alpha = 0.5) +
scale_color_discrete(na.translate = FALSE) +
guides(size = FALSE) +
labs(
x = "Unemployment rate (2017)",
y = "Percentage of population in poverty (2016)",
color = "Metropolitan area",
title = "Unemployment rate and poverty"
)
Fatal Police Shootings data.
Description
A subset of the Washington Post database. Contains records of every fatal police shooting by an on-duty officer since January 1, 2015.
Usage
fatal_police_shootings
Format
A data frame with 6421 rows and 12 variables.
- date
date of fatal shooting.
- manner_of_death
shot or shot and Tasered.
- armed
Indicates if the victim was armed with some sort of implement that a police officer believed could inflict harm.
- age
the age of the victim.
- gender
The gender of the victim. The Post identifies victims by the gender they identify with if reports indicate that it differs from their biological sex.
- race
W White non-Hispanic; B Black non-Hispanic; A Asian; N Native American; H Hispanic; O Other None unknown.
- city
The municipality where the fatal shooting took place. Note that in some cases this field may contain a county name if a more specific municipality is unavailable or unknown.
- state
two-letter postal code abbreviation.
- signs_of_mental_illness
If news reports have indicated the victim had a history of mental health issues, expressed suicidal intentions or was experiencing mental distress at the time of the shooting.
- threat_level
The general criteria for the attack label was that there was the most direct and immediate threat to life that would include incidents where officers or others were shot at, threatened with a gun, attacked with other weapons or physical force, etc. ; the attack category is meant to flag the highest level of threat; the other and undetermined categories represent all remaining cases; other includes many incidents where officers or others faced significant threats.
- flee
If news reports have indicated the victim was moving away from officers by Foot, by Car, or Not fleeing.
- body_camera
If news reports have indicated an officer was wearing a body camera and it may have recorded some portion of the incident.
Source
Examples
library(dplyr)
# List race frequency and percentage
fatal_police_shootings |>
group_by(race) |>
summarize(n = n()) |>
mutate(freq = n / sum(n) * 100)
# List different weapons that victims were armed with
fatal_police_shootings |>
distinct(armed)
Gerrymander
Description
A dataset on gerrymandering and its influence on House elections. The data set was originally built by Jeff Whitmer.
Usage
gerrymander
Format
A data frame with 435 rows and 12 variables:
- district
Congressional district.
- last_name
Last name of 2016 election winner.
- first_name
First name of 2016 election winnner.
- party16
Political party of 2016 election winner.
- clinton16
Percent of vote received by Clinton in 2016 Presidential Election.
- trump16
Percent of vote received by Trump in 2016 Presidential Election.
- dem16
Did a Democrat win the 2016 House election. Levels of 1 (yes) and 0 (no).
- state
State the Representative is from.
- party18
Political Party of the 2018 election winner.
- dem18
Did a Democrat win the 2018 House election. Levels of 1 (yes) and 0 (no).
- flip18
Did a Democrat flip the seat in the 2018 election? Levels of 1 (yes) and 0 (no).
- gerry
Categorical variable for prevalence of gerrymandering with levels of low, mid and high.
Source
Examples
library(ggplot2)
library(dplyr)
ggplot(gerrymander |> filter(gerry != "mid"), aes(clinton16, dem16, color = gerry)) +
geom_jitter(height = 0.05, size = 3, shape = 1) +
geom_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE) +
scale_color_manual(values = c("purple", "orange")) +
labs(
title = "Logistic Regression of 2016 House Elections",
subtitle = "by Congressional District",
x = "Percent of Presidential Vote Won by Clinton",
y = "Seat Won by Democrat Candidate",
color = "Gerrymandering"
)
Election results for 2010 Governor races in the U.S.
Description
Election results for 2010 Governor races in the U.S.
Usage
govrace10
Format
A data frame with 37 observations on the following 23 variables.
- id
Unique identifier for the race, which does not overlap with other 2010 races (see
houserace10
andsenaterace10
)- state
State name
- abbr
State name abbreviation
- name1
Name of the winning candidate
- perc1
Percentage of vote for winning candidate (if more than one candidate)
- party1
Party of winning candidate
- votes1
Number of votes for winning candidate
- name2
Name of candidate with second most votes
- perc2
Percentage of vote for candidate who came in second
- party2
Party of candidate with second most votes
- votes2
Number of votes for candidate who came in second
- name3
Name of candidate with third most votes
- perc3
Percentage of vote for candidate who came in third
- party3
Party of candidate with third most votes
- votes3
Number of votes for candidate who came in third
- name4
Name of candidate with fourth most votes
- perc4
Percentage of vote for candidate who came in fourth
- party4
Party of candidate with fourth most votes
- votes4
Number of votes for candidate who came in fourth
- name5
Name of candidate with fifth most votes
- perc5
Percentage of vote for candidate who came in fifth
- party5
Party of candidate with fifth most votes
- votes5
Number of votes for candidate who came in fifth
Source
MSNBC.com, retrieved 2010-11-09.
Examples
table(govrace10$party1, govrace10$party2)
Election results for the 2010 U.S. House of Represenatives races
Description
Election results for the 2010 U.S. House of Represenatives races
Usage
houserace10
Format
A data frame with 435 observations on the following 24 variables.
- id
Unique identifier for the race, which does not overlap with other 2010 races (see
govrace10
andsenaterace10
)- state
State name
- abbr
State name abbreviation
- num
District number for the state
- name1
Name of the winning candidate
- perc1
Percentage of vote for winning candidate (if more than one candidate)
- party1
Party of winning candidate
- votes1
Number of votes for winning candidate
- name2
Name of candidate with second most votes
- perc2
Percentage of vote for candidate who came in second
- party2
Party of candidate with second most votes
- votes2
Number of votes for candidate who came in second
- name3
Name of candidate with third most votes
- perc3
Percentage of vote for candidate who came in third
- party3
Party of candidate with third most votes
- votes3
Number of votes for candidate who came in third
- name4
Name of candidate with fourth most votes
- perc4
Percentage of vote for candidate who came in fourth
- party4
Party of candidate with fourth most votes
- votes4
Number of votes for candidate who came in fourth
- name5
Name of candidate with fifth most votes
- perc5
Percentage of vote for candidate who came in fifth
- party5
Party of candidate with fifth most votes
- votes5
Number of votes for candidate who came in fifth
Details
This analysis in the Examples section was inspired by and is similar to that of Nate Silver's district-level analysis on the FiveThirtyEight blog in the New York Times: https://fivethirtyeight.com/features/2010-an-aligning-election/
Source
MSNBC.com, retrieved 2010-11-09.
Examples
hr <- table(houserace10[, c("abbr", "party1")])
nr <- apply(hr, 1, sum)
pr <- prrace08[prrace08$state != "DC", c("state", "p_obama")]
hr <- hr[as.character(pr$state), ]
(fit <- glm(hr ~ pr$p_obama, family = binomial))
x1 <- pr$p_obama[match(houserace10$abbr, pr$state)]
y1 <- (houserace10$party1 == "Democrat") + 0
g <- glm(y1 ~ x1, family = binomial)
x <- pr$p_obama[pr$state != "DC"]
nr <- apply(hr, 1, sum)
plot(x, hr[, "Democrat"] / nr,
pch = 19, cex = sqrt(nr), col = "#22558844",
xlim = c(20, 80), ylim = c(0, 1),
xlab = "Percent vote for Obama in 2008",
ylab = "Probability of Democrat winning House seat"
)
X <- seq(0, 100, 0.1)
lo <- -5.6079 + 0.1009 * X
p <- exp(lo) / (1 + exp(lo))
lines(X, p)
abline(h = 0:1, lty = 2, col = "#888888")
Pierce County House Sales Data for 2020
Description
Real estate sales for Pierce County, WA in 2020.
Usage
pierce_county_house_sales
Format
A data frame with 16814 rows and 19 variables.
- sale_date
Date the legal document (deed) was executed.
- sale_price
Dollar amount recorded for the sale.
- house_square_feet
Sum of the square feet for the building.
- attic_finished_square_feet
Finished living area in the attic.
- basement_square_feet
Total square footage of the basement..
- attached_garage_square_feet
Total square footage of the attached or built in garage(s).
- detached_garage_square_feet
Total detached garage(s) square footage.
- fireplaces
Total count of single, double or PreFab stoves.
- hvac_description
Text description associated with the predominant heating source for the built-as structure i.e. Forced Air, Electric Baseboard, Steam, etc. .
- exterior
Predominant type of construction materials used for the exterior siding on Residential Buildings.
- interior
Predominant type of materials used on the interior walls. i.e. Sheetrock or Paneling.
- stories
Number of floors/building levels above grade. Stories do not include attic or basement areas.
- roof_cover
Material used for the roof. I.e. Composition Shingles, Wood Shake, Concrete Tile, etc.
- year_built
Year the building was built, as stated by the building permit or a historical record.
- bedrooms
Number of bedrooms listed for a residential property.
- bathrooms
Number of baths listed for a residential property. The number is listed as a decimal, i.e. 2.75 = two full and one three-quarter baths. A tub/sink/toilet combination (plus any additional fixtures) is considered 1.0 bath. A shower/sink/toilet combination (plus any additional fixtures) is 0.75 bath. A sink/toilet combination is .5 bath.
- waterfront_type
Describes the type of waterfront the property adjoins or has legal access to.
- view_quality
Assigned to reflect the market appeal of the overall view available from the dwelling or property.
- utility_sewer
Identifies if sewer/septic is installed, available or not available or if the property does not support an on site sewage disposal system.
Source
Examples
library(dplyr)
library(lubridate)
# List house sales frequency and average price grouped by month
pierce_county_house_sales |>
mutate(month_sale = month(sale_date)) |>
group_by(month_sale) |>
summarize(freq = n(), mean_price = mean(sale_price)) |>
arrange(desc(freq))
# List house sales frequency and average price group by waterfront type
pierce_county_house_sales |>
group_by(waterfront_type) |>
summarize(freq = n(), mean_price = mean(sale_price)) |>
arrange(desc(mean_price))
Population Age 2019 Data.
Description
State level data on population by age.
Usage
pop_age_2019
Format
A data frame with 2820 rows and 4 variables.
- state
State as 2 letter abbreviation.
- state_name
State name.
- age
Age cohort for population.
- population
Population of age cohort.
- state_total_population
total estimated state population in 2019
Source
Centers for Disease Control and Prevention
Examples
library(dplyr)
# List age population for each state with percent of total
pop_age_2019 |>
group_by(state_name, age) |>
mutate(percent = population / state_total_population * 100) |>
select(state_name, age, population, percent)
pop_age_2019 |>
select(state_name, state_total_population) |>
distinct() |>
arrange(desc(state_total_population))
Population Race 2019 Data.
Description
State level data on population by race.
Usage
pop_race_2019
Format
A data frame with 2820 rows and 4 variables.
- state
State as 2 letter abbreviation.
- state_name
State name.
- race
race cohort for population.
- hispanic
indicates whether population is Hispanic or Latino
- population
Population of race cohort.
- state_total_population
total estimated state population in 2019
Source
Centers for Disease Control and Prevention
Examples
library(dplyr)
# List race population for each state with percent of total
pop_race_2019 |>
group_by(state_name, race, hispanic) |>
mutate(percent = population / state_total_population * 100) |>
select(state_name, race, hispanic, population, percent)
pop_race_2019 |>
select(state_name, state_total_population) |>
distinct() |>
arrange(desc(state_total_population))
Presidential Power.
Description
Data from a Pew Research Center poll about Presidential power/control over gas prices.
Usage
prez_pwr
Format
A data frame with 365 rows and 3 variables.
- president
Sitting President at time of the poll.
- party
Political party of the respondent with levels d(emocrat) and r(epublican).
- has_pwr
Respondent answer to the question: "Is the price of gasoline something the president can do alot about, or is that beyond the president's control?"
Source
Pew Research Center, May 2006 & March 2012.
Examples
library(ggplot2)
ggplot(prez_pwr, aes(has_pwr, fill = party)) +
geom_bar() +
labs(
title = "Is the price of gasoline something the president can do alot about?",
x = "",
y = "Number of respondents",
fill = "Respondent Party"
) +
facet_wrap(~president)
Election results for the 2008 U.S. Presidential race
Description
Election results for the 2008 U.S. Presidential race
Usage
prrace08
Format
A data frame with 51 observations on the following 7 variables.
- state
State name abbreviation
- state_full
Full state name
- n_obama
Number of votes for Barack Obama
- p_obama
Proportion of votes for Barack Obama
- n_mc_cain
Number of votes for John McCain
- p_mc_cain
Proportion of votes for John McCain
- el_votes
Number of electoral votes for a state
Details
In Nebraska, 4 electoral votes went to McCain and 1 to Obama. Otherwise the electoral votes were a winner-take-all.
Source
Presidential Election of 2008, Electoral and Popular Vote Summary, retrieved 2011-04-21.
Examples
# ===> Obtain 2010 US House Election Data <===#
hr <- table(houserace10[, c("abbr", "party1")])
nr <- apply(hr, 1, sum)
# ===> Obtain 2008 President Election Data <===#
pr <- prrace08[prrace08$state != "DC", c("state", "p_obama")]
hr <- hr[as.character(pr$state), ]
(fit <- glm(hr ~ pr$p_obama, family = binomial))
# ===> Visualizing Binomial outcomes <===#
x <- pr$p_obama[pr$state != "DC"]
nr <- apply(hr, 1, sum)
plot(x, hr[, "Democrat"] / nr,
pch = 19, cex = sqrt(nr), col = "#22558844",
xlim = c(20, 80), ylim = c(0, 1), xlab = "Percent vote for Obama in 2008",
ylab = "Probability of Democrat winning House seat"
)
# ===> Logistic Regression <===#
x1 <- pr$p_obama[match(houserace10$abbr, pr$state)]
y1 <- (houserace10$party1 == "Democrat") + 0
g <- glm(y1 ~ x1, family = binomial)
X <- seq(0, 100, 0.1)
lo <- -5.6079 + 0.1009 * X
p <- exp(lo) / (1 + exp(lo))
lines(X, p)
abline(h = 0:1, lty = 2, col = "#888888")
Election results for the 2010 U.S. Senate races
Description
Election results for the 2010 U.S. Senate races
Usage
senaterace10
Format
A data frame with 38 observations on the following 23 variables.
- id
Unique identifier for the race, which does not overlap with other 2010 races (see
govrace10
andhouserace10
)- state
State name
- abbr
State name abbreviation
- name1
Name of the winning candidate
- perc1
Percentage of vote for winning candidate (if more than one candidate)
- party1
Party of winning candidate
- votes1
Number of votes for winning candidate
- name2
Name of candidate with second most votes
- perc2
Percentage of vote for candidate who came in second
- party2
Party of candidate with second most votes
- votes2
Number of votes for candidate who came in second
- name3
Name of candidate with third most votes
- perc3
Percentage of vote for candidate who came in third
- party3
Party of candidate with third most votes
- votes3
Number of votes for candidate who came in third
- name4
Name of candidate with fourth most votes
- perc4
Percentage of vote for candidate who came in fourth
- party4
Party of candidate with fourth most votes
- votes4
Number of votes for candidate who came in fourth
- name5
Name of candidate with fifth most votes
- perc5
Percentage of vote for candidate who came in fifth
- party5
Party of candidate with fifth most votes
- votes5
Number of votes for candidate who came in fifth
Source
MSNBC.com, retrieved 2010-11-09.
Examples
library(ggplot2)
ggplot(senaterace10, aes(x = perc1)) +
geom_histogram(binwidth = 5) +
labs(x = "Winning candidate vote percentage")
Convert state names to abbreviations
Description
Two utility functions. One converts state names to the state abbreviations, and the second does the opposite.
Usage
state2abbr(state)
Arguments
state |
A vector of state name, where there is a little fuzzy matching. |
Value
Returns a vector of the same length with the corresponding state names or abbreviations.
Author(s)
David Diez
See Also
abbr2state
, county
, county_complete
Examples
state2abbr("Minnesota")
# Some spelling/capitalization errors okay
state2abbr("mINnesta")
State-level data
Description
Information about each state collected from both the official US Census website and from various other sources.
Usage
state_stats
Format
A data frame with 51 observations on the following 23 variables.
- state
State name.
- abbr
State abbreviation (e.g.
"MN"
).- fips
FIPS code.
- pop2010
Population in 2010.
- pop2000
Population in 2000.
- homeownership
Home ownership rate.
- multiunit
Percent of living units that are in multi-unit structures.
- income
Average income per capita.
- med_income
Median household income.
- poverty
Poverty rate.
- fed_spend
Federal spending per capita.
- land_area
Land area.
- smoke
Percent of population that smokes.
- murder
Murders per 100,000 people.
- robbery
Robberies per 100,000.
- agg_assault
Aggravated assaults per 100,000.
- larceny
Larcenies per 100,000.
- motor_theft
Vehicle theft per 100,000.
- soc_sec
Percent of individuals collecting social security.
- nuclear
Percent of power coming from nuclear sources.
- coal
Percent of power coming from coal sources.
- tr_deaths
Traffic deaths per 100,000.
- tr_deaths_no_alc
Traffic deaths per 100,000 where alcohol was not a factor.
- unempl
Unemployment rate (February 2012, preliminary).
Source
Census Quick Facts (no longer available as of 2020),
InfoChimps (also no longer available as of 2020),
National Highway Traffic Safety Administration
(tr_deaths
, tr_deaths_no_alc
),
Bureau of Labor Statistics
(unempl
).
Examples
library(ggplot2)
library(dplyr)
library(maps)
states_selected <- state_stats |>
mutate(region = tolower(state)) |>
select(region, unempl, murder, nuclear)
states_map <- map_data("state") |>
inner_join(states_selected)
# Unemployment map
ggplot(states_map, aes(map_id = region)) +
geom_map(aes(fill = unempl), map = states_map) +
expand_limits(x = states_map$long, y = states_map$lat) +
scale_fill_viridis_c() +
labs(x = "", y = "", fill = "Unemployment\n(%)")
# Murder rate map
states_map |>
filter(region != "district of columbia") |>
ggplot(aes(map_id = region)) +
geom_map(aes(fill = murder), map = states_map) +
expand_limits(x = states_map$long, y = states_map$lat) +
scale_fill_viridis_c() +
labs(x = "", y = "", fill = "Murders\nper 100k")
# Nuclear energy map
ggplot(states_map, aes(map_id = region)) +
geom_map(aes(fill = nuclear), map = states_map) +
expand_limits(x = states_map$long, y = states_map$lat) +
scale_fill_viridis_c() +
labs(x = "", y = "", fill = "Nuclear energy\n(%)")
Summary of many state-level variables
Description
Census data for the 50 states plus DC and Puerto Rico.
Usage
urban_owner
Format
A data frame with 52 observations on the following 28 variables.
- state
State
- total_housing_units_2000
Total housing units available in 2000.
- total_housing_units_2010
Total housing units available in 2010.
- pct_vacant
a numeric vector
- occupied
Occupied.
- pct_owner_occupied
a numeric vector
- pop_st
a numeric vector
- area_st
a numeric vector
- pop_urban
a numeric vector
- poppct_urban
a numeric vector
- area_urban
a numeric vector
- areapct_urban
a numeric vector
- popden_urban
a numeric vector
- pop_ua
a numeric vector
- poppct_urban.1
a numeric vector
- area_ua
a numeric vector
- areapct_ua
a numeric vector
- popden_ua
a numeric vector
- pop_uc
a numeric vector
- poppct_uc
a numeric vector
- area_uc
a numeric vector
- areapct_uc
a numeric vector
- popden_uc
a numeric vector
- pop_rural
a numeric vector
- poppct_rural
a numeric vector
- area_rural
a numeric vector
- areapct_rural
a numeric vector
- popden_rural
a numeric vector
Source
US Census.
Examples
urban_owner
State summary info
Description
Census info for the 50 US states plus DC.
Usage
urban_rural_pop
Format
A data frame with 51 observations on the following 5 variables.
- state
US state.
- urban_in
a numeric vector
- urban_out
a numeric vector
- rural_farm
a numeric vector
- rural_nonfarm
a numeric vector
Source
US census.
Examples
urban_rural_pop
US Crime Rates
Description
National data on the number of crimes committed in the US between 1960 and 2019.
Usage
us_crime_rates
Format
A data frame with 60 rows and 12 variables.
- year
Year data was collected.
- population
Population of the United States the year data was collected.
- total
Total number of violent and property crimes committed.
- violent
Total number of violent crimes committed.
- property
Total number of property crimes committed.
- murder
Number of murders committed. Counted in violent total.
- forcible_rape
Number of forcible rapes committed. Counted in violent total.
- robbery
Number of robberies committed. Counted in violent total.
- aggravated_assault
Number of aggravated assaults committed. Counted in violent total.
- burglary
Number of burglaries committed. Counted in property total.
- larceny_theft
Number of larcency thefts committed. Counted in property total.
- vehicle_theft
Number of vehicle thefts committed. Counted in property total.
Source
Examples
library(ggplot2)
ggplot(us_crime_rates, aes(x = population, y = total)) +
geom_point() +
labs(
title = "Crimes V Population",
x = "Population",
y = "Total Number of Crimes"
)
ggplot(us_crime_rates, aes(x = murder)) +
geom_boxplot() +
labs(
title = "US Murders",
subtitle = "1960 - 2019",
x = "Number of Murders"
) +
theme(axis.text.y = element_blank())
US Temperature Data
Description
A representative set of monitoring locations were taken from NOAA data that had both years of interest (1950 and 2022). The information was collected so as to spread the measurements across the continental United States. Daily high and low temperatures are given for each of 24 weather stations.
Usage
us_temp
Format
A data frame with 17250 observations on the following 9 variables.
- station
Station ID, measurements from 24 stations.
- name
Name of the station.
- latitude
Latitude of the station.
- longitude
Longitude of the station.
- elevation
Elevation of the station.
- date
Date of observed temperature.
- tmax
High temp for the observed day.
- tmin
Low temp for the observed day.
- year
Factor variable for year, levels:
1950
and2022
.
Details
Please keep in mind that these are two annual snapshots from a few dozen arbitrarily selected weather stations. A complete analysis would consider more than two years of data and a more precise random sample uniformly distributed across the United States.
Source
https://www.ncei.noaa.gov/cdo-web/, retrieved 2023-09-23.
Examples
library(ggplot2)
library(maps)
library(sf)
library(dplyr)
# Summarize temperature by station and year for plotting
summarized_temp <- us_temp |>
group_by(station, year, latitude, longitude) |>
summarize(tmax_med = median(tmax, na.rm = TRUE), .groups = "drop") |>
mutate(plot_shift = ifelse(year == "1950", 0, 2))
# Make a map of the US as a baseline
usa <- st_as_sf(maps::map("state", fill = TRUE, plot = FALSE))
# Layer the US map with summarized temperatures
ggplot(data = usa) +
geom_sf() +
geom_point(
data = summarized_temp,
aes(x = longitude + plot_shift, y = latitude, fill = tmax_med, shape = year),
color = "black", size = 3
) +
scale_fill_gradient(high = "red", low = "yellow") +
scale_shape_manual(values = c(21, 24)) +
labs(
title = "Median high temperature, 1950 and 2022",
x = "Longitude",
y = "Latitude",
fill = "Median\nhigh temp",
shape = "Year"
)
American Time Survey 2009 - 2019
Description
Average Time Spent on Activities by Americans
Usage
us_time_survey
Format
A data frame with 11 rows and 8 variables.
- year
Year data collected
- household_activities
Average hours per day spent on household activities - travel included
- eating_and_drinking
Average hours per day spent eating and drinking including travel.
- leisure_and_sports
Average hours per day spent on leisure and sports - including travel.
- sleeping
Average Hours spent sleeping.
- caring_children
Average hours spent per day caring for and helping children under 18 years of age.
- working_employed
Average hours spent working for those employed. (15 years and older)
- working_employed_days_worked
Average hours per day spent working on days worked (15 years and older)
Source
Examples
library(ggplot2)
us_time_survey$year <- as.factor(us_time_survey$year)
ggplot(us_time_survey, aes(year, sleeping)) +
geom_point(alpha = 0.3) +
labs(
x = "Year",
y = "Average hours spent Sleeping",
title = "US Average hours spent sleeping, 2009 - 2019"
)
Predicting who would vote for NSA Mass Surveillance
Description
In 2013, the House of Representatives voted to not stop the National Security Agency's (NSA's) mass surveillance of phone behaviors. We look at two predictors for how a representative voted: their party and how much money they have received from the private defense industry.
Usage
vote_nsa
Format
A data frame with 434 observations on the following 5 variables.
- name
Name of the Congressional representative.
- party
The party of the representative:
D
for Democrat andR
for Republican.- state
State for the representative.
- money
Money received from the defense industry for their campaigns.
- phone_spy_vote
Voting to rein in the phone dragnet or continue allowing mass surveillance.
Source
MapLight. Available at http://s3.documentcloud.org/documents/741074/amash-amendment-vote-maplight.pdf.
References
Kravets, D., 2020. Lawmakers Who Upheld NSA Phone Spying Received Double The Defense Industry Cash. WIRED. Available at https://www.wired.com/2013/07/money-nsa-vote/.
Examples
table(vote_nsa$party, vote_nsa$phone_spy_vote)
boxplot(vote_nsa$money / 1000 ~ vote_nsa$phone_spy_vote,
ylab = "$1000s Received from Defense Industry"
)
US Voter Turnout Data.
Description
State-level data on federal elections held in November between 1980 and 2014.
Usage
voter_count
Format
A data frame with 936 rows and 7 variables.
- year
Year election was held.
- region
Specifies if data is state or national total.
- voting_eligible_population
Number of citizens eligible to vote; does not count felons.
- total_ballots_counted
Number of ballots cast.
- highest_office
Number of ballots that contained a vote for the highest office of that election.
- percent_total_ballots_counted
Overall voter turnout percentage.
- percent_highest_office
Highest office voter turnout percentage.
Source
United States Election Project
Examples
library(ggplot2)
ggplot(voter_count, aes(x = percent_highest_office, y = percent_total_ballots_counted)) +
geom_point() +
labs(
title = "Total Ballots V Highest Office",
x = "Highest Office",
y = "Total Ballots"
)