Title: | Read Census Privacy Protected Microdata Files |
Version: | 0.1.3 |
Date: | 2021-12-13 |
Description: | Implements data processing described in <doi:10.1126/sciadv.abk3283> to align modern differentially private data with formatting of older US Census data releases. The primary goal is to read in Census Privacy Protected Microdata Files data in a reproducible way. This includes tools for aggregating to relevant levels of geography by creating geographic identifiers which match the US Census Bureau's numbering. Additionally, there are tools for grouping race numeric identifiers into categories, consistent with OMB (Office of Management and Budget) classifications. Functions exist for downloading and linking to existing sources of privacy protected microdata. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
BugReports: | https://github.com/christopherkenny/ppmf/issues |
URL: | https://github.com/christopherkenny/ppmf/, https://www.christophertkenny.com/ppmf/ |
RoxygenNote: | 7.1.2 |
Imports: | censable, dplyr, magrittr, readr, rlang (≥ 0.4.11), stringr, tibble, tidyr, zip |
Suggests: | roxygen2 |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2021-12-13 15:48:56 UTC; chris |
Author: | Christopher T. Kenny
|
Maintainer: | Christopher T. Kenny <christopherkenny@fas.harvard.edu> |
Repository: | CRAN |
Date/Publication: | 2021-12-15 08:20:06 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs)
.
Add Standard GEOID to PPMF Data
Description
Adds the GEOID identifier common to spatial census data sets, such as those loaded by tigris. This allows for easier merging or aggregation by a single variable.
Usage
add_geoid(
ppmf,
state = TABBLKST,
county = TABBLKCOU,
tract = TABTRACT,
block_group = TABBLKGRP,
block = TABBLK,
level = "block"
)
Arguments
ppmf |
tibble of ppmf data |
state |
Column in ppmf with state (fips) ID. Default is |
county |
Column in ppmf with county (fips) ID. Default is |
tract |
Column in ppmf with tract ID. Default is |
block_group |
Column in ppmf with block group ID. Default is |
block |
Column in ppmf with block ID. Default is |
level |
Geographic level to write the GEOID for. Options are block (default), block_group, tract, and county. |
Value
input data ppmf with added column GEOID
Examples
data(ppmf_ex)
ppmf_ex <- ppmf_ex %>% add_geoid()
Add ppmf12 path to Renviron
Description
Add ppmf12 path to Renviron
Usage
add_ppmf12_path(path, overwrite = FALSE, install = FALSE)
Arguments
path |
path where ppmf12 data is stored |
overwrite |
Defaults to FALSE. Should existing ppmf12 in Renviron be overwritten? |
install |
Defaults to FALSE. Should ppmf12 be added to '~/.Renviron' file? |
Value
path, invisibly
Examples
## Not run:
tp <- tempfile(fileext = '.csv')
add_ppmf12_path(tp)
path12 <- Sys.getenv('path12')
## End(Not run)
Add ppmf19 path to Renviron
Description
Add ppmf19 path to Renviron
Usage
add_ppmf19_path(path, overwrite = FALSE, install = FALSE)
Arguments
path |
path where ppmf19 data is stored |
overwrite |
Defaults to FALSE. Should existing ppmf19 in Renviron be overwritten? |
install |
Defaults to FALSE. Should ppmf19 be added to '~/.Renviron' file? |
Value
path, invisibly
Examples
## Not run:
tp <- tempfile(fileext = '.csv')
add_ppmf19_path(tp)
path19 <- Sys.getenv('path19')
## End(Not run)
Add ppmf4 path to Renviron
Description
Add ppmf4 path to Renviron
Usage
add_ppmf4_path(path, overwrite = FALSE, install = FALSE)
Arguments
path |
path where ppmf4 data is stored |
overwrite |
Defaults to FALSE. Should existing ppmf4 in Renviron be overwritten? |
install |
Defaults to FALSE. Should ppmf4 be added to '~/.Renviron' file? |
Value
path, invisibly
Examples
## Not run:
tp <- tempfile(fileext = '.csv')
add_ppmf4_path(tp)
path4 <- Sys.getenv('path4')
## End(Not run)
Aggregate PPMF Data
Description
Aggregate PPMF Data
Usage
agg(ppmf, group = GEOID, age = VOTING_AGE, race = CENRACE, hisp = CENHISP)
Arguments
ppmf |
tibble of ppmf data |
group |
Column in ppmf to group by, typically GEOID |
age |
Column in ppmf containing 1 for not voting age and 2 for voting age |
race |
Column in ppmf containing race codes |
hisp |
Column in ppmf containing 1 for Not Hispanic and 2 for Hispanic |
Value
tibble of ppmf data aggregated by group with race classified with columns:
-
group
: named by entry group -
pop
: total population -
pop_hisp
: total population - Hispanic or Latino (of any race) -
pop_white
: total population - White alone, not Hispanic or Latino -
pop_black
: total population - Black or African American alone, not Hispanic or Latino -
pop_aian
: total population - American Indian and Alaska Native alone, not Hispanic or Latino -
pop_asian
: total population - Asian alone, not Hispanic or Latino -
pop_nhpi
: total population - Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino -
pop_other
: total population - Some Other Race alone, not Hispanic or Latino -
pop_two
: total population - Population of two or more races, not Hispanic or Latino -
vap
: voting age population -
vap_hisp
: voting age population - Hispanic or Latino (of any race) -
vap_white
: voting age population - White alone, not Hispanic or Latino -
vap_black
: voting age population - Black or African American alone, not Hispanic or Latino -
vap_aian
: voting age population - American Indian and Alaska Native alone, not Hispanic or Latino -
vap_asian
: voting age population - Asian alone, not Hispanic or Latino -
vap_nhpi
: voting age population - Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino -
vap_other
: voting age population - Some Other Race alone, not Hispanic or Latino -
vap_two
: voting age population - Population of two or more races, not Hispanic or Latino
Examples
data(ppmf_ex)
ppmf_ex <- ppmf_ex %>% add_geoid()
blocks <- agg(ppmf_ex)
Breakdown GEOID into Components
Description
Breakdown GEOID into Components
Usage
breakdown_geoid(ppmf, GEOID = GEOID)
Arguments
ppmf |
tibble of ppmf data |
GEOID |
Column in ppmf with GEOID. Default is |
Value
tibble. ppmf with columns added for state, county, tract, block group, and/or block
Examples
data(ppmf_ex)
ppmf_ex <- ppmf_ex %>% add_geoid()
ppmf_ex <- ppmf_ex %>% breakdown_geoid()
Download PPMF Files
Description
Downloads zipped ppmf files from GitHub.
Usage
download_ppmf(dsn, dir = "", version = "19", overwrite = FALSE)
Arguments
dsn |
(data save name) string to unzip the data to |
dir |
the folder or directory to save the file in |
version |
string in '19', '12' or '4' signifying the 19.61, 12.2 or 4.5 versions respectively |
overwrite |
If a file is found at path/dsn, should it be overwritten? Defaults to FALSE. |
Value
a string path to where the file was downloaded to
Examples
## Not run:
# Takes a few minutes and requires read access to files
temp <- tempdir()
path <- download_ppmf(dsn = 'ppmf_12', dir = temp)
## End(Not run)
Get PPMF File Links
Description
Returns the urls for the data. This will be expanded to link to prior or any new releases.
Usage
get_ppmf_links(version = "19", release = "06.08.2021", compressed = TRUE)
Arguments
version |
string in '19', '12' or '4' signifying the 19.61, 12.2, or 4.5 versions respectively |
release |
string. Ignored. Options are '06.08.2021' and '04.28.2021'. |
compressed |
boolean. Return a compressed version (TRUE). FALSE gives the Census Bureau link to the uncompressed data. |
Value
a string with url
Examples
# 04.28.2021 version 12.2
get_ppmf_links()
# 04.28.2021 version 4.5
get_ppmf_links(version = '4')
Overwrite Races with Hispanic
Description
Overwrite Races with Hispanic
Usage
overwrite_hisp_race(ppmf, race = CENRACE, hisp = CENHISP)
Arguments
ppmf |
tibble of ppmf data |
race |
Column in ppmf containing race codes |
hisp |
Column in ppmf containing 1 for Not Hispanic and 2 for Hispanic |
Value
tibble with race column entries replaced if the individual is Hispanic
Examples
data(ppmf_ex)
ppmf_ex %>% replace_race() %>% overwrite_hisp_race()
Example PPMF Data
Description
Includes Perry County, Alabama PPMF data from the April 28, 2021 PPMF data release. This is a subset taken from the 12-2 P data.
As each observation is a person, this does not cover every block in the county and due to DAS, not every block with population appears in this data.
Usage
data('ppmf_ex')
Value
tibble with sample ppmf data
Examples
data('ppmf_ex')
Race Classifications
Description
This data includes the basic race classifications used for redistricting to
get to an easier to work with set of values. This does not include hisp
grouping
which is controlled separately by race within the census
Usage
data('races')
Value
tibble with three columns
code: the two digit code used to code races
desc: the description of the races
group: the summary group used
Examples
data('races')
Read PPMF data and Merge with Census 2010 Data
Description
Read PPMF data and Merge with Census 2010 Data
Usage
read_merge_ppmf(
state,
level,
versions = c("19"),
prefixes = paste0("v", versions, "_"),
paths = Sys.getenv(paste0("ppmf", versions))
)
Arguments
state |
state abbreviation |
level |
geography level. One of 'block', 'block group', 'tract', 'county' |
versions |
character vector of ppmf versions. Currently '19', '12', and/or '4' |
prefixes |
prefixes to give pop and vap columns in output. Default is |
paths |
paths to PPMF data. Default is |
Value
sf tibble of PPMF merged with Census 2010 data
Examples
## Not run:
# Requires Census Bureau API
de_bg <- read_merge_ppmf('DE', 'block group')
## End(Not run)
Read in PPMF Data
Description
This reads in PPMF data from a file. Use download_ppmf()
if you do
not have a local copy of the ppmf data.
Usage
read_ppmf(state, path)
Arguments
state |
two letter state (+ DC + PR) abbreviation or two digit state fips code |
path |
where the data is saved to |
Value
tibble of ppmf data
Examples
## Not run:
# Takes a few minutes and requires read access to files
temp <- tempdir()
path <- download_ppmf('ppmf_12.csv', dir = temp)
# If you already have it downloaded, point to it with path:
ppmf <- read_ppmf('AL', path)
## End(Not run)
Replace Race Categories
Description
Replaces the Census's numeric categories for race with less specific racial classifications, typically useful for redistricting purposes.
Usage
replace_race(ppmf, race = CENRACE)
Arguments
ppmf |
tibble of ppmf data |
race |
Column in ppmf containing race codes |
Value
tibble with race column replaced by simpler racial classifications
Examples
data(ppmf_ex)
ppmf_ex %>% replace_race()
State Rows
Description
This data includes the 52 geographies (50 states plus D.C. and P.R.). Within the 2010 PPMF, skip and n_max indicate the relevant rows for a geography.
Usage
data('states')
Value
tibble with sample ppmf data
Examples
data('states')