Help for package ppmf

Title:

Read Census Privacy Protected Microdata Files

Version:

0.1.3

Date:

2021-12-13

Description:

Implements data processing described in <doi:10.1126/sciadv.abk3283> to align modern differentially private data with formatting of older US Census data releases. The primary goal is to read in Census Privacy Protected Microdata Files data in a reproducible way. This includes tools for aggregating to relevant levels of geography by creating geographic identifiers which match the US Census Bureau's numbering. Additionally, there are tools for grouping race numeric identifiers into categories, consistent with OMB (Office of Management and Budget) classifications. Functions exist for downloading and linking to existing sources of privacy protected microdata.

License:

MIT + file LICENSE

Encoding:

UTF-8

LazyData:

true

BugReports:

https://github.com/christopherkenny/ppmf/issues

URL:

https://github.com/christopherkenny/ppmf/, https://www.christophertkenny.com/ppmf/

RoxygenNote:

7.1.2

Imports:

censable, dplyr, magrittr, readr, rlang (≥ 0.4.11), stringr, tibble, tidyr, zip

Suggests:

roxygen2

Depends:

R (≥ 2.10)

NeedsCompilation:

Packaged:

2021-12-13 15:48:56 UTC; chris

Author:

Christopher T. Kenny

[aut, cre]

Maintainer:

Christopher T. Kenny <christopherkenny@fas.harvard.edu>

Repository:

CRAN

Date/Publication:

2021-12-15 08:20:06 UTC

Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling rhs(lhs).

Add Standard GEOID to PPMF Data

Description

Adds the GEOID identifier common to spatial census data sets, such as those loaded by tigris. This allows for easier merging or aggregation by a single variable.

Usage

add_geoid(
  ppmf,
  state = TABBLKST,
  county = TABBLKCOU,
  tract = TABTRACT,
  block_group = TABBLKGRP,
  block = TABBLK,
  level = "block"
)

Arguments

ppmf

tibble of ppmf data

state

Column in ppmf with state (fips) ID. Default is TABBLKST.

county

Column in ppmf with county (fips) ID. Default is TABBLKCOU.

tract

Column in ppmf with tract ID. Default is TABBLKTRACT.

block_group

Column in ppmf with block group ID. Default is TABBLKGRP

block

Column in ppmf with block ID. Default is TABBLK.

level

Geographic level to write the GEOID for. Options are block (default), block_group, tract, and county.

Value

input data ppmf with added column GEOID

Examples

data(ppmf_ex)
ppmf_ex <- ppmf_ex %>% add_geoid()

Add ppmf12 path to Renviron

Description

Add ppmf12 path to Renviron

Usage

add_ppmf12_path(path, overwrite = FALSE, install = FALSE)

Arguments

path

path where ppmf12 data is stored

overwrite

Defaults to FALSE. Should existing ppmf12 in Renviron be overwritten?

install

Defaults to FALSE. Should ppmf12 be added to '~/.Renviron' file?

Value

path, invisibly

Examples

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf12_path(tp)
path12 <- Sys.getenv('path12')

## End(Not run)

Add ppmf19 path to Renviron

Description

Add ppmf19 path to Renviron

Usage

add_ppmf19_path(path, overwrite = FALSE, install = FALSE)

Arguments

path

path where ppmf19 data is stored

overwrite

Defaults to FALSE. Should existing ppmf19 in Renviron be overwritten?

install

Defaults to FALSE. Should ppmf19 be added to '~/.Renviron' file?

Value

path, invisibly

Examples

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf19_path(tp)
path19 <- Sys.getenv('path19')

## End(Not run)

Add ppmf4 path to Renviron

Description

Add ppmf4 path to Renviron

Usage

add_ppmf4_path(path, overwrite = FALSE, install = FALSE)

Arguments

path

path where ppmf4 data is stored

overwrite

Defaults to FALSE. Should existing ppmf4 in Renviron be overwritten?

install

Defaults to FALSE. Should ppmf4 be added to '~/.Renviron' file?

Value

path, invisibly

Examples

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf4_path(tp)
path4 <- Sys.getenv('path4')

## End(Not run)

Aggregate PPMF Data

Description

Aggregate PPMF Data

Usage

agg(ppmf, group = GEOID, age = VOTING_AGE, race = CENRACE, hisp = CENHISP)

Arguments

ppmf

tibble of ppmf data

group

Column in ppmf to group by, typically GEOID

age

Column in ppmf containing 1 for not voting age and 2 for voting age

race

Column in ppmf containing race codes

hisp

Column in ppmf containing 1 for Not Hispanic and 2 for Hispanic

Value

tibble of ppmf data aggregated by group with race classified with columns:

group: named by entry group
pop: total population
pop_hisp: total population - Hispanic or Latino (of any race)
pop_white: total population - White alone, not Hispanic or Latino
pop_black: total population - Black or African American alone, not Hispanic or Latino
pop_aian: total population - American Indian and Alaska Native alone, not Hispanic or Latino
pop_asian: total population - Asian alone, not Hispanic or Latino
pop_nhpi: total population - Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino
pop_other: total population - Some Other Race alone, not Hispanic or Latino
pop_two: total population - Population of two or more races, not Hispanic or Latino
vap: voting age population
vap_hisp: voting age population - Hispanic or Latino (of any race)
vap_white: voting age population - White alone, not Hispanic or Latino
vap_black: voting age population - Black or African American alone, not Hispanic or Latino
vap_aian: voting age population - American Indian and Alaska Native alone, not Hispanic or Latino
vap_asian: voting age population - Asian alone, not Hispanic or Latino
vap_nhpi: voting age population - Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino
vap_other: voting age population - Some Other Race alone, not Hispanic or Latino
vap_two: voting age population - Population of two or more races, not Hispanic or Latino

Examples

data(ppmf_ex)
ppmf_ex <- ppmf_ex %>% add_geoid()
blocks <- agg(ppmf_ex)

Breakdown GEOID into Components

Description

Breakdown GEOID into Components

Usage

breakdown_geoid(ppmf, GEOID = GEOID)

Arguments

ppmf

tibble of ppmf data

GEOID

Column in ppmf with GEOID. Default is GEOID.

Value

tibble. ppmf with columns added for state, county, tract, block group, and/or block

Examples

data(ppmf_ex)
ppmf_ex <- ppmf_ex %>% add_geoid()
ppmf_ex <- ppmf_ex %>% breakdown_geoid()

Download PPMF Files

Description

Downloads zipped ppmf files from GitHub.

Usage

download_ppmf(dsn, dir = "", version = "19", overwrite = FALSE)

Arguments

dsn

(data save name) string to unzip the data to

dir

the folder or directory to save the file in

version

string in '19', '12' or '4' signifying the 19.61, 12.2 or 4.5 versions respectively

overwrite

If a file is found at path/dsn, should it be overwritten? Defaults to FALSE.

Value

a string path to where the file was downloaded to

Examples

## Not run: 
# Takes a few minutes and requires read access to files
temp <- tempdir()
path <- download_ppmf(dsn = 'ppmf_12', dir = temp)

## End(Not run)

Get PPMF File Links

Description

Returns the urls for the data. This will be expanded to link to prior or any new releases.

Usage

get_ppmf_links(version = "19", release = "06.08.2021", compressed = TRUE)

Arguments

version

string in '19', '12' or '4' signifying the 19.61, 12.2, or 4.5 versions respectively

release

string. Ignored. Options are '06.08.2021' and '04.28.2021'.

compressed

boolean. Return a compressed version (TRUE). FALSE gives the Census Bureau link to the uncompressed data.

Value

a string with url

Examples

# 04.28.2021 version 12.2
get_ppmf_links()
# 04.28.2021 version 4.5
get_ppmf_links(version = '4')

Overwrite Races with Hispanic

Description

Overwrite Races with Hispanic

Usage

overwrite_hisp_race(ppmf, race = CENRACE, hisp = CENHISP)

Arguments

ppmf

tibble of ppmf data

race

Column in ppmf containing race codes

hisp

Column in ppmf containing 1 for Not Hispanic and 2 for Hispanic

Value

tibble with race column entries replaced if the individual is Hispanic

Examples

data(ppmf_ex)
ppmf_ex %>% replace_race() %>% overwrite_hisp_race()

Example PPMF Data

Description

Includes Perry County, Alabama PPMF data from the April 28, 2021 PPMF data release. This is a subset taken from the 12-2 P data.

As each observation is a person, this does not cover every block in the county and due to DAS, not every block with population appears in this data.

Usage

data('ppmf_ex')

Value

tibble with sample ppmf data

Examples

data('ppmf_ex')

Race Classifications

Description

This data includes the basic race classifications used for redistricting to get to an easier to work with set of values. This does not include hisp grouping which is controlled separately by race within the census

Usage

data('races')

Value

tibble with three columns

code: the two digit code used to code races
desc: the description of the races
group: the summary group used

Examples

data('races')

Read PPMF data and Merge with Census 2010 Data

Description

Read PPMF data and Merge with Census 2010 Data

Usage

read_merge_ppmf(
  state,
  level,
  versions = c("19"),
  prefixes = paste0("v", versions, "_"),
  paths = Sys.getenv(paste0("ppmf", versions))
)

Arguments

state

state abbreviation

level

geography level. One of 'block', 'block group', 'tract', 'county'

versions

character vector of ppmf versions. Currently '19', '12', and/or '4'

prefixes

prefixes to give pop and vap columns in output. Default is paste0('v', versions, '_')

paths

paths to PPMF data. Default is Sys.getenv(paste0('ppmf', versions))

Value

sf tibble of PPMF merged with Census 2010 data

Examples

## Not run: 
# Requires Census Bureau API
de_bg <- read_merge_ppmf('DE', 'block group')

## End(Not run)

Read in PPMF Data

Description

This reads in PPMF data from a file. Use download_ppmf() if you do not have a local copy of the ppmf data.

Usage

read_ppmf(state, path)

Arguments

state

two letter state (+ DC + PR) abbreviation or two digit state fips code

path

where the data is saved to

Value

tibble of ppmf data

Examples

## Not run: 
# Takes a few minutes and requires read access to files
temp <- tempdir()
path <- download_ppmf('ppmf_12.csv', dir = temp)
# If you already have it downloaded, point to it with path:
ppmf <- read_ppmf('AL', path)

## End(Not run)

Replace Race Categories

Description

Replaces the Census's numeric categories for race with less specific racial classifications, typically useful for redistricting purposes.

Usage

replace_race(ppmf, race = CENRACE)

Arguments

ppmf

tibble of ppmf data

race

Column in ppmf containing race codes

Value

tibble with race column replaced by simpler racial classifications

Examples

data(ppmf_ex)
ppmf_ex %>% replace_race()

State Rows

Description

This data includes the 52 geographies (50 states plus D.C. and P.R.). Within the 2010 PPMF, skip and n_max indicate the relevant rows for a geography.

Usage

data('states')

Value

tibble with sample ppmf data

Examples

data('states')