Title: Search Data Frames for Personally Identifiable Information
Version: 1.3.0
Maintainer: Jacob Patterson-Stein <jacobpstein@gmail.com>
Description: Check a data frame for personal information, including names, location, disability status, and geo-coordinates.
License: MIT + file LICENSE
Encoding: UTF-8
Depends: R (≥ 2.10), dplyr, stringr, uuid, utils
RoxygenNote: 7.3.2
Suggests: testthat (≥ 3.0.0)
Config/testthat/edition: 3
URL: https://github.com/jacobpstein/pii
BugReports: https://github.com/jacobpstein/pii/issues
NeedsCompilation: no
Packaged: 2025-01-11 19:55:50 UTC; jacobpstein
Author: Jacob Patterson-Stein [aut, cre]
Repository: CRAN
Date/Publication: 2025-01-13 15:40:06 UTC

Search Data Frames for Personally Identifiable Information

Description

Search Data Frames for Personally Identifiable Information

Usage

check_PII(df)

Arguments

df

a data frame object

Value

Returns a data frame of columns that potentially contain PII

Examples

# create a data frame containing various personally identifiable information
pii_df <- data.frame(
 lat = c(40.7128, 34.0522, 41.8781),
 long = c(-74.0060, -118.2437, -87.6298),
 first_name = c("John", "Michael", "Linda"),
 phone = c("123-456-7890", "234-567-8901", "345-678-9012"),
 age = sample(30:60, 3, replace = TRUE),
 email = c("test@example.com", "contact@domain.com", "user@website.org"),
 disabled = c("No", "Yes", "No"),
 stringsAsFactors = FALSE
)

check_PII(pii_df)

Split Data Into PII and Non-PII Columns

Description

Split Data Into PII and Non-PII Columns

Usage

split_PII_data(df, exclude_columns = NULL)

Arguments

df

a data frame object

exclude_columns

columns to exclude from the data frame splitdescription

Value

Returns two data frames into the global environment: one containing the PII columns and one without the PII columns. A unique merge key is created to join them. The function then prints the columns that were flagged and split to the console.

Examples

# create a data frame containing various personally identifiable information
pii_df <- data.frame(
 lat = c(40.7128, 34.0522, 41.8781),
 long = c(-74.0060, -118.2437, -87.6298),
 first_name = c("John", "Michael", "Linda"),
 phone = c("123-456-7890", "234-567-8901", "345-678-9012"),
 age = sample(30:60, 3, replace = TRUE),
 email = c("test@example.com", "contact@domain.com", "user@website.org"),
 disabled = c("No", "Yes", "No"),
 stringsAsFactors = FALSE
)

split_PII_data(pii_df, exclude_columns = c("phone"))