Title: | Measuring Discursive Sophistication in Open-Ended Survey Responses |
Version: | 0.1.1 |
Description: | A simple approach to measure political sophistication based on open-ended survey responses. Discursive sophistication captures the complexity of individual attitude expression by quantifying its relative size, range, and constraint. For more information on the measurement approach see: Kraft, Patrick W. 2023. "Women Also Know Stuff: Challenging the Gender Gap in Political Sophistication." American Political Science Review (forthcoming). |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Depends: | R (≥ 2.10) |
LazyData: | true |
Imports: | SnowballC, stm, stringr, tm, utils |
NeedsCompilation: | no |
Packaged: | 2023-06-11 08:29:00 UTC; patrick |
Author: | Patrick Kraft |
Maintainer: | Patrick Kraft <kraft.pw@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-06-11 11:50:05 UTC |
Cooperative Congressional Election Study 2018
Description
A subset of data from the UWM Team Content of the 2018 CCES wave. See Kraft (2023) for details.
Usage
cces
Format
cces
A data frame with 1,000 rows and 15 columns:
- age
Age (in years)
- female
Gender (1 = female)
- educ_cont
Education level (1-6)
- pid_cont
Party identification (1-7)
- educ_pid
educ_cont * pid_cont
- oe01-oe10
Open-ended responses
Source
Constraint Dictionary
Description
A sample of terms that signal a higher level of constraint between different considerations (combining conjunctions and exclusive words). See Kraft (2023) for details.
Usage
dict_sample
Format
cces
A data character vector with 4 elements:
- conjunctions
also, and
- exclusive
but, without
Compute discursive sophistication for a set of open-ended responses
Description
This function takes a data frame (data
) containing a set of open-ended responses (openends
) to compute the three components of discursive sophistication (size, range, and constraint) and combines them in a single scale. See Kraft (2023) for details.
Usage
discursive(
data,
openends,
meta,
args_textProcessor = NULL,
args_prepDocuments = NULL,
args_stm = NULL,
keep_stm = TRUE,
dictionary,
remove_duplicates = FALSE,
type = c("scale", "average", "average_scale", "product"),
progress = TRUE
)
Arguments
data |
A data frame. |
openends |
A character vector containing variable names of open-ended responses in |
meta |
A character vector containing topic prevalence covariates included in |
args_textProcessor |
A named list containing additional arguments passed to |
args_prepDocuments |
A named list containing additional arguments passed to |
args_stm |
A named list containing additional arguments passed to |
keep_stm |
Logical. If TRUE function returns output of |
dictionary |
A character vector containing dictionary terms to flag conjunctions and exclusive words. May include regular expressions. |
remove_duplicates |
Logical. If TRUE duplicates in |
type |
The method of combining the three components, must be "scale", "average", "average_scale", or "product". The default is "scale", which creates an additive index that is re-scaled to mean 0 and standard deviation 1. Alternatively, "average" creates the same additive index without re-scaling; "average_scale" re-scales each individual component to mean 0 and standard deviation 1 before creating the additive index; "product" creates a multiplicative index. |
progress |
Logical. Shows progress bar if TRUE. |
Value
A list containing the measure of discursive sophistication and the underlying components in a data frame, as well as the output of stm::textProcessor()
, stm::prepDocuments()
, and stm::stm()
.
Examples
discursive(data = cces,
openends = c(paste0("oe0", 1:9), "oe10"),
meta = c("age", "educ_cont", "pid_cont", "educ_pid", "female"),
args_prepDocuments = list(lower.thresh = 10),
args_stm = list(K = 25, seed = 12345),
dictionary = dict_sample)
Combine three components of discursive sophistication in a single scale
Description
This function combines the size
, range
, and constraint
of open-ended responses in a single scale. See Kraft (2023) for details.
Usage
discursive_combine(
size,
range,
constraint,
type = c("scale", "average", "average_scale", "product")
)
Arguments
size |
A named list containing an element labeled |
range |
A numeric vector containing the range component of discursive sophistication. Usually created via |
constraint |
A numeric vector containing the constraint component of discursive sophistication. Usually created via |
type |
The method of combining the three components, must be "scale", "average", "average_scale", or "product". The default is "scale", which creates an additive index that is re-scaled to mean 0 and standard deviation 1. Alternatively, "average" creates the same additive index without re-scaling; "average_scale" re-scales each individual component to mean 0 and standard deviation 1 before creating the additive index; "product" creates a multiplicative index. |
Value
A numeric vector with the same length as the number of rows in data
.
Examples
discursive_combine(size = list(size = runif(100)), range = runif(100), constraint = runif(100))
Compute the constraint component of discursive sophistication
Description
This function takes a data frame (data
) containing a set of open-ended responses (openends
) and a dictionary
to identify terms that signal a higher level of constraint between different considerations (usually conjunctions and exclusive words). It returns a numeric vector of dictionary counts re-scaled to range from 0 to 1. See Kraft (2023) for details.
Usage
discursive_constraint(data, openends, dictionary, remove_duplicates = FALSE)
Arguments
data |
A data frame. |
openends |
A character vector containing variable names of open-ended responses in |
dictionary |
A character vector containing dictionary terms to flag conjunctions and exclusive words. May include regular expressions. |
remove_duplicates |
Logical. If TRUE duplicates in |
Value
A numeric vector with the same length as the number of rows in data
.
Examples
discursive_constraint(data = cces,
openends = c(paste0("oe0", 1:9), "oe10"),
dictionary = dict_sample)
Compute the range component of discursive sophistication
Description
This function takes a data frame (data
) containing a set of open-ended responses (openends
) to compute the Shannon entropy in individual response lengths across items. The function returns a numeric vector of topic counts re-scaled to range from 0 to 1. See Kraft (2023) for details.
Usage
discursive_range(data, openends)
Arguments
data |
A data frame. |
openends |
A character vector containing variable names of open-ended responses in |
Value
A numeric vector with the same length as the number of rows in data
.
Examples
discursive_range(data = cces,
openends = c(paste0("oe0", 1:9), "oe10"))
Compute the size component of discursive sophistication
Description
This function takes a data frame (data
) containing a set of open-ended responses (openends
) and additional arguments passed to stm::textProcessor()
and stm::prepDocuments()
to estimate a structural topic model via stm::stm()
. The results of the the structural topic model are used to compute the relative number of topics raised in each open-ended response. The function returns a numeric vector of topic counts re-scaled to range from 0 to 1. See Kraft (2023) for details.
Usage
discursive_size(
data,
openends,
meta,
args_textProcessor = NULL,
args_prepDocuments = NULL,
args_stm = NULL,
keep_stm = TRUE,
progress = TRUE
)
Arguments
data |
A data frame. |
openends |
A character vector containing variable names of open-ended responses in |
meta |
A character vector containing topic prevalence covariates included in |
args_textProcessor |
A named list containing additional arguments passed to |
args_prepDocuments |
A named list containing additional arguments passed to |
args_stm |
A named list containing additional arguments passed to |
keep_stm |
Logical. If TRUE function returns output of |
progress |
Logical. Shows progress bar if TRUE. |
Value
A list containing the size component of discursive sophistication as well as the output of stm::textProcessor()
, stm::prepDocuments()
, and stm::stm()
.
Examples
discursive_size(data = cces,
openends = c(paste0("oe0", 1:9), "oe10"),
meta = c("age", "educ_cont", "pid_cont", "educ_pid", "female"),
args_prepDocuments = list(lower.thresh = 10),
args_stm = list(K = 25, seed = 12345))
Compute number of topics based on stm results
Description
This function takes a structural topic model output estimated via stm::stm()
as well as the underlying set of documents created via stm::prepDocuments()
to compute the relative number of topics raised in each open-ended response. The function returns a numeric vector of topic counts re-scaled to range from 0 to 1. See Kraft (2023) for details.
Usage
ntopics(x, docs, progress = TRUE)
Arguments
x |
A structural topic model estimated via |
docs |
A set of documents used for the structural topic model; created via |
progress |
Logical. Shows progress bar if TRUE. |
Value
A numeric vector with the same length as the number of documents in x
and docs
.
Examples
meta <- c("age", "educ_cont", "pid_cont", "educ_pid", "female")
openends <- c(paste0("oe0", 1:9), "oe10")
cces$resp <- apply(cces[, openends], 1, paste, collapse = " ")
cces <- cces[!apply(cces[, meta], 1, anyNA), ]
processed <- stm::textProcessor(cces$resp, metadata = cces[, meta])
out <- stm::prepDocuments(processed$documents, processed$vocab, processed$meta, lower.thresh = 10)
stm_fit <- stm::stm(out$documents, out$vocab, prevalence = as.matrix(out$meta), K=25, seed=12345)
ntopics(stm_fit, out)
Compute Shannon entropy
Description
Internal function to compute Shannon entropy in relative word counts across a set of elements in a character vecotr. Entropy is re-scaled to range from 0 to 1. Function used in discursive_range()
.
Usage
oe_shannon(x)
Arguments
x |
Character vector containing open-ended responses. |
Value
Numeric vector with the same length as x.