Help for package poldis

Type:

Package

Title:

Analyse Political Texts

Version:

0.1.2

Date:

2024-09-03

Maintainer:

Henrique Sposito <henrique.sposito@graduateinstitute.ch>

Description:

Wrangle and annotate different types of political texts. It also introduces Urgency Analysis, a new method for the analysis of urgency in political texts.

URL:

http://henriquesposito.com/poldis/

BugReports:

https://github.com/henriquesposito/poldis/issues

License:

MIT + file LICENSE

Imports:

dplyr, stringr, purrr, stringi, quanteda, spacyr, textstem, tidyr, stringdist

Suggests:

rmarkdown, testthat, tesseract, quanteda.textstats, keyATM, messydates, pdftools, fmsb, ggplot2, tm, cli

RoxygenNote:

7.3.2

Encoding:

UTF-8

LazyData:

True

Depends:

R (≥ 3.5.0)

NeedsCompilation:

Packaged:

2024-09-04 16:00:04 UTC; henriquesposito

Author:

Henrique Sposito

[cre, aut, ctb] (IHEID), James Hollway

[ctb] (IHEID), Jael Tan

[ctb] (IHEID)

Repository:

CRAN

Date/Publication:

2024-09-04 16:30:02 UTC

US News Conferences Data from 1960 to 1980

Description

A dataset containing the news conferences from US presidents from 1960 to 1980. The dataset was gathered from the American Presidency Project website.

Usage

data(US_News_Conferences_1960_1980)

Format

A data frame with 353 rows and 3 variables: the president, the date, and the full text.

Annotate text with NLP

Description

This function relies on '{spacyr}' NLP parsing to annotate texts.

Usage

annotate_text(v, level = "words")

Arguments

v

Text vector

level

At which level would you like to parse the text? Options include "words" or "sentences". Defaults to "words".

Value

A data frame with syntax information by words or sentences in text.

Examples

#annotate_text(US_News_Conferences_1960_1980[1:2, 3])
#annotate_text(US_News_Conferences_1960_1980[1:2, 3], level = "sentence")

Extract context for string matches

Description

A function for getting string matches and the context in which they occur.

Usage

extract_context(match, v, level = "sentences", n = 1)

Arguments

match

Character string to be matched. For multiple strings, please use "|" as a separator.

v

Text vector or annotated data frame.

level

At which text level do you want matches to be returned? Defaults to "sentences". Options are sentences, words, and paragraph.

n

Number of sentences or words matched before and after string match. Defaults to 1. That is, one word or one sentence before, and after, string match. For paragraphs, n is always set to one.

Value

A list of string matches and their context.

Examples


extract_context(match = "war|weapons of mass destruction|conflict|NATO|peace",
                v = US_News_Conferences_1960_1980$text[100],
                level = "sentences", n = 2)

Extract dates from text

Description

Wrapper function for 'messydates::as_messydates'.

Usage

extract_date(v)

Arguments

v

Text vector.

Value

A vector of the dates in text.

Examples

#extract_date("Today is the twenty six of February of two thousand and twenty four")

Extract locations from strings

Description

Extract locations from strings

Usage

extract_locations(v)

Arguments

v

Text vector.

Details

The function relies on geographical entity detection from NLP models.

Value

A data frame of locations and the number of times they appear.

Examples

#extract_locations(c("This is the United States", "This is Sao Paulo",
#"I was in Rio de Janeiro and Sao Paulo, then back to the United States"))

Extract text matches

Description

Get texts in which certain "matches" occur.

Usage

extract_match(v, match, invert = FALSE, ignore.case = TRUE)

Arguments

v

Text vector or annotated data frame.

match

A regex match for a word(s) or expression. For multiple words, please use "|" to divide them.

invert

Do you want texts without certain matches to be returned? By default FALSE.

ignore.case

Should case be ignored? By default, TRUE.

Value

A list the same length as text variable.

Examples


extract_match(c("This function was created on the 29 September 2021",
"Today is October 12, 2021"), "October")

Extract a list of possible names of individuals in texts

Description

Extract a list of possible names of individuals in texts

Usage

extract_names(v)

Arguments

v

A text vector.

Details

The function relies on named entity recognition from NLP models.

Value

A data frame of individual names and the number of times they appear.

Examples

#extract_names(US_News_Conferences_1960_1980[20, 3])

Extract similarities and differences in texts/segments

Description

Extract similarities and differences in texts/segments

Usage

extract_text_similarities(v, comparison = "similarities", method)

Arguments

v

Text vector or annotated data frame.

comparison

How would you like to compare texts? Options are "similarities", for comparing similarities, or "differences", for comparing differences. Defaults to "similarities".

method

A method for checking similarities or differences between texts. For similarities, defaults to "correlation" method. Other methods for similarities include "cosine", "jaccard", "ejaccard", "dice", "edice", "simple matching", and "hamann". For differences, defaults to "euclidean". Other methods for differences include "manhattan", "maximum", "canberra", and "minkowski". For more information on each of these methods and what are the implications in selecting a method, please see '?quanteda.textstats::textstat_simil()'.

Value

A matrix of similarity scores between texts.

Examples

#extract_text_similarities(US_News_Conferences_1960_1980[1:2,3])

Extract first sentence from text

Description

A lot of information is contained in the first sentence of a text. In political texts, for example, dates and locations are often contained in the first sentence of the text.

Usage

extract_title(v)

Arguments

v

Text vector.

Value

A list of the first sentences in text.

Examples

extract_title("This is the first sentence. This is the second sentence.")

Gather terms related to subjects

Description

Gather terms related to subjects

Usage

gather_related_terms(.data, dictionary)

Arguments

.data

A data frame, priorities data frame coded using 'select_priorities()', or text vector. For data frames, function will search for "text" variable. For priorities data frame function will search for "priorities" variable.

dictionary

The dictionary of 20 major political topics from the Comparative Agendas Project (Jones et al., 2023) is used by default. Users can also declare a custom dictionary as a vector or a list. If users declare a vector, each element is treated as a independent topic. If users declare a list of subjects and related terms, function understands names as topic and words as terms.

Details

This function relies on keyword assisted topic models implemented in the '{keyATM}' package to find related words based on the topics provided and texts in which they appear.

Value

A list of related terms to each of the topics declared in dictionary.

References

Eshima S, Imai K, and Sasaki T. 2024. “Keyword-Assisted Topic Models.” _American Journal of Political Science_, 68(2): 730-750. doi:10.1111/ajps.12779

Examples

#gather_related_terms(US_News_Conferences_1960_1980[1:5, 3], dictionary = "CAP")
#gather_related_terms(US_News_Conferences_1960_1980[1:5, 3],
#                     dictionary = c("military", "development"))
#gather_related_terms(US_News_Conferences_1960_1980[1:5, 3],
#                     dictionary = list("military" = c("military", "gun", "war"),
#                                       "development" = c("development", "interest rate", "banks")))

Gather topic from political discourses

Description

Gather topic from political discourses

Usage

gather_topics(.data, dictionary = "CAP")

Arguments

.data

dictionary

Value

A list of topics present in each text separated by comma.

Examples


gather_topics(US_News_Conferences_1960_1980[1:5, 3])
gather_topics(US_News_Conferences_1960_1980[1:5, 3],
              dictionary = c("military", "development"))
gather_topics(US_News_Conferences_1960_1980[1:5, 3],
              dictionary = list("military" = c("military", "gun", "war"),
                                "development" = c("development", "interest rate", "banks")))
#summary(gather_topics(US_News_Conferences_1960_1980[1:5, 3]))
#plot(gather_topics(US_News_Conferences_1960_1980[1:5, 3],
#                   dictionary = c("military", "development")))

Urgency Analysis

Description

Urgency Analysis

Usage

get_urgency(.data, summarise = "sum")

Arguments

.data

summarise

How to handle multiple matches for the same dimension in the same text observation? By default, multiple matches are added together and their "sum" per text observation is returned. Users can, instead, choose the "mean" which returns the average score per dimension per text observation when there are multiple matches. The "mean" can also be used as a form of normalization per dimension and text observation in certain cases.

Details

Urgency in political discourses is an expression of how necessary and/or how soon an action should be undertaken or completed. This is measured along four dimensions, two related to necessity and two related to timing. The first two dimensions, degree of intensity and degree of commitment, relate to the necessity of taking the action, while the next two dimensions, frequency of action and timing of action, relate to the timing in which action is taken. Our dictionary includes terms in each of these dimensions. The terms included in each of these dimensions were validated and adjusted through an online survey that took place between July and August of 2024. The survey results were recorded as counts of the number of participants who selected an urgency-related word as more urgent than its pair. To analyze the survey results, we employed Bradley-Terry models for paired comparisons. A rank of the words for each dimension of urgency was obtained from the analysis, which were then used to create the urgency word scores in the dictionaries. For more information on the dimensions, scores, or the survey on urgency, please run 'get_urgency()' to access the urgency codebook. For priorities (i.e. coded using the 'select_priorities()'), urgency scores are calculated by multiplying the commitment scores by all other dimensions. This is done because commitment words are indicative of political priorities, For more information please refer to the 'select_priorities()' function. For vectors or data frames urgency scores are calculated by adding commitment and intensity dimension scores (i.e. how necessary) and multiplying these by the sum of timing and frequency dimension scores (i.e. how soon). In both cases, zero urgency scores are indicative of no urgency but maximum scores can vary.

Value

A scored data frame for each dimension of urgency.

Examples


get_urgency(US_News_Conferences_1960_1980[1:10, 3])
get_urgency(US_News_Conferences_1960_1980[1:10,])
#get_urgency(select_priorities(US_News_Conferences_1960_1980[1:2, 3]))
#summary(get_urgency(US_News_Conferences_1960_1980[1:10, 3]))
#plot(get_urgency(US_News_Conferences_1960_1980[1:10, 3]))
#get_urgency()

Read text from PDFs

Description

Read text from PDFs

Usage

read_pdf(path)

Arguments

path

The path to a PDF file or a folder containing multiple PDFs.

Value

A list of texts.

Select future priorities from political discourses

Description

Political priorities are statements in which actors express their intent or commitment to take political action in the future.

Usage

select_priorities(.data, na.rm = TRUE)

Arguments

.data

A (annotated) data frame or text vector. For data frames, function will search for "text" variable. For annotated data frames, please declare an annotated data frame at the sentence level.

na.rm

Would you like political statements that do not contain a political action to be removed? By default, TRUE.

Value

A data frame with syntax information by sentences and a variable identifying which of these sentences are priorities.

Examples

#select_priorities(US_News_Conferences_1960_1980[1:2,3])

Simulating urgency in priorities

Description

Simulating urgency in priorities

Usage

sim_urgency(urgency, commitment, intensity, timing, frequency, pronoun = "We")

Arguments

urgency

Desired urgency score, optional.

commitment

Desired commitment score, optional.

intensity

Desired intensity score, optional.

timing

Desired timing score, optional.

frequency

Desired frequency score, optional.

pronoun

How would you like the simulated priorities to start? By default, priorities start with the pronoun "We".

Details

Users can declare a score for one or more of the urgency dimensions or an urgency score. This means, if users may not declare an urgency score and the score for one or more dimensions at once. In those cases, the urgency score is favored.

Value

A sentence that matches the urgency or urgency dimension scores.

Examples


sim_urgency()
sim_urgency(urgency = 0.5)
sim_urgency(urgency = 2.5)
sim_urgency(urgency = -2.5)
sim_urgency(commitment = 0.6)
sim_urgency(commitment = 0.6, intensity = 1.4)
sim_urgency(commitment = 0.6, intensity = 1.4, timing = 1.4)
sim_urgency(commitment = 0.6, intensity = 1.2, timing = 1.4, frequency = 1.8)

Split texts

Description

Split texts into structured lists of lists according to a split sign.

Usage

split_text(v, splitsign = "\\.")

Arguments

v

Text vector or annotated data frame.

splitsign

Where do you want to split? By default sentences ("."). This can also be words, signals or other markers you want. For special characters, please use escape sign before (i.e. "\").

Value

A list of lists the same length as vector.

Examples


split_text("This is the first sentence. This is the second sentence.")