Type: | Package |
Title: | Rapid Automatic Keyword Extraction (RAKE) Algorithm |
Version: | 0.1.3 |
Description: | A 'Java' implementation of the RAKE algorithm ('Rose', S., 'Engel', D., 'Cramer', N. and 'Cowley', W. (2010) <doi:10.1002/9780470689646.ch1>), which can be used to extract keywords from documents without any training data. |
URL: | https://crew102.github.io/slowraker/articles/rapidraker.html |
BugReports: | https://github.com/crew102/rapidraker/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
Depends: | R (≥ 3.1) |
Imports: | rJava, openNLPdata, slowraker, utils |
Suggests: | knitr, rmarkdown, testthat |
SystemRequirements: | Java (>= 8) |
RoxygenNote: | 7.1.1 |
NeedsCompilation: | no |
Packaged: | 2021-06-01 23:35:45 UTC; cbaker |
Author: | Christopher Baker [aut, cre] |
Maintainer: | Christopher Baker <chriscrewbaker@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2021-06-02 07:20:05 UTC |
Rapid RAKE
Description
A relatively fast version of the Rapid Automatic Keyword Extraction (RAKE) algorithm. See Automatic keyword extraction from individual documents for details on how RAKE works.
Usage
rapidrake(
txt,
stop_words = slowraker::smart_words,
stop_pos = c("VB", "VBD", "VBG", "VBN", "VBP", "VBZ"),
word_min_char = 3,
stem = TRUE,
phrase_delims = "[-,.?():;\"!/]"
)
Arguments
txt |
A character vector, where each element of the vector contains the text for one document. |
stop_words |
A vector of stop words which will be removed from your
documents. The default value ( |
stop_pos |
All words that have a part-of-speech (POS) that appears in
|
word_min_char |
The minimum number of characters that a word must have
to remain in the corpus. Words with fewer than |
stem |
Do you want to stem the words before running RAKE? |
phrase_delims |
A regular expression containing the characters that will be used as phrase delimiters |
Value
An object of class rakelist
, which is just a list of data
frames (one data frame for each element of txt
). Each data frame
will have the following columns:
- keyword
A keyword that was identified by RAKE.
- freq
The number of times the keyword appears in the document.
- score
The keyword's score, as per the RAKE algorithm. Keywords with higher scores are considered to be higher quality than those with lower scores.
- stem
If you specified
stem = TRUE
, you will get the stemmed versions of the keywords in this column. When you choose stemming, the keyword's score (score
) will be based off its stem, but the reported number of times that the keyword appears (freq
) will still be based off of the raw, unstemmed version of the keyword.
Examples
## Not run:
rakelist <- rapidrake(txt = "some text that has great keywords")
slowraker::rbind_rakelist(rakelist)
## End(Not run)