Type: | Package |
Title: | A Stemming Algorithm for the Portuguese Language |
Version: | 0.2.0 |
Maintainer: | Daniel Falbel <dfalbel@gmail.com> |
Description: | Implements the "Stemming Algorithm for the Portuguese Language" <doi:10.1109/SPIRE.2001.10024>. |
URL: | https://github.com/dfalbel/rslp |
License: | MIT + file LICENSE |
LazyData: | TRUE |
Encoding: | UTF-8 |
RoxygenNote: | 7.0.2 |
Imports: | stringr, stringi, plyr, magrittr, tokenizers |
Suggests: | dplyr, testthat, covr |
NeedsCompilation: | no |
Packaged: | 2020-05-11 14:22:26 UTC; dfalbel |
Author: | Daniel Falbel [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2020-05-11 14:40:03 UTC |
Pipe operator
Description
See %>%
for more details.
Usage
lhs %>% rhs
Apply rules
Description
Apply rules
Usage
apply_rules(word, name, steprules)
Arguments
word |
word to which you want to apply the rules |
name |
the rule name, possible values are: 'Plural', 'Feminine', 'Adverb', 'Augmentative', 'Noun', 'Verb', 'Vowel' . |
steprules |
steprules as obtained from the function extract_rules. |
Extract raw rules
Description
Separate the seven kinds of rules
Usage
extract_raw_rules(raw_rules)
Arguments
raw_rules |
a charcter with the raw rules. |
Extract replacement rules
Description
Parses the the raw replacement rules.
Usage
extract_replacement_rules(raw_repl)
Arguments
raw_repl |
the part with replacement rules for each step rule. |
Extract Rule Info
Description
Extract all info for one rule
Usage
extract_rule_info(rule)
Arguments
rule |
the rule you want to extract infos |
Extract Rules from file
Description
This function parse the rules that are disponible in the RLSP package disponible in the RSLP C source. This file has been downloaded and is installed with the package. It's path can be found using system.file("steprules.txt", package = "rslp") A parsed version is saved is also installed with the package and its path can be found using system.file("steprules.rds", package = "rslp").
Usage
extract_rules(path = system.file("steprules.txt", package = "rslp"))
Arguments
path |
path to the raw steprules. Most of the times you don't have to change it. |
Extract Rules Info
Description
Extract all info from all rules
Usage
extract_rules_info(rules)
Arguments
rules |
rules parsed before by extract_rule_info |
Remove Acccents
Description
A wrappper for stringi package.
Usage
remove_accents(s)
Arguments
s |
the string you want to remove accents |
RSLP
Description
Apply the Stemming Algorithm for the Portuguese Language to vector of words.
Usage
rslp(
words,
steprules = readRDS(system.file("steprules.rds", package = "rslp"))
)
Arguments
words |
vector of words that you want to stem. |
steprules |
as obtained from the function extract_rules. (only define if you are certain about it). The default is to get the parsed versionof the rules installed with the package. |
References
V. Orengo, C. Huyck, "A Stemming Algorithmm for the Portuguese Language", SPIRE, 2001, String Processing and Information Retrieval, International Symposium on, String Processing and Information Retrieval, International Symposium on 2001, pp. 0186, doi:10.1109/SPIRE.2001.10024
Examples
words <- c("gostou", "gosto", "gostaram")
rslp(words)
RSLP_
Description
Apply the Stemming Algorithm for the Portuguese Language to a word.
Usage
rslp_(
word,
steprules = readRDS(system.file("steprules.rds", package = "rslp"))
)
Arguments
word |
word to be stemmed. |
steprules |
as obtained from the function extract_rules. |
RSLP Document
Description
Apply the Stemming Algorithm for the Portuguese Language to vector of documents. It extracts words using the regex "\b[:alpha:]\b"
Usage
rslp_doc(
docs,
steprules = readRDS(system.file("steprules.rds", package = "rslp"))
)
Arguments
docs |
chr vector of documents |
steprules |
as obtained from the function extract_rules. (only define if you are certain about it). The default is to get the parsed version of the rules installed with the package. |
References
V. Orengo, C. Huyck, "A Stemming Algorithmm for the Portuguese Language", SPIRE, 2001, String Processing and Information Retrieval, International Symposium on, String Processing and Information Retrieval, International Symposium on 2001, pp. 0186, doi:10.1109/SPIRE.2001.10024
Examples
docs <- c("coma frutas pois elas fazem bem para.")
rslp_doc(docs)
Verify
Description
Given a list of suffixes, returns a vector of true or false indicating if the word has each one of the suffixes.
Usage
verify_sufix(word, rep_rules)
Arguments
word |
word you which to verify replacement rules |
rep_rules |
data.frame of rules as specified in steprules$replacement_rule |