Title: | Data for Morpheme Tokenization |
Version: | 1.2.0 |
Description: | Provides data about morphemes, the smallest units of meaning in a language. |
License: | Apache License (≥ 2) |
Encoding: | UTF-8 |
RoxygenNote: | 7.1.2 |
URL: | https://github.com/macmillancontentscience/morphemepiece.data |
BugReports: | https://github.com/macmillancontentscience/morphemepiece.data/issues |
Suggests: | testthat (≥ 3.0.0) |
Depends: | R (≥ 3.5.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2022-04-18 17:19:26 UTC; jonth |
Author: | Jonathan Bratt |
Maintainer: | Jon Harmon <jonthegeek@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2022-04-18 17:42:28 UTC |
Generate the inst path
Description
Generate the inst path
Usage
.get_path(filetype, n_tokens)
Arguments
filetype |
Character scalar; the type of file, like "lookup" or "vocab". |
n_tokens |
Integer scalar; The number of tokens used for that file. |
Value
Character scalar; the path to the file.
Load an RDS from inst Dir
Description
Load an RDS from inst Dir
Usage
.load_inst_rds(filetype, n_tokens)
Arguments
filetype |
Character scalar; the type of file, like "lookup" or "vocab". |
n_tokens |
Integer scalar; The number of tokens used for that file. |
Value
The R object.
Load a Morphemepiece Lookup
Description
A morphemepiece lookup is a named character vector. The names of the vector are the words, and the values are the space-separated morpheme breakdowns of those words.
Usage
morphemepiece_lookup()
Value
A named character vector.
Examples
head(morphemepiece_lookup())
Load a Morphemepiece Vocabulary
Description
A morphemepiece vocabulary is a named integer vector with class "morphemepiece_vocabulary". The names of the vector are the morphemes, and the values are the integer identifiers of those tokens. The vocabulary is 0-indexed for compatibility with Python implementations.
Usage
morphemepiece_vocab()
Value
A morphemepiece_vocabulary.
Examples
head(morphemepiece_vocab())