Type: Package
Title: Language Support for 'koRpus' Package: English
Depends: R (≥ 3.1),koRpus (≥ 0.11-2)
Imports: methods,sylly.en
Description: Adds support for the English language to the 'koRpus' package. To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please consider subscribing to the koRpus-dev mailing list (https://korpusml.reaktanz.de).
License: GPL (≥ 3)
Encoding: UTF-8
LazyLoad: yes
URL: https://reaktanz.de/?c=hacking&s=koRpus
BugReports: https://github.com/unDocUMeantIt/koRpus.lang.en/issues
Version: 0.1-4
Date: 2020-10-24
RoxygenNote: 7.1.1
NeedsCompilation: no
Packaged: 2020-10-24 13:28:33 UTC; m
Author: Meik Michalke [aut, cre], Elen Le Foll [ctb] (BNC tagset)
Maintainer: Meik Michalke <meik.michalke@hhu.de>
Repository: CRAN
Date/Publication: 2020-10-24 14:10:06 UTC

Language Support for 'koRpus' Package: English

Description

Adds support for the English language to the 'koRpus' package. To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please consider subscribing to the koRpus-dev mailing list (<https://korpusml.reaktanz.de>).

Details

The DESCRIPTION file:

Package: koRpus.lang.en
Type: Package
Version: 0.1-4
Date: 2020-10-24
Depends: R (>= 3.1),koRpus (>= 0.11-2)
Encoding: UTF-8
License: GPL (>= 3)
LazyLoad: yes
URL: https://reaktanz.de/?c=hacking&s=koRpus

Author(s)

NA

Maintainer: NA

See Also

Useful links:


Language support for English

Description

This function adds support for English to the koRpus package. You should not need to call it manually, as that is done automatically when this package is being loaded.

Usage

lang.support.en(...)

Arguments

...

Optional arguments for set.lang.support.

Details

The POS tags cover tag definitions from multiple sources. Please note that there is one tag, "PRP", that is defined in both PENN[3] and BNC[4] tagsets, but with different meanings: The PENN tag marks personal pronouns, whereas the BNC tag marks prepositions (except "of"). Since the conflicting tag is not being used by TreeTagger's PENN parameter set, but in its BNC set, koRpus also uses the BNC definition. Keep this in mind if you use this language support package with alternative taggers.

In particular, this function adds the following:

Hyphenation patterns are provided by means of the sylly.en package.

References

[1] http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

[2] http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Penn-Treebank-Tagset.pdf

[3] https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

[4] http://www.natcorp.ox.ac.uk/docs/c5spec.html

Examples

lang.support.en()