Type: | Package |
Title: | Google's Compact Language Detector 3 |
Version: | 1.6.1 |
Description: | Google's Compact Language Detector 3 is a neural network model for language identification and the successor of 'cld2' (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from 'cld2'. See https://github.com/google/cld3#readme for more information. |
License: | Apache License 2.0 |
Encoding: | UTF-8 |
URL: | https://docs.ropensci.org/cld3/ https://ropensci.r-universe.dev/cld3 |
BugReports: | https://github.com/ropensci/cld3/issues |
Imports: | Rcpp |
LinkingTo: | Rcpp |
RoxygenNote: | 6.0.1.9000 |
SystemRequirements: | libprotobuf and protobuf-compiler |
Suggests: | testthat, cld2 |
NeedsCompilation: | yes |
Packaged: | 2024-10-03 14:12:28 UTC; jeroen |
Author: | Jeroen Ooms |
Maintainer: | Jeroen Ooms <jeroenooms@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-10-04 15:30:02 UTC |
Compact Language Detector 3
Description
The function detect_language()
is vectorised and guesses the the language of each string
in text
or returns NA
if the language could not reliably be determined. The function
detect_language_multi()
is not vectorised and detects all languages inside the entire
character vector as a whole.
Usage
detect_language(text)
detect_language_mixed(text, size = 3)
Arguments
text |
a string with text to classify or a connection to read from |
size |
number of languages to detect |
Examples
# Vectorized best guess
text <- c("To be or not to be?", "Ce n'est pas grave.",
"Hij heeft de klok horen luiden maar weet niet waar de klepel hangt.")
detect_language(text)
# Multiple languages in one text (doesn't seem to work well)
detect_language_mixed(text)