The text-package provides users access to HuggingFace
Transformers in R through reticulate, which
interfaces with Python. It relies
on Python packages such as torch and
transformers.
To use these features, you need to install: • The text
package in R • A Python environment with the required packages
The easiest way is to use textrpp_install()
to
install a preconfigured Conda environment and then initialize it with
textrpp_initialize()
.
library(text)
library(reticulate)
# Install text-required Python packages in a Conda environment
text::textrpp_install()
# Show available Conda environments
reticulate::conda_list()
# Initialize the installed Conda environment
text::textrpp_initialize(save_profile = TRUE)
# Test that textEmbed works
textEmbed("hello")
Can you install an R package like dplyr?
Can you install system-level tools like Python/miniconda?
If you do not have premissions, please contact your administrator for advice.
After restarting R, functions like textEmbed() can stop working again.
Solution: Persist the initialization in your R profile:
See the article Installing and
Managing Python Environments with reticulate
, for
detailed information.
If something isn’t working right, it is a good start to examine what is installed and running on your system.
Because the text package requires some system-level setup, installation is automatically verified on Windows, macOS, and Ubuntu through our GitHub Actions. If you encounter any issues, please review the tests and check the workflow file for details on system-specific installations.
To view the workflow file, select the three-dot menu on the right side of any GitHub Action run and choose View workflow file. This file specifies the operating systems, R versions, and additional libraries being tested.
It is also possible to use virtual environments (although it is currently only tested on MacOS).
# Create a virtual environment with text required python packages.
# Note that you have to provide a python path.
text::textrpp_install_virtualenv(rpp_version = c("torch==1.7.1", "transformers==4.12.5", "numpy", "nltk"),
python_path = "/usr/local/bin/python3.9",
envname = "textrpp_virtualenv")
# Initialize the virtual environment.
text::textrpp_initialize(virtualenv = "textrpp_virtualenv",
condaenv = NULL,
save_profile = TRUE)
Some macOS users may experience a crash when running functions like textEmbed() from the text package. This is due to a known conflict between multiple OpenMP libraries (e.g., libomp.dylib and libiomp5.dylib) used by Python packages such as torch and transformers.
Workaround (Automatically Applied) To prevent this crash, the package sets the following environment variables when running on macOS:
Sys.setenv(OMP_NUM_THREADS = "1")
Sys.setenv(OMP_MAX_ACTIVE_LEVELS = "1")
Sys.setenv(KMP_DUPLICATE_LIB_OK = "TRUE")
These settings: - Limit the number of OpenMP threads - Avoid nested threading issues - Instruct macOS to ignore duplicate OpenMP libraries
This workaround is safe for most users and enables smooth functionality, but note that: - It may slightly reduce parallel processing performance. - It bypasses a system-level issue rather than solving it permanently.
The text package sets OpenMP-related environment variables for compatibility with PyTorch to avoid crashes due to libomp.dylib conflicts. You can skip this behavior by setting:
Sys.setenv(TEXT_SKIP_OMP_PATCH = "TRUE")
library(text)
The exact way to install these packages may differ across systems.
Please see:
Python
torch
transformers