Extended Installation Guide

The text-package provides users access to HuggingFace Transformers in R through reticulate, which interfaces with Python. It relies on Python packages such as torch and transformers.

To use these features, you need to install: • The text package in R • A Python environment with the required packages

The easiest way is to use textrpp_install() to install a preconfigured Conda environment and then initialize it with textrpp_initialize().

Set up a python environment for the text-package

library(text)
library(reticulate)

# Install text-required Python packages in a Conda environment
text::textrpp_install()

# Show available Conda environments
reticulate::conda_list()

# Initialize the installed Conda environment
text::textrpp_initialize(save_profile = TRUE)

# Test that textEmbed works
textEmbed("hello")

Troubleshooting

1. Check if you have install permissions

Can you install an R package like dplyr?

install.packages("dplyr")

Can you install system-level tools like Python/miniconda?

library(reticulate)
reticulate::install_miniconda()

If you do not have premissions, please contact your administrator for advice.

2. Remembered to initialize the python environment.

After restarting R, functions like textEmbed() can stop working again.

Solution: Persist the initialization in your R profile:

text::textrpp_initialize(
  condaenv = "textrpp_condaenv",
  refresh_settings = TRUE,
  save_profile = TRUE,
) 

3. Install the development version from GitHub.

# install.packages("devtools")
devtools::install_github("oscarkjell/text")

4. Force reinstallation of the environment

library(text)
text::textrpp_install(
  update_conda = TRUE,
  force_conda = TRUE
)

5. Install the python environment using reticulate

See the article Installing and Managing Python Environments with reticulate, for detailed information.

6. Inspect diagnostic information

If something isn’t working right, it is a good start to examine what is installed and running on your system.

library(text)
log <- text::textDiagnostics()
log 

Because the text package requires some system-level setup, installation is automatically verified on Windows, macOS, and Ubuntu through our GitHub Actions. If you encounter any issues, please review the tests and check the workflow file for details on system-specific installations.

To view the workflow file, select the three-dot menu on the right side of any GitHub Action run and choose View workflow file. This file specifies the operating systems, R versions, and additional libraries being tested.

Virtual environments

It is also possible to use virtual environments (although it is currently only tested on MacOS).

# Create a virtual environment with text required python packages.
# Note that you have to provide a python path.
text::textrpp_install_virtualenv(rpp_version = c("torch==1.7.1", "transformers==4.12.5", "numpy", "nltk"),
                                 python_path = "/usr/local/bin/python3.9",
                                 envname = "textrpp_virtualenv")

# Initialize the virtual environment.
text::textrpp_initialize(virtualenv = "textrpp_virtualenv",
                         condaenv = NULL,
                         save_profile = TRUE)

Solving OMP errors and R/Rstudio crashes

Some macOS users may experience a crash when running functions like textEmbed() from the text package. This is due to a known conflict between multiple OpenMP libraries (e.g., libomp.dylib and libiomp5.dylib) used by Python packages such as torch and transformers.

Workaround (Automatically Applied) To prevent this crash, the package sets the following environment variables when running on macOS:

Sys.setenv(OMP_NUM_THREADS = "1")
Sys.setenv(OMP_MAX_ACTIVE_LEVELS = "1")
Sys.setenv(KMP_DUPLICATE_LIB_OK = "TRUE")

These settings: - Limit the number of OpenMP threads - Avoid nested threading issues - Instruct macOS to ignore duplicate OpenMP libraries

This workaround is safe for most users and enables smooth functionality, but note that: - It may slightly reduce parallel processing performance. - It bypasses a system-level issue rather than solving it permanently.

The text package sets OpenMP-related environment variables for compatibility with PyTorch to avoid crashes due to libomp.dylib conflicts. You can skip this behavior by setting:

Sys.setenv(TEXT_SKIP_OMP_PATCH = "TRUE")
library(text)

The exact way to install these packages may differ across systems. Please see:
Python
torch
transformers

Share advise

If you find a good solution please feel free to email oscar [ d_o t] kjell [a_t] psy [DOT] lu [d_o_t]se so that we can update above instructions.