Type: | Package |
Title: | Utilities for Importing and Manipulating Biomedical Data Files |
Version: | 0.2-15 |
Date: | 2025-03-01 |
Author: | Vidal Fey [aut, cre] |
Maintainer: | Vidal Fey <vidal.fey@gmail.com> |
Description: | Tools to read various file types into one list of data structures, usually, but not limited to, data frames. Excel files are read sheet-wise, i.e., all or a selection of sheets can be read. Field delimiters and decimal separators are determined automatically. |
Depends: | R (≥ 3.5.0), R.utils, utils |
Imports: | methods, xml2, readxl, plyr |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-03-01 10:37:56 UTC; fsvife |
Repository: | CRAN |
Date/Publication: | 2025-03-01 11:00:02 UTC |
Determine the number of lines in a (large) text file without importing it.
Description
Determine the number of lines in a (large) text file without importing it.
Usage
get.nlines(file, n = 1, pattern = NULL, incl.header = FALSE)
Arguments
file |
|
n |
|
pattern |
|
incl.header |
|
Value
An integer value.
Determine field delimiter in text files
Description
Determine field delimiter in text files
Usage
get.sep(file, n = 1, pattern)
Arguments
file |
|
n |
|
pattern |
|
Value
If successful, the filed delimiter. If more than on of the possible delimiters is found, an error is returned.
See Also
Determine Number of Rows to be Skipped in Text Files
Description
get.skip
attempts to determine the number of rows that could be skipped when reading text files.
Usage
get.skip(file, n = 1, pattern = NULL)
Arguments
file |
( |
n |
( |
pattern |
( |
Value
The skip
value. If no value is determined 0 (zero) is returned.
See Also
Read various input file formats into a list of data frames. Wrapper function for 'read2list' to automate reading further and avoid errors due to missing folders or files.
Description
read.to.list
is meant to act as a universal reading function as it attempts to read
a number of different file formats into a list of data frames.
Usage
read.to.list(
dat,
type,
folder,
nsheets = 1,
sheet = NULL,
keep.tibble = FALSE,
skip = 0,
sep = NULL,
lines = FALSE,
dec = NULL,
...,
verbose = TRUE,
x.verbose = FALSE
)
Arguments
dat |
|
type |
|
folder |
|
nsheets |
|
sheet |
|
keep.tibble |
|
skip |
|
sep |
|
lines |
|
dec |
|
... |
Additional arguments passed to functions. |
verbose |
|
x.verbose |
|
Details
Excel files (file extension .xls or .xlsx) will be read by readxl::read_excel
. A test is attempted
to determine whether the input file is genuinely derived from Excel or only named like an nExcel file. If the latter,
it will be attempted to read it as text file.
Text files are read as tables or by line if lines
is TRUE
.
For text files, field delimiters and decimal separators are determined automatically if not provided.
Files with the extensions .txt", ".tsv", ".csv", ".gtf" and ".gff" are treated and read as text files.
VCF files are also treated as text files but can noly be read in full (incl. header) if read by line. Otherwise,
if skip
is 0
, the line with the column names will be determined automatically and the file read
as delimited text file.
XML files are read by xml2::read_xml
.
".RData" files are loaded and assigned a name.
".rds" and ".rda" files are read by readRDS
.
".xdr" files are read by R.utils::loadObject
.
Value
A list of tibbles/data frames.
See Also
Examples
# The function readxl::read_excel is used internally to read Excel files.
# The example uses their example data.
readxl_datasets <- readxl::readxl_example("datasets.xlsx")
# A randomly generated data frame was saved to a tab-separated text file
# and two different R object files.
tsv_datasets <- dir(system.file("extdata", package = "readmoRe"), full.names = TRUE)
# All example data are read into a list. From the Excel file, the first
# sheet is read.
dat <- read.to.list(c(readxl_datasets, tsv_datasets))
# All example data are read into a list. From the Excel file, the first
# 3 sheets are read.
dat <- read.to.list(c(readxl_datasets, tsv_datasets), nsheets=3)
# All example data are read into a list. From the Excel file, sheets 1 and
# 3 are read.
dat <- read.to.list(c(readxl_datasets, tsv_datasets), sheet=c(1, 3))
# From two Excel files, different sheets are read: 1 and 3 from the first
# file and 2 and 3 from the second.
# (For simplicity, the same example file is used.)
dat <- read.to.list(c(readxl_datasets, readxl_datasets), sheet=list(c(1, 3), c(2, 3)))
Read various input file formats into a list of data frames
Description
read2list
is meant to act as a universal reading function as it attempts to read
a number of different file formats into a list of data frames.
Usage
read2list(
dat,
nsheets = 1,
sheet = NULL,
keep.tibble = FALSE,
skip = 0,
sep = NULL,
lines = FALSE,
dec = NULL,
...,
verbose = TRUE,
x.verbose = FALSE
)
Arguments
dat |
|
nsheets |
|
sheet |
|
keep.tibble |
|
skip |
|
sep |
|
lines |
|
dec |
|
... |
Additional arguments passed to functions. |
verbose |
|
x.verbose |
|
See Also
Utilities for data import
Description
A collection of utilities for reading and importing data into R by performing (usually small) manipulations of data structures such as data frames, matrices and list and automatically determining import parameters.
Details
Package: | readmoRe |
Type: | Package |
Initial version: | 0.1-0 |
Created: | 2011-01-07 |
License: | GPL-3 |
LazyLoad: | yes |
The main function of the package is read.to.list
which reads a number of different file formats into a list of data objects
such as data frames, depending on the source file.
Author(s)
Vidal Fey <vidal.fey@gmail.com>
Remove Empty Columns From an Imported Excel Sheet
Description
rm.empty.cols
removes columns that have only NAs
AND whose names
start with a capital 'X' (unless na.only is TRUE
in which case all NA
columns
will be removed).
Usage
rm.empty.cols(x, na.only = FALSE)
Arguments
x |
( |
na.only |
( |
Details
Empty columns in Excel sheets are imported to NA
columns in the resulting data frame.
If using gdata::read.xls
for reading Excel files, columns that did not have a column name in the
spread sheet will result in data frame column names starting with 'X'. rm.empty.cols
makes use
of these two criteria to identify columns that can safely be removed from the data frame.
Value
A data frame.
Remove 'newline' Characters From Imported Excel Sheets
Description
rm.newline.chars
removes ‘newline’ characters (\n
) from any column of a data frame.
Usage
rm.newline.chars(x, verbose = TRUE)
Arguments
x |
( |
verbose |
( |
Details
‘Newline’ characters in data frame rows are read verbatim and will cause rows in output text files to be distributed across two ore more lines. Such characters, entered accidentally or deliberately in the source Excel file, should be avoided. This function removes all ‘newline’ characters found at the end of a line or replaces them when found within the line text.
Value
A data frame.