Title: An R Interface for Raven DataFrames (Beta0)
Version: 0.2.0
Description: Provides an I/O interface between R data.frames and Raven DataFrames. Defines functions to both read and write DataFrame files, as well as serialize/deserialize data.frames/DataFrames.
License: Apache License (== 2)
URL: https://github.com/raven-computing/rdf
Encoding: UTF-8
Depends: R (≥ 3.5.0)
LazyData: true
RoxygenNote: 7.1.1
Maintainer: Phil Gaiser <phil.gaiser@raven-computing.com>
Suggests: testthat (≥ 3.0.0)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2021-03-15 11:12:42 UTC; kilo52
Author: Phil Gaiser [aut, cre], Raven Computing [cph]
Repository: CRAN
Date/Publication: 2021-03-17 13:20:02 UTC

Deserializes the given vector of raw bytes and returns a data.frame object.

Description

The raw vector to be deserialized must represent a Raven DataFrame. That DataFrame is returned as an R data.frame object.

Usage

deserializeDataFrame(bytes)

Arguments

bytes

The vector of raw bytes to deserialize

Details

The column types from Raven DataFrames are mapped to the corresponding R types. More specifically, all integer types (byte, short, int, long) are mapped to the R 'integer' type. The floating point types (float, double) are mapped to the R 'double' type. Both string and char types are mapped to the R 'character' type. Booleans are mapped to the R 'logical' type. Binary columns are represented as R 'list' types containing raw vectors.

Value

A data.frame object from the specified raw vector

See Also

readDataFrame() for reading DataFrame (.df) files directly.

Examples

## Not run: 
# deserialize a raw vector representing a DataFrame
df <- deserializeDataFrame(my.raw.vector)

# get the types for all columns
types <- sapply(df, typeof)

## End(Not run)


Reads a DataFrame from the specified file.

Description

The file to be read must be a DataFrame (.df) file. The content of the file is returned as an R data.frame object.

Usage

readDataFrame(filepath)

Arguments

filepath

The path to the file to read

Details

The column types from Raven DataFrames are mapped to the corresponding R types. More specifically, all integer types (byte, short, int, long) are mapped to the R 'integer' type. The floating point types (float, double) are mapped to the R 'double' type. Both string and char types are mapped to the R 'character' type. Booleans are mapped to the R 'logical' type. Binary columns are represented as R 'list' types containing raw vectors.

Value

A data.frame object

See Also

deserializeDataFrame() for deserializing vectors of raw bytes.
writeDataFrame() for writing DataFrame files which can be read by this function.

Examples

## Not run: 
# read a .df file into memory
df <- readDataFrame("/path/to/my/file.df")

# get the types for all columns
types <- sapply(df, typeof)

## End(Not run)


Serializes the specified data.frame object to a vector of raw bytes.

Description

The R data.frame is serialized as a Raven DataFrame. The concrete column types to use for each individual data.frame column can be specified by the 'types' argument.

Usage

serializeDataFrame(df, types = NULL, compress = FALSE, as.nullable = FALSE)

Arguments

df

The data.frame object to serialize

types

The type names for all column types. Must be a vector of character values. May be NULL

compress

A logical indicating whether to compress the content of the returned raw vector

as.nullable

A logical indicating whether the data.frame should be serialized as a NullableDataFrame, even if it contains no NA values

Details

The column types of the R data.frame object are mapped to the corresponding Raven DataFrame column types. The following types exist:

Type name Description
byte int8
short int16
int int32
long int64
float float32
double float64
string UTF-8 encoded unicode string
char single printable ASCII character
boolean logical value TRUE or FALSE
binary arbitrary length byte array

By default, if the 'types' argument is not explicitly specified, all values are mapped to the corresponding largest possible type in order to avoid possible loss of information. However, users can specify the concrete type for each column in the DataFrame file to be written. This is done by providing a vector of character values denoting the type name of each corresponding data.frame column. The index of each entry corresponds to the index of the column in the underlying data.frame to persist.

If the specified data.frame object contains at least one NA value, then the serialized DataFrame will represent a NullableDataFrame. If the data.frame contains no NA values, then the serialized DataFrame will represent a DefaultDataFrame, unless the 'as.nullable' argument is set to TRUE.

The logical 'compress' argument specifies whether the serialized DataFrame is compressed.

Value

A raw vector representing the serialized date.frame object

See Also

writeDataFrame() for directly persisting data.frame objects to the file system

Examples

## Not run: 
# get a data.frame
df <- cars
# serialize the data.frame to a raw vector
vec <- serializeDataFrame(df)

# specify the concrete types of all columns
coltypes <- c("float", "double")
# serialize the data.frame to a raw vector with concrete types
serializeDataFrame(df, types = coltypes)

## End(Not run)


Writes the specified data.frame to the specified file.

Description

The R data.frame is persisted as a DataFrame (.df) file. The concrete column types to use for each individual data.frame column can be specified by the 'types' argument.

Usage

writeDataFrame(filepath, df, types = NULL, as.nullable = FALSE)

Arguments

filepath

The path to the file to write

df

The data.frame object to write

types

The type names for all column types. Must be a vector of character values. May be NULL

as.nullable

A logical indicating whether the data.frame should be persisted as a NullableDataFrame, even if it contains no NA values

Details

The column types of the R data.frame object are mapped to the corresponding Raven DataFrame column types. The following types exist:

Type name Description
byte int8
short int16
int int32
long int64
float float32
double float64
string UTF-8 encoded unicode string
char single printable ASCII character
boolean logical value TRUE or FALSE
binary arbitrary length byte array

By default, if the 'types' argument is not explicitly specified, all values are mapped to the corresponding largest possible type in order to avoid possible loss of information. However, users can specify the concrete type for each column in the DataFrame file to be written. This is done by providing a vector of character values denoting the type name of each corresponding data.frame column. The index of each entry corresponds to the index of the column in the underlying data.frame to persist.

If the specified data.frame object contains at least one NA value, then the DataFrame file to be persisted will represent a NullableDataFrame. If the data.frame contains no NA values, then the DataFrame file to be persisted will represent a DefaultDataFrame, unless the 'as.nullable' argument is set to TRUE.

Value

The number of bytes written to the specified file

See Also

serializeDataFrame() for serializing data.frame objects to vectors of raw bytes.
readDataFrame() for reading DataFrame files which have been previously persisted by this function.

Examples

## Not run: 
# get a data.frame
df <- cars
# write the data.frame to a .df file
writeDataFrame("cars.df", df)

# specify the concrete types of all columns
coltypes <- c("float", "double")
# write the data.frame to a .df file with concrete types
writeDataFrame("cars.df", df, types = coltypes)

## End(Not run)