Type: Package
Title: 'rquery' for 'data.table'
Version: 1.3.3
Date: 2023-08-19
Maintainer: John Mount <jmount@win-vector.com>
Description: Implements the 'rquery' piped Codd-style query algebra using 'data.table'. This allows for a high-speed in memory implementation of Codd-style data manipulation tools.
URL: https://github.com/WinVector/rqdatatable/, https://winvector.github.io/rqdatatable/
BugReports: https://github.com/WinVector/rqdatatable/issues
License: GPL-2 | GPL-3
Encoding: UTF-8
ByteCompile: true
VignetteBuilder: knitr
Depends: R (≥ 3.4.0), wrapr (≥ 2.0.9), rquery (≥ 1.4.9)
Imports: data.table (≥ 1.12.2)
RoxygenNote: 7.2.3
Suggests: knitr, rmarkdown, DBI, RSQLite, parallel, tinytest
NeedsCompilation: no
Packaged: 2023-08-20 05:23:02 UTC; johnmount
Author: John Mount [aut, cre], Win-Vector LLC [cph]
Repository: CRAN
Date/Publication: 2023-08-21 08:00:02 UTC

rqdatatable: Relational Query Generator for Data Manipulation Implemented by data.table

Description

Implements the rquery piped query algebra using data.table. This allows for a high-speed in memory implementation of Codd-style data manipulation tools.

Author(s)

Maintainer: John Mount jmount@win-vector.com

Other contributors:

See Also

Useful links:


Execute an rquery pipeline with data.table sources.

Description

data.tables are looked for by name in the tables argument and in the execution environment. Main external execution interface.

Usage

ex_data_table(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Details

Value

resulting data.table (intermediate tables can somtimes be mutated as is practice with data.table).

Examples


  a <- data.table::data.table(x = c(1, 2) , y = c(20, 30), z = c(300, 400))
  optree <- local_td(a) %.>%
     select_columns(., c("x", "y")) %.>%
     select_rows_nse(., x<2 & y<30)
  cat(format(optree))
  ex_data_table(optree)

  # other ways to execute the pipeline include
  data.frame(x = 0, y = 4, z = 400) %.>% optree



Execute an rquery pipeline with data.table in parallel.

Description

Execute an rquery pipeline with data.table in parallel, partitioned by a given column. Note: usually the overhead of partitioning and distributing the work will by far overwhelm any parallel speedup. Also data.table itself already seems to exploit some thread-level parallelism (one often sees user time > elapsed time). Requires the parallel package. For a worked example with significant speedup please see https://github.com/WinVector/rqdatatable/blob/master/extras/Parallel_rqdatatable.md.

Usage

ex_data_table_parallel(
  optree,
  partition_column,
  cl = NULL,
  ...,
  tables = list(),
  source_limit = NULL,
  debug = FALSE,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

partition_column

character name of column to partition work by.

cl

a cluster object, created by package parallel or by package snow. If NULL, use the registered default cluster.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

debug

logical if TRUE use lapply instead of parallel::clusterApplyLB.

env

environment to look for values in.

Details

Care must be taken that the calculation partitioning is course enough to ensure a correct calculation. For example: anything one is joining on, aggregating over, or ranking over must be grouped so that all elements affecting a given result row are in the same level of the partition.

Value

resulting data.table (intermediate tables can sometimes be mutated as is practice with data.table).


Execute an rquery pipeline with data.table sources.

Description

data.tables are looked for by name in the tables argument and in the execution environment. Internal execution interface.

Usage

ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Details

Value

resulting data.table (intermediate tables can somtimes be mutated as is practice with data.table).

Examples


  a <- data.table::data.table(x = c(1, 2) , y = c(20, 30), z = c(300, 400))
  optree <- local_td(a) %.>%
     select_columns(., c("x", "y")) %.>%
     select_rows_nse(., x<2 & y<30)
  cat(format(optree))
  ex_data_table_step(optree)

  # other ways to execute the pipeline include
  ex_data_table(optree)
  data.frame(x = 0, y = 4, z = 400) %.>% optree



default non-impementation.

Description

Throw on error if this method is called, signalling that a specific data.table implemetation is needed for this method.

Usage

## Default S3 method:
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.


Implement drop columns.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_drop_columns'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_limit = NULL,
  source_usage = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

source_usage

list mapping source table names to vectors of columns used.

env

environment to work in.

Examples


dL <- data.frame(x = 1, y = 2, z = 3)
rquery_pipeline <- local_td(dL) %.>%
  drop_columns(., "y")
dL %.>% rquery_pipeline


Implement extend/assign operator.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_extend'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Details

Will re-order columns if there are ordering terms.

Examples


dL <- build_frame(
    "subjectID", "surveyCategory"     , "assessmentTotal", "one" |
    1          , "withdrawal behavior", 5                , 1     |
    1          , "positive re-framing", 2                , 1     |
    2          , "withdrawal behavior", 3                , 1     |
    2          , "positive re-framing", 4                , 1     )
rquery_pipeline <- local_td(dL) %.>%
  extend_nse(.,
             probability %:=%
               exp(assessmentTotal * 0.237)/
               sum(exp(assessmentTotal * 0.237)),
             count %:=% sum(one),
             rank %:=% rank(),
             orderby = c("assessmentTotal", "surveyCategory"),
             reverse = c("assessmentTotal"),
             partitionby = 'subjectID') %.>%
  orderby(., c("subjectID", "probability"))
dL %.>% rquery_pipeline


Natural join.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_natural_join'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


d1 <- build_frame(
    "key", "val", "val1" |
      "a"  , 1  ,  10    |
      "b"  , 2  ,  11    |
      "c"  , 3  ,  12    )
d2 <- build_frame(
    "key", "val", "val2" |
      "a"  , 5  ,  13    |
      "b"  , 6  ,  14    |
      "d"  , 7  ,  15    )

# key matching join
optree <- natural_join(local_td(d1), local_td(d2),
                       jointype = "FULL", by = 'key')
ex_data_table(optree)

# full cross-product join
# (usually with jointype = "FULL", but "LEFT" is more
# compatible with rquery field merge semantics).
optree2 <- natural_join(local_td(d1), local_td(d2),
                        jointype = "LEFT", by = NULL)
ex_data_table(optree2)
# notice ALL non-"by" fields take coalese to left table.


Direct non-sql (function) node, not implemented for data.table case.

Description

Passes a single table to a function that takes a single data.frame as its argument, and returns a single data.frame.

Usage

## S3 method for class 'relop_non_sql'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

See Also

rq_df_funciton_node, rq_df_grouped_funciton_node

Examples


set.seed(3252)
d <- data.frame(a = rnorm(1000), b = rnorm(1000))

optree <- local_td(d) %.>%
  quantile_node(.)
d %.>% optree

p2 <- local_td(d) %.>%
  rsummary_node(.)
d %.>% p2

summary(d)


Replace NAs.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_null_replace'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


dL <- build_frame(
    "x", "y" |
    2L ,  5  |
    NA ,  7  |
    NA , NA )
rquery_pipeline <- local_td(dL) %.>%
  null_replace(., c("x", "y"), 0, note_col = "nna")
dL %.>% rquery_pipeline


Order rows by expression.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_order_expr'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


dL <- build_frame(
    "x", "y" |
    2L , "b" |
   -4L , "a" |
    3L , "c" )
rquery_pipeline <- local_td(dL) %.>%
  order_expr(., abs(x))
dL %.>% rquery_pipeline


Reorder rows.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_orderby'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


dL <- build_frame(
    "x", "y" |
    2L , "b" |
    1L , "a" |
    3L , "c" )
rquery_pipeline <- local_td(dL) %.>%
  orderby(., "y")
dL %.>% rquery_pipeline


Implement projection operator.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_project'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


dL <- build_frame(
  "subjectID", "surveyCategory"     , "assessmentTotal" |
    1          , "withdrawal behavior", 5                 |
    1          , "positive re-framing", 2                 |
    2          , "withdrawal behavior", 3                 |
    2          , "positive re-framing", 4                 )
test_p <- local_td(dL) %.>%
  project(.,
          maxscore := max(assessmentTotal),
          count := n(),
          groupby = 'subjectID')
cat(format(test_p))
dL %.>% test_p


Rename columns.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_rename_columns'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


dL <- build_frame(
    "x", "y" |
    2L , "b" |
    1L , "a" |
    3L , "c" )
rquery_pipeline <- local_td(dL) %.>%
  rename_columns(., c("x" = "y", "y" = "x"))
dL %.>% rquery_pipeline


Implement drop columns.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_select_columns'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


dL <- data.frame(x = 1, y = 2, z = 3)
rquery_pipeline <- local_td(dL) %.>%
  select_columns(., "y")
dL %.>% rquery_pipeline


Select rows by condition.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_select_rows'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


dL <- build_frame(
    "x", "y" |
    2L , "b" |
    1L , "a" |
    3L , "c" )
rquery_pipeline <- local_td(dL) %.>%
  select_rows_nse(., x <= 2)
dL %.>% rquery_pipeline


Implement set_indicatoroperator.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_set_indicator'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


d <- data.frame(a = c("1", "2", "1", "3"),
                b = c("1", "1", "3", "2"),
                q = 1,
                stringsAsFactors = FALSE)
set <- c("1", "2")
op_tree <- local_td(d) %.>%
  set_indicator(., "one_two", "a", set) %.>%
  set_indicator(., "z", "a", c())
d %.>% op_tree



Direct sql node.

Description

Execute one step using the rquery.rquery_db_executor SQL supplier. Note: it is not a good practice to use SQL nodes in data.table intended pipelines (loss of class information and cost of data transfer). This implementation is only here for completeness.

Usage

## S3 method for class 'relop_sql'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


# WARNING: example tries to change rquery.rquery_db_executor option to RSQLite and back.
if (requireNamespace("DBI", quietly = TRUE) &&
    requireNamespace("RSQLite", quietly = TRUE)) {
  # example database connection
  my_db <- DBI::dbConnect(RSQLite::SQLite(),
                          ":memory:")
  old_o <- options(list("rquery.rquery_db_executor" = list(db = my_db)))

  # example data
  d <- data.frame(v1 = c(1, 2, NA, 3),
                  v2 = c(NA, "b", NA, "c"),
                  v3 = c(NA, NA, 7, 8),
                  stringsAsFactors = FALSE)

  # example xform
  vars <- column_names(d)
  # build a NA/NULLs per-row counting expression.
  # names are "quoted" by wrapping them with as.name().
  # constants can be quoted by an additional list wrapping.
  expr <- lapply(vars,
                 function(vi) {
                   list("+ (CASE WHEN (",
                        as.name(vi),
                        "IS NULL ) THEN 1.0 ELSE 0.0 END)")
                 })
  expr <- unlist(expr, recursive = FALSE)
  expr <- c(list(0.0), expr)

  # instantiate the operator node
  op_tree <- local_td(d) %.>%
    sql_node(., "num_missing" %:=% list(expr))
  cat(format(op_tree))

  d %.>% op_tree

  options(old_o)
  DBI::dbDisconnect(my_db)
}


Build a data source description.

Description

data.table based implementation. Looks for tables first in tables and then in env. Will accept any data.frame that can be converted to data.table.

Usage

## S3 method for class 'relop_table_source'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


dL <- build_frame(
    "x", "y" |
    2L , "b" |
    1L , "a" |
    3L , "c" )
rquery_pipeline <- local_td(dL)
dL %.>% rquery_pipeline


Theta join (database implementation).

Description

Limited implementation. All terms must be of the form: "(table1.col CMP table2.col) (, (table1.col CMP table2.col) )".

Usage

## S3 method for class 'relop_theta_join'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


 d1 <- data.frame(AUC = 0.6, R2 = 0.2)
 d2 <- data.frame(AUC2 = 0.4, R2 = 0.3)

 optree <- theta_join_se(local_td(d1), local_td(d2), "AUC >= AUC2")

 ex_data_table(optree, tables = list(d1 = d1, d2 = d2)) %.>%
   print(.)


Bind tables together by rows.

Description

data.table based implementation.

Usage

## S3 method for class 'relop_unionall'
ex_data_table_step(
  optree,
  ...,
  tables = list(),
  source_usage = NULL,
  source_limit = NULL,
  env = parent.frame()
)

Arguments

optree

relop operations tree.

...

not used, force later arguments to bind by name.

tables

named list map from table names used in nodes to data.tables and data.frames.

source_usage

list mapping source table names to vectors of columns used.

source_limit

if not null limit all table sources to no more than this many rows (used for debugging).

env

environment to work in.

Examples


dL <- build_frame(
    "x", "y" |
    2L , "b" |
    1L , "a" |
    3L , "c" )
rquery_pipeline <- unionall(list(local_td(dL), local_td(dL)))
dL %.>% rquery_pipeline


Map a data records from row records to block records with one record row per columnsToTakeFrom value.

Description

Map a data records from row records (records that are exactly single rows) to block records (records that may be more than one row). All columns not named in columnsToTakeFrom are copied to each record row in the result.

Usage

layout_to_blocks_data_table(
  data,
  ...,
  nameForNewKeyColumn,
  nameForNewValueColumn,
  columnsToTakeFrom,
  columnsToCopy = setdiff(colnames(data), columnsToTakeFrom)
)

Arguments

data

data.frame to work with.

...

force later arguments to bind by name.

nameForNewKeyColumn

character name of column to write new keys in.

nameForNewValueColumn

character name of column to write new values in.

columnsToTakeFrom

character array names of columns to take values from.

columnsToCopy

character array names of columns to copy.

Value

new data.frame with values moved to rows.

Examples


(d <- wrapr::build_frame(
  "id"  , "id2", "AUC", "R2" |
    1   , "a"  , 0.7  , 0.4  |
    2   , "b"  , 0.8  , 0.5  ))

(layout_to_blocks_data_table(
  d,
  nameForNewKeyColumn = "measure",
  nameForNewValueColumn = "value",
  columnsToTakeFrom = c("AUC", "R2"),
  columnsToCopy = c("id", "id2")))



Map data records from block records that have one row per measurement value to row records.

Description

Map data records from block records (where each record may be more than one row) to row records (where each record is a single row). Values specified in rowKeyColumns determine which sets of rows build up records and are copied into the result.

Usage

layout_to_rowrecs_data_table(
  data,
  ...,
  columnToTakeKeysFrom,
  columnToTakeValuesFrom,
  rowKeyColumns,
  sep = "_"
)

Arguments

data

data.frame to work with (must be local, for remote please try moveValuesToColumns*).

...

force later arguments to bind by name.

columnToTakeKeysFrom

character name of column build new column names from.

columnToTakeValuesFrom

character name of column to get values from.

rowKeyColumns

character array names columns that should be table keys.

sep

character if not null build more detailed column names.

Value

new data.frame with values moved to columns.

Examples


(d2 <- wrapr::build_frame(
  "id"  , "id2", "measure", "value" |
    1   , "a"  , "AUC"    , 0.7     |
    2   , "b"  , "AUC"    , 0.8     |
    1   , "a"  , "R2"     , 0.4     |
    2   , "b"  , "R2"     , 0.5     ))

(layout_to_rowrecs_data_table(d2,
                             columnToTakeKeysFrom = "measure",
                             columnToTakeValuesFrom = "value",
                             rowKeyColumns = c("id", "id2")))


Lookup by column function factory.

Description

Build data.table implementation of lookup_by_column. We do this here as rqdatatable is a data.table aware package (and rquery is not).

Usage

make_dt_lookup_by_column(pick, result)

Arguments

pick

character scalar, name of column to control value choices.

result

character scalar, name of column to place values in.

Value

f_dt() function.

Examples


df = data.frame(x = c(1, 2, 3, 4),
                y = c(5, 6, 7, 8),
                choice = c("x", "y", "x", "z"),
                stringsAsFactors = FALSE)
make_dt_lookup_by_column("choice", "derived")(df)

# # base-R implementation
# df %.>% lookup_by_column(., "choice", "derived")
# # # data.table implementation (requies rquery 1.1.0, or newer)
# # df %.>% lookup_by_column(., "choice", "derived",
# #                          f_dt_factory = rqdatatable::make_dt_lookup_by_column)


rbindlist

Description

Note: different argument defaults than data.table::rbindlist.

Usage

rbindlist_data_table(l, use.names = TRUE, fill = TRUE, idcol = NULL)

Arguments

l

list of data.frames to rbind.

use.names

passed to data.table

fill

passed to data.table

idcol

passed to data.table

Value

data.table

Examples


rbindlist_data_table(list(
  data.frame(x = 1, y = 2),
  data.frame(x = c(2, 3), y = c(NA, 4))))


Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

data.table

as.data.table


Helper to build data.table capable non-sql nodes.

Description

Helper to build data.table capable non-sql nodes.

Usage

rq_df_funciton_node(
  .,
  f,
  ...,
  f_db = NULL,
  columns_produced,
  display_form,
  orig_columns = FALSE
)

Arguments

.

or data.frame input.

f

function that takes a data.table to a data.frame (or data.table).

...

force later arguments to bind by name.

f_db

implementation signature: f_db(db, incoming_table_name, outgoing_table_name, nd, ...) (db being a database handle). NULL defaults to using f.

columns_produced

character columns produces by f.

display_form

display form for node.

orig_columns

orig_columns, if TRUE assume all input columns are present in derived table.

Value

relop non-sql node implementation.

See Also

ex_data_table_step.relop_non_sql, rq_df_grouped_funciton_node

Examples


# a node generator is something an expert can
# write and part-time R users can use.
grouped_regression_node <- function(., group_col = "group", xvar = "x", yvar = "y") {
  force(group_col)
  formula_str <- paste(yvar, "~", xvar)
  f <- function(df, nd = NULL) {
    dlist <- split(df, df[[group_col]])
    clist <- lapply(dlist,
                    function(di) {
                      mi <- lm(as.formula(formula_str), data = di)
                      ci <- as.data.frame(summary(mi)$coefficients)
                      ci$Variable <- rownames(ci)
                      rownames(ci) <- NULL
                      ci[[group_col]] <- di[[group_col]][[1]]
                      ci
                    })
    data.table::rbindlist(clist)
  }
  columns_produced =
     c("Variable", "Estimate", "Std. Error", "t value", "Pr(>|t|)", group_col)
  rq_df_funciton_node(
    ., f,
    columns_produced = columns_produced,
    display_form = paste0(yvar, "~", xvar, " grouped by ", group_col))
}

# work an example
set.seed(3265)
d <- data.frame(x = rnorm(1000),
                y = rnorm(1000),
                group = sample(letters[1:5], 1000, replace = TRUE),
                stringsAsFactors = FALSE)

rquery_pipeline <- local_td(d) %.>%
  grouped_regression_node(.)

cat(format(rquery_pipeline))

d %.>% rquery_pipeline


Helper to build data.table capable non-sql nodes.

Description

Helper to build data.table capable non-sql nodes.

Usage

rq_df_grouped_funciton_node(
  .,
  f,
  ...,
  f_db = NULL,
  columns_produced,
  group_col,
  display_form
)

Arguments

.

or data.frame input.

f

function that takes a data.table to a data.frame (or data.table).

...

force later arguments to bind by name.

f_db

implementation signature: f_db(db, incoming_table_name, outgoing_table_name) (db being a database handle). NULL defaults to using f.

columns_produced

character columns produces by f.

group_col

character, column to split by.

display_form

display form for node.

Value

relop non-sql node implementation.

See Also

ex_data_table_step.relop_non_sql, rq_df_funciton_node

Examples


# a node generator is something an expert can
# write and part-time R users can use.
grouped_regression_node <- function(., group_col = "group", xvar = "x", yvar = "y") {
  force(group_col)
  formula_str <- paste(yvar, "~", xvar)
  f <- function(di) {
    mi <- lm(as.formula(formula_str), data = di)
    ci <- as.data.frame(summary(mi)$coefficients)
    ci$Variable <- rownames(ci)
    rownames(ci) <- NULL
    colnames(ci) <- c("Estimate", "Std_Error", "t_value", "p_value", "Variable")
    ci
  }
  columns_produced =
    c("Estimate", "Std_Error", "t_value", "p_value", "Variable", group_col)
  rq_df_grouped_funciton_node(
    ., f,
    columns_produced = columns_produced,
    group_col = group_col,
    display_form = paste0(yvar, "~", xvar, " grouped by ", group_col))
}

# work an example
set.seed(3265)
d <- data.frame(x = rnorm(1000),
                y = rnorm(1000),
                group = sample(letters[1:5], 1000, replace = TRUE),
                stringsAsFactors = FALSE)

rquery_pipeline <- local_td(d) %.>%
  grouped_regression_node(.)

cat(format(rquery_pipeline))

d %.>% rquery_pipeline



Set rqdatatable package as default rquery executor

Description

Sets rqdatatable (and hence data.table) as the default executor for rquery).

Usage

set_rqdatatable_as_executor()