Type: | Package |
Title: | 'rquery' for 'data.table' |
Version: | 1.3.3 |
Date: | 2023-08-19 |
Maintainer: | John Mount <jmount@win-vector.com> |
Description: | Implements the 'rquery' piped Codd-style query algebra using 'data.table'. This allows for a high-speed in memory implementation of Codd-style data manipulation tools. |
URL: | https://github.com/WinVector/rqdatatable/, https://winvector.github.io/rqdatatable/ |
BugReports: | https://github.com/WinVector/rqdatatable/issues |
License: | GPL-2 | GPL-3 |
Encoding: | UTF-8 |
ByteCompile: | true |
VignetteBuilder: | knitr |
Depends: | R (≥ 3.4.0), wrapr (≥ 2.0.9), rquery (≥ 1.4.9) |
Imports: | data.table (≥ 1.12.2) |
RoxygenNote: | 7.2.3 |
Suggests: | knitr, rmarkdown, DBI, RSQLite, parallel, tinytest |
NeedsCompilation: | no |
Packaged: | 2023-08-20 05:23:02 UTC; johnmount |
Author: | John Mount [aut, cre], Win-Vector LLC [cph] |
Repository: | CRAN |
Date/Publication: | 2023-08-21 08:00:02 UTC |
rqdatatable
: Relational Query Generator for Data Manipulation Implemented by data.table
Description
Implements the rquery
piped query algebra using data.table
. This allows
for a high-speed in memory implementation of Codd-style data manipulation tools.
Author(s)
Maintainer: John Mount jmount@win-vector.com
Other contributors:
Win-Vector LLC [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/WinVector/rqdatatable/issues
Execute an rquery
pipeline with data.table
sources.
Description
data.table
s are looked for by name in the tables
argument and in the execution environment.
Main external execution interface.
Usage
ex_data_table(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Details
-
ex_data_table_step.relop_drop_columns
: implement drop columns -
ex_data_table_step.relop_extend
: implement extend/assign operator -
ex_data_table_step.relop_natural_join
: implement natural join -
ex_data_table_step.relop_non_sql
: direct function (non-sql) operator (not implemented fordata.table
) -
ex_data_table_step.relop_null_replace
: implement NA/NULL replacement -
ex_data_table_step.relop_orderby
: implement row ordering -
ex_data_table_step.relop_project
: implement row ordering -
ex_data_table_step.relop_rename_columns
: implement column renaming -
ex_data_table_step.relop_select_columns
: implement select columns -
ex_data_table_step.relop_select_rows
: implement select rows -
ex_data_table_step.relop_sql
: direct sql operator (not implemented fordata.table
) -
ex_data_table_step.relop_table_source
: implement data source -
ex_data_table_step.relop_theta_join
: implement theta join (not implemented fordata.table
) -
ex_data_table_step.relop_unionall
: implement row binding
Value
resulting data.table (intermediate tables can somtimes be mutated as is practice with data.table).
Examples
a <- data.table::data.table(x = c(1, 2) , y = c(20, 30), z = c(300, 400))
optree <- local_td(a) %.>%
select_columns(., c("x", "y")) %.>%
select_rows_nse(., x<2 & y<30)
cat(format(optree))
ex_data_table(optree)
# other ways to execute the pipeline include
data.frame(x = 0, y = 4, z = 400) %.>% optree
Execute an rquery
pipeline with data.table
in parallel.
Description
Execute an rquery
pipeline with data.table
in parallel, partitioned by a given column.
Note: usually the overhead of partitioning and distributing the work will by far overwhelm any parallel speedup.
Also data.table
itself already seems to exploit some thread-level parallelism (one often sees user time > elapsed time).
Requires the parallel
package. For a worked example with significant speedup please see https://github.com/WinVector/rqdatatable/blob/master/extras/Parallel_rqdatatable.md.
Usage
ex_data_table_parallel(
optree,
partition_column,
cl = NULL,
...,
tables = list(),
source_limit = NULL,
debug = FALSE,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
partition_column |
character name of column to partition work by. |
cl |
a cluster object, created by package parallel or by package snow. If NULL, use the registered default cluster. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
debug |
logical if TRUE use lapply instead of parallel::clusterApplyLB. |
env |
environment to look for values in. |
Details
Care must be taken that the calculation partitioning is course enough to ensure a correct calculation. For example: anything one is joining on, aggregating over, or ranking over must be grouped so that all elements affecting a given result row are in the same level of the partition.
Value
resulting data.table (intermediate tables can sometimes be mutated as is practice with data.table).
Execute an rquery
pipeline with data.table
sources.
Description
data.table
s are looked for by name in the tables
argument and in the execution environment.
Internal execution interface.
Usage
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Details
-
ex_data_table_step.relop_drop_columns
: implement drop columns -
ex_data_table_step.relop_extend
: implement extend/assign operator -
ex_data_table_step.relop_natural_join
: implement natural join -
ex_data_table_step.relop_non_sql
: direct function (non-sql) operator (not implemented fordata.table
) -
ex_data_table_step.relop_null_replace
: implement NA/NULL replacement -
ex_data_table_step.relop_orderby
: implement row ordering -
ex_data_table_step.relop_project
: implement row ordering -
ex_data_table_step.relop_rename_columns
: implement column renaming -
ex_data_table_step.relop_select_columns
: implement select columns -
ex_data_table_step.relop_select_rows
: implement select rows -
ex_data_table_step.relop_sql
: direct sql operator (not implemented fordata.table
) -
ex_data_table_step.relop_table_source
: implement data source -
ex_data_table_step.relop_theta_join
: implement theta join (not implemented fordata.table
) -
ex_data_table_step.relop_unionall
: implement row binding
Value
resulting data.table (intermediate tables can somtimes be mutated as is practice with data.table).
Examples
a <- data.table::data.table(x = c(1, 2) , y = c(20, 30), z = c(300, 400))
optree <- local_td(a) %.>%
select_columns(., c("x", "y")) %.>%
select_rows_nse(., x<2 & y<30)
cat(format(optree))
ex_data_table_step(optree)
# other ways to execute the pipeline include
ex_data_table(optree)
data.frame(x = 0, y = 4, z = 400) %.>% optree
default non-impementation.
Description
Throw on error if this method is called, signalling that a specific data.table
implemetation is needed for this method.
Usage
## Default S3 method:
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Implement drop columns.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_drop_columns'
ex_data_table_step(
optree,
...,
tables = list(),
source_limit = NULL,
source_usage = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
source_usage |
list mapping source table names to vectors of columns used. |
env |
environment to work in. |
Examples
dL <- data.frame(x = 1, y = 2, z = 3)
rquery_pipeline <- local_td(dL) %.>%
drop_columns(., "y")
dL %.>% rquery_pipeline
Implement extend/assign operator.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_extend'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Details
Will re-order columns if there are ordering terms.
Examples
dL <- build_frame(
"subjectID", "surveyCategory" , "assessmentTotal", "one" |
1 , "withdrawal behavior", 5 , 1 |
1 , "positive re-framing", 2 , 1 |
2 , "withdrawal behavior", 3 , 1 |
2 , "positive re-framing", 4 , 1 )
rquery_pipeline <- local_td(dL) %.>%
extend_nse(.,
probability %:=%
exp(assessmentTotal * 0.237)/
sum(exp(assessmentTotal * 0.237)),
count %:=% sum(one),
rank %:=% rank(),
orderby = c("assessmentTotal", "surveyCategory"),
reverse = c("assessmentTotal"),
partitionby = 'subjectID') %.>%
orderby(., c("subjectID", "probability"))
dL %.>% rquery_pipeline
Natural join.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_natural_join'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
d1 <- build_frame(
"key", "val", "val1" |
"a" , 1 , 10 |
"b" , 2 , 11 |
"c" , 3 , 12 )
d2 <- build_frame(
"key", "val", "val2" |
"a" , 5 , 13 |
"b" , 6 , 14 |
"d" , 7 , 15 )
# key matching join
optree <- natural_join(local_td(d1), local_td(d2),
jointype = "FULL", by = 'key')
ex_data_table(optree)
# full cross-product join
# (usually with jointype = "FULL", but "LEFT" is more
# compatible with rquery field merge semantics).
optree2 <- natural_join(local_td(d1), local_td(d2),
jointype = "LEFT", by = NULL)
ex_data_table(optree2)
# notice ALL non-"by" fields take coalese to left table.
Direct non-sql (function) node, not implemented for data.table
case.
Description
Passes a single table to a function that takes a single data.frame as its argument, and returns a single data.frame.
Usage
## S3 method for class 'relop_non_sql'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
See Also
rq_df_funciton_node
, rq_df_grouped_funciton_node
Examples
set.seed(3252)
d <- data.frame(a = rnorm(1000), b = rnorm(1000))
optree <- local_td(d) %.>%
quantile_node(.)
d %.>% optree
p2 <- local_td(d) %.>%
rsummary_node(.)
d %.>% p2
summary(d)
Replace NAs.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_null_replace'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
dL <- build_frame(
"x", "y" |
2L , 5 |
NA , 7 |
NA , NA )
rquery_pipeline <- local_td(dL) %.>%
null_replace(., c("x", "y"), 0, note_col = "nna")
dL %.>% rquery_pipeline
Order rows by expression.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_order_expr'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
dL <- build_frame(
"x", "y" |
2L , "b" |
-4L , "a" |
3L , "c" )
rquery_pipeline <- local_td(dL) %.>%
order_expr(., abs(x))
dL %.>% rquery_pipeline
Reorder rows.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_orderby'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
dL <- build_frame(
"x", "y" |
2L , "b" |
1L , "a" |
3L , "c" )
rquery_pipeline <- local_td(dL) %.>%
orderby(., "y")
dL %.>% rquery_pipeline
Implement projection operator.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_project'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
dL <- build_frame(
"subjectID", "surveyCategory" , "assessmentTotal" |
1 , "withdrawal behavior", 5 |
1 , "positive re-framing", 2 |
2 , "withdrawal behavior", 3 |
2 , "positive re-framing", 4 )
test_p <- local_td(dL) %.>%
project(.,
maxscore := max(assessmentTotal),
count := n(),
groupby = 'subjectID')
cat(format(test_p))
dL %.>% test_p
Rename columns.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_rename_columns'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
dL <- build_frame(
"x", "y" |
2L , "b" |
1L , "a" |
3L , "c" )
rquery_pipeline <- local_td(dL) %.>%
rename_columns(., c("x" = "y", "y" = "x"))
dL %.>% rquery_pipeline
Implement drop columns.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_select_columns'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
dL <- data.frame(x = 1, y = 2, z = 3)
rquery_pipeline <- local_td(dL) %.>%
select_columns(., "y")
dL %.>% rquery_pipeline
Select rows by condition.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_select_rows'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
dL <- build_frame(
"x", "y" |
2L , "b" |
1L , "a" |
3L , "c" )
rquery_pipeline <- local_td(dL) %.>%
select_rows_nse(., x <= 2)
dL %.>% rquery_pipeline
Implement set_indicatoroperator.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_set_indicator'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
d <- data.frame(a = c("1", "2", "1", "3"),
b = c("1", "1", "3", "2"),
q = 1,
stringsAsFactors = FALSE)
set <- c("1", "2")
op_tree <- local_td(d) %.>%
set_indicator(., "one_two", "a", set) %.>%
set_indicator(., "z", "a", c())
d %.>% op_tree
Direct sql node.
Description
Execute one step using the rquery.rquery_db_executor SQL supplier. Note: it is not a good practice to use SQL nodes in data.table intended pipelines (loss of class information and cost of data transfer). This implementation is only here for completeness.
Usage
## S3 method for class 'relop_sql'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
# WARNING: example tries to change rquery.rquery_db_executor option to RSQLite and back.
if (requireNamespace("DBI", quietly = TRUE) &&
requireNamespace("RSQLite", quietly = TRUE)) {
# example database connection
my_db <- DBI::dbConnect(RSQLite::SQLite(),
":memory:")
old_o <- options(list("rquery.rquery_db_executor" = list(db = my_db)))
# example data
d <- data.frame(v1 = c(1, 2, NA, 3),
v2 = c(NA, "b", NA, "c"),
v3 = c(NA, NA, 7, 8),
stringsAsFactors = FALSE)
# example xform
vars <- column_names(d)
# build a NA/NULLs per-row counting expression.
# names are "quoted" by wrapping them with as.name().
# constants can be quoted by an additional list wrapping.
expr <- lapply(vars,
function(vi) {
list("+ (CASE WHEN (",
as.name(vi),
"IS NULL ) THEN 1.0 ELSE 0.0 END)")
})
expr <- unlist(expr, recursive = FALSE)
expr <- c(list(0.0), expr)
# instantiate the operator node
op_tree <- local_td(d) %.>%
sql_node(., "num_missing" %:=% list(expr))
cat(format(op_tree))
d %.>% op_tree
options(old_o)
DBI::dbDisconnect(my_db)
}
Build a data source description.
Description
data.table
based implementation.
Looks for tables first in tables
and then in env
.
Will accept any data.frame
that can
be converted to data.table
.
Usage
## S3 method for class 'relop_table_source'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
dL <- build_frame(
"x", "y" |
2L , "b" |
1L , "a" |
3L , "c" )
rquery_pipeline <- local_td(dL)
dL %.>% rquery_pipeline
Theta join (database implementation).
Description
Limited implementation. All terms must be of the form: "(table1.col CMP table2.col) (, (table1.col CMP table2.col) )".
Usage
## S3 method for class 'relop_theta_join'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
d1 <- data.frame(AUC = 0.6, R2 = 0.2)
d2 <- data.frame(AUC2 = 0.4, R2 = 0.3)
optree <- theta_join_se(local_td(d1), local_td(d2), "AUC >= AUC2")
ex_data_table(optree, tables = list(d1 = d1, d2 = d2)) %.>%
print(.)
Bind tables together by rows.
Description
data.table
based implementation.
Usage
## S3 method for class 'relop_unionall'
ex_data_table_step(
optree,
...,
tables = list(),
source_usage = NULL,
source_limit = NULL,
env = parent.frame()
)
Arguments
optree |
relop operations tree. |
... |
not used, force later arguments to bind by name. |
tables |
named list map from table names used in nodes to data.tables and data.frames. |
source_usage |
list mapping source table names to vectors of columns used. |
source_limit |
if not null limit all table sources to no more than this many rows (used for debugging). |
env |
environment to work in. |
Examples
dL <- build_frame(
"x", "y" |
2L , "b" |
1L , "a" |
3L , "c" )
rquery_pipeline <- unionall(list(local_td(dL), local_td(dL)))
dL %.>% rquery_pipeline
Map a data records from row records to block records with one record row per columnsToTakeFrom value.
Description
Map a data records from row records (records that are exactly single rows) to block records (records that may be more than one row). All columns not named in columnsToTakeFrom are copied to each record row in the result.
Usage
layout_to_blocks_data_table(
data,
...,
nameForNewKeyColumn,
nameForNewValueColumn,
columnsToTakeFrom,
columnsToCopy = setdiff(colnames(data), columnsToTakeFrom)
)
Arguments
data |
data.frame to work with. |
... |
force later arguments to bind by name. |
nameForNewKeyColumn |
character name of column to write new keys in. |
nameForNewValueColumn |
character name of column to write new values in. |
columnsToTakeFrom |
character array names of columns to take values from. |
columnsToCopy |
character array names of columns to copy. |
Value
new data.frame with values moved to rows.
Examples
(d <- wrapr::build_frame(
"id" , "id2", "AUC", "R2" |
1 , "a" , 0.7 , 0.4 |
2 , "b" , 0.8 , 0.5 ))
(layout_to_blocks_data_table(
d,
nameForNewKeyColumn = "measure",
nameForNewValueColumn = "value",
columnsToTakeFrom = c("AUC", "R2"),
columnsToCopy = c("id", "id2")))
Map data records from block records that have one row per measurement value to row records.
Description
Map data records from block records (where each record may be more than one row) to row records (where each record is a single row). Values specified in rowKeyColumns determine which sets of rows build up records and are copied into the result.
Usage
layout_to_rowrecs_data_table(
data,
...,
columnToTakeKeysFrom,
columnToTakeValuesFrom,
rowKeyColumns,
sep = "_"
)
Arguments
data |
data.frame to work with (must be local, for remote please try |
... |
force later arguments to bind by name. |
columnToTakeKeysFrom |
character name of column build new column names from. |
columnToTakeValuesFrom |
character name of column to get values from. |
rowKeyColumns |
character array names columns that should be table keys. |
sep |
character if not null build more detailed column names. |
Value
new data.frame with values moved to columns.
Examples
(d2 <- wrapr::build_frame(
"id" , "id2", "measure", "value" |
1 , "a" , "AUC" , 0.7 |
2 , "b" , "AUC" , 0.8 |
1 , "a" , "R2" , 0.4 |
2 , "b" , "R2" , 0.5 ))
(layout_to_rowrecs_data_table(d2,
columnToTakeKeysFrom = "measure",
columnToTakeValuesFrom = "value",
rowKeyColumns = c("id", "id2")))
Lookup by column function factory.
Description
Build data.table implementation of lookup_by_column. We do this here as rqdatatable is a data.table aware package (and rquery is not).
Usage
make_dt_lookup_by_column(pick, result)
Arguments
pick |
character scalar, name of column to control value choices. |
result |
character scalar, name of column to place values in. |
Value
f_dt() function.
Examples
df = data.frame(x = c(1, 2, 3, 4),
y = c(5, 6, 7, 8),
choice = c("x", "y", "x", "z"),
stringsAsFactors = FALSE)
make_dt_lookup_by_column("choice", "derived")(df)
# # base-R implementation
# df %.>% lookup_by_column(., "choice", "derived")
# # # data.table implementation (requies rquery 1.1.0, or newer)
# # df %.>% lookup_by_column(., "choice", "derived",
# # f_dt_factory = rqdatatable::make_dt_lookup_by_column)
rbindlist
Description
Note: different argument defaults than data.table::rbindlist.
Usage
rbindlist_data_table(l, use.names = TRUE, fill = TRUE, idcol = NULL)
Arguments
l |
list of data.frames to rbind. |
use.names |
passed to data.table |
fill |
passed to data.table |
idcol |
passed to data.table |
Value
data.table
Examples
rbindlist_data_table(list(
data.frame(x = 1, y = 2),
data.frame(x = c(2, 3), y = c(NA, 4))))
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- data.table
Helper to build data.table capable non-sql nodes.
Description
Helper to build data.table capable non-sql nodes.
Usage
rq_df_funciton_node(
.,
f,
...,
f_db = NULL,
columns_produced,
display_form,
orig_columns = FALSE
)
Arguments
. |
or data.frame input. |
f |
function that takes a data.table to a data.frame (or data.table). |
... |
force later arguments to bind by name. |
f_db |
implementation signature: f_db(db, incoming_table_name, outgoing_table_name, nd, ...) (db being a database handle). NULL defaults to using f. |
columns_produced |
character columns produces by f. |
display_form |
display form for node. |
orig_columns |
orig_columns, if TRUE assume all input columns are present in derived table. |
Value
relop non-sql node implementation.
See Also
ex_data_table_step.relop_non_sql
, rq_df_grouped_funciton_node
Examples
# a node generator is something an expert can
# write and part-time R users can use.
grouped_regression_node <- function(., group_col = "group", xvar = "x", yvar = "y") {
force(group_col)
formula_str <- paste(yvar, "~", xvar)
f <- function(df, nd = NULL) {
dlist <- split(df, df[[group_col]])
clist <- lapply(dlist,
function(di) {
mi <- lm(as.formula(formula_str), data = di)
ci <- as.data.frame(summary(mi)$coefficients)
ci$Variable <- rownames(ci)
rownames(ci) <- NULL
ci[[group_col]] <- di[[group_col]][[1]]
ci
})
data.table::rbindlist(clist)
}
columns_produced =
c("Variable", "Estimate", "Std. Error", "t value", "Pr(>|t|)", group_col)
rq_df_funciton_node(
., f,
columns_produced = columns_produced,
display_form = paste0(yvar, "~", xvar, " grouped by ", group_col))
}
# work an example
set.seed(3265)
d <- data.frame(x = rnorm(1000),
y = rnorm(1000),
group = sample(letters[1:5], 1000, replace = TRUE),
stringsAsFactors = FALSE)
rquery_pipeline <- local_td(d) %.>%
grouped_regression_node(.)
cat(format(rquery_pipeline))
d %.>% rquery_pipeline
Helper to build data.table capable non-sql nodes.
Description
Helper to build data.table capable non-sql nodes.
Usage
rq_df_grouped_funciton_node(
.,
f,
...,
f_db = NULL,
columns_produced,
group_col,
display_form
)
Arguments
. |
or data.frame input. |
f |
function that takes a data.table to a data.frame (or data.table). |
... |
force later arguments to bind by name. |
f_db |
implementation signature: f_db(db, incoming_table_name, outgoing_table_name) (db being a database handle). NULL defaults to using f. |
columns_produced |
character columns produces by f. |
group_col |
character, column to split by. |
display_form |
display form for node. |
Value
relop non-sql node implementation.
See Also
ex_data_table_step.relop_non_sql
, rq_df_funciton_node
Examples
# a node generator is something an expert can
# write and part-time R users can use.
grouped_regression_node <- function(., group_col = "group", xvar = "x", yvar = "y") {
force(group_col)
formula_str <- paste(yvar, "~", xvar)
f <- function(di) {
mi <- lm(as.formula(formula_str), data = di)
ci <- as.data.frame(summary(mi)$coefficients)
ci$Variable <- rownames(ci)
rownames(ci) <- NULL
colnames(ci) <- c("Estimate", "Std_Error", "t_value", "p_value", "Variable")
ci
}
columns_produced =
c("Estimate", "Std_Error", "t_value", "p_value", "Variable", group_col)
rq_df_grouped_funciton_node(
., f,
columns_produced = columns_produced,
group_col = group_col,
display_form = paste0(yvar, "~", xvar, " grouped by ", group_col))
}
# work an example
set.seed(3265)
d <- data.frame(x = rnorm(1000),
y = rnorm(1000),
group = sample(letters[1:5], 1000, replace = TRUE),
stringsAsFactors = FALSE)
rquery_pipeline <- local_td(d) %.>%
grouped_regression_node(.)
cat(format(rquery_pipeline))
d %.>% rquery_pipeline
Set rqdatatable package as default rquery executor
Description
Sets rqdatatable (and hence data.table) as the default executor for rquery).
Usage
set_rqdatatable_as_executor()