Title: | Running Operations for Vectors |
Type: | Package |
Version: | 0.4.4 |
Depends: | R (≥ 3.0) |
Language: | en-US |
Encoding: | UTF-8 |
Maintainer: | Dawid Kałędkowski <dawid.kaledkowski@gmail.com> |
Description: | Lightweight library for rolling windows operations. Package enables full control over the window length, window lag and a time indices. With a runner one can apply any R function on a rolling windows. The package eases work with equally and unequally spaced time series. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
BugReports: | https://github.com/gogonzo/runner/issues |
LinkingTo: | Rcpp |
Imports: | methods, parallel, Rcpp |
Suggests: | knitr, rmarkdown, tinytest |
RoxygenNote: | 7.3.1 |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2024-03-03 20:09:02 UTC; gonzo |
Author: | Dawid Kałędkowski |
Repository: | CRAN |
Date/Publication: | 2024-03-03 20:50:02 UTC |
Resolves at argument
Description
Resolves at
argument passed to the runner
-
checks if the argument has a valid value. If argument is a single character
matching column name in the x
then is replaced with the value x[[at]]
Usage
.check_unresolved_at(x, at)
Arguments
x |
( |
at |
( |
Value
resolved at
Resolves time difference argument
Description
Resolves at
argument passed to the runner
(k
or lag
)
checks if the argument has a valid value. If argument is a single character
matching column name in the x
then is replaced with the value x[[idx]]
Usage
.check_unresolved_difftime(x, k)
Arguments
x |
( |
k |
( |
Value
resolved idx
Resolves index argument
Description
Resolves at
argument passed to the runner
-
checks if the argument has a valid value. If argument is a single character
matching column name in the x
then is replaced with the value x[[idx]]
Usage
.check_unresolved_index(x, idx)
Arguments
x |
( |
idx |
( |
Value
resolved idx
Validate date time character
Description
Checks if the character is a valid date time string
Usage
.is_datetime_valid(x)
Arguments
x |
( |
Value
logical(1)
denoting if all elements of the character vectors are valid
Converts k and lag from time-unit-interval to int
Description
Converts k and lag from time-unit-interval to int
Usage
.k_by(k, idx, param)
Arguments
k |
( |
idx |
( |
param |
name of the parameter to be printed in error message |
Examples
k <- "1 month"
idx <- seq(
as.POSIXct("2019-01-01 03:02:01"),
as.POSIXct("2020-01-01 03:02:01"),
by = "month"
)
k_difftime <- runner:::.k_by(k, idx, param = "k")
idx - k_difftime
Formats time-unit-interval to valid for runner
Description
Formats time-unit-interval to valid for runner. User specifies k
as
positive number but this means that this interval needs to be substracted
from idx
- because windows length extends window backwards in time.
The same situation for lag.
Usage
.reformat_k(k, only_positive = TRUE)
Arguments
k |
(k or lag) object from runner to be formatted |
only_positive |
for |
Examples
runner:::.reformat_k("1 days")
runner:::.reformat_k("day")
runner:::.reformat_k("10 days")
runner:::.reformat_k("-10 days", only_positive = FALSE)
runner:::.reformat_k(c("-10 days", "2 months"), only_positive = FALSE)
Resolves at argument
Description
Resolves argument passed to the runner
-
checks if the argument has a valid value. If argument is a single character
matching column name in the x
then is replaced with the value x[[arg]]
Usage
.resolve_arg(x, arg)
Arguments
x |
( |
Value
resolved at
Creates sequence for at as time-unit-interval
Description
Creates sequence for at as time-unit-interval
Usage
.seq_at(at, idx)
Arguments
at |
object from runner |
idx |
object from runner |
Access group data in mutate
Description
Access group data in dplyr::mutate
after
dplyr::group_by
.
Function created because data available in dplyr::group_by %>% mutate
scheme is not filtered by group - in mutate function .
is still initial
dataset. This function creates data.frame
using dplyr::groups
information.
Usage
.this_group(x)
Arguments
x |
( |
Value
data.frame filtered by current dplyr::groups()
Fill NA with previous non-NA element
Description
Fill NA
with last non-NA element.
Usage
fill_run(x, run_for_first = FALSE, only_within = FALSE)
Arguments
x |
( |
run_for_first |
If first elements are filled with |
only_within |
|
Value
vector - x
containing all x
elements with NA
replaced with previous non-NA element.
Examples
fill_run(c(NA, NA, 1:10, NA, NA), run_for_first = TRUE)
fill_run(c(NA, NA, 1:10, NA, NA), run_for_first = TRUE)
fill_run(c(NA, NA, 1:10, NA, NA), run_for_first = FALSE)
fill_run(c(NA, NA, 1, 2, NA, NA, 2, 2, NA, NA, 1, NA, NA), run_for_first = TRUE, only_within = TRUE)
Lag dependent on variable
Description
Vector of input lagged along integer vector
Usage
lag_run(x, lag = 1L, idx = integer(0), nearest = FALSE)
Arguments
x |
( |
lag |
( |
idx |
( |
nearest |
|
Examples
lag_run(1:10, lag = 3)
lag_run(letters[1:10], lag = -2, idx = c(1, 1, 1, 2, 3, 4, 6, 7, 8, 10))
lag_run(letters[1:10], lag = 2, idx = c(1, 1, 1, 2, 3, 4, 6, 7, 8, 10), nearest = TRUE)
Length of running windows
Description
Number of elements in k-long window calculated on idx
vector.
If idx
is an as.integer(date)
vector, then k=number of days in window -
then the result is number of observations within k days window.
Usage
length_run(k = integer(1), lag = integer(1), idx = integer(0))
Arguments
k |
( |
lag |
( |
idx |
( |
Examples
length_run(k = 3, idx = c(1, 2, 2, 4, 5, 5, 5, 5, 5, 5))
Running maximum
Description
min_run
calculates running max on given x
numeric vector,
specified k
window size.
Usage
max_run(
x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_rm = TRUE,
na_pad = FALSE
)
Arguments
x |
( |
k |
( |
lag |
( |
idx |
( |
at |
( |
na_rm |
|
na_pad |
( |
Value
max (numeric
) vector of length equals length of x
.
Examples
set.seed(11)
x1 <- sample(c(1, 2, 3), 15, replace = TRUE)
x2 <- sample(c(NA, 1, 2, 3), 15, replace = TRUE)
k <- sample(1:4, 15, replace = TRUE)
max_run(x1) # simple cumulative maximum
max_run(x2, na_rm = TRUE) # cumulative maximum with removing NA.
max_run(x2, na_rm = TRUE, k = 4) # maximum in 4-element window
max_run(x2, na_rm = FALSE, k = k) # maximum in varying k window size
Running mean
Description
Running mean in specified window of numeric vector.
Usage
mean_run(
x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_rm = TRUE,
na_pad = FALSE
)
Arguments
x |
|
k |
( |
lag |
( |
idx |
( |
at |
( |
na_rm |
|
na_pad |
( |
Value
mean (numeric
) vector of length equals length of x
.
Examples
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA, 5), rnorm(15)), 15, replace = TRUE)
k <- sample(1:15, 15, replace = TRUE)
mean_run(x1)
mean_run(x2, na_rm = TRUE)
mean_run(x2, na_rm = FALSE)
mean_run(x2, na_rm = TRUE, k = 4)
Running minimum
Description
min_run
calculates running min on given x
numeric vector, specified k
window size.
Usage
min_run(
x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_rm = TRUE,
na_pad = FALSE
)
Arguments
x |
( |
k |
( |
lag |
( |
idx |
( |
at |
( |
na_rm |
|
na_pad |
( |
Value
min (numeric
) vector of length equals length of x
.
Examples
set.seed(11)
x1 <- sample(c(1, 2, 3), 15, replace = TRUE)
x2 <- sample(c(NA, 1, 2, 3), 15, replace = TRUE)
k <- sample(1:4, 15, replace = TRUE)
min_run(x1)
min_run(x2, na_rm = TRUE)
min_run(x2, na_rm = TRUE, k = 4)
min_run(x2, na_rm = FALSE, k = k)
Running min/max
Description
min_run
calculates running minimum-maximum on given x
numeric
vector, specified k
window size.
Usage
minmax_run(x, metric = "min", na_rm = TRUE)
Arguments
x |
( |
metric |
|
na_rm |
|
Value
list.
Set window parameters
Description
Set window parameters for runner()
. This function sets the
attributes to x
(only data.frame
) object and saves user effort
to specify window parameters in further multiple runner()
calls.
Usage
run_by(x, idx, k, lag, na_pad, at)
Arguments
x |
( |
idx |
( |
k |
( |
lag |
( |
na_pad |
( |
at |
( |
Value
x object which runner()
can be executed on.
Examples
## Not run:
library(dplyr)
data <- data.frame(
index = c(2, 3, 3, 4, 5, 8, 10, 10, 13, 15),
a = rep(c("a", "b"), each = 5),
b = 1:10
)
data %>%
group_by(a) %>%
run_by(idx = "index", k = 5) %>%
mutate(
c = runner(
x = .,
f = function(x) {
paste(x$b, collapse = ">")
}
),
d = runner(
x = .,
f = function(x) {
sum(x$b)
}
)
)
## End(Not run)
Apply running function
Description
Applies custom function on running windows.
Usage
runner(
x,
f = function(x) x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_pad = FALSE,
simplify = TRUE,
cl = NULL,
...
)
## Default S3 method:
runner(
x,
f = function(x) x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_pad = FALSE,
simplify = TRUE,
cl = NULL,
...
)
## S3 method for class 'data.frame'
runner(
x,
f = function(x) x,
k = attr(x, "k"),
lag = if (!is.null(attr(x, "lag"))) attr(x, "lag") else integer(1),
idx = attr(x, "idx"),
at = attr(x, "at"),
na_pad = if (!is.null(attr(x, "na_pad"))) attr(x, "na_pad") else FALSE,
simplify = TRUE,
cl = NULL,
...
)
## S3 method for class 'grouped_df'
runner(
x,
f = function(x) x,
k = attr(x, "k"),
lag = if (!is.null(attr(x, "lag"))) attr(x, "lag") else integer(1),
idx = attr(x, "idx"),
at = attr(x, "at"),
na_pad = if (!is.null(attr(x, "na_pad"))) attr(x, "na_pad") else FALSE,
simplify = TRUE,
cl = NULL,
...
)
## S3 method for class 'matrix'
runner(
x,
f = function(x) x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_pad = FALSE,
simplify = TRUE,
cl = NULL,
...
)
## S3 method for class 'xts'
runner(
x,
f = function(x) x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_pad = FALSE,
simplify = TRUE,
cl = NULL,
...
)
Arguments
x |
( |
f |
( |
k |
( |
lag |
( |
idx |
( |
at |
( |
na_pad |
( |
simplify |
( |
cl |
( |
... |
(optional) |
Details
Function can apply any R function on running windows defined by x
,
k
, lag
, idx
and at
. Running window can be calculated
on several ways:
-
Cumulative windows
applied when user doesn't specifyk
argument or specifyk = length(x)
, this would mean thatk
is equal to number of available elements
-
Constant sliding windows applied when user specify
k
as constant value keepingidx
andat
unspecified.lag
argument shifts windows left (lag > 0
) or right (lag < 0
).
-
Windows depending on date
If one specifiesidx
this would mean that output windows size might change in size because of unequally spaced indexes. Fox example 5-period window is different than 5-element window, because 5-period window might contain any number of observation (7-day mean is not the same as 7-element mean)
-
Window at specific indices
runner
by default returns vector of the same size asx
unless one specifiesat
argument. Each element ofat
is an index on which runner calculates function - which means that output of the runner is now of length equal toat
. Note that one can change index ofx
by specifyingidx
. Illustration below shows output ofrunner
forat = c(18, 27, 45, 31)
which gives windows in ranges enclosed in square brackets. Range forat = 27
is[22, 26]
which is not available in current indices.
Specifying time-intervals
at
can also be specified as interval of the output defined by
at = "<increment>"
which results in indices sequence defined by
seq.POSIXt(min(idx), max(idx), by = "<increment>")
. Increment of sequence
is the same as in base::seq.POSIXt()
function.
It's worth noting that increment interval can't be more frequent than
interval of idx
- for Date
the most frequent time-unit is a "day"
,
for POSIXt
a sec
.
k
and lag
can also be specified as using time sequence increment.
Available time units are
"sec", "min", "hour", "day", "DSTday", "week", "month", "quarter" or "year"
.
To increment by number of units one can also specify <number> <unit>s
for example lag = "-2 days"
, k = "5 weeks"
.
Setting k
and lag
as a sequence increment can be also a vector can be a
vector which allows to stretch and lag/lead each window freely on in time
(on indices).
Parallel computing
Beware that executing R call in parallel not always
have the edge over single-thread even if the
cl <- registerCluster(detectCores())
was specified before.
Parallel windows are executed in the independent environment, which means
that objects other than function arguments needs to be copied to the
parallel environment using parallel::clusterExport()
. For
example using f = function(x) x + y + z
will result in error as
clusterExport(cl, varlist = c("y", "z"))
needs to be called before.
Value
vector with aggregated values for each window. Length of output is
the same as length(x)
or length(at)
if specified. Type of the output
depends on the output from a function f
.
Examples
# runner returns windows as is by default
runner(1:10)
# mean on k = 3 elements windows
runner(1:10, f = mean, k = 3)
# mean on k = 3 elements windows with different specification
runner(1:10, k = 3, f = function(x) mean(x, na.rm = TRUE))
# concatenate two columns
runner(
data.frame(
a = letters[1:10],
b = 1:10
),
f = function(x) paste(paste0(x$a, x$b), collapse = "+")
)
# concatenate two columns with additional argument
runner(
data.frame(
a = letters[1:10],
b = 1:10
),
f = function(x, xxx) {
paste(paste0(x$a, xxx, x$b), collapse = " + ")
},
xxx = "..."
)
# number of unique values in each window (varying window size)
runner(letters[1:10],
k = c(1, 2, 2, 4, 5, 5, 5, 5, 5, 5),
f = function(x) length(unique(x))
)
# concatenate only on selected windows index
runner(letters[1:10],
f = function(x) paste(x, collapse = "-"),
at = c(1, 5, 8)
)
# 5 days mean
idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48)
runner::runner(
x = idx,
k = "5 days",
lag = 1,
idx = Sys.Date() + idx,
f = function(x) mean(x)
)
# 5 days mean at 4-indices
runner::runner(
x = 1:15,
k = 5,
lag = 1,
idx = idx,
at = c(18, 27, 48, 31),
f = mean
)
# runner with data.frame
df <- data.frame(
a = 1:13,
b = 1:13 + rnorm(13, sd = 5),
idx = seq(as.Date("2022-02-22"), as.Date("2023-02-22"), by = "1 month")
)
runner(
x = df,
idx = "idx",
at = "6 months",
f = function(x) {
cor(x$a, x$b)
}
)
# parallel computing
library(parallel)
data <- data.frame(
a = runif(100),
b = runif(100),
idx = cumsum(sample(rpois(100, 5)))
)
const <- 0
cl <- makeCluster(1)
clusterExport(cl, "const", envir = environment())
runner(
x = data,
k = 10,
f = function(x) {
cor(x$a, x$b) + const
},
idx = "idx",
cl = cl
)
stopCluster(cl)
# runner with matrix
data <- matrix(data = runif(100, 0, 1), nrow = 20, ncol = 5)
runner(
x = data,
f = function(x) {
tryCatch(
cor(x),
error = function(e) NA
)
}
)
Running streak length
Description
Calculates running series of consecutive elements
Usage
streak_run(
x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_rm = TRUE,
na_pad = FALSE
)
Arguments
x |
any type vector which running function is calculated on |
k |
( |
lag |
( |
idx |
( |
at |
( |
na_rm |
|
na_pad |
( |
Value
streak numeric vector of length equals length of x
containing
number of consecutive occurrences.
Examples
set.seed(11)
x1 <- sample(c("a", "b"), 15, replace = TRUE)
x2 <- sample(c(NA_character_, "a", "b"), 15, replace = TRUE)
k <- sample(1:4, 15, replace = TRUE)
streak_run(x1) # simple streak run
streak_run(x1, k = 2) # streak run within 2-element window
streak_run(x2, na_pad = TRUE, k = 3) # streak run within k=3 with padding NA
streak_run(x1, k = k) # streak run within varying window size specified by vector k
Running sum
Description
Running sum in specified window of numeric vector.
Usage
sum_run(
x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_rm = TRUE,
na_pad = FALSE
)
Arguments
x |
|
k |
( |
lag |
( |
idx |
( |
at |
( |
na_rm |
|
na_pad |
( |
Value
sum numeric
vector of length equals length of x
.
Examples
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA, 5), rnorm(15)), 15, replace = TRUE)
k <- sample(1:15, 15, replace = TRUE)
sum_run(x1)
sum_run(x2, na_rm = TRUE)
sum_run(x2, na_rm = FALSE)
sum_run(x2, na_rm = TRUE, k = 4)
Running which
Description
min_run
calculates running which - returns index of element where x == TRUE
.
Usage
which_run(
x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
which = "last",
na_rm = TRUE,
na_pad = FALSE
)
Arguments
x |
( |
k |
( |
lag |
( |
idx |
( |
at |
( |
which |
|
na_rm |
|
na_pad |
( |
Value
integer vector of indexes of the same length as x
.
Examples
set.seed(11)
x1 <- sample(c(1, 2, 3), 15, replace = TRUE)
x2 <- sample(c(NA, 1, 2, 3), 15, replace = TRUE)
k <- sample(1:4, 15, replace = TRUE)
which_run(x1)
which_run(x2, na_rm = TRUE)
which_run(x2, na_rm = TRUE, k = 4)
which_run(x2, na_rm = FALSE, k = k)
List of running windows
Description
Creates list
of windows with given arguments settings.
Length of output list
is equal
Usage
window_run(
x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_pad = FALSE
)
Arguments
x |
( |
k |
( |
lag |
( |
idx |
( |
at |
( |
na_pad |
( |
Value
list of vectors (windows). Length of list is the same as
length(x)
or length(at)
if specified, and length of each
window is defined by k
(unless window is out of range).
Examples
window_run(1:10, k = 3, lag = -1)
window_run(letters[1:10], k = c(1, 2, 2, 4, 5, 5, 5, 5, 5, 5))