Help for package synthesis

Title:

Generate Synthetic Data from Statistical Models

Version:

1.2.5

Author:

Ze Jiang

[aut, cre]

Maintainer:

Ze Jiang <ze.jiang@unsw.edu.au>

Description:

Generate synthetic time series from commonly used statistical models, including linear, nonlinear and chaotic systems. Applications to testing methods can be found in Jiang, Z., Sharma, A., & Johnson, F. (2019) <doi:10.1016/j.advwatres.2019.103430> and Jiang, Z., Sharma, A., & Johnson, F. (2020) <doi:10.1029/2019WR026962> associated with an open-source tool by Jiang, Z., Rashid, M. M., Johnson, F., & Sharma, A. (2020) <doi:10.1016/j.envsoft.2020.104907>.

Depends:

R (≥ 3.5.0)

License:

GPL (≥ 3)

Encoding:

UTF-8

URL:

https://github.com/zejiang-unsw/synthesis#readme

BugReports:

https://github.com/zejiang-unsw/synthesis/issues

Imports:

stats, MASS, graphics

Suggests:

zoo, knitr, rmarkdown, testthat, devtools

RoxygenNote:

7.2.3

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2024-06-28 22:28:48 UTC; ze

Repository:

CRAN

Date/Publication:

2024-07-07 15:10:02 UTC

Generate build-up and wash-off model for water quality modeling

Description

Generate build-up and wash-off model for water quality modeling

Usage

data.gen.BUWO(nobs, k = 0.5, a = 1, m0 = 10, q = 0)

Arguments

nobs

The data length to be generated.

k

build-up coefficient (kg*t-1)

a

wash-off rate constant (m-3)

m0

threshold at which additional mass does not accumulate on the surface (kg)

q

runoff (m3*t-1)

Value

A list of 2 elements: a vector of build-up mass (x), and a vector of wash-off mass (y) per unit time.

References

Wu, X., Marshall, L., & Sharma, A. (2019). The influence of data transformations in simulating Total Suspended Solids using Bayesian inference. Environmental modelling & software, 121, 104493. doi:https://doi.org/10.1016/j.envsoft.2019.104493

Shaw, S. B., Stedinger, J. R., & Walter, M. T. (2010). Evaluating Urban Pollutant Buildup/Wash-Off Models Using a Madison, Wisconsin Catchment. Journal of Environmental Engineering, 136(2), 194-203. https://doi.org/10.1061/(ASCE)EE.1943-7870.0000142

Examples

# Build up model
set.seed(101)
sample = 500
#create a gamma shape storm event
q<- seq(0,20, length.out=sample)
p <- pgamma(q, shape=9, rate =2, lower.tail = TRUE)
p <- c(p[1],p[2:sample]-p[1:(sample-1)])

data.tss<-data.gen.BUWO(sample, k=0.5, a=5, m0=10, q=p)
plot.ts(cbind(p, data.tss$x, data.tss$y), ylab=c("Q","Bulid-up","Wash-off"))

Duffing map

Description

Generates a 2-dimensional time series using the Duffing map.

Usage

data.gen.Duffing(
  nobs = 5000,
  a = 2.75,
  b = 0.2,
  start = runif(n = 2, min = -0.5, max = 0.5),
  s,
  do.plot = TRUE
)

Arguments

nobs

Length of the generated time series. Default: 5000 samples.

a

The a parameter. Default: 2.75.

b

The b parameter. Default: 0.2.

start

A 2-dimensional vector indicating the starting values for the x and y Duffing coordinates. Default: If the starting point is not specified, it is generated randomly.

s

The level of noise, default 0.

do.plot

Logical value. If TRUE (default value), a plot of the generated Duffing system is shown.

Details

The Duffing map is defined as follows:

x_n = y_{n - 1}

y_n = -b \cdot x_{n - 1} + a \cdot y_{n - 1} - y_{n - 1}^3

The default selection for both a and b parameters (a=1.4 and b=0.3) is known to produce a deterministic chaotic time series.

Value

A list with two vectors named x and y containing the x-components and the y-components of the Duffing map, respectively.

Note

Some initial values may lead to an unstable system that will tend to infinity.

References

Constantino A. Garcia (2019). nonlinearTseries: Nonlinear Time Series Analysis. R package version 0.2.7. https://CRAN.R-project.org/package=nonlinearTseries

Examples

Duffing.map=data.gen.Duffing(nobs = 1000, do.plot=TRUE)

Generate predictor and response data: Hysteresis Loop

Description

Generate predictor and response data: Hysteresis Loop

Usage

data.gen.HL(
  nobs = 512,
  a = 0.8,
  b = 0.6,
  c = 0.2,
  m = 3,
  n = 5,
  fp = 25,
  fd,
  sd.x = 0.1,
  sd.y = 0.1
)

Arguments

nobs

The data length to be generated.

a

The a parameter. Default: 0.8.

b

The b parameter. Default: 0.6.

c

The c parameter. Default: 0.2.

m

Positive integer for the split line parameter. If m=1, split line is linear; If m is even, split line has a u shape; If m is odd and higher than 1, split line has a chair or classical shape.

n

Positive odd integer for the bulging parameter, indicates degree of outward curving (1=highest level of bulging).

fp

The frequency in the generated response. fp = 25 used in the WRR paper.

fd

A vector of frequencies for potential predictors. fd = c(3,5,10,15,25,30,55,70,95) used in the WRR paper.

sd.x

The noise level in the predictor.

sd.y

The noise level in the response.

Details

The Hysteresis is a common nonlinear phenomenon in natural systems and it can be numerical simulated by the following formulas:

x_{t} = a*cos(2pi*f*t)

y_{t} = b*cos(2pi*f*t)^m - c*sin(2pi*f*t)^n

The default selection for the system parameters (a = 0.8, b = 0.6, c = 0.2, m = 3, n = 5) is known to generate a classical hysteresis loop.

Value

A list of 3 elements: a vector of response (x), a matrix of potential predictors (dp) with each column containing one potential predictor, and a vector of true predictor numbers.

References

LAPSHIN, R. V. 1995. Analytical model for the approximation of hysteresis loop and its application to the scanning tunneling microscope. Review of Scientific Instruments, 66, 4718-4730.

Examples

###synthetic example - Hysteresis loop
#frequency, sampled from a given range
fd <- c(3,5,10,15,25,30,55,70,95)

data.HL <- data.gen.HL(m=3,n=5,nobs=512,fp=25,fd=fd)
plot.ts(cbind(data.HL$x,data.HL$dp))

Henon map

Description

Generates a 2-dimensional time series using the Henon map.

Usage

data.gen.Henon(
  nobs = 5000,
  a = 1.4,
  b = 0.3,
  start = runif(n = 2, min = -0.5, max = 0.5),
  s,
  do.plot = TRUE
)

Arguments

nobs

Length of the generated time series. Default: 5000 samples.

a

The a parameter. Default: 1.4.

b

The b parameter. Default: 0.3.

start

A 2-dimensional vector indicating the starting values for the x and y Henon coordinates. Default: If the starting point is not specified, it is generated randomly.

s

The level of noise, default 0.

do.plot

Logical value. If TRUE (default value), a plot of the generated Henon system is shown.

Details

The Henon map is defined as follows:

x_n = 1 - a \cdot x_{n - 1}^2 + y_{n - 1}

y_n = b \cdot x_{n - 1}

The default selection for both a and b parameters (a=1.4 and b=0.3) is known to produce a deterministic chaotic time series.

Value

A list with two vectors named x and y containing the x-components and the y-components of the Henon map, respectively.

Note

Some initial values may lead to an unstable system that will tend to infinity.

References

Constantino A. Garcia (2019). nonlinearTseries: Nonlinear Time Series Analysis. R package version 0.2.7. https://CRAN.R-project.org/package=nonlinearTseries

Examples

Henon.map=data.gen.Henon(nobs = 1000, do.plot=TRUE)

Linear Gaussian state-space model

Description

Generates data from a specific linear Gaussian state space model of the form x_{t} = \phi x_{t-1} + \sigma_v v_t and y_t = x_t + \sigma_e e_t, where v_t and e_t denote independent standard Gaussian random variables, i.e. N(0,1).

Usage

data.gen.LGSS(
  theta,
  nobs,
  start = runif(n = 1, min = -1, max = 1),
  do.plot = TRUE
)

Arguments

theta

The parameters \theta=\{\phi,\sigma_v,\sigma_e\} of the LGSS model.

nobs

The data length to be generated.

start

A numeric value indicating the starting value for the time series. If the starting point is not specified, it is generated randomly.

do.plot

Logical value. If TRUE (default value), a plot of the generated LGSS system is shown.

Value

A list of two variables, state and response.

References

#Dahlin, J. & Schon, T. B. 'Getting Started with Particle Metropolis-Hastings for Inference in Nonlinear Dynamical Models.' Journal of Statistical Software, Code Snippets, 88(2): 1–41, 2019.

Examples

data.LGSS <- data.gen.LGSS(theta=c(0.75,1.00,0.10), nobs=500, start=0)

Logistic map

Description

Generates a time series using the logistic map.

Usage

data.gen.Logistic(
  nobs = 5000,
  r = 4,
  start = runif(n = 1, min = 0, max = 1),
  s,
  do.plot = TRUE
)

Arguments

nobs

Length of the generated time series. Default: 5000 samples.

r

The r parameter. Default: 4

start

A numeric value indicating the starting value for the time series. If the starting point is not specified, it is generated randomly.

s

The level of noise, default 0.

do.plot

Logical value. If TRUE (default value), a plot of the generated Logistic system is shown.

Details

The logistic map is defined as follows:

x_n = r \cdot x_{n-1} \cdot (1 - x_{n-1})

Value

A vector of time series.

References

Constantino A. Garcia (2019). nonlinearTseries: Nonlinear Time Series Analysis. R package version 0.2.7. https://CRAN.R-project.org/package=nonlinearTseries

Examples

Logistic.map=data.gen.Logistic(nobs = 1000, do.plot=TRUE)

Lorenz system

Description

Generates a 3-dimensional time series using the Lorenz equations.

Usage

data.gen.Lorenz(
  sigma = 10,
  beta = 8/3,
  rho = 28,
  start = c(-13, -14, 47),
  time = seq(0, 50, length.out = 1000),
  s
)

Arguments

sigma

The \sigma parameter. Default: 10.

beta

The \beta parameter. Default: 8/3.

rho

The \rho parameter. Default: 28.

start

A 3-dimensional numeric vector indicating the starting point for the time series. Default: c(-13, -14, 47).

time

The temporal interval at which the system will be generated. Default: time=seq(0,50,by = 0.01).

s

The level of noise, default 0.

Details

The Lorenz system is a system of ordinary differential equations defined as:

\dot{x} = \sigma(y-x)

\dot{y} = \rho x-y-xz

\dot{z} = -\beta z + xy

The default selection for the system parameters (\sigma=10, \rho=28, \beta=8/3) is known to produce a deterministic chaotic time series.

Value

A list with four vectors named time, x, y and z containing the time, the x-components, the y-components and the z-components of the Lorenz system, respectively.

Note

Some initial values may lead to an unstable system that will tend to infinity.

References

Constantino A. Garcia (2019). nonlinearTseries: Nonlinear Time Series Analysis. R package version 0.2.7. https://CRAN.R-project.org/package=nonlinearTseries

Examples

###Synthetic example - Lorenz
ts.l <- data.gen.Lorenz(sigma = 10, beta = 8/3, rho = 28, start = c(-13, -14, 47),
                        time = seq(0, by=0.05, length.out = 2000))

ts.plot(cbind(ts.l$x,ts.l$y,ts.l$z), col=c('black','red','blue'))

Rössler system

Description

Generates a 3-dimensional time series using the Rossler equations.

Usage

data.gen.Rossler(
  a = 0.2,
  b = 0.2,
  w = 5.7,
  start = c(-2, -10, 0.2),
  time = seq(0, by = 0.05, length.out = 1000),
  s
)

Arguments

a

The a parameter. Default: 0.2.

b

The b parameter. Default: 0.2.

w

The w parameter. Default: 5.7.

start

A 3-dimensional numeric vector indicating the starting point for the time series. Default: c(-2, -10, 0.2).

time

The temporal interval at which the system will be generated. Default: time=seq(0,50,by=0.01) or time = seq(0,by=0.01,length.out = 1000)

s

The level of noise, default 0.

Details

The Rössler system is a system of ordinary differential equations defined as:

\dot{x} = -(y + z)

\dot{y} = x+a \cdot y

\dot{z} = b + z*(x-w)

The default selection for the system parameters (a = 0.2, b = 0.2, w = 5.7) is known to produce a deterministic chaotic time series. However, the values a = 0.1, b = 0.1, and c = 14 are more commonly used. These Rössler equations are simpler than those Lorenz used since only one nonlinear term appears (the product xz in the third equation).

Here, a = b = 0.1 and c changes. The bifurcation diagram reveals that low values of c are periodic, but quickly become chaotic as c increases. This pattern repeats itself as c increases — there are sections of periodicity interspersed with periods of chaos, and the trend is towards higher-period orbits as c increases. For example, the period one orbit only appears for values of c around 4 and is never found again in the bifurcation diagram. The same phenomenon is seen with period three; until c = 12, period three orbits can be found, but thereafter, they do not appear.

Value

A list with four vectors named time, x, y and z containing the time, the x-components, the y-components and the z-components of the Rössler system, respectively.

Note

Some initial values may lead to an unstable system that will tend to infinity.

References

Rössler, O. E. 1976. An equation for continuous chaos. Physics Letters A, 57, 397-398.

Constantino A. Garcia (2019). nonlinearTseries: Nonlinear Time Series Analysis. R package version 0.2.7. https://CRAN.R-project.org/package=nonlinearTseries

wikipedia https://en.wikipedia.org/wiki/R

Examples

###synthetic example - Rössler

ts.r <- data.gen.Rossler(a = 0.1, b = 0.1, w = 8.7, start = c(-2, -10, 0.2),
                         time = seq(0, by=0.05, length.out = 10000))

oldpar <- par(no.readonly = TRUE)
par(mfrow=c(1,1), ps=12, cex.lab=1.5)
plot.ts(cbind(ts.r$x,ts.r$y,ts.r$z), col=c('black','red','blue'))

par(mfrow=c(1,2), ps=12, cex.lab=1.5)
plot(ts.r$x,ts.r$y, xlab='x',ylab = 'y', type = 'l')
plot(ts.r$x,ts.r$z, xlab='x',ylab = 'z', type = 'l')
par(oldpar)

Generate predictor and response data: Sinusoidal model

Description

Generate predictor and response data: Sinusoidal model

Usage

data.gen.SW(nobs = 500, freq = 50, A = 2, phi = pi, mu = 0, sd = 1)

Arguments

nobs

The data length to be generated.

freq

The frequencies in the generated response. Default freq=50.

A

The amplitude of the sinusoidal series

phi

The phase of the sinusoidal series

mu

The mean of Gaussian noise in the variable.

sd

The standard deviation of Gaussian noise in the variable.

Value

A list of time and x.

References

Shumway, R. H., & Stoffer, D. S. (2011). Characteristics of Time Series. In D. S. Stoffer (Ed.), Time series analysis and its applications (pp. 8-14). New York : Springer.

Examples

### Sinusoidal model
delta <- 1/12 # sampling rate, assuming monthly
period.max<- 2^5

N = 6*period.max/delta
scales<- 2^(0:5)[c(2,6)] #pick two scales
scales

### scale, period, and frequency
# freq=1/T; T=s/delta so freq = delta/s

tmp <- NULL
for(s in scales){
  tmp <- cbind(tmp, data.gen.SW(nobs=N, freq = delta/s, A = 1, phi = 0, mu=0, sd = 0)$x)
}
x <- rowSums(data.frame(tmp))
plot.ts(cbind(tmp,x), type = 'l', main=NA)

Generate an affine error model.

Description

Generate an affine error model.

Usage

data.gen.affine(nobs, a = 0, b = 1, ndim = 3, mu = 0, sd = 1)

Arguments

nobs

The data length to be generated.

a

intercept

b

slope

ndim

The number of potential predictors (default is 9).

mu

mean of error term

sd

standard deviation of error term

Value

A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.

References

McColl, K. A., Vogelzang, J., Konings, A. G., Entekhabi, D., Piles, M., & Stoffelen, A. (2014). Extended triple collocation: Estimating errors and correlation coefficients with respect to an unknown target. Geophysical Research Letters, 41(17), 6229-6236. doi:10.1002/2014gl061322

Examples

# Affine error model from paper with 3 dummy variables
data.affine<-data.gen.affine(500)
plot.ts(cbind(data.affine$x,data.affine$dp))

Generate predictor and response data from AR1 model.

Description

Generate predictor and response data from AR1 model.

Usage

data.gen.ar1(nobs, ndim = 9)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

Value

A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.

Examples

# AR1 model from paper with 9 dummy variables
data.ar1<-data.gen.ar1(500)
plot.ts(cbind(data.ar1$x,data.ar1$dp))

Generate predictor and response data from AR4 model.

Description

Generate predictor and response data from AR4 model.

Usage

data.gen.ar4(nobs, ndim = 9)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

Value

A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.

Examples

# AR4 model from paper with total 9 dimensions
data.ar4<-data.gen.ar4(500)
plot.ts(cbind(data.ar4$x,data.ar4$dp))

Generate predictor and response data from AR9 model.

Description

Generate predictor and response data from AR9 model.

Usage

data.gen.ar9(nobs, ndim = 9)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

Value

A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.

Examples

# AR9 model from paper with total 9 dimensions
data.ar9<-data.gen.ar9(500)
plot.ts(cbind(data.ar9$x,data.ar9$dp))

Gaussian Blobs

Description

Gaussian Blobs

Usage

data.gen.blobs(
  nobs = 100,
  features = 2,
  centers = 3,
  sd = 1,
  bbox = c(-10, 10),
  do.plot = TRUE
)

Arguments

nobs

The data length to be generated.

features

Features of dataset.

centers

Either the number of centers, or a matrix of the chosen centers.

sd

The level of Gaussian noise, default 1.

bbox

The bounding box of the dataset.

do.plot

Logical value. If TRUE (default value), a plot of the generated Blobs is shown.

Details

This function generates a matrix of features creating multiclass datasets by allocating each class one or more normally-distributed clusters of points. It can control both centers and standard deviations of each cluster. For example, we want to generate a dataset of weight and height (two features) of 500 people (data length), including three groups, baby, children, and adult. Centers are the average weight and height for each group, assuming both weight and height are normally distributed (i.e. follow Gaussian distribution). The standard deviation (sd) is the sd of the Gaussian distribution while the bounding box (bbox) is the range for each generated cluster center when only the number of centers is given.

Value

A list of two variables, x and classes.

References

Amos Elberg (2018). clusteringdatasets: Datasets useful for testing clustering algorithms. R package version 0.1.1. https://github.com/elbamos/clusteringdatasets

Examples

Blobs=data.gen.blobs(nobs=1000, features=2, centers=3, sd=1, bbox=c(-10,10), do.plot=TRUE)

Generate a time series of Brownian motion.

Description

This function generates a time series of one dimension Brownian motion.

Usage

data.gen.bm(
  x0 = 0,
  w0 = 0,
  time = seq(0, by = 0.01, length.out = 101),
  do.plot = TRUE
)

Arguments

x0

the start value of x, with the default value 0

w0

the start value of w, with the default value 0

time

the temporal interval at which the system will be generated. Default seq(0,by=0.01,len=101).

do.plot

a logical value. If TRUE (default value), a plot of the generated system is shown.

Value

A ts object.

References

Yanping Chen, http://cos.name/wp-content/uploads/2008/12/stochastic-differential-equation-with-r.pdf

Examples

set.seed(123)
x <- data.gen.bm()

Circles

Description

Circles

Usage

data.gen.circles(
  n,
  r_vec = c(1, 2),
  start = runif(1, -1, 1),
  s,
  do.plot = TRUE
)

Arguments

n

The data length to be generated.

r_vec

The radius of circles.

start

The center of circles.

s

The level of Gaussian noise, default 0.

do.plot

Logical value. If TRUE (default value), a plot of the generated Circles is shown.

Value

A list of two variables, x and classes.

Examples

Circles=data.gen.circles(n = 1000, r_vec=c(1,2), start=runif(1,-1,1), s=0.1, do.plot=TRUE)

Generate a time series of fractional Brownian motion.

Description

This function generates a a time series of one dimension fractional Brownian motion.

Usage

data.gen.fbm(
  hurst = 0.95,
  time = seq(0, by = 0.01, length.out = 1000),
  do.plot = TRUE
)

Arguments

hurst

the hurst index, with the default value 0.95, ranging from [0,1].

time

the temporal interval at which the system will be generated. Default seq(0,by=0.01,len=1000).

do.plot

a logical value. If TRUE (default value), a plot of the generated system is shown.

Value

A ts object.

References

Zdravko Botev (2020). Fractional Brownian motion generator (https://www.mathworks.com/matlabcentral/fileexchange/38935-fractional-brownian-motion-generator), MATLAB Central File Exchange. Retrieved August 17, 2020.

Kroese, D. P., & Botev, Z. I. (2015). Spatial Process Simulation. In Stochastic Geometry, Spatial Statistics and Random Fields(pp. 369-404) Springer International Publishing, DOI: 10.1007/978-3-319-10064-7_12

Examples

set.seed(123)
x <- data.gen.fbm()

Friedman with independent uniform variates

Description

Friedman with independent uniform variates

Usage

data.gen.fm1(nobs, ndim = 9, noise = 1)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

noise

The noise level in the time series.

Value

A list of 3 elements: a vector of response (x), a matrix of potential predictors (dp) with each column containing one potential predictor, and a vector of true predictor numbers.

Examples

###synthetic example - Friedman
#Friedman with independent uniform variates
data.fm1 <- data.gen.fm1(nobs=1000, ndim = 9, noise = 0)

#Friedman with correlated uniform variates
data.fm2 <- data.gen.fm2(nobs=1000, ndim = 9, r = 0.6, noise = 0)

plot.ts(cbind(data.fm1$x,data.fm2$x), col=c('red','blue'), main=NA, xlab=NA,
        ylab=c('Friedman with \n independent uniform variates',
        'Friedman with \n correlated uniform variates'))

Friedman with correlated uniform variates

Description

Friedman with correlated uniform variates

Usage

data.gen.fm2(nobs, ndim = 9, r = 0.6, noise = 0)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

r

Target Spearman correlation.

noise

The noise level in the time series.

Value

A list of 3 elements: a vector of response (x), a matrix of potential predictors (dp) with each column containing one potential predictor, and a vector of true predictor numbers.

Examples

###synthetic example - Friedman
#Friedman with independent uniform variates
data.fm1 <- data.gen.fm1(nobs=1000, ndim = 9, noise = 0)

#Friedman with correlated uniform variates
data.fm2 <- data.gen.fm2(nobs=1000, ndim = 9, r = 0.6, noise = 0)

plot.ts(cbind(data.fm1$x,data.fm2$x), col=c('red','blue'), main=NA, xlab=NA,
        ylab=c('Friedman with \n independent uniform variates',
        'Friedman with \n correlated uniform variates'))

Generate a time series of geometric Brownian motion.

Description

This function generates a a time series of one dimension geometric Brownian motion.

Usage

data.gen.gbm(
  x0 = 10,
  w0 = 0,
  mu = 1,
  sigma = 0.5,
  time = seq(0, by = 0.01, length.out = 101),
  do.plot = TRUE
)

Arguments

x0

the start value of x, with the default value 10

w0

the start value of w, with the default value 0

mu

the interest/drifting rate, with the default value 1.

sigma

the diffusion coefficient, with the default value 0.5.

time

the temporal interval at which the system will be generated. Default seq(0,by=0.01,len=101).

do.plot

a logical value. If TRUE (default value), a plot of the generated system is shown.

Value

A ts object.

References

Yanping Chen, http://cos.name/wp-content/uploads/2008/12/stochastic-differential-equation-with-r.pdf

Examples

set.seed(123)
x <- data.gen.gbm()

Nonlinear system with independent/correlate covariates

Description

Nonlinear system with independent/correlate covariates

Usage

data.gen.nl1(nobs, ndim = 15, r = 0.6, noise = 1)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

r

Target Spearman correlation among covariates.

noise

The noise level in the time series.

Value

A list of 3 elements: a vector of response (x), a matrix of potential predictors (dp) with each column containing one potential predictor, and a vector of true predictor numbers.

Examples

###synthetic example - Friedman
#Friedman with independent uniform variates
data.nl1 <- data.gen.nl1(nobs=1000)

#Friedman with correlated uniform variates
data.nl2 <- data.gen.nl2(nobs=1000)

plot.ts(cbind(data.nl1$x,data.nl2$x), col=c('red','blue'), main=NA, xlab=NA,
        ylab=c('Nonlinear system with \n independent uniform variates',
        'Nonlinear system with \n correlated uniform variates'))

Nonlinear system with Exogenous covariates

Description

Nonlinear system with Exogenous covariates

Usage

data.gen.nl2(nobs, ndim = 7, noise = 1)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

noise

The noise level in the time series.

Value

A list of 3 elements: a vector of response (x), a matrix of potential predictors (dp) with each column containing one potential predictor, and a vector of true predictor numbers.

References

Sharma, A., & Mehrotra, R. (2014). An information theoretic alternative to model a natural system using observational information alone. Water Resources Research, 50(1), 650-660.

Examples

###synthetic example - Friedman
#Friedman with independent uniform variates
data.nl1 <- data.gen.nl1(nobs=1000)

#Friedman with correlated uniform variates
data.nl2 <- data.gen.nl2(nobs=1000)

plot.ts(cbind(data.nl1$x,data.nl2$x), col=c('red','blue'), main=NA, xlab=NA,
        ylab=c('Nonlinear system with \n independent uniform variates',
        'Nonlinear system with \n correlated uniform variates'))

Generate correlated normal variates

Description

Generate correlated normal variates

Usage

data.gen.norm(n, mu = rep(0, 2), sd = rep(1, 2), r = 0.6, sigma)

Arguments

n

The data length to be generated.

mu

A vector giving the means of the variables.

sd

A vector giving the standard deviation of the variables.

r

The target Pearson correlation, default is 0.6.

sigma

A positive-definite symmetric matrix specifying the covariance matrix of the variables.

Value

A matrix of correlated normal variates

Generate Random walk time series.

Description

Generate Random walk time series.

Usage

data.gen.rw(nobs, drift = 0.2, sd = 1)

Arguments

nobs

the data length to be generated

drift

drift

sd

the white noise in the data

Value

A list of 2 elements: random walk and random walk with drift

References

Shumway, R. H. and D. S. Stoffer (2011). Time series regression and exploratory data analysis. Time series analysis and its applications, Springer: 47-82.

Examples

set.seed(154)
data.rw <- data.gen.rw(200)
plot.ts(data.rw$xd, ylim=c(-5,55), main='random walk', ylab='')
lines(data.rw$x, col=4); abline(h=0, col=4, lty=2); abline(a=0, b=.2, lty=2)

Spirals

Description

Spirals

Usage

data.gen.spirals(n, cycles = 1, s = 0, do.plot = TRUE)

Arguments

n

The data length to be generated.

cycles

The number of cycles of spirals.

s

The level of Gaussian noise, default 0.

do.plot

Logical value. If TRUE (default value), a plot of the generated Spirals is shown.

Value

A list of two variables, x and classes.

References

Friedrich Leisch & Evgenia Dimitriadou (2010). mlbench: Machine Learning Benchmark Problems. R package version 2.1-1.

Examples

Spirals=data.gen.spirals(n = 2000, cycles=2, s=0.01, do.plot=TRUE)

Generate a two-regime threshold autoregressive (TAR) process.

Description

Generate a two-regime threshold autoregressive (TAR) process.

Usage

data.gen.tar(
  nobs,
  ndim = 9,
  phi1 = c(0.6, -0.1),
  phi2 = c(-1.1, 0),
  theta = 0,
  d = 2,
  p = 2,
  noise = 0.1
)

Arguments

nobs

the data length to be generated

ndim

The number of potential predictors (default is 9)

phi1

the coefficient vector of the lower-regime model

phi2

the coefficient vector of the upper-regime model

theta

threshold

d

delay

p

maximum autoregressive order

noise

the white noise in the data

Details

The two-regime Threshold Autoregressive (TAR) model is given by the following formula:

Y_t = \phi_{1,0}+\phi_{1,1} Y_{t-1} +\ldots+ \phi_{1,p} Y_{t-p}+\sigma_1 e_t, \mbox{ if } Y_{t-d}\le r

Y_t = \phi_{2,0}+\phi_{2,1} Y_{t-1} +\ldots+ \phi_{2,p} Y_{t-p}+\sigma_2 e_t, \mbox{ if } Y_{t-d} > r.

where r is the threshold and d the delay.

Value

A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.

References

Cryer, J. D. and K.-S. Chan (2008). Time Series Analysis With Applications in R Second Edition Springer Science+ Business Media, LLC.

Examples

# TAR2 model from paper with total 9 dimensions
data.tar<-data.gen.tar(500)
plot.ts(cbind(data.tar$x,data.tar$dp))

Generate predictor and response data from TAR1 model.

Description

Generate predictor and response data from TAR1 model.

Usage

data.gen.tar1(nobs, ndim = 9, noise = 0.1)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

noise

The white noise in the data

Value

A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.

References

Sharma, A. (2000). Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1 - A strategy for system predictor identification. Journal of Hydrology, 239(1-4), 232-239.

Examples

# TAR1 model from paper with total 9 dimensions
data.tar1<-data.gen.tar1(500)
plot.ts(cbind(data.tar1$x,data.tar1$dp))

Generate predictor and response data from TAR2 model.

Description

Generate predictor and response data from TAR2 model.

Usage

data.gen.tar2(nobs, ndim = 9, noise = 0.1)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

noise

The white noise in the data

Value

A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.

References

Examples

# TAR2 model from paper with total 9 dimensions
data.tar2<-data.gen.tar2(500)
plot.ts(cbind(data.tar2$x,data.tar2$dp))

Generate correlated uniform variates

Description

Generate correlated uniform variates

Usage

data.gen.unif(n, ndim = 9, r = 0.6, sigma, method = c("pearson", "spearman"))

Arguments

n

The data length to be generated.

ndim

The number of potential predictors (default is 9).

r

The target correlation, default is 0.6.

sigma

A symmetric matrix of Pearson correlation, should be same as ndim.

method

The target correlation type, inluding Pearson and Spearman correlation.

Value

A matrix of correlated uniform variates

References

Schumann, E. (2009). Generating correlated uniform variates. COMISEF. http://comisef. wikidot. com/tutorial: correlateduniformvariates.