Encoding: | UTF-8 |
Type: | Package |
Title: | Sequential Input Selection Algorithm |
Version: | 0.49 |
Date: | 2024-10-25 |
Author: | Mikko Korpela [aut, cre] |
Maintainer: | Mikko Korpela <mvkorpel@iki.fi> |
Copyright: | Aalto University |
Depends: | R (≥ 4.3.0) |
Imports: | graphics, grDevices, grid, methods, stats, utils, boot, lattice, mgcv, digest, R.matlab, R.methodsS3 |
Suggests: | graph, Rgraphviz, testthat (≥ 0.8) |
Description: | Implements the SISAL algorithm by Tikka and Hollmén. It is a sequential backward selection algorithm which uses a linear model in a cross-validation setting. Starting from the full model, one variable at a time is removed based on the regression coefficients. From this set of models, a parsimonious (sparse) model is found by choosing the model with the smallest number of variables among those models where the validation error is smaller than a threshold. Also implements extensions which explore larger parts of the search space and/or use ridge regression instead of ordinary least squares. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://github.com/mvkorpel/sisal |
BugReports: | https://github.com/mvkorpel/sisal/issues |
LazyData: | yes |
NeedsCompilation: | no |
Packaged: | 2024-10-25 23:23:56 UTC; mikko |
Repository: | CRAN |
Date/Publication: | 2024-10-26 02:20:02 UTC |
sisal: Sequential input selection algorithm
Description
Implements the SISAL algorithm by Tikka and Hollmén. It is a sequential backward selection algorithm which uses a linear model in a cross-validation setting. Starting from the full model, one variable at a time is removed based on the regression coefficients. From this set of models, a parsimonious (sparse) model is found by choosing the model with the smallest number of variables among those models where the validation error is smaller than a threshold. Also implements extensions which explore larger parts of the search space and/or use ridge regression instead of ordinary least squares.
Details
Package: | sisal |
Depends: | R (>= 3.1.2) |
Imports: | graphics, grDevices, grid, methods, stats, utils, |
boot, lattice, mgcv, digest, R.matlab, R.methodsS3 | |
Suggests: | graph, Rgraphviz, testthat (>= 0.8) |
License: | GPL (>= 2) |
LazyData: | yes |
Index:
bootMSE Bootstrap Estimate of Mean Squared Error Using SISAL Object dynTextGrob Create Text with Changing Size laggedData Create Input Matrix and Output Vector for Time Series Prediction plot.sisal Plotting Sequential Input Selection Results plotSelected.sisal Plotting Sets of Inputs Produced by Sequential Input Selection print.sisal Printing Sequential Input Selection Objects sisal Sequential Input Selection Algorithm (SISAL) sisal-package sisal: Sequential input selection algorithm in R sisalData Download External Datasets for SISAL sisalTable Draw Table with Equally Sized Cells summary.sisal Summarizing Sequential Input Selection Results testSisal Testing the Sequential Input Selection Algorithm toy.learn Toy Data for SISAL (Learning Set) toy.test Toy Data for SISAL (Test Set) tsToy.learn Toy Time Series Data for SISAL (Learning Set) tsToy.test Toy Time Series Data for SISAL (Test Set)
Run input selection on your own data with sisal
. For demo
purposes, use testSisal
to run the algorithm on example
data sets. After input selection, compute bootstrap MSE in test data
with bootMSE
.
Author(s)
Mikko Korpela mvkorpel@iki.fi
References
Tikka, J. and Hollmén, J. (2008) Sequential input selection algorithm for long-term prediction of time series. Neurocomputing, 71(13–15):2604–2615.
Bootstrap Estimate of Mean Squared Error Using SISAL Object
Description
Using a linear model produced by sisal
, computes a
bootstrap estimate of MSE in test data.
Usage
bootMSE(object, dataset = NULL, R = 1000,
inputs = c("L.f", "L.v", "full"),
method = c("OLS", "magic"), standardize = "inherit",
stepsAhead = NULL, noiseSd = NULL, verbose = 1, ...)
Arguments
object |
an object of class |
dataset |
dataset to work on. A |
R |
the number of bootstrap replicates. Usually a single
positive integral number. See |
inputs |
a |
method |
a |
standardize |
|
stepsAhead |
If doing time series prediction, this indicates how
many steps ahead to predict. A non-negative integral value or
|
noiseSd |
standard deviation of the noise to be added to the
dependent variable when |
verbose |
verbosity level. A single |
... |
arguments passed to |
Details
Four types of values are supported in dataset
.
Use one of
"laser"
,"poland"
,"toy"
and"tsToy"
to work on the test part of a dataset included in or specifically supported by the package. The first two options will load their respective datasets over a network connection. SeesisalData
,toy.test
andtsToy.test
.Use a
numeric
vector
to work with time series data. The use of the"laser"
and"poland"
datasets is recognized. Loading the datasets in advance reduces unnecessary network traffic when doing multiple repeats with the same dataset.Use a
list
with anumeric
matrix
"X"
and anumeric
vector
"y"
to supply inputs"X"
and output"y"
. This is appropriate when using your own data for something else than time series prediction based on past values of the same time series.Use
NULL
(the default value) for automatic detection of the dataset. This works ifobject
was created withtestSisal
.
When using time series data, the names of the inputs used in
object
must match the regular expression
"lag\.\d+"
, i.e. "lag"
followed by a dot and an
integer without spaces or any other formatting. This is automatically
taken care of by laggedData
and testSisal
.
When using other than time series data, the user-supplied
dataset
must contain all the input variables used in the
selected linear model (i.e. full model or a subset of inputs) of
object
.
Value
An object of class "boot"
, as returned by
boot::boot
.
Author(s)
Mikko Korpela
See Also
Examples
foo <- testSisal(dataset="toy", Mtimes=10)
bootMSE(foo)
Create Text with Changing Size
Description
This function creates a text object. When drawn, its size changes automatically according to the space available.
Usage
dynTextGrob(label, x = 0.5, y = 0.5, width = 1, height = 1,
default.units = "npc", just = c(0.5, 0.5),
hjust = NULL, vjust = NULL, rot = 0, rotJust = TRUE,
rotHjust = NULL, rotVjust = NULL, resize = TRUE,
sizingWidth = NULL, sizingHeight = NULL,
adjustJust = TRUE, takeMeasurements = FALSE,
name = NULL, gp = gpar(), vp = NULL)
Arguments
label |
a |
x |
a |
y |
a |
width |
the space available for the labels in the width direction of the viewport. Used for computing the fontsize. |
height |
the space available for the labels in the height direction of the viewport. Used for computing the fontsize. |
default.units |
default unit to use when dimensions or locations
are unitless numbers. See |
just |
a |
hjust |
a |
vjust |
a |
rot |
a |
rotJust |
a |
rotHjust |
a |
rotVjust |
a |
resize |
a |
sizingWidth |
If |
sizingHeight |
See |
adjustJust |
A |
takeMeasurements |
A |
name |
a |
gp |
graphical parameters. See |
vp |
a |
Details
The number of labels created is the maximum of the lengths of x
and y
. Variables are recycled to that length if necessary.
All labels of one "dynText"
grob have the same fontsize.
Value
If takeMeasurements
is FALSE
(the default), returns a
grob
of class
"dynText"
. It can be drawn
with grid.draw
.
If takeMeasurements
is TRUE
, returns a list
containing measurements of the labels.
Author(s)
Mikko Korpela
See Also
See function textGrob
in package
grid.
Examples
library(grid)
grid.newpage()
grid.draw(dynTextGrob("Hello", vjust = 0, y = 0))
grid.draw(dynTextGrob(list(expression(y==x^2),
"Hello,\ntry resizing me!"),
x = rep(1, 2), y = 1, rot = -45,
hjust = 1, vjust = 1,
rotHjust = c(0, 1), rotVjust = 1))
Create Input Matrix and Output Vector for Time Series Prediction
Description
Given a time series vector, produces the input matrix and output vector for a time series prediction task. The other parameters are the lags to include and the number of steps ahead to predict.
Usage
laggedData(x, lags = 0:9, stepsAhead = 1)
Arguments
x |
an |
lags |
which lags to use for prediction. A |
stepsAhead |
how many steps ahead to predict. A non-negative
integral value ( |
Details
The default parameters correspond to predicting one step ahead
(position t+1
) using the ten most recent values
(positions t
... t-9
).
Value
A list
with two components:
X |
The |
y |
The output |
Author(s)
Mikko Korpela
Examples
laggedData(1:20)
Plotting Sequential Input Selection Results
Description
A plot
method for class "sisal"
. Supports 3 plot
types: error as a function of the number of variables, search graph,
and color key of the search graph.
Usage
## S3 method for class 'sisal'
plot(x, which = 1, standardize = "inherit", ...,
plotArgs = list(list(), list(mai = rep(0.1, 4))),
xlim = c(x[["d"]], 0), ylim = NULL, ask = TRUE,
dev.set = !ask, draw.node.labels = TRUE,
draw.edge.labels = TRUE, draw.selected.labels = TRUE,
rankdir = c("TB", "LR", "BT", "RL"),
fillcolor.normal = "deepskyblue",
fillcolor.pruned = "deeppink",
fillcolor.selected = "chartreuse",
fillcolor.levelbest = "gold",
fillcolor.small = "moccasin", fillcolor.large = "black",
fillcolor.NA = "white",
bordercolor.normal = "black",
bordercolor.special.levelbest = fillcolor.levelbest,
bordercolor.special.selected = fillcolor.selected,
color.by.error = FALSE,
ramp.space = c("Lab", "rgb"), ramp.size = 128,
error.limits = c(NA_real_, NA_real_),
category.labels =
c(normal = gettext("Other", domain="R-sisal"),
pruned = gettext("Pruned", domain="R-sisal"),
levelbest = gettext("Best\nin class", domain="R-sisal"),
selected = gettext("Selected", domain="R-sisal"),
special.levelbest = gettext("Best\n(no branching)",
domain="R-sisal"),
special.selected = gettext("Selected\n(no branching)",
domain="R-sisal"),
shape.normal=gettext("Other", domain="R-sisal"),
shape.highlighted=gettext("Highlighted", domain="R-sisal")),
integrate.colorkey = TRUE, colorkey.gap = 0.1,
colorkey.space = c("right", "bottom", "left", "top"),
colorkey.title.gp = gpar(fontface = "bold"),
nodesep = 0.25, ranksep = 0.5,
graph.attributes = character(0),
node.attributes = character(0),
edge.attributes = character(0))
Arguments
x |
an object of class |
which |
which plots to draw. A
The default is to draw plot number 1. For drawing plot number 2,
Bioconductor packages
Some other arguments of this method only apply to specific plots. |
standardize |
|
... |
arguments passed to |
plotArgs |
arguments passed to graphical functions. A
|
xlim |
the x limits |
ylim |
the y limits |
ask |
a |
dev.set |
a |
draw.node.labels |
a |
draw.edge.labels |
a |
draw.selected.labels |
a |
rankdir |
the drawing direction of plot number 2 (search graph).
A |
fillcolor.normal |
fill color for normal nodes in plot number 2. |
fillcolor.pruned |
fill color for pruned (unevaluated) nodes in
plot 2. If |
fillcolor.selected |
fill color for nodes representing the L.v
and L.f input variable sets of |
fillcolor.levelbest |
fill color for nodes with the smallest
validation error using a given number of input variables in plot 2.
If |
fillcolor.small |
if |
fillcolor.large |
if |
fillcolor.NA |
if |
bordercolor.normal |
border color for normal nodes in plot 2. |
bordercolor.special.levelbest |
border color for special nodes
in plot 2. If branching ( |
bordercolor.special.selected |
border color for another kind of
special nodes in plot 2. The “no branching” L.v or L.f node,
if different from the corresponding node in the solution where
branching is allowed, is marked with this border color. If
|
color.by.error |
a |
ramp.space |
color space to be used in plots number 2 and 3 if
|
ramp.size |
the number of colors to be used in the color
gradient of plot number 3 if |
error.limits |
a |
category.labels |
text labels to be used in plot number 3 if
|
integrate.colorkey |
a |
colorkey.gap |
a |
colorkey.space |
location of the color and shape key (plot 3)
relative to the graph (plot 2). One of |
colorkey.title.gp |
graphical parameters for the titles in plot
3. See |
nodesep |
a Graphviz attribute giving the minimum space in
inches between adjacent nodes representing the same number of input
variables. This |
ranksep |
a Graphviz attribute giving the minimum space in
inches between adjacent rows or columns of nodes, where a row or
column consists of nodes representing the same number of input
variables. This |
graph.attributes |
a named |
node.attributes |
a named |
edge.attributes |
a named |
Details
In argument plotArgs
, plotArgs[[1]]
is passed to
matplot
, plotArgs[[2]]
to the
plot method for class "Ragraph"
,
and plotArgs[[3]]
to draw.colorkey$key
.
For possible color values, see col2rgb
.
Value
When 2 %in% which
, the function invisibly returns
a graph of class "graphNEL"
representing the search graph of a run of sisal
.
Otherwise NULL
.
Author(s)
Mikko Korpela
References
For information about graph, node and edge attributes for plot number 2, see the Graphviz web site: https://www.graphviz.org/.
See Also
Examples
library(graphics)
foo <- testSisal(dataset="toy", Mtimes=10)
## Plotting the search graph requires "Rgraphviz" and "graph"
if (requireNamespace("Rgraphviz", quietly=TRUE) &&
requireNamespace("graph", quietly=TRUE)) {
plot(foo, which=2)
}
## Default output is a mean squared error plot
plot(foo)
Plotting Sets of Inputs Produced by Sequential Input Selection
Description
Draws a table depicting the inputs selected by a number of
sisal
runs, one row for each run.
Usage
## S3 method for class 'sisal'
plotSelected(x, useAllNames = TRUE,
pickIntPart = FALSE, intTransform = function(x) x,
formatCArgs = list(), xLabels = 1, yLabels = NULL,
L.f.color = "black", L.v.color = "grey50",
other.color = "white", naFill = other.color,
naStripes = L.v.color, selectedLabels = TRUE,
otherLabels = FALSE,
labelPar = gpar(fontface = 1, fontsize = 20, cex = 0.35),
nestedPar = gpar(fontface = 3),
ranking = c("pairwise", "nested"), tableArgs = list(),
...)
## S3 method for class 'list'
plotSelected(x, ...)
Arguments
x |
an object of class |
useAllNames |
a |
pickIntPart |
a |
intTransform |
a |
formatCArgs |
a named |
xLabels |
a |
yLabels |
a |
L.f.color |
fill color for table cells representing an input variable in the L.f set. |
L.v.color |
fill color for table cells representing an input variable in the L.v set. |
other.color |
fill color for table cells representing an input variable outside both L.f and L.v. |
naFill |
background color for table cells representing a missing input variable. |
naStripes |
stripe color for table cells representing a missing input variable. |
selectedLabels |
a |
otherLabels |
a |
labelPar |
graphical parameters for labels of table cells. |
nestedPar |
graphical parameters for labels on rows that
represent input selection runs where the best nodes of each size are
all nested. See ‘Details’. Only used if
|
ranking |
which input ranking method(s) to use. A
|
tableArgs |
a named |
... |
In the |
Details
Currently the "sisal"
and "list"
methods are the only
methods for the generic function plotSelected
defined by the
sisal package.
Mathematical annotation can be used in text. See plotmath. If
the same input is in both the L.f and the L.v sets,
L.f.color
and L.v.color
are mixed in
alternating stripes. See col2rgb
for a description of
possible color values.
The importance rank of input variables is determined using one or both
of the following two methods (see ranking
):
- "nested"
-
This method requires that all the nodes with the smallest validation error among the nodes with the same number of input variables are nested. Let's imagine a path through the incrementally smaller best nodes (not necessarily a path in the search graph) where the edges are labeled with the ID of the input removed in order to create the smaller model. In this ranking method, the remaining input variable gets rank 1. Traversing the path in the reverse direction and printing the edge labels produces the rest of the input variables from smaller rank to larger. If
hbranches = 1
insisal
, the models are always nested and the method agrees with"pairwise"
. - "pairwise"
-
This is Copeland's pairwise aggregation method. It can be used in all cases, unlike
"nested"
. The score of an input variable is the number of pairwise victories minus the number of pairwise defeats when compared with other inputs. The inputs are ranked by their score. The method may result in ties. Tied nodes are ranked according toties.method = "min"
inrank
.The pairwise comparisons are performed in the following way: In
sisal
, at each stage of the search, input variables are ordered and inputs are removed starting from one or more (whenhbranches > 1
) of the worst ones according to that order. A record, let's sayC[A, B]
, is kept of each pair of inputs (A, B) in order to keep track of how many times A was better than B. Let L be the set of inputs to remove at the current stage of the search in one of the branches and M the set of remaining inputs. Then,C[A, B]
is incremented by one for all A in M and B in L, but also for all A in L and B in L such that A is better than B according to the order used for picking the inputs to remove. A gets a pairwise victory over B ifC[A, B] > C[B, A]
.
For information on setting graphical parameters
(labelPar
, nestedPar
), see
gpar
.
Value
The function is usually called for the side effect (a plot is drawn),
but it also returns a grob
representation of the plot.
Author(s)
Mikko Korpela
References
Pomerol, J.-C. and Barba-Romero, S. (2000) Multicriterion decision in management: principles and practice. Springer. p. 122. ISBN: 0-7923-7756-7.
See Also
sisal
, sisalTable
,
plotmath, gpar
Examples
library(grDevices)
library(grid)
toy1.2 <- list(testSisal(Mtimes=10, stepsAhead=1, dataset="tsToy"),
testSisal(Mtimes=10, stepsAhead=2, dataset="tsToy"))
## Resizing enabled:
## - mathematical expressions in titles
## - extracting the integer part of input variable names
grid.newpage()
plotSelected(toy1.2, yLabels = c("+1", "+2"),
main = "Toy time series",
xlab = expression(paste("input variables ",
italic(y[t+l]))),
ylab = expression(paste("output ", italic(y[t+k]))),
pickIntPart = TRUE, intTransform = function(x) -x)
## Fixed size plot:
## - some graphical parameters adjusted
## - cex in labelPar adjusts the space around the text in table cells
## - new device the same size as the plot
grb <- plotSelected(toy1.2, resizeText = FALSE, resizeTable = FALSE,
axesPar = gpar(fontsize = 11, col = "red"),
labelPar = gpar(fontsize = 14/0.25, cex = 0.25),
fg = "wheat", outerRect = FALSE,
linePar = gpar(lty = "dashed"),
xAxisRot = 45, just = c("left", "top"),
tableArgs = list(x = 0, y = 1), draw = FALSE)
devWidth <- convertWidth(grobWidth(grb), unitTo = "inches",
valueOnly = TRUE)
devHeight <- convertHeight(grobHeight(grb), unitTo = "inches",
valueOnly = TRUE)
dev.new(width = devWidth, height = devHeight, units = "in", res = 72)
grid.draw(grb)
if (interactive()) {
dev.set(dev.prev())
} else {
dev.off()
}
Printing Sequential Input Selection Objects
Description
Prints information contained in a sequential input selection object.
Usage
## S3 method for class 'sisal'
print(x, max.warn = 10, ...)
Arguments
x |
an object of class |
max.warn |
a |
... |
additional arguments passed to other |
Details
The following information is printed:
Parameter values used in the
sisal
callData dimensions
Names of the input variables, if available
Selected inputs, L.v (smallest validation error)
Selected inputs, L.f (result within error margin)
Whether L.f is a subset of L.v (nested model) or not
The removal order and / or rank of the input variables (see
plotSelected.sisal
)The stages of search (if any) at which branching reduced validation error compared to a
hbranches = 1
solution. Not printed if branching was not used or if it is possible that the search did not proceed through every set of variables on thehbranches = 1
path, i.e. ifpruning.keep.best
wasFALSE
. One must note that these results, like many others, are subject to randomness. Thus the results may differ between successive runs ofsisal
.Any warnings produced by the
sisal
run (seemax.warn
)
Value
Invisibly returns x
.
Author(s)
Mikko Korpela
See Also
More information can be obtained with summary.sisal
.
Examples
foo <- testSisal(dataset="toy", nData = 200, Mtimes = 10,
noiseSd = 0.5, verbose = 0)
print(foo)
Sequential Input Selection Algorithm (SISAL)
Description
Identifies relevant inputs using a backward selection type algorithm with optional branching. Choices are made by assessing linear models estimated with ordinary least squares or ridge regression in a cross-validation setting.
Usage
sisal(X, y, Mtimes = 100, kfold = 10, hbranches = 1,
max.width = hbranches^2, q = 0.165, standardize = TRUE,
pruning.criterion = c("round robin", "random nodes",
"random edges", "greedy"),
pruning.keep.best = TRUE, pruning.reverse = FALSE,
verbose = 1, use.ridge = FALSE,
max.warn = getOption("nwarnings"), sp = -1, ...)
Arguments
X |
a |
y |
a |
Mtimes |
the number of times the cross-validation is repeated,
i.e. the number of predictions made for each data point. An
integral value ( |
kfold |
the number of approximately equally sized parts used for
partitioning the data on each cross-validation round. An integral
value ( |
hbranches |
the number of branches to take when removing a
variable from the model. In Tikka and Hollmén
(2008), the algorithm always removes the “weakest” variable
( |
max.width |
the maximum number of nodes with a given number of
variables allowed in the search graph. The same limit is used for
all search levels. An integral value ( |
q |
a |
standardize |
a |
pruning.criterion |
a If If If If |
pruning.keep.best |
a |
pruning.reverse |
a |
verbose |
a |
use.ridge |
a |
max.warn |
a |
sp |
a |
... |
additional arguments passed to |
Details
When choosing which variable to drop from the model, the importance of a variable is measured by looking at two variables derived from the sampling distribution of its coefficient in the linear models of the repeated cross-validation runs:
absolute value of the median and
width of the distribution (see
q
).
The importance of an input variable is the ratio of the median to
the width: hbranches
variables with the smallest ratios
are dropped, one variable in each branch. See max.width
and pruning.criterion
.
The main results of the function are described here. More details are available in ‘Value’.
The function returns two sets of inputs variables:
- L.v
set corresponding to the smallest validation error.
- L.f
smallest set where validation error is close to the smallest error. The margin is the standard deviation of the training error measured in the node of the smallest validation error.
The mean of mean squared errors in the training and
validation sets are also returned (E.tr
,
E.v
). For the training set, the standard deviation of
MSEs (s.tr
) is also returned. The length of
these vectors is the number of variables in X
. The
i:th element in each of the vectors corresponds to the best
model with i input variables, where goodness is measured by the
mean MSE in the validation set.
Linear models fitted to the whole data set are also returned. Both
ordinary least square regression (lm.L.f
,
lm.L.v
, lm.full
) and ridge regression models
(magic.L.f
, magic.L.v
,
magic.full
) are computed, irrespective of the
use.ridge
setting. Both fitting methods are used for the
L.f
set of variables, the L.v
set and the
full set (all variables).
Value
A list
with class
"sisal"
. The items are:
L.f |
a |
L.v |
a |
E.tr |
a |
s.tr |
a |
E.v |
a |
L.f.nobranch |
a |
L.v.nobranch |
like |
E.tr.nobranch |
a |
s.tr.nobranch |
like |
E.v.nobranch |
like |
n.evaluated |
a |
edges |
a |
vertices |
a |
vertices.logical |
a |
vertex.data |
A
|
var.names |
names of the variables (column names of
|
n |
number of observations in the ( |
d |
number of variables (columns) in |
n.missing |
number of samples where either |
n.clean |
number of complete samples in the data set
|
lm.L.f |
|
lm.L.v |
|
lm.full |
|
magic.L.f |
|
magic.L.v |
|
magic.full |
|
mean.y |
mean of |
sd.y |
standard deviation (denominator |
zeroRange.y |
a |
mean.X |
column means of |
sd.X |
standard deviation (denominator |
zeroRange.X |
a |
constant.X |
a |
params |
a named |
pairwise.points |
a |
pairwise.wins |
a |
pairwise.preferences |
a |
pairwise.rank |
an |
path.length |
a |
nested.path |
a |
nested.rank |
an |
branching.useful |
If branching is enabled
( |
warnings |
warnings stored. A |
n.warn |
number of warnings produced. May be higher than number of warnings stored. |
Author(s)
Mikko Korpela
References
Tikka, J. and Hollmén, J. (2008) Sequential input selection algorithm for long-term prediction of time series. Neurocomputing, 71(13–15):2604–2615.
See Also
See magic
for information about the algorithm used for
estimating the regularization parameter and the corresponding linear
model when use.magic
is TRUE
.
See summary.sisal
for how to extract information from
the returned object.
Examples
library(stats)
set.seed(123)
X <- cbind(sine=sin((1:100)/5),
linear=seq(from=-1, to=1, length.out=100),
matrix(rnorm(800), 100, 8,
dimnames=list(NULL, paste("random", 1:8, sep="."))))
y <- drop(X %*% c(3, 10, 1, rep(0, 7)) + rnorm(100))
foo <- sisal(X, y, Mtimes=10, kfold=5)
print(foo) # selected inputs "L.v" are same as
summary(foo$lm.full) # significant coefficients of full model
Download External Datasets for SISAL
Description
Loads external datasets for testing with SISAL. Choices are laser generated data and Poland electricity load data.
Usage
sisalData(dataset = c("poland", "laser", "laser.cont"), verify = TRUE)
Arguments
dataset |
A |
verify |
A |
Details
The laser generated data come in two parts, "laser"
and
"laser.cont"
. The Poland electricity load data is also divided
in two parts, but they are both returned with dataset="poland"
.
This function requires an Internet connection. The download may fail due to a problem such as the remote server being unavailable.
Value
With option dataset="laser"
, returns an integer
vector
of length
1000.
With option dataset="laser.cont"
, returns an
integer
vector
of length
9093.
With option dataset="poland"
, returns a list with two
numeric
vectors:
learn |
1400 values |
test |
201 values |
Note
Checked on 2020-02-14, the Santa Fe datasets are no longer available at their previous location. Attempting to download them with this function will result in an error.
Author(s)
Mikko Korpela
References
The Santa Fe Time Series Competition Data / Data Set A: Laser generated data. Availability unknown (2020-02-14).
Environmental and Industrial Machine Learning Group / Datasets / Poland Electricity Load. https://research.cs.aalto.fi/aml/datasets.shtml. URL accessed on 2024-10-25.
See Also
Examples
## Not run:
foo <- sisalData("poland")
length(foo$learn) # 1400
length(foo$test) # 201
## End(Not run)
Draw Table with Equally Sized Cells
Description
Draws a resizable or fixed-size table with equally sized cells. Main title, axis (tick) labels and axis titles (left, bottom) are optional. Cells can have individual background and text colors and stripes.
Usage
sisalTable(labels = matrix(seq_len(12), 3, 4),
nRows = NROW(labels), nCols = NCOL(labels),
bg = sample(colors(), nRows * nCols, replace = TRUE),
stripeCol = NULL, fg = NULL, naFill = "white",
naStripes = "grey50", main = NULL, xlab = NULL,
ylab = NULL, xAxisLabels = NULL, yAxisLabels = NULL,
draw = TRUE, outerRect = TRUE, innerLines = TRUE,
nStripes = 7, stripeRot = 45, stripeWidth = 0.2,
stripeScale = 0.95, resizeText = TRUE,
resizeTable = TRUE, resizeMain = resizeText,
resizeLab = resizeText, resizeAxes = resizeText,
resizeLabels = resizeTable && resizeText,
x = unit(0.5, "npc"), y = unit(0.5, "npc"),
width = unit(0.97, "npc"), height = unit(0.97, "npc"),
default.units = "npc", just = "center",
clip = "inherit", xAxisRot = 0, yAxisRot = 0,
xAxisJust = c(0.5, 1), xAxisX = 0.5, xAxisY = 1,
yAxisJust = c(1, 0.5), yAxisX = 1, yAxisY = 0.5,
mainMargin = if (resizeMain) 0.15 else unit(8, "points"),
xlabMargin = if (resizeLab) 0.1 else unit(5, "points"),
ylabMargin = if (resizeLab) 0.1 else unit(5, "points"),
axesMargin = if (resizeAxes) 0.1 else unit(5, "points"),
axesSize = 0.8, forceAxesSize = FALSE,
mainSize = 1, xlabSize = 1, ylabSize = 1,
mainPar = gpar(fontface = "bold", fontsize = 14),
labPar = gpar(fontface = "plain", fontsize = 14),
labelPars = gpar(fontsize = 20, cex = 0.6),
axesPar = gpar(fontsize = 10),
rectPar = gpar(), linePar = gpar(),
name = NULL, gp = NULL, vp = NULL)
Arguments
labels |
the labels to use in the table cells. A
|
nRows |
the number of rows in the table. A positive integral number. |
nCols |
the number of columns in the table. A positive integral number. |
bg |
the background colors of the table cells. One element is used for each cell. |
stripeCol |
an optional |
fg |
the text colors of the table cells. One element is used
for each cell. If |
naFill |
background color to use when the label of a table cell
is |
naStripes |
table cells with an |
main |
the main title of the plot. |
xlab |
a title for the x axis. |
ylab |
a title for the y axis. |
xAxisLabels |
a label for each column of the table. |
yAxisLabels |
a label for each row of the table. |
draw |
a |
outerRect |
a |
innerLines |
a |
nStripes |
a positive integral number giving the number of
stripes to be drawn in table cells. Only applies to those cells
where stripes are used, i.e. when the relevant element of
|
stripeRot |
an integral number giving the rotation angle
(degrees, counterclockwise) of the stripes used in table cells.
Defaults to |
stripeWidth |
a |
stripeScale |
a |
resizeText |
a |
resizeTable |
a |
resizeMain |
a |
resizeLab |
a |
resizeLabels |
a |
resizeAxes |
a |
x |
a |
y |
a |
width |
a |
height |
a |
default.units |
a |
just |
a |
clip |
a |
xAxisRot |
a |
yAxisRot |
a |
xAxisJust |
justification setting for column labels. A
|
xAxisX |
x location of column labels relative to the space
allocated for them. A |
xAxisY |
y location of column labels relative to the space
allocated for them. A |
yAxisJust |
justification setting for row labels. A
|
yAxisX |
x location of row labels relative to the space
allocated for them. A |
yAxisY |
y location of row labels relative to the space
allocated for them. A |
mainMargin |
size of the margin between the main title and the table. |
xlabMargin |
size of the margin between the x axis title and the next graphical object towards the table. |
ylabMargin |
size of the margin between the y axis title and the next graphical object towards the table. |
axesMargin |
size of the margin between the row or column labels and the table. |
axesSize |
a positive |
forceAxesSize |
a |
mainSize |
scale factor for fontsize of main title. A positive
|
xlabSize |
scale factor for fontsize of x axis title. A
positive |
ylabSize |
scale factor for fontsize of y axis title. A
positive |
mainPar |
graphical parameters for the main title. |
labPar |
graphical parameters for x and y axis titles. |
labelPars |
graphical parameters for labels used in table cells. Can also be a list, one element for each table cell, recycled if necessary. |
axesPar |
graphical parameters for row and column labels. |
rectPar |
graphical parameters for the rectangle around the table. |
linePar |
graphical parameters for the line segments between table cells. |
name |
a |
gp |
graphical parameters for the whole object. |
vp |
a |
Details
This function was written to be used with plotSelected
but it should be generic enough to be useful for other purposes, too.
The color and text vectors (including matrices and arrays) pointing to
table cells (labels
, bg
,
stripeCol
, fg
) are interpreted in
column-major order, like linear indexing of a matrix
. Each
data.frame
argument is collapsed to a list by combining its
columns. Finally, values are recycled if needed, also in
xAxisLabels
and yAxisLabels
.
For possible color values, see col2rgb
.
In the various text objects, mathematical annotation (see
plotmath) is supported in addition to character
values.
For information on setting graphical parameters (gp
,
mainPar
, labPar
, ...), see
gpar
.
The graphical object returned is a gTree
which contains
a gList
of graphical objects and a vpTree
of viewports. The child viewports are placed inside the parent using
a grid.layout
. The size of the whole object is the size
of the parent viewport. It will be fixed or depend on the space
available to it:
If all graphical elements are non-resizable (but
resizeLabels
can beTRUE
), a suitable fixed size will be computed.Otherwise, the size is determined by
width
andheight
. However, if there are non-resizable elements, the graphical object may be larger than that.
The graphical object will not use any excess space. In other words,
the width and height reported by grobWidth
and
grobHeight
are tight. It is possible that some parts of
the plot may overflow their assigned space and the bounds computed for
the whole graphical object. Examples include using large fixed-size
text elements or large values of the gpar
graphical
parameter "cex"
. Clipping can be adjusted through
clip
.
If resizeAxes
is TRUE
, axesMargin
must be a non-negative numeric
value giving the size of the
margin as a proportion of the side length of a table cell. If
resizeAxes
is FALSE
, axesMargin
can
also be a unit
object. The arguments
mainMargin
and labMargin
are analogous to
axesMargin
.
Value
The function is usually called for the side effect (a plot is drawn),
but it also returns a grob
representation of the plot.
The returned object is a custom gTree
of class
"sisalTable"
.
Author(s)
Mikko Korpela
Examples
library(grDevices)
library(grid)
## Default: 3 by 4 table with labels 1:12 and random background colors
grid.newpage()
sisalTable()
## Four examples in a grid layout
rowCol <- c(1, 18, 2, 18, 1)
lo <- grid.layout(nrow = 5, ncol = 5,
widths = rowCol, heights = rowCol)
grid.newpage()
pushViewport(viewport(layout = lo, name = "bgLayout"))
grid.rect(gp=gpar(fill="grey75", col="grey75"))
rNames <- c("topmargin", "top", "hspace", "bottom", "bottommargin")
cNames <- c("leftmargin", "left", "vspace", "right", "rightmargin")
for (Row in c(2, 4)) {
for (Col in c(2, 4)) {
pushViewport(viewport(layout.pos.row = Row,
layout.pos.col = Col,
name = paste(rNames[Row],
cNames[Col], sep="")))
grid.rect(gp=gpar(fill="cadetblue"))
upViewport(1)
}
}
colors1Vec <- terrain.colors(12)
colors1Mat <- matrix(colors1Vec, 3, 4)
labels1Vec <- sample(c(letters, LETTERS), 12)
labels1Mat <- matrix(labels1Vec, 3, 4)
## Column vector, aligned with the right side of the viewport
longText <- rep("", 12)
longText[3] <- "a longish piece of text"
longText[9] <- "and some more"
sisalTable(labels1Vec, bg = colors1Vec, vp = "topleft",
x = 1, just = "right",
yAxisLabels = longText, xAxisLabels = "Boo")
## Matrix, zero margin
downViewport("topright")
sisalTable(labels1Mat, bg = colors1Mat,
width = 1, height = 1, name = "trPlot",
xAxisLabels = 1:4, yAxisLabels = LETTERS[1:3])
grid.rect(width = grobWidth("trPlot"), height = grobHeight("trPlot"),
gp = gpar(lty="dashed", col = "white", lwd = 2))
upViewport(1)
## Transpose of matrix, width and height 0.75 "npc" units
downViewport("bottomleft")
sisalTable(t(labels1Mat), bg = t(colors1Mat),
width = 0.75, height = 0.75, name = "blPlot",
yAxisLabels = 1:4, xAxisLabels = LETTERS[1:3])
grid.rect(width = grobWidth("blPlot"), height = grobHeight("blPlot"),
gp = gpar(lty="dashed", col = "white", lwd = 2))
upViewport(1)
## ?plotmath, some cells with no background color
labels2 <- expression(x^{y+x}, sqrt(x), bolditalic(x), NA)
bgCol <- c(rep("white", 3), NA)
sisalTable(labels2, nRows=3, nCols=5, bg = bgCol, naFill = NA,
naStripes = "darkmagenta", vp="bottomright",
main = "plotmath text")
Summarizing Sequential Input Selection Results
Description
summary
method for class "sisal"
Usage
## S3 method for class 'sisal'
summary(object, ...)
## S3 method for class 'summary.sisal'
print(x, ...)
Arguments
object |
an object of class |
x |
an object of class |
... |
arguments passed to/from other methods. |
Details
The functions compute and print summaries (summary.lm
)
of the ordinary least squares regression models stored in the
object
and some additional information.
Value
The function summary.sisal
returns a list
with
class
"summary.sisal"
, currently containing:
summ.full |
summary of the full model. An object of class
|
summ.L.v |
summary of the L.v model. An object of
class |
summ.L.f |
summary of the L.f model. An object of
class |
error.df |
a
|
The function print.summary.sisal
invisibly returns
x
.
Author(s)
Mikko Korpela
See Also
Examples
foo <- testSisal(dataset="toy", Mtimes=10, hbranches=2)
summary(foo)
Testing the Sequential Input Selection Algorithm
Description
Tests sisal
with example datasets or time series data.
The function uses the training part of an example dataset or
user-supplied numeric data interpreted as a time series.
Usage
testSisal(dataset = c("tsToy", "laser", "poland", "toy"), nData = Inf,
FUN = "sisal", lags = NULL, stepsAhead = 1,
noiseSd = 0.2, verbose = 1, ...)
Arguments
dataset |
the dataset to use. A |
nData |
a |
FUN |
which function to call. By default, acts as a front end
to |
lags |
a |
stepsAhead |
an integral value specifying how many steps ahead to predict in a time series setting. The default is 1. |
noiseSd |
standard deviation of noise to be used with the
|
verbose |
a |
... |
arguments passed to |
Details
The function recognizes if a numeric
dataset
is the "laser"
or "poland"
dataset. In case repeated
experiments will be performed on those datasets, it is best to explicitly
fetch them with sisalData
before using this function.
Doing so reduces the amount of network traffic and makes offline work
possible.
Value
The value returned by function FUN
, when called with the
given dataset
(processed by this function) and
parameters. See the help page of the relevant function,
e.g. sisal
.
Author(s)
Mikko Korpela
See Also
See sisalData
, toy.learn
and
tsToy.learn
for documentation on the datasets.
The performance of the models returned by this functions can be
evaluated using bootMSE
, which uses a separate test part
of the dataset.
Examples
foo <- testSisal(dataset="toy", hbranches=2, max.width=2, Mtimes=5,
use.ridge=TRUE)
print(foo)
names(foo)
Toy Data for SISAL (Learning Set)
Description
Numeric matrix with independent and dependent variables and noise
Usage
toy.learn
Format
The format is:
num [1:1000, 1:12] -0.62067 1.36985 0.00122 0.75527 -1.82271 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:12] "y" "noise" "X1" "X2" ...
Details
This is the learning set of the toy data, i.e. 1000 rows of the whole 1500 row dataset.
Columns "X1"
, "X2"
, ..., "X10"
were generated
with rnorm
to follow a standard normal distribution.
Column "y"
is a linear combination of "X1"
, "X2"
,
"X3"
, coefficients (1:3)/sqrt(sum((1:3)^2))
, yielding a
theoretical standard normal distribution.
Column "noise"
was also generated from the standard normal
distribution.
Use file.show(system.file("toyDataSrc", "sisalToy.R",
package="sisal"))
to view the script that generated the data.
See Also
Examples
library(graphics)
plot(as.data.frame(toy.learn))
Toy Data for SISAL (Test Set)
Description
Numeric matrix with independent and dependent variables and noise
Usage
toy.test
Format
The format is:
num [1:500, 1:12] -0.543 -0.881 0.115 0.461 -0.173 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:12] "y" "noise" "X1" "X2" ...
Details
This is the test set of the toy data, i.e. 500 rows of the whole 1500 row dataset.
For other details, see toy.learn
.
See Also
Examples
library(graphics)
plot(as.data.frame(toy.test))
Toy Time Series Data for SISAL (Learning Set)
Description
Numeric vector with autoregressive (AR) time series data
Usage
tsToy.learn
Format
The format is:
num [1:1000] 0.7529 -0.2576 0.441 0.8473 0.0164 ...
Details
This is the learning set of the toy time series data, i.e. the first 1000 of the total 3000 observations.
The data follow a second order AR model. The first order
coefficient is -0.5
and the second order coefficient
0.3
. The autocovariances for lags 0
to 4
are
c(1.0, -0.71, 0.66, -0.54, 0.47)
(theoretical values, two
significant digits).
Use file.show(system.file("toyDataSrc", "sisalToyTs.R",
package="sisal"))
to view the script that generated the data.
See Also
Examples
library(graphics)
library(stats)
plot(tsToy.learn)
acf(tsToy.learn)
Toy Time Series Data for SISAL (Test Set)
Description
Numeric vector with autoregressive (AR) time series data
Usage
tsToy.test
Format
The format is:
num [1:2000] 0.583 -0.71 -1.172 1.067 -0.719 ...
Details
This is the test set of the toy time series data, i.e. the last 2000 of the total 3000 observations.
The data follow a second order AR model. The first order
coefficient is -0.5
and the second order coefficient
0.3
.
Use file.show(system.file("toyDataSrc", "sisalToyTs.R",
package="sisal"))
to view the script that generated the data.
See Also
Examples
library(graphics)
library(stats)
plot(tsToy.test)
acf(tsToy.test, type="partial")