Title: | Obtaining Stars from Flat Tables |
Version: | 1.2.5 |
Description: | Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a star schema. Transformations can be carried out using professional extract, transform and load tools or tools intended for data transformation for end users. With the tools mentioned, this transformation can be carried out, but it requires a lot of work. The main objective of this package is to define transformations that allow obtaining stars from flat tables easily. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context. |
License: | MIT + file LICENSE |
URL: | https://josesamos.github.io/starschemar/, https://github.com/josesamos/starschemar |
BugReports: | https://github.com/josesamos/starschemar/issues |
Depends: | R (≥ 2.10) |
Imports: | dplyr, generics, methods, purrr, rlang, snakecase, stats, tibble, tidyr |
Suggests: | knitr, pander, rmarkdown, testthat |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
Language: | en-GB |
LazyData: | true |
RoxygenNote: | 7.3.1 |
NeedsCompilation: | no |
Packaged: | 2024-05-02 05:44:57 UTC; joses |
Author: | Jose Samos |
Maintainer: | Jose Samos <jsamos@ugr.es> |
Repository: | CRAN |
Date/Publication: | 2024-05-02 06:10:03 UTC |
Obtaining Star Schemas from Flat Tables
Description
Transformations that allow obtaining star schemas from flat tables.
Details
From flat tables star schemas can be defined that can form constellations (star schema and constellation definition functions). Dimensions contain data without duplicates, operations to do data cleaning can be applied on them (data cleaning functions). Dimensions can be enriched by adding additional columns, sometimes using functions, others explicitly defined by the user (dimension enrichment functions). When new data is obtained, it is necessary to refresh the existing data with them by means of incremental refresh operations or delete data that is no longer necessary (incremental refresh functions). Finally, the results obtained can be exported to be consulted with other tools (results export functions) or through the defined query functions (query functions).
Star schema and constellation definition
Starting from a flat
table, a dimensional model is defined specifying the attributes that make
up each of the dimensions and the measurements in the facts. The result is
a dimensional_model
object. It is carried out through the following
dimensional model definition functions:
A star schema is defined from a flat table and a dimensional model definition. Once defined, a star schema can be transformed by defining role playing dimensions, changing the writing style of element names or the type of dimension attributes. These operations are carried out through the following star schema definition and transformation functions:
Once a star schema is defined, we can rename its elements. It is necessary to be able to rename attributes of dimensions and measures of facts because the definition operations only allowed us to select columns of a flat table. For completeness also dimensions and facts can be renamed. To carry out these operations, the following star schema rename functions are available:
Based on various star schemas, a constellation can be defined in which star schemas share common dimensions. Dimensions with the same name must be shared. It is defined by the following constellation definition function:
Data cleaning
Once the star schemas and constellations are defined, data cleaning operations can be carried out on dimensions. There are three groups of functions: one to obtain dimensions of star schemas and constellations; another to define data cleaning operations over dimensions; and one more to apply operations to star schemas or constellations.
Obtaining dimensions:
Update definition functions:
Modification application functions:
Dimension enrichment
To enrich a dimension with new attributes related to others already included in it, first, we export the attributes on which the new ones depend, then we define the new attributes, and import the table with all the attributes to be added to the dimension.
Incremental refresh
When new data is obtained, an incremental refresh of the data can be carried out, both of the dimensions and of the facts. Incremental refresh can be applied to both star schema and constellation, using the following functions:
Sometimes the data refresh consists of eliminating data that is no longer necessary, generally because it corresponds to a period that has stopped being analysed but it can also be for other reasons. This data can be selected using the following function:
Once the fact data is removed (using the other incremental refresh functions), we can remove the data for the dimensions that are no longer needed using the following functions:
Results export
Once the data has been properly structured and transformed, it can be exported to be consulted with other tools or with R. Various export formats have been defined, both for star schemas and for constellations, using the following functions:
Query functions
There are many multidimensional query tools
available. The exported data, once stored in files, can be used directly
from them. Using the following functions, you can also perform basic
queries from R on data in the multistar
format:
Transform a dimension numeric attributes to character
Description
Transforms numeric type attributes of a dimension into character type.
Usage
character_dimension(
dimension,
length_integers = TRUE,
NA_replacement_value = NULL
)
## S3 method for class 'dimension_table'
character_dimension(
dimension,
length_integers = TRUE,
NA_replacement_value = NULL
)
Arguments
dimension |
A |
length_integers |
A |
NA_replacement_value |
A string, value to replace NA values. |
Details
It allows indicating the amplitude for some fields, filling with zeros on the left: This is useful to make the alphabetical order of the result correspond to the numerical order. It also allows indicating the literal to be used in case the numerical value is not defined. For dates, for not defined values, the value "9999-12-31" is assigned.
Value
A dimension_table
object.
Transform dimension numeric attributes to character
Description
Transforms numeric type attributes of dimensions into character type. In a
star_schema
numerical data are measurements that are situated in the facts.
Numerical data in dimensions are usually codes, day, week, month or year
numbers. There are tools that consider any numerical data to be a
measurement, for this reason it is appropriate to transform the numerical
data of dimensions into character data.
Usage
character_dimensions(st, length_integers = list(), NA_replacement_value = NULL)
## S3 method for class 'star_schema'
character_dimensions(st, length_integers = list(), NA_replacement_value = NULL)
Arguments
st |
A |
length_integers |
A |
NA_replacement_value |
A string, value to replace NA values. |
Details
It allows indicating the amplitude for some fields, filling with zeros on the left. This is useful to make the alphabetical order of the result correspond to the numerical order.
It also allows indicating the literal to be used in case the numerical value is not defined.
If a role playing dimension has been defined, the transformation is performed on it.
Value
A star_schema
object.
See Also
Other star schema and constellation definition functions:
constellation()
,
role_playing_dimension()
,
snake_case()
,
star_schema()
Examples
st <- star_schema(mrs_age_test, dm_mrs_age) |>
role_playing_dimension(
dim_names = c("when", "when_available"),
name = "When Common",
attributes = c("date", "week", "year")
) |>
character_dimensions(length_integers = list(week = 2),
NA_replacement_value = "Unknown")
Conform all dimensions of a constellation
Description
Conform all dimensions with the same name in the star schemas of a constellation. If two dimensions have the same name in a constellation, they must be conformed.
Usage
conform_all_dimensions(ct)
Arguments
ct |
A |
Value
A constellation
object.
Conform dimensions of given name
Description
If two dimensions have the same name in a constellation, they must be conformed.
Usage
conform_dimensions(ct, name = NULL)
Arguments
ct |
A |
name |
A string, name of the dimension. |
Value
A constellation
object.
constellation
S3 class
Description
Creates a constellation
object from a list of star_schema
objects. All
dimensions with the same name in the star schemas have to be conformable.
Usage
constellation(lst, name = NULL)
Arguments
lst |
A list of |
name |
A string. |
Value
A constellation
object.
See Also
Other star schema and constellation definition functions:
character_dimensions()
,
role_playing_dimension()
,
snake_case()
,
star_schema()
Examples
ct <- constellation(list(st_mrs_age, st_mrs_cause), name = "mrs")
Export a constellation as a multistar
Description
Once we have refined the format or content of facts and dimensions, we can
obtain a multistar
. A multistar
only distinguishes between general and
conformed dimensions, each dimension has its own data. It can contain
multiple fact tables.
Usage
constellation_as_multistar(ct)
## S3 method for class 'constellation'
constellation_as_multistar(ct)
Arguments
ct |
A |
Value
A multistar
object.
See Also
Other results export functions:
constellation_as_tibble_list()
,
multistar_as_flat_table()
,
star_schema_as_flat_table()
,
star_schema_as_multistar()
,
star_schema_as_tibble_list()
Examples
ms <- ct_mrs |>
constellation_as_multistar()
Export a constellation as a tibble
list
Description
Once we have refined the format or content of facts and dimensions, we can
obtain a tibble
list with them. Role playing dimensions can be optionally
included.
Usage
constellation_as_tibble_list(ct, include_role_playing = FALSE)
## S3 method for class 'constellation'
constellation_as_tibble_list(ct, include_role_playing = FALSE)
Arguments
ct |
A |
include_role_playing |
A boolean. |
Value
A list of tibble
objects.
See Also
Other results export functions:
constellation_as_multistar()
,
multistar_as_flat_table()
,
star_schema_as_flat_table()
,
star_schema_as_multistar()
,
star_schema_as_tibble_list()
Examples
tl <- ct_mrs |>
constellation_as_tibble_list()
tl <- ct_mrs |>
constellation_as_tibble_list(include_role_playing = TRUE)
Constellation for Mortality Reporting System
Description
Constellation for the Mortality Reporting System considering age and cause classification.
Usage
ct_mrs
Format
A constellation
object.
Examples
# Defined by:
ct_mrs <- constellation(list(st_mrs_age, st_mrs_cause), name = "mrs")
Constellation for Mortality Reporting System Test
Description
Constellation for the Mortality Reporting System considering age and cause classification data test.
Usage
ct_mrs_test
Format
A constellation
object.
Examples
# Defined by:
ct_mrs_test <-
constellation(list(st_mrs_age_test, st_mrs_cause_test), name = "mrs_test")
Define dimensions in a dimensional_model
object
Description
To define a dimension in a dimensional_model
object, we have to define its
name and the set of attributes that make it up.
Usage
define_dimension(st, name = NULL, attributes = NULL)
## S3 method for class 'dimensional_model'
define_dimension(st, name = NULL, attributes = NULL)
Arguments
st |
A |
name |
A string, name of the dimension. |
attributes |
A vector of attribute names. |
Details
To get a star schema (a star_schema
object) we need a flat table
(implemented through a tibble
) and a dimensional_model
object. The
definition of dimensions in the dimensional_model
object is made from the
flat table column names. Using the dput
function we can list the column
names of the flat table so that we do not have to type their names.
Value
A dimensional_model
object.
See Also
Other star definition functions:
define_fact()
,
dimensional_model()
Examples
# dput(colnames(mrs_age))
#
# c(
# "Reception Year",
# "Reception Week",
# "Reception Date",
# "Data Availability Year",
# "Data Availability Week",
# "Data Availability Date",
# "Year",
# "WEEK",
# "Week Ending Date",
# "REGION",
# "State",
# "City",
# "Age Range",
# "Deaths"
# )
dm <- dimensional_model() |>
define_dimension(name = "When",
attributes = c("Week Ending Date",
"WEEK",
"Year")) |>
define_dimension(name = "When Available",
attributes = c("Data Availability Date",
"Data Availability Week",
"Data Availability Year")) |>
define_dimension(name = "Where",
attributes = c("REGION",
"State",
"City")) |>
define_dimension(name = "Who",
attributes = c("Age Range"))
Define facts in a dimensional_model
object
Description
To define facts in a dimensional_model
object, the essential data is a name
and a set of measurements that can be empty (does not have explicit
measurements). Associated with each measurement, an aggregation function is
required, which by default is SUM.
Usage
define_fact(
st,
name = NULL,
measures = NULL,
agg_functions = NULL,
nrow_agg = "nrow_agg"
)
## S3 method for class 'dimensional_model'
define_fact(
st,
name = NULL,
measures = NULL,
agg_functions = NULL,
nrow_agg = "nrow_agg"
)
Arguments
st |
A |
name |
A string, name of the fact. |
measures |
A vector of measure names. |
agg_functions |
A vector of aggregation function names. If none is indicated, the default is SUM. Additionally they can be MAX or MIN. |
nrow_agg |
A string, measurement name for the number of rows aggregated. |
Details
To get a star schema (a star_schema
object) we need a flat table
(implemented through a tibble
) and a dimensional_model
object. The
definition of facts in the dimensional_model
object is made from the flat
table column names. Using the dput
function we can list the column names of
the flat table so that we do not have to type their names.
Associated with each measurement there is an aggregation function that can be SUM, MAX or MIN. Mean is not considered among the possible aggregation functions: The reason is that calculating the mean by considering subsets of data does not necessarily yield the mean of the total data.
An additional measurement corresponding to the number of aggregated rows is always added which, together with SUM, allows us to obtain the mean if needed.
Value
A dimensional_model
object.
See Also
Other star definition functions:
define_dimension()
,
dimensional_model()
Examples
# dput(colnames(mrs_age))
#
# c(
# "Reception Year",
# "Reception Week",
# "Reception Date",
# "Data Availability Year",
# "Data Availability Week",
# "Data Availability Date",
# "Year",
# "WEEK",
# "Week Ending Date",
# "REGION",
# "State",
# "City",
# "Age Range",
# "Deaths"
# )
dm <- dimensional_model() |>
define_fact(
name = "mrs_age",
measures = c("Deaths"),
agg_functions = c("SUM"),
nrow_agg = "nrow_agg"
)
dm <- dimensional_model() |>
define_fact(
name = "mrs_age",
measures = c("Deaths")
)
dm <- dimensional_model() |>
define_fact(name = "Factless fact")
Define selected dimensions
Description
Include the selected dimensions and only the selected attributes in them.
Usage
define_selected_dimensions(dq)
Arguments
dq |
A |
Value
A dimensional_query
object.
Define selected facts
Description
Measure names are stored as the names of the columns with the aggregation functions.
Usage
define_selected_facts(dq)
Arguments
dq |
A |
Value
A dimensional_query
object.
Delete records
Description
Delete records with the same primary key.
Usage
delete_records(ft, ft_new, fk)
Arguments
ft |
A |
ft_new |
A |
fk |
A vector of foreign key names. |
Value
A fact_table
object.
Delete unused foreign keys
Description
In facts, remove foreign keys from dimensions not included in the result.
Usage
delete_unused_foreign_keys(dq)
Arguments
dq |
A |
Value
A dimensional_query
object.
Dereference a dimension
Description
Given a dimension, transform the fact table so that the primary key of the dimension (which is a foreign key in the fact table) is replaced by the other attributes of the dimension.
Usage
dereference_dimension(ft, dimension, conversion = TRUE)
Arguments
ft |
A |
dimension |
A |
conversion |
A boolean, indicates whether the attributes need to be transformed. |
Value
A fact_table
object.
dimensional_model
S3 class
Description
An empty dimensional_model
object is created in which definition of facts
and dimensions can be added.
Usage
dimensional_model()
Details
To get a star schema (a star_schema
object) we need a flat table
(implemented through a tibble
) and a dimensional_model
object. The
definition of facts and dimensions in the dimensional_model
object is made
from the flat table columns. Each attribute can only appear once in the
definition.
Value
A dimensional_model
object.
See Also
Other star definition functions:
define_dimension()
,
define_fact()
Examples
dm <- dimensional_model()
dimensional_query
S3 class
Description
An empty dimensional_query
object is created where you can select fact
measures, dimension attributes and filter dimension rows.
Usage
dimensional_query(ms = NULL)
Arguments
ms |
A |
Value
A dimensional_query
object.
See Also
Other query functions:
filter_dimension()
,
run_query()
,
select_dimension()
,
select_fact()
Examples
# ms_mrs <- ct_mrs |>
# constellation_as_multistar()
# dq <- dimensional_query(ms_mrs)
Star Definition for Mortality Reporting System by Age
Description
Definition of facts and dimensions for the Mortality Reporting System considering the age classification.
Usage
dm_mrs_age
Format
A dimensional_model
object.
Examples
# Defined by:
dm_mrs_age <- dimensional_model() |>
define_fact(
name = "mrs_age",
measures = c(
"Deaths"
),
agg_functions = c(
"SUM"
),
nrow_agg = "nrow_agg"
) |>
define_dimension(
name = "when",
attributes = c(
"Week Ending Date",
"WEEK",
"Year"
)
) |>
define_dimension(
name = "when_available",
attributes = c(
"Data Availability Date",
"Data Availability Week",
"Data Availability Year"
)
) |>
define_dimension(
name = "where",
attributes = c(
"REGION",
"State",
"City"
)
) |>
define_dimension(
name = "who",
attributes = c(
"Age Range"
)
)
Star Definition for Mortality Reporting System by Cause
Description
Definition of facts and dimensions for the Mortality Reporting System considering the cause classification.
Usage
dm_mrs_cause
Format
A dimensional_model
object.
Examples
# Defined by:
dm_mrs_cause <- dimensional_model() |>
define_fact(
name = "mrs_cause",
measures = c(
"Pneumonia and Influenza Deaths",
"Other Deaths"
),
) |>
define_dimension(
name = "when",
attributes = c(
"Week Ending Date",
"WEEK",
"Year"
)
) |>
define_dimension(
name = "when_received",
attributes = c(
"Reception Date",
"Reception Week",
"Reception Year"
)
) |>
define_dimension(
name = "when_available",
attributes = c(
"Data Availability Date",
"Data Availability Week",
"Data Availability Year"
)
) |>
define_dimension(
name = "where",
attributes = c(
"REGION",
"State",
"City"
)
)
Export selected attributes of a dimension
Description
Export the selected attributes of a dimension, without repeated combinations, to enrich the dimension.
Usage
enrich_dimension_export(st, name = NULL, attributes = NULL)
## S3 method for class 'star_schema'
enrich_dimension_export(st, name = NULL, attributes = NULL)
Arguments
st |
A |
name |
A string, name of the dimension. |
attributes |
A vector of attribute names. |
Details
If it is a role dimension they cannot be exported, you have to work with the associated role playing dimension.
Value
A tibble
object.
See Also
Other dimension enrichment functions:
enrich_dimension_import()
,
enrich_dimension_import_test()
Examples
tb <-
enrich_dimension_export(st_mrs_age,
name = "when_common",
attributes = c("week", "year"))
Import tibble
to enrich a dimension
Description
For a dimension of a star schema a tibble
is attached. This contains
dimension attributes and new attributes. If values associated with all rows
in the dimension are included in the tibble
, the dimension is enriched with
the new attributes.
Usage
enrich_dimension_import(st, name = NULL, tb)
## S3 method for class 'star_schema'
enrich_dimension_import(st, name = NULL, tb)
Arguments
st |
A |
name |
A string, name of the dimension. |
tb |
A |
Details
Role dimensions cannot be directly enriched. If a role playing dimension is enriched, the new attributes are also added to the associated role dimensions.
Value
A star_schema
object.
See Also
Other dimension enrichment functions:
enrich_dimension_export()
,
enrich_dimension_import_test()
Examples
tb <-
enrich_dimension_export(st_mrs_age,
name = "when_common",
attributes = c("week", "year"))
# Add new columns with meaningful data (these are not), possibly exporting
# data to a file, populating it and importing it.
tb <- tibble::add_column(tb, x = "x", y = "y", z = "z")
st <- enrich_dimension_import(st_mrs_age, name = "when_common", tb)
Import tibble
to test to enrich a dimension
Description
For a dimension of a star schema a tibble
is attached. This contains
dimension attributes and new attributes. If values associated with all rows
in the dimension are included in the tibble
, the dimension is enriched with
the new attributes. This function checks that there are values for all
instances. Returns the dimension instances that do not match the imported
data.
Usage
enrich_dimension_import_test(st, name = NULL, tb)
## S3 method for class 'star_schema'
enrich_dimension_import_test(st, name = NULL, tb)
Arguments
st |
A |
name |
A string, name of the dimension. |
tb |
A |
Value
A dimension
object.
See Also
Other dimension enrichment functions:
enrich_dimension_export()
,
enrich_dimension_import()
Examples
tb <-
enrich_dimension_export(st_mrs_age,
name = "when_common",
attributes = c("week", "year"))
# Add new columns with meaningful data (these are not), possibly exporting
# data to a file, populating it and importing it.
tb <- tibble::add_column(tb, x = "x", y = "y", z = "z")[-1, ]
tb2 <- enrich_dimension_import_test(st_mrs_age, name = "when_common", tb)
Filter dimension
Description
Allows you to define selection conditions for dimension rows.
Usage
filter_dimension(dq, name = NULL, ...)
## S3 method for class 'dimensional_query'
filter_dimension(dq, name = NULL, ...)
Arguments
dq |
A |
name |
A string, name of the dimension. |
... |
Conditions, defined in exactly the same way as in |
Details
Conditions can be defined on any attribute of the dimension (not only on
attributes selected in the query for the dimension). The selection is made
based on the function dplyr::filter
. Conditions are defined in exactly the
same way as in that function.
Value
A dimensional_query
object.
See Also
Other query functions:
dimensional_query()
,
run_query()
,
select_dimension()
,
select_fact()
Examples
dq <- dimensional_query(ms_mrs) |>
filter_dimension(name = "when", when_happened_week <= "03") |>
filter_dimension(name = "where", city == "Boston")
Filter fact rows
Description
Filter fact rows based on dimension conditions in a star schema. Dimensions remain unchanged.
Usage
filter_fact_rows(st, name = NULL, ...)
## S3 method for class 'star_schema'
filter_fact_rows(st, name = NULL, ...)
Arguments
st |
A |
name |
A string, name of the dimension. |
... |
Conditions, defined in exactly the same way as in |
Details
Filtered rows can be deleted using the incremental_refresh_star_schema
function.
Value
A star_schema
object.
See Also
Other incremental refresh functions:
get_star_schema()
,
get_star_schema_names()
,
incremental_refresh_constellation()
,
incremental_refresh_star_schema()
,
purge_dimensions_constellation()
,
purge_dimensions_star_schema()
Examples
st <- st_mrs_age |>
filter_fact_rows(name = "when", week <= "03") |>
filter_fact_rows(name = "where", city == "Bridgeport")
st2 <- st_mrs_age |>
incremental_refresh_star_schema(st, existing = "delete")
Filter selected instances
Description
For some dimensions the instances to include have been defined, we have the value of the primary key. They are filtered for both facts and dimensions.
Usage
filter_selected_instances(dq)
Arguments
dq |
A |
Value
A dimensional_query
object.
Find values in a dimension
Description
Find a vector of named values in a dimension.
Usage
find_values(dimension, values)
Arguments
dimension |
A |
values |
A vector of named values. |
Value
A vector of boolean.
Modelling the long-term health impacts of air pollution in London
Description
Estimation of the long-term health impacts of exposure to air pollution in London from 2016 to 2050.
Usage
ft_datagov_uk
Format
A tibble
.
Details
The original dataset contains 68 files, corresponding to 34 London areas and 2 pollutants: pollutant and zone are indicated in the name of each file. Each file has several sheets with different variables. It has been transformed into a flat table considering a single variable and defining the area and the pollutant as columns.
Source
https://data.world/datagov-uk/fd864906-8456-46a8-9a01-0dcb2dbd87b9
London Boroughs
Description
Classification of London's boroughs into zones and sub-regions.
Usage
ft_london_boroughs
Format
A tibble
.
Source
https://en.wikipedia.org/wiki/List_of_sub-regions_used_in_the_London_Plan
USA City and County
Description
City, state and county for US cities. It only includes those that appear in the Mortality Reporting System.
Usage
ft_usa_city_county
Format
A tibble
.
Source
https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
USA States
Description
Name and abbreviation of US states.
Usage
ft_usa_states
Format
A tibble
.
Source
https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
Get all dimensions
Description
Get all the dimensions of a star schema.
Usage
get_all_dimensions(st)
Arguments
st |
A |
Value
A list of dimension_table
objects.
Get attribute names
Description
Get the names of the attributes used so far in the definition.
Usage
get_attribute_names(dm)
Arguments
dm |
A |
Value
A vector of attribute names.
Get conformed dimension
Description
Get a conformed dimension of a constellation given its name.
Usage
get_conformed_dimension(ct, name)
## S3 method for class 'constellation'
get_conformed_dimension(ct, name)
Arguments
ct |
A |
name |
A string, name of the dimension. |
Value
A dimension_table
object.
See Also
Other data cleaning functions:
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
Examples
d <- ct_mrs |>
get_conformed_dimension("when")
Get conformed dimension names
Description
Get the names of the conformed dimensions of a constellation.
Usage
get_conformed_dimension_names(ct)
## S3 method for class 'constellation'
get_conformed_dimension_names(ct)
Arguments
ct |
A |
Value
A vector of dimension names.
See Also
Other data cleaning functions:
get_conformed_dimension()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
Examples
d <- ct_mrs |>
get_conformed_dimension_names()
Get conformed dimension names
Description
Get the names of the star schema conformed dimensions.
Usage
get_conformed_dimension_names_st(st)
## S3 method for class 'star_schema'
get_conformed_dimension_names_st(st)
Arguments
st |
A |
Value
A vector of dimension names.
Get dimension
Description
Get a dimension of a star schema given its name.
Usage
get_dimension(st, name)
## S3 method for class 'star_schema'
get_dimension(st, name)
Arguments
st |
A |
name |
A string, name of the dimension. |
Details
Role dimensions can be obtained but not role playing dimensions. Role dimensions get their instances of role playing dimensions.
Value
A dimension_table
object.
See Also
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
Examples
d <- st_mrs_age |>
get_dimension("when")
Get dimension attribute names
Description
Get the name of attributes in a dimension.
Usage
get_dimension_attribute_names(st, name)
## S3 method for class 'star_schema'
get_dimension_attribute_names(st, name)
Arguments
st |
A |
name |
A string, name of the dimension. |
Value
A vector of attribute names.
See Also
Other rename functions:
get_measure_names()
,
rename_dimension()
,
rename_dimension_attributes()
,
rename_fact()
,
rename_measures()
Examples
attribute_names <-
st_mrs_age |> get_dimension_attribute_names("when")
Get the dimension name
Description
Returns the name of the dimension.
Usage
get_dimension_name(dimension)
## S3 method for class 'dimension_table'
get_dimension_name(dimension)
Arguments
dimension |
A |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A string, name of the dimension.
Get dimension names
Description
Get the names of the dimensions of a star schema.
Usage
get_dimension_names(st)
## S3 method for class 'star_schema'
get_dimension_names(st)
Arguments
st |
A |
Details
Role playing dimensions are not considered.
Value
A vector of dimension names.
See Also
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
Examples
dn <- st_mrs_age |>
get_dimension_names()
Get the dimension type
Description
Returns the type of the dimension.
Usage
get_dimension_type(dimension)
## S3 method for class 'dimension_table'
get_dimension_type(dimension)
Arguments
dimension |
A |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A string, type of the dimension.
Get fact name
Description
Get the name of the fact table.
Usage
get_fact_name(st)
## S3 method for class 'star_schema'
get_fact_name(st)
Arguments
st |
A |
Value
A string, name of the fact table.
Get measure names
Description
Get the name of measures in facts.
Usage
get_measure_names(st)
## S3 method for class 'star_schema'
get_measure_names(st)
Arguments
st |
A |
Value
A vector of measure names.
See Also
Other rename functions:
get_dimension_attribute_names()
,
rename_dimension()
,
rename_dimension_attributes()
,
rename_fact()
,
rename_measures()
Examples
measure_names <-
st_mrs_age |> get_measure_names()
Get the name of the role playing dimensions
Description
Get the name of the role playing dimensions
Usage
get_name_of_role_playing_dimensions(st)
Arguments
st |
A |
Value
A vector of dimension names.
Get name of uniquely implemented dimensions
Description
Get a list of dimension names that are uniquely implemented.
Usage
get_name_of_uniquely_implemented_dimensions(st)
Arguments
st |
A |
Details
For role dimensions that share role playing dimension, only one is considered. Role playing dimensions are not considered.
Value
A vector of dimension names.
Get role dimension names associated to a role-playing dimension
Description
Each role dimension has the name of the role-playing dimension associated. This function allows us to obtain role dimension names for a role-playing dimension.
Usage
get_role_dimension_names(st, name)
Arguments
st |
A |
name |
A string, dimension name. |
Value
A vector of dimension names.
Get the associated role-playing dimension name
Description
Each role dimension has the name of the role-playing dimension associated. This function allows us to obtain its name.
Usage
get_role_playing_dimension_name(dimension)
## S3 method for class 'dimension_table'
get_role_playing_dimension_name(dimension)
Arguments
dimension |
A |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A string, name of the dimension.
Get star schema
Description
Get a star schema of a constellation given its name.
Usage
get_star_schema(ct, name)
## S3 method for class 'constellation'
get_star_schema(ct, name)
Arguments
ct |
A |
name |
A string, name of the star schema. |
Value
A dimension_table
object.
See Also
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema_names()
,
incremental_refresh_constellation()
,
incremental_refresh_star_schema()
,
purge_dimensions_constellation()
,
purge_dimensions_star_schema()
Examples
d <- ct_mrs |>
get_star_schema("mrs_age")
Get star schema names
Description
Get the names of the star schemas in a constellation.
Usage
get_star_schema_names(ct)
## S3 method for class 'constellation'
get_star_schema_names(ct)
Arguments
ct |
A |
Value
A vector of star schema names.
See Also
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema()
,
incremental_refresh_constellation()
,
incremental_refresh_star_schema()
,
purge_dimensions_constellation()
,
purge_dimensions_star_schema()
Examples
d <- ct_mrs |>
get_star_schema_names()
Group facts
Description
Once the external keys have been possibly replaced, group the rows of facts.
Usage
group_facts(dq)
Arguments
dq |
A |
Value
A dimensional_query
object.
Group records
Description
Group records with the same primary key.
Usage
group_records(ft, ft_new, fk)
Arguments
ft |
A |
ft_new |
A |
fk |
A vector of foreign key names. |
Value
A fact_table
object.
Group the records in the table
Description
Group the records in the table using the aggregation functions for the measurements.
Usage
group_table(ft)
Arguments
ft |
A |
Value
A fact_table
object.
Homogenize a dimension
Description
To merge dimensions, they must first be homogenized: the generated primary key must be removed and, if necessary, its attributes (columns) must be renamed.
Usage
homogenize(dimension, attributes = NULL)
## S3 method for class 'dimension_table'
homogenize(dimension, attributes = NULL)
Arguments
dimension |
A |
attributes |
A vector of attribute names of the dimension. |
Value
A dimension_table
object.
Incrementally refresh a constellation with a star schema
Description
Incrementally refresh a star schema in a constellation with the content of a new star schema that is integrated into the first.
Usage
incremental_refresh_constellation(ct, st, existing = "ignore")
## S3 method for class 'constellation'
incremental_refresh_constellation(ct, st, existing = "ignore")
Arguments
ct |
A |
st |
A |
existing |
A string, operation to be performed with records in the fact table whose keys match. |
Details
Once the dimensions are integrated, if there are records in the fact table
whose keys match the new ones, new ones can be ignored, they can be replaced
by new ones, all of them can be grouped using the aggregation functions, or
they can be deleted. Therefore, the possible values of the existing
parameter are: "ignore", "replace", "group" or "delete".
Value
A constellation
object.
See Also
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema()
,
get_star_schema_names()
,
incremental_refresh_star_schema()
,
purge_dimensions_constellation()
,
purge_dimensions_star_schema()
Examples
ct <- ct_mrs |>
incremental_refresh_constellation(st_mrs_age_w10, existing = "replace")
ct <- ct_mrs |>
incremental_refresh_constellation(st_mrs_cause_w10, existing = "group")
Incrementally refresh a dimension with another
Description
Incrementally refresh a dimension with the content of a new one that is integrated into the first.
Usage
incremental_refresh_dimension(dimension, dimension_new)
## S3 method for class 'dimension_table'
incremental_refresh_dimension(dimension, dimension_new)
Arguments
dimension |
A |
dimension_new |
A |
Value
A dimension_table
object.
Incrementally refresh a fact table with another
Description
Incrementally refresh a fact table with the content of a new one that is integrated into the first.
Usage
incremental_refresh_fact(ft, ft_new, existing)
## S3 method for class 'fact_table'
incremental_refresh_fact(ft, ft_new, existing)
Arguments
ft |
A |
ft_new |
A |
existing |
A string, operation to be performed with records whose keys match. |
Details
If there are records whose keys match the new ones, we can ignore, replace or group them.
Value
A fact_table
object.
Incrementally refresh a star schema with another
Description
Incrementally refresh a star schema with the content of a new one that is integrated into the first.
Usage
incremental_refresh_star_schema(st, st_new, existing = "ignore")
## S3 method for class 'star_schema'
incremental_refresh_star_schema(st, st_new, existing = "ignore")
Arguments
st |
A |
st_new |
A |
existing |
A string, operation to be performed with records in the fact table whose keys match. |
Details
Once the dimensions are integrated, if there are records in the fact table
whose keys match the new ones, new ones can be ignored, they can be replaced
by new ones, all of them can be grouped using the aggregation functions, or
they can be deleted. Therefore, the possible values of the existing
parameter are: "ignore", "replace", "group" or "delete".
Value
A star_schema
object.
See Also
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema()
,
get_star_schema_names()
,
incremental_refresh_constellation()
,
purge_dimensions_constellation()
,
purge_dimensions_star_schema()
Examples
st <- st_mrs_age |>
incremental_refresh_star_schema(st_mrs_age_w10, existing = "replace")
st <- st_mrs_cause |>
incremental_refresh_star_schema(st_mrs_cause_w10, existing = "group")
Is it conformed dimension?
Description
Indicates by means of a boolean if the dimension is a conformed dimension.
Usage
is_conformed_dimension(dimension)
## S3 method for class 'dimension_table'
is_conformed_dimension(dimension)
Arguments
dimension |
A |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A boolean.
Is dimension in set of updates?
Description
Given a set of dimension record update operations and the name of a dimension, it checks if there is any update operation to perform on the dimension.
Usage
is_dimension_in_updates(updates, name)
## S3 method for class 'record_update_set'
is_dimension_in_updates(updates, name)
Arguments
updates |
A |
name |
A string, name of the dimension. |
Value
A boolean, indicating if the dimension appears in the list of update operations.
Is it role dimension?
Description
Indicates by means of a boolean if the dimension is a role dimension.
Usage
is_role_dimension(dimension)
## S3 method for class 'dimension_table'
is_role_dimension(dimension)
Arguments
dimension |
A |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A boolean.
Is it role-playing dimension?
Description
Indicates by means of a boolean if the dimension is a role-playing dimension.
Usage
is_role_playing_dimension(dimension)
## S3 method for class 'dimension_table'
is_role_playing_dimension(dimension)
Arguments
dimension |
A |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A boolean.
Make a dimension record equal to another
Description
For a dimension, given the primary key of two records, it adds an update to the set of updates that modifies the combination of values of the rest of attributes of the first record so that they become the same as those of the second.
Usage
match_records(updates, dimension, old, new)
## S3 method for class 'record_update_set'
match_records(updates, dimension, old, new)
Arguments
updates |
A |
dimension |
A |
old |
A number, primary key of the record to update. |
new |
A number, primary key of the record from which the values are taken. |
Details
Primary keys are only used to get the combination of values easily. The update is defined exclusively from the rest of values.
It is especially useful when it is detected that two records should be only one: Two have been generated due to some data error.
Value
A record_update_set
object.
See Also
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
Examples
dim_names <- st_mrs_age |>
get_dimension_names()
where <- st_mrs_age |>
get_dimension("where")
# head(where, 2)
updates <- record_update_set() |>
match_records(dimension = where,
old = 1,
new = 2)
Apply dimension record update operations to conformed dimensions
Description
Given a list of dimension record update operations, they are applied on the conformed
dimensions of the constellation
object. Update operations must be defined
with the set of functions available for that purpose.
Usage
modify_conformed_dimension_records(ct, updates = record_update_set())
## S3 method for class 'constellation'
modify_conformed_dimension_records(ct, updates = record_update_set())
Arguments
ct |
A |
updates |
A |
Details
When dimensions are defined, records can be detected that must be modified as part of the data cleaning process: frequently to unify two or more records due to data errors or missing data. This is not immediate because facts must be adapted to the new set of dimension instances.
This operation allows us to unify records and automatically propagate modifications to facts in star schemas.
Value
A constellation
object.
See Also
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
Examples
ct <- ct_mrs |>
modify_conformed_dimension_records(updates_st_mrs_age)
Apply dimension record update operations
Description
Given a list of dimension record update operations, they are applied on the
dimensions of the star_schema
object. Update operations must be defined
with the set of functions available for that purpose.
Usage
modify_dimension_records(st, updates = record_update_set())
## S3 method for class 'star_schema'
modify_dimension_records(st, updates = record_update_set())
Arguments
st |
A |
updates |
A |
Details
When dimensions are defined, records can be detected that must be modified as part of the data cleaning process: frequently to unify two or more records due to data errors or missing data. This is not immediate because facts must be adapted to the new set of dimension instances.
This operation allows us to unify records and automatically propagate modifications to facts.
The list of update operations can be applied repeatedly to new data received
to be incorporated into the star_schema
object.
Value
A star_schema
object.
See Also
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
Examples
st <- st_mrs_age |>
modify_dimension_records(updates_st_mrs_age)
Mortality Reporting System
Description
Selection of data from the 122 Cities Mortality Reporting System, for the first 11 weeks of 1962.
Usage
mrs
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Source
Mortality Reporting System by Age
Description
Selection of data from the 122 Cities Mortality Reporting System by age group, for the first 9 weeks of 1962.
Usage
mrs_age
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Source
Mortality Reporting System by Age Test
Description
Selection of data from the 2 Cities Mortality Reporting System by age group, for the first 3 weeks of 1962.
Usage
mrs_age_test
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Source
Mortality Reporting System by Age for Week 10
Description
Selection of data from the 122 Cities Mortality Reporting System by age group, for week 10 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
Usage
mrs_age_w10
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Source
Mortality Reporting System by Age for Week 11
Description
Selection of data from the 122 Cities Mortality Reporting System by age group, for week 11 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
Usage
mrs_age_w11
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Source
Mortality Reporting System by Age for Week Test
Description
Selection of data from the 3 Cities Mortality Reporting System by age group, for week 4 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
Usage
mrs_age_w_test
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Source
Mortality Reporting System by Cause
Description
Selection of data from the 122 Cities Mortality Reporting System by cause, for the first 9 weeks of 1962.
Usage
mrs_cause
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Source
Mortality Reporting System by Cause Test
Description
Selection of data from the 2 Cities Mortality Reporting System by cause, for the first 3 weeks of 1962.
Usage
mrs_cause_test
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Source
Mortality Reporting System by Cause for Week 10
Description
Selection of data from the 122 Cities Mortality Reporting System by cause, for week 10 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
Usage
mrs_cause_w10
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Source
Mortality Reporting System by Cause for Week 11
Description
Selection of data from the 122 Cities Mortality Reporting System by cause, for week 11 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
Usage
mrs_cause_w11
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Source
Mortality Reporting System by Cause for Week Test
Description
Selection of data from the 3 Cities Mortality Reporting System by cause, for week 4 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
Usage
mrs_cause_w_test
Format
A tibble
.
Details
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Source
Multistar for Mortality Reporting System
Description
Multistar for the Mortality Reporting System considering age and cause classification. It is the result obtained in the vignette.
Usage
ms_mrs
Format
A multistar
object.
Examples
# Defined by:
ms_mrs <- ct_mrs |>
constellation_as_multistar()
Multistar for Mortality Reporting System Test
Description
Multistar for the Mortality Reporting System considering age and cause classification data test.
Usage
ms_mrs_test
Format
A multistar
object.
Examples
# Defined by:
ms_mrs_test <- ct_mrs_test |>
constellation_as_multistar()
Export a multistar
as a flat table
Description
We can obtain a flat table, implemented using a tibble
, from a multistar
(which can be the result of a query). If it only has one fact table, it is
not necessary to provide its name.
Usage
multistar_as_flat_table(ms, fact = NULL)
## S3 method for class 'multistar'
multistar_as_flat_table(ms, fact = NULL)
Arguments
ms |
A |
fact |
A string, name of the fact. |
Value
A tibble
.
See Also
Other results export functions:
constellation_as_multistar()
,
constellation_as_tibble_list()
,
star_schema_as_flat_table()
,
star_schema_as_multistar()
,
star_schema_as_tibble_list()
Examples
ft <- ms_mrs |>
multistar_as_flat_table(fact = "mrs_age")
ms <- dimensional_query(ms_mrs) |>
select_dimension(name = "where",
attributes = c("city", "state")) |>
select_dimension(name = "when",
attributes = c("when_happened_year")) |>
select_fact(name = "mrs_age",
measures = c("n_deaths")) |>
select_fact(
name = "mrs_cause",
measures = c("pneumonia_and_influenza_deaths", "other_deaths")
) |>
filter_dimension(name = "when", when_happened_week <= "03") |>
filter_dimension(name = "where", city == "Boston") |>
run_query()
ft <- ms |>
multistar_as_flat_table()
constellation
S3 class
Description
Internal low-level constructor that creates new objects with the correct structure.
Usage
new_constellation(lst = list(), name = NULL)
Arguments
lst |
A list of |
name |
A string. |
Value
A constellation
object.
dimension_table
S3 class
Description
Internal low-level constructor that creates new objects with the correct structure.
Usage
new_dimension_table(ft = tibble::tibble(), name = NULL, type = "general")
Arguments
ft |
A |
name |
A string, name of the dimension. |
type |
A string, type of the dimension. |
Details
Types considered: (general), (role, role_playing), (conformed).
Value
A dimension_table
object.
dimensional_model S3 class
Description
Internal low-level constructor that creates new objects with the correct structure.
Usage
new_dimensional_model()
Value
A dimensional_model
object.
dimensional_query
S3 class
Description
Internal low-level constructor that creates new objects with the correct structure.
Usage
new_dimensional_query(ms = NULL)
Value
A dimensional_query
object.
fact_table
S3 class
Description
Internal low-level constructor that creates new objects with the correct structure.
Usage
new_fact_table(
ft = tibble::tibble(),
name = NULL,
measures = NULL,
agg_functions = NULL,
nrow_agg = NULL
)
Arguments
ft |
A |
name |
A string, name of the fact. |
measures |
A vector of measurement names. |
agg_functions |
A vector of aggregation function names. |
nrow_agg |
A string, measurement name for the number of rows aggregated. |
Value
A fact_table
object.
multistar
S3 class
Description
Internal low-level constructor that creates new objects with the correct structure.
Usage
new_multistar(fl = list(), dl = list())
Arguments
fl |
A |
dl |
A |
Details
It only distinguishes between general and conformed dimensions, each dimension has its own data. It can contain multiple fact tables.
Value
A multistar
object.
record_update S3 class
Description
Internal low-level constructor that creates new objects with the correct structure.
Usage
new_record_update(dimension, old, new)
Details
For a dimension, it relates old record field values to the new values to replace them.
Value
A record_update
object.
record_update_set S3 class
Description
Internal low-level constructor that creates new objects with the correct structure.
Usage
new_record_update_set()
Value
A record_update_set
object.
star_schema S3 class
Description
Internal low-level constructor that creates new objects with the correct structure.
Usage
new_star_schema(ft = tibble::tibble(), sd = dimensional_model())
Arguments
ft |
A |
sd |
A |
Value
A star_schema
object.
Transform a tibble
to join
Description
Transform all fields in a tibble
to character type and replace the NA
with a specific value.
Usage
prepare_join(tb)
Arguments
tb |
A |
Value
A tibble
.
Purge dimensions in a constellation
Description
Delete instances of dimensions not related to facts in a constellation.
Usage
purge_dimensions_constellation(ct)
## S3 method for class 'constellation'
purge_dimensions_constellation(ct)
Arguments
ct |
A |
Value
A constellation
object.
See Also
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema()
,
get_star_schema_names()
,
incremental_refresh_constellation()
,
incremental_refresh_star_schema()
,
purge_dimensions_star_schema()
Examples
ct <- ct_mrs |>
purge_dimensions_constellation()
Purge dimensions
Description
Delete instances of dimensions not related to facts in a star schema.
Usage
purge_dimensions_star_schema(st)
## S3 method for class 'star_schema'
purge_dimensions_star_schema(st)
Arguments
st |
A |
Value
A star_schema
object.
See Also
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema()
,
get_star_schema_names()
,
incremental_refresh_constellation()
,
incremental_refresh_star_schema()
,
purge_dimensions_constellation()
Examples
st <- st_mrs_age |>
purge_dimensions_star_schema()
record_update_set
S3 class
Description
A record_update_set
object is created. Stores updates on dimension records.
Usage
record_update_set()
Details
Each update is made up of a dimension name, an old value set, and a new value set.
When the update is applied, all the dimension records that have the combination of old values are modified with the new values provided.
Value
A record_update_set
object.
See Also
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
update_record()
,
update_selection()
,
update_selection_general()
Examples
updates <- record_update_set()
Reference a dimension
Description
Given a dimension, transform the fact table so that the attributes of the dimension indicated as a parameter, which are in the fact table, are replaced by the other attributes of the dimension.
Usage
reference_dimension(ft, dimension, attributes, conversion = TRUE)
Arguments
ft |
A |
dimension |
A |
attributes |
A vector of attribute names, attributes used to reference the dimension. |
conversion |
A boolean, indicates whether the attributes need to be transformed. |
Details
It is used to replace a set of attributes in the fact table with the generated key of the dimension.
If necessary, it is also used for the inverse operation: replace the generated key with the rest of attributes (dereference a dimension).
Value
A fact_table
object.
Remove duplicate dimension rows
Description
After selecting only a few columns of the dimensions, there may be rows with duplicate values. We eliminate duplicates and adapt facts to the new dimensions.
Usage
remove_duplicate_dimension_rows(dq)
Arguments
dq |
A |
Value
A dimensional_query
object.
Rename dimension
Description
Set new name for a dimension.
Usage
rename_dimension(st, name, new_name)
## S3 method for class 'star_schema'
rename_dimension(st, name, new_name)
Arguments
st |
A |
name |
A string, name of the dimension. |
new_name |
A string, new name of the dimension. |
Value
A star_schema
object.
See Also
Other rename functions:
get_dimension_attribute_names()
,
get_measure_names()
,
rename_dimension_attributes()
,
rename_fact()
,
rename_measures()
Examples
st <- st_mrs_age |>
rename_dimension(name = "when", new_name = "when_happened")
Rename dimension attributes
Description
Set new names of some attributes in a dimension.
Usage
rename_dimension_attributes(st, name, attributes, new_names)
## S3 method for class 'star_schema'
rename_dimension_attributes(st, name, attributes, new_names)
Arguments
st |
A |
name |
A string, name of the dimension. |
attributes |
A vector of attribute names. |
new_names |
A vector of new attribute names. |
Value
A star_schema
object.
See Also
Other rename functions:
get_dimension_attribute_names()
,
get_measure_names()
,
rename_dimension()
,
rename_fact()
,
rename_measures()
Examples
st <-
st_mrs_age |> rename_dimension_attributes(
name = "when",
attributes = c("week", "year"),
new_names = c("w", "y")
)
Rename fact
Description
Set new name for facts.
Usage
rename_fact(st, name)
## S3 method for class 'star_schema'
rename_fact(st, name)
Arguments
st |
A |
name |
A string, new name of the fact. |
Value
A star_schema
object.
See Also
Other rename functions:
get_dimension_attribute_names()
,
get_measure_names()
,
rename_dimension()
,
rename_dimension_attributes()
,
rename_measures()
Examples
st <- st_mrs_age |> rename_fact("age")
Rename measures
Description
Set new names of some measures in facts.
Usage
rename_measures(st, measures, new_names)
## S3 method for class 'star_schema'
rename_measures(st, measures, new_names)
Arguments
st |
A |
measures |
A vector of measure names. |
new_names |
A vector of new measure names. |
Value
A star_schema
object.
See Also
Other rename functions:
get_dimension_attribute_names()
,
get_measure_names()
,
rename_dimension()
,
rename_dimension_attributes()
,
rename_fact()
Examples
st <-
st_mrs_age |> rename_measures(measures = c("deaths"),
new_names = c("n_deaths"))
Replace a star schema dimension
Description
Replace dimension with another that contains all the instances of the first and, possibly, some more, in a star schema.
Usage
replace_dimension(st, name, dimension)
## S3 method for class 'star_schema'
replace_dimension(st, name, dimension)
Arguments
st |
A |
name |
A string, name of the dimension. |
dimension |
A |
Value
A star_schema
object.
Replace in facts a star schema dimension
Description
This operation can be due to integrating several dimensions in a constellation or an incremental update of a dimension (indicated with the boolean parameter). The new dimension replaces in facts the original dimension, whose name is indicated.
Usage
replace_dimension_in_facts(st, name, dimension, set_type_conformed = FALSE)
## S3 method for class 'star_schema'
replace_dimension_in_facts(st, name, dimension, set_type_conformed = FALSE)
Arguments
st |
A |
name |
A string, name of the dimension. |
dimension |
A |
set_type_conformed |
A boolean. |
Value
A star_schema
object.
Replace in facts a star schema general dimension
Description
Replace in facts a star schema general dimension
Usage
replace_general_dimension_in_facts(st, name, dimension)
Arguments
st |
A |
name |
A string, name of the dimension. |
dimension |
A |
Value
A star_schema
object.
Replace records
Description
Replace records with the same primary key.
Usage
replace_records(ft, ft_new, fk)
Arguments
ft |
A |
ft_new |
A |
fk |
A vector of foreign key names. |
Value
A fact_table
object.
Replace in facts a star schema role dimension
Description
Replace in facts a star schema role dimension
Usage
replace_role_dimension_in_facts(st, name, dimension, dimension_names)
Arguments
st |
A |
name |
A string, name of the dimension. |
dimension |
A |
dimension_names |
A vector of dimension names. |
Value
A star_schema
object.
Transform a dimension into a role dimension
Description
Once the role-playing dimension has been generated, the dimensions from which it has been defined are transformed into role dimensions. Records are removed as they are obtained from the role-playing dimension.
Usage
role_dimension(dimension, role_playing_name)
## S3 method for class 'dimension_table'
role_dimension(dimension, role_playing_name)
Arguments
dimension |
A |
role_playing_name |
A string, name of role-playing dimension. |
Value
A dimension_table
object.
Define a role playing dimension in a star_schema
object
Description
Given a list of star_schema
dimension names, all with the same structure, a
role playing dimension with the indicated name and attributes is generated.
The original dimensions become role dimensions defined from the new role
playing dimension.
Usage
role_playing_dimension(st, dim_names, name = NULL, attributes = NULL)
## S3 method for class 'star_schema'
role_playing_dimension(st, dim_names, name = NULL, attributes = NULL)
Arguments
st |
A |
dim_names |
A vector of dimension names. |
name |
A string, name of the role playing dimension. |
attributes |
A vector of attribute names of the role playing dimension. |
Details
After definition, all role dimensions have the same virtual instances (those of the role playing dimension). The foreign keys in facts are adapted to this new situation.
Value
A star_schema
object.
See Also
Other star schema and constellation definition functions:
character_dimensions()
,
constellation()
,
snake_case()
,
star_schema()
Examples
st <- star_schema(mrs_age, dm_mrs_age) |>
role_playing_dimension(
dim_names = c("when", "when_available"),
name = "When Common",
attributes = c("Date", "Week", "Year")
)
st <- star_schema(mrs_cause, dm_mrs_cause) |>
role_playing_dimension(
dim_names = c("when", "when_received", "when_available"),
name = "when_common",
attributes = c("date", "week", "year")
)
Run query
Description
Once we have selected the facts, dimensions and defined the conditions on the instances, we can execute the query to obtain the result.
Usage
run_query(dq, unify_by_grain = TRUE)
## S3 method for class 'dimensional_query'
run_query(dq, unify_by_grain = TRUE)
Arguments
dq |
A |
unify_by_grain |
A boolean, unify facts with the same grain. |
Details
As an option, we can indicate if we do not want to unify the facts in the case of having the same grain.
Value
A dimensional_query
object.
See Also
Other query functions:
dimensional_query()
,
filter_dimension()
,
select_dimension()
,
select_fact()
Examples
ms <- dimensional_query(ms_mrs) |>
select_dimension(name = "where",
attributes = c("city", "state")) |>
select_dimension(name = "when",
attributes = c("when_happened_year")) |>
select_fact(
name = "mrs_age",
measures = c("n_deaths"),
agg_functions = c("MAX")
) |>
select_fact(
name = "mrs_cause",
measures = c("pneumonia_and_influenza_deaths", "other_deaths")
) |>
filter_dimension(name = "when", when_happened_week <= "03") |>
filter_dimension(name = "where", city == "Boston") |>
run_query()
Select dimension
Description
To add a dimension in a dimensional_query
object, we have to define its
name and a subset of the dimension attributes. If only the name of the
dimension is indicated, it is considered that all its attributes should be
added.
Usage
select_dimension(dq, name = NULL, attributes = NULL)
## S3 method for class 'dimensional_query'
select_dimension(dq, name = NULL, attributes = NULL)
Arguments
dq |
A |
name |
A string, name of the dimension. |
attributes |
A vector of attribute names. |
Value
A dimensional_query
object.
See Also
Other query functions:
dimensional_query()
,
filter_dimension()
,
run_query()
,
select_fact()
Examples
dq <- dimensional_query(ms_mrs) |>
select_dimension(name = "where",
attributes = c("city", "state")) |>
select_dimension(name = "when")
Select fact
Description
To define the fact to be consulted, its name is indicated, optionally, a vector of names of selected measures and another of aggregation functions are also indicated.
Usage
select_fact(dq, name = NULL, measures = NULL, agg_functions = NULL)
## S3 method for class 'dimensional_query'
select_fact(dq, name = NULL, measures = NULL, agg_functions = NULL)
Arguments
dq |
A |
name |
A string, name of the fact. |
measures |
A vector of measure names. |
agg_functions |
A vector of aggregation function names. If none is indicated, those defined in the fact table are considered. |
Details
If the name of any measure is not indicated, only the one corresponding to the number of aggregated rows is included, which is always included.
If no aggregation function is included, those defined for the measures are considered.
Value
A dimensional_query
object.
See Also
Other query functions:
dimensional_query()
,
filter_dimension()
,
run_query()
,
select_dimension()
Examples
dq <- dimensional_query(ms_mrs) |>
select_fact(
name = "mrs_age",
measures = c("n_deaths"),
agg_functions = c("MAX")
)
dq <- dimensional_query(ms_mrs) |>
select_fact(name = "mrs_age",
measures = c("n_deaths"))
dq <- dimensional_query(ms_mrs) |>
select_fact(name = "mrs_age")
Generate a record selection bitmap
Description
Obtain a vector of boolean to select the records in the table that have the combination of values.
Usage
selection_bit_map(table, values, names)
Arguments
table |
A |
values |
A |
names |
A vector of column names to consider. |
Value
A vector of boolean.
Set the dimension name
Description
It allows us to define the name of the dimension.
Usage
set_dimension_name(dimension, name)
## S3 method for class 'dimension_table'
set_dimension_name(dimension, name)
Arguments
dimension |
A |
name |
A string, name of the dimension. |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A dimension_table
object.
Set the dimension type
Description
It allows us to define the type of the dimension.
Usage
set_dimension_type(dimension, type)
## S3 method for class 'dimension_table'
set_dimension_type(dimension, type)
Arguments
dimension |
A |
type |
A string, type of the dimension. |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A dimension_table
object.
Set the type of a conformed dimension
Description
It allows us to define the type of a conformed dimension.
Usage
set_dimension_type_conformed(dimension)
## S3 method for class 'dimension_table'
set_dimension_type_conformed(dimension)
Arguments
dimension |
A |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A dimension_table
object.
Set the type of a role-playing dimension
Description
It allows us to define the type of a role-playing dimension.
Usage
set_dimension_type_role_playing(dimension)
## S3 method for class 'dimension_table'
set_dimension_type_role_playing(dimension)
Arguments
dimension |
A |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A dimension_table
object.
Set fact name
Description
It allows us to define the name of facts.
Usage
set_fact_name(ft, name)
## S3 method for class 'fact_table'
set_fact_name(ft, name)
Arguments
ft |
A |
name |
A string, name of fact. |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A fact_table
object.
Set the associated role-playing dimension name
Description
Each role dimension has the name of the role-playing dimension associated. This function allows us to set its name.
Usage
set_role_playing_dimension_name(dimension, name)
## S3 method for class 'dimension_table'
set_role_playing_dimension_name(dimension, name)
Arguments
dimension |
A |
name |
A string, name of role-playing dimension. |
Details
Attributes can be accessed directly but this function has been defined because it is used from other classes and is thus done in a more controlled way.
Value
A dimension_table
object.
Transform names according to the snake case style
Description
Transform fact, dimension, measurement, and attribute names according to the snake case style.
Usage
snake_case(st)
## S3 method for class 'star_schema'
snake_case(st)
Arguments
st |
A |
Details
This style is suitable if we are going to work with databases.
Value
A star_schema
object.
See Also
Other star schema and constellation definition functions:
character_dimensions()
,
constellation()
,
role_playing_dimension()
,
star_schema()
Examples
st <- star_schema(mrs_age, dm_mrs_age) |>
snake_case()
st <- star_schema(mrs_age, dm_mrs_age) |>
role_playing_dimension(
dim_names = c("when", "when_available"),
name = "When Common",
attributes = c("Date", "Week", "Year")
) |>
snake_case()
Transform names according to the snake case style in a dimension
Description
Transform column, attribute and dimension names according to the snake case style.
Usage
snake_case_dimension(dimension)
## S3 method for class 'dimension_table'
snake_case_dimension(dimension)
Arguments
dimension |
A |
Value
A dimension_table
object.
Transform names according to the snake case style in a fact table
Description
Transform foreign keys, measures and fact table names according to the snake case style.
Usage
snake_case_fact(ft)
## S3 method for class 'fact_table'
snake_case_fact(ft)
Arguments
ft |
A |
Value
A fact_table
object.
Star Schema for Mortality Reporting System by Age
Description
Star Schema for the Mortality Reporting System considering the age classification.
Usage
st_mrs_age
Format
A star_schema
object.
Examples
# Defined by:
st_mrs_age <- star_schema(mrs_age, dm_mrs_age) |>
role_playing_dimension(
dim_names = c("when", "when_available"),
name = "When Common",
attributes = c("date", "week", "year")
) |>
snake_case() |>
character_dimensions(NA_replacement_value = "Unknown",
length_integers = list(week = 2))
Star Schema for Mortality Reporting System by Age Test
Description
Star Schema for the Mortality Reporting System considering the age classification data test.
Usage
st_mrs_age_test
Format
A star_schema
object.
Examples
# Defined by:
st_mrs_age_test <- star_schema(mrs_age_test, dm_mrs_age) |>
role_playing_dimension(
dim_names = c("when", "when_available"),
name = "When Common",
attributes = c("date", "week", "year")
) |>
snake_case() |>
character_dimensions(NA_replacement_value = "Unknown",
length_integers = list(week = 2))
Star Schema for Mortality Reporting System by Age for Week 10
Description
Star Schema for the Mortality Reporting System considering the age classification data, for week 10 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
Usage
st_mrs_age_w10
Format
A star_schema
object.
Examples
# Defined by:
st_mrs_age_w10 <- star_schema(mrs_age_w10, dm_mrs_age) |>
role_playing_dimension(
dim_names = c("when", "when_available"),
name = "When Common",
attributes = c("date", "week", "year")
) |>
snake_case() |>
character_dimensions(NA_replacement_value = "Unknown",
length_integers = list(week = 2))
Star Schema for Mortality Reporting System by Age for Week 11
Description
Star Schema for the Mortality Reporting System considering the age classification data, for week 11 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
Usage
st_mrs_age_w11
Format
A star_schema
object.
Examples
# Defined by:
st_mrs_age_w11 <- star_schema(mrs_age_w11, dm_mrs_age) |>
role_playing_dimension(
dim_names = c("when", "when_available"),
name = "When Common",
attributes = c("date", "week", "year")
) |>
snake_case() |>
character_dimensions(NA_replacement_value = "Unknown",
length_integers = list(week = 2))
Star Schema for Mortality Reporting System by Age for Week Test
Description
Star Schema for the Mortality Reporting System considering the age classification data test, for week 4 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
Usage
st_mrs_age_w_test
Format
A star_schema
object.
Examples
# Defined by:
st_mrs_age_w_test <- star_schema(mrs_age_w_test, dm_mrs_age) |>
role_playing_dimension(
dim_names = c("when", "when_available"),
name = "When Common",
attributes = c("date", "week", "year")
) |>
snake_case() |>
character_dimensions(NA_replacement_value = "Unknown",
length_integers = list(week = 2))
Star Schema for Mortality Reporting System by Cause
Description
Star Schema for the Mortality Reporting System considering the cause classification.
Usage
st_mrs_cause
Format
A star_schema
object.
Examples
# Defined by:
st_mrs_cause <- star_schema(mrs_cause, dm_mrs_cause) |>
snake_case() |>
character_dimensions(
NA_replacement_value = "Unknown",
length_integers = list(
week = 2,
data_availability_week = 2,
reception_week = 2
)
) |>
role_playing_dimension(
dim_names = c("when", "when_received", "when_available"),
name = "when_common",
attributes = c("date", "week", "year")
)
Star Schema for Mortality Reporting System by Cause Test
Description
Star Schema for the Mortality Reporting System considering the cause classification data test.
Usage
st_mrs_cause_test
Format
A star_schema
object.
Examples
# Defined by:
st_mrs_cause_test <- star_schema(mrs_cause_test, dm_mrs_cause) |>
snake_case() |>
character_dimensions(
NA_replacement_value = "Unknown",
length_integers = list(
week = 2,
data_availability_week = 2,
reception_week = 2
)
) |>
role_playing_dimension(
dim_names = c("when", "when_received", "when_available"),
name = "when_common",
attributes = c("date", "week", "year")
)
Star Schema for Mortality Reporting System by Cause for Week 10
Description
Star Schema for the Mortality Reporting System considering the cause classification data, for week 10 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
Usage
st_mrs_cause_w10
Format
A star_schema
object.
Examples
# Defined by:
st_mrs_cause_w10 <- star_schema(mrs_cause_w10, dm_mrs_cause) |>
snake_case() |>
character_dimensions(
NA_replacement_value = "Unknown",
length_integers = list(
week = 2,
data_availability_week = 2,
reception_week = 2
)
) |>
role_playing_dimension(
dim_names = c("when", "when_received", "when_available"),
name = "when_common",
attributes = c("date", "week", "year")
)
Star Schema for Mortality Reporting System by Cause for Week 11
Description
Star Schema for the Mortality Reporting System considering the cause classification data, for week 11 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
Usage
st_mrs_cause_w11
Format
A star_schema
object.
Examples
# Defined by:
st_mrs_cause_w11 <- star_schema(mrs_cause_w11, dm_mrs_cause) |>
snake_case() |>
character_dimensions(
NA_replacement_value = "Unknown",
length_integers = list(
week = 2,
data_availability_week = 2,
reception_week = 2
)
) |>
role_playing_dimension(
dim_names = c("when", "when_received", "when_available"),
name = "when_common",
attributes = c("date", "week", "year")
)
Star Schema for Mortality Reporting System by Cause for Week Test
Description
Star Schema for the Mortality Reporting System considering the cause classification data test, for week 4 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
Usage
st_mrs_cause_w_test
Format
A star_schema
object.
Examples
# Defined by:
st_mrs_cause_w_test <- star_schema(mrs_cause_w_test, dm_mrs_cause) |>
snake_case() |>
character_dimensions(
NA_replacement_value = "Unknown",
length_integers = list(
week = 2,
data_availability_week = 2,
reception_week = 2
)
) |>
role_playing_dimension(
dim_names = c("when", "when_received", "when_available"),
name = "when_common",
attributes = c("date", "week", "year")
)
star_schema
S3 class
Description
Creates a star_schema
object from a flat table (implemented by a tibble
)
and a dimensional_model
object.
Usage
star_schema(ft, sd)
Arguments
ft |
A |
sd |
A |
Details
Transforms the flat table data according to the facts and dimension
definitions of the dimensional_model
object. Each dimension is generated with
a surrogate key which is a foreign key in facts.
Facts only contain measurements and foreign keys.
Value
A star_schema
object.
See Also
Other star schema and constellation definition functions:
character_dimensions()
,
constellation()
,
role_playing_dimension()
,
snake_case()
Examples
st <- star_schema(mrs_age, dm_mrs_age)
Export a star schema as a flat table
Description
Once we have refined the format or content of facts and dimensions, we can
again obtain a flat table, implemented using a tibble
, from a star schema.
Usage
star_schema_as_flat_table(st)
## S3 method for class 'star_schema'
star_schema_as_flat_table(st)
Arguments
st |
A |
Value
A tibble
.
See Also
Other results export functions:
constellation_as_multistar()
,
constellation_as_tibble_list()
,
multistar_as_flat_table()
,
star_schema_as_multistar()
,
star_schema_as_tibble_list()
Examples
ft <- st_mrs_age |>
star_schema_as_flat_table()
Star schema as multistar
export (common)
Description
Star schema as multistar
export (common)
Usage
star_schema_as_mst(st, fl = NULL, dl = NULL, commondim = NULL)
## S3 method for class 'star_schema'
star_schema_as_mst(st, fl = NULL, dl = NULL, commondim = NULL)
Arguments
st |
A |
fl |
A list of |
dl |
A list of |
commondim |
A list of dimension names already included. |
Value
A multistar
object.
Export a star schema as a multistar
Description
Once we have refined the format or content of facts and dimensions, we can
obtain a multistar
. A multistar
only distinguishes between general and
conformed dimensions, each dimension has its own data. It can contain
multiple fact tables.
Usage
star_schema_as_multistar(st)
## S3 method for class 'star_schema'
star_schema_as_multistar(st)
Arguments
st |
A |
Value
A multistar
object.
See Also
Other results export functions:
constellation_as_multistar()
,
constellation_as_tibble_list()
,
multistar_as_flat_table()
,
star_schema_as_flat_table()
,
star_schema_as_tibble_list()
Examples
ms <- st_mrs_age |>
star_schema_as_multistar()
Export a star schema as a tibble
list
Description
Once we have refined the format or content of facts and dimensions, we can
obtain a tibble
list with them. Role playing dimensions can be optionally
included.
Usage
star_schema_as_tibble_list(st, include_role_playing = FALSE)
## S3 method for class 'star_schema'
star_schema_as_tibble_list(st, include_role_playing = FALSE)
Arguments
st |
A |
include_role_playing |
A boolean. |
Value
A list of tibble
objects.
See Also
Other results export functions:
constellation_as_multistar()
,
constellation_as_tibble_list()
,
multistar_as_flat_table()
,
star_schema_as_flat_table()
,
star_schema_as_multistar()
Examples
tl <- st_mrs_age |>
star_schema_as_tibble_list()
tl <- st_mrs_age |>
star_schema_as_tibble_list(include_role_playing = TRUE)
Export a star schema as a tibble
list (common)
Description
Export a star schema as a tibble
list (common)
Usage
star_schema_as_tl(st, tl_prev = NULL, commondim = NULL, include_role_playing)
## S3 method for class 'star_schema'
star_schema_as_tl(st, tl_prev = NULL, commondim = NULL, include_role_playing)
Arguments
st |
A |
tl_prev |
A list of |
commondim |
A list of dimension names already included. |
include_role_playing |
A boolean. |
Value
A tibble
list.
Transform a value according to its type
Description
Transform a string value according to its given type.
Usage
typed_value(value, type)
Arguments
value |
A string. |
type |
A string |
Value
A typed value.
Unify facts by grain
Description
Unify facts by grain
Usage
unify_facts_by_grain(dq)
Arguments
dq |
A |
Value
A dimensional_query
object.
Perform union of dimensions
Description
Generates a new dimension from the instances of the dimensions in a list, as the union of the dimensions.
Usage
union_of_dimensions(dimensions, name = NULL, type = "role_playing")
Arguments
dimensions |
List of |
name |
A string, name of the dimension. |
type |
A string, type of the dimension. |
Value
A dimension_table
object.
Apply dimension record update operations to a dimension
Description
Given a list of dimension record update operations, they are applied on the
dimension_table
object. Update operations must be defined with the set of
functions available for that purpose.
Usage
update_dimension(dimension, updates)
## S3 method for class 'dimension_table'
update_dimension(dimension, updates)
Arguments
dimension |
A |
updates |
A |
Value
A dimension_table
object.
Apply update operations to dimensions
Description
Apply dimension record update operations to the dimensions in the list. Returns the list of modified dimensions.
Usage
update_dimensions(dimensions, updates)
Arguments
dimensions |
List of |
updates |
A |
Value
List of updated dimension_table
objects.
Update facts with a list of modified dimensions
Description
Update the fact table with the modified dimensions. New dimensions are generated from the modified ones.
Usage
update_facts_with_dimensions(st, dimensions)
## S3 method for class 'star_schema'
update_facts_with_dimensions(st, dimensions)
Arguments
st |
A |
dimensions |
A list of |
Value
A star_schema
object.
Update facts with a general dimension
Description
Update facts with a general dimension
Usage
update_facts_with_general_dimension(st, name, old_dimension, dimension)
Arguments
st |
A |
name |
A string, name of the dimension. |
old_dimension |
A |
dimension |
A |
Value
A star_schema
object.
Update facts with a role dimension
Description
Update facts with a role dimension
Usage
update_facts_with_role_dimension(
st,
name,
old_dimension,
dimension,
dimension_names
)
Arguments
st |
A |
name |
A string, name of the dimension. |
old_dimension |
A |
dimension |
A |
dimension_names |
A vector of dimension names. |
Value
A star_schema
object.
Update a dimension record with a set of values
Description
For a dimension, given the primary key of one record, it adds an update to the set of updates that modifies the combination of values of the rest of attributes of the selected record so that they become those given.
Usage
update_record(updates = NULL, dimension, old, values = vector())
## S3 method for class 'record_update_set'
update_record(updates = NULL, dimension, old, values = vector())
Arguments
updates |
A |
dimension |
A |
old |
A number, primary key of the record to modify. |
values |
A vector of character values. |
Details
Primary key is only used to get the combination of values easily. The update is defined exclusively from the rest of values.
Value
A record_update_set
object.
See Also
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_selection()
,
update_selection_general()
Examples
dim_names <- st_mrs_age |>
get_dimension_names()
where <- st_mrs_age |>
get_dimension("where")
# head(where, 2)
updates <- record_update_set() |>
update_record(
dimension = where,
old = 1,
values = c("1", "CT", "Bridgeport")
)
Update dimension records with a set of values
Description
For a dimension, given a vector of column names, a vector of old values and a vector of new values, it adds an update to the set of updates that modifies all the records that have the combination of old values in the columns with the new values in those same columns.
Usage
update_selection(
updates = NULL,
dimension,
columns = vector(),
old_values = vector(),
new_values = vector()
)
## S3 method for class 'record_update_set'
update_selection(
updates = NULL,
dimension,
columns = vector(),
old_values = vector(),
new_values = vector()
)
Arguments
updates |
A |
dimension |
A |
columns |
A vector of column names. |
old_values |
A vector of character values. |
new_values |
A vector of character values. |
Value
A record_update_set
object.
See Also
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection_general()
Examples
dim_names <- st_mrs_age |>
get_dimension_names()
where <- st_mrs_age |>
get_dimension("where")
# head(where, 2)
updates <- record_update_set() |>
update_selection(
dimension = where,
columns = c("city"),
old_values = c("Bridgepor"),
new_values = c("Bridgeport")
)
Update dimension records with a set of values in given columns
Description
For a dimension, given a vector of column names, a vector of old values for those columns, another vector column names, and a vector of new values for those columns, it adds an update to the set of updates that modifies all the records that have the combination of old values in the first column vector with the new values in the second column vector.
Usage
update_selection_general(
updates = NULL,
dimension,
columns_old = vector(),
old_values = vector(),
columns_new = vector(),
new_values = vector()
)
## S3 method for class 'record_update_set'
update_selection_general(
updates = NULL,
dimension,
columns_old = vector(),
old_values = vector(),
columns_new = vector(),
new_values = vector()
)
Arguments
updates |
A |
dimension |
A |
columns_old |
A vector of column names. |
old_values |
A vector of character values. |
columns_new |
A vector of column names. |
new_values |
A vector of character values. |
Value
A record_update_set
object.
See Also
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
Examples
dim_names <- st_mrs_age |>
get_dimension_names()
where <- st_mrs_age |>
get_dimension("where")
# head(where, 2)
updates <- record_update_set() |>
update_selection_general(
dimension = where,
columns_old = c("state", "city"),
old_values = c("CT", "Bridgepor"),
columns_new = c("city"),
new_values = c("Bridgeport")
)
Updates for the Star Schema for Mortality Reporting System by Age
Description
Example of updates on some dimensions of the star schema for Mortality Reporting System by age.
Usage
updates_st_mrs_age
Format
A record_update_set
object.
Examples
# Defined by:
(dim_names <- st_mrs_age |>
get_dimension_names())
where <- st_mrs_age |>
get_dimension("where")
when <- st_mrs_age |>
get_dimension("when")
who <- st_mrs_age |>
get_dimension("who")
updates_st_mrs_age <- record_update_set() |>
update_selection_general(
dimension = where,
columns_old = c("state", "city"),
old_values = c("CT", "Bridgepor"),
columns_new = c("city"),
new_values = c("Bridgeport")
) |>
match_records(dimension = when,
old = 37,
new = 36) |>
update_record(
dimension = when,
old = 73,
values = c("1962-02-17", "07", "1962")
) |>
update_selection(
dimension = who,
columns = c("age_range"),
old_values = c("<1 year"),
new_values = c("1: <1 year")
) |>
update_selection(
dimension = who,
columns = c("age_range"),
old_values = c("1-24 years"),
new_values = c("2: 1-24 years")
) |>
update_selection(
dimension = who,
columns = c("age_range"),
old_values = c("25-44 years"),
new_values = c("3: 25-44 years")
) |>
update_selection(
dimension = who,
columns = c("age_range"),
old_values = c("45-64 years"),
new_values = c("4: 45-64 years")
) |>
update_selection(
dimension = who,
columns = c("age_range"),
old_values = c("65+ years"),
new_values = c("5: 65+ years")
)
Updates for the Star Schema for Mortality Reporting System by Age Test
Description
Example of updates on some dimensions of the star schema for Mortality Reporting System by age test.
Usage
updates_st_mrs_age_test
Format
A record_update_set
object.
Examples
# Defined by:
(dim_names <- st_mrs_age_test |>
get_dimension_names())
where <- st_mrs_age_test |>
get_dimension("where")
when <- st_mrs_age_test |>
get_dimension("when")
who <- st_mrs_age_test |>
get_dimension("who")
updates_st_mrs_age_test <- record_update_set() |>
update_selection_general(
dimension = where,
columns_old = c("state", "city"),
old_values = c("CT", "Bridgepor"),
columns_new = c("city"),
new_values = c("Bridgeport")
) |>
match_records(dimension = when,
old = 4,
new = 3) |>
update_record(
dimension = when,
old = 9,
values = c("1962-01-20", "03", "1962")
) |>
update_selection(
dimension = who,
columns = c("age_range"),
old_values = c("<1 year"),
new_values = c("1: <1 year")
) |>
update_selection(
dimension = who,
columns = c("age_range"),
old_values = c("1-24 years"),
new_values = c("2: 1-24 years")
) |>
update_selection(
dimension = who,
columns = c("age_range"),
old_values = c("25-44 years"),
new_values = c("3: 25-44 years")
) |>
update_selection(
dimension = who,
columns = c("age_range"),
old_values = c("45-64 years"),
new_values = c("4: 45-64 years")
) |>
update_selection(
dimension = who,
columns = c("age_range"),
old_values = c("65+ years"),
new_values = c("5: 65+ years")
)
Validate names
Description
Validate names
Usage
validate_names(defined_names, names, concept = "name", repeated = FALSE)
Arguments
defined_names |
A vector of strings, defined attribute names. |
names |
A vector of strings, new attribute names. |
concept |
A string, treated concept. |
repeated |
A boolean, repeated names allowed. |
Value
A vector of strings, names.