Help for package reddPrec

Title:

Reconstruction of Daily Data - Precipitation

Version:

3.0.0

Description:

Applies quality control to daily precipitation observations; reconstructs the original series by estimating precipitation in missing values; and creates gridded datasets of daily precipitation.

License:

GPL-3

Encoding:

UTF-8

Imports:

terra, stats, foreach, doParallel, reshape, qmap, BreakPoints, Kendall, car, geosphere, gridExtra, lattice, pracma, xts, zoo, e1071, neuralnet, randomForest, reshape2, xgboost

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-04-12 20:02:10 UTC; rserrano

Author:

Roberto Serrano-Notivoli

[aut, cre], Adrian Huerta

[aut], Abel Centella

[ctb]

Maintainer:

Roberto Serrano-Notivoli <roberto.serrano@unizar.es>

Repository:

CRAN

Date/Publication:

2025-04-24 11:30:02 UTC

Enhanced quality control plots for daily precipitation

Description

The function create a matrix plot of the enhanced quality control tests for daily precipitation

Usage

eqc_Plot(xts_obj)

Arguments

xts_obj

xts of a single time series

Details

Six plots are built based on daily precipitation: time series, truncation, time series (threshold = 5 mm), small gaps, precision and rounding patterns and weekly cycle. These provide a visual inspection (Hunziker et al., 2017) but also how the automatic application (Huerta et al., 2020) of the enhanced quality control is applied.

References

Hunziker, S., Gubler, S., Calle, J., Moreno, I., Andrade, M., Velarde, F., ... & Brönnimann, S. (2017). Identifying, attributing, and overcoming common data quality issues of manned station observations. https://doi.org/10.1002/joc.5037

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Examples

## Not run: 
set.seed(123)

# creating fake daily precipitation data
dates_t <- seq(as.Date("1980-01-01"), as.Date("2015-12-31"), by = "day")
prec <- round(rnorm(length(dates_t), mean = 1.2, sd = 6), 1)
prec[prec<0] <- 0
xts_obj <- xts::xts(prec, dates_t)
names(xts_obj) <- "prec"

# enhanced qc plots
eqc_Plot(xts_obj)

# it also work if there is some empty data (but not if all is NA)
xts_obj["1990/2010"] <- NA
eqc_Plot(xts_obj)

## End(Not run)

Precision and rounding patterns of daily precipitation

Description

The function determine the level (0, 1 or 2) of the precision and rounding patterns test

Usage

eqc_PrecisionRounding(
  xts_obj,
  lmn_yday = 365 * 80/100,
  ths = list(lv0 = c(percent = 70), lv1to2 = c(percent = 50))
)

Arguments

xts_obj

xts of a single time series

lmn_yday

numeric value of the minimum number of days to be considered a complete year. The default value is 365 * 80 / 100 days

ths

list. List of parameters to define in which level would be the daily precipitation.

Details

Precision and rounding patterns depict inconsistencies in the frequency of decimal values in the time series. As there is no absolute correct frequency of decimals, we decided to measure how similar the decimal patterns are in the time series. A decimal pattern is interpreted as the list of unique decimal values observed, sorted in descending order. In this way, the decimal pattern for each year is computed first, followed by the selection of the most dominating pattern (mode). Based on how much (in percentage) this dominating pattern represents the time series, it is defined: Level 0: coherent precision and rounding pattern (similar decimal pattern in more (or equal) than 70% of the time series). Level 1: a similar decimal pattern in less than 70% but more than (or equal) 50% of the time series. Level 2: different decimal patterns (no dominant pattern). Argument ths present the above default thresholds for the levels definition.

References

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Examples

## Not run: 
set.seed(123)

# creating fake daily precipitation data
dates_t <- seq(as.Date("1980-01-01"), as.Date("2015-12-31"), by = "day")
prec <- round(rnorm(length(dates_t), mean = 1.2, sd = 6), 1)
prec[prec<0] <- 0
xts_obj <- xts::xts(prec, dates_t)
names(xts_obj) <- "prec"

# Comparison with visual inspection
eqc_Plot(xts_obj)
eqc_PrecisionRounding(xts_obj)

# it also work if there is some empty data (but not if all is NA)
xts_obj["1990/2010"] <- NA
eqc_Plot(xts_obj)
eqc_PrecisionRounding(xts_obj)

# creating some rounding values
xts_obj["1980/1990"] <- round(xts_obj["1980/1990"], 0) 
eqc_Plot(xts_obj)
eqc_PrecisionRounding(xts_obj)


## End(Not run)

Small gaps level of daily precipitation

Description

The function determine the level (0, 1 or 2) of the small gaps test

Usage

eqc_SmallGaps(
  xts_obj,
  lmn_yday = 365 * 80/100,
  ths = list(lv0 = c(percent = 0), lv1to2 = c(percent = 20))
)

Arguments

xts_obj

xts of a single time series

lmn_yday

numeric value of the minimum number of days to be considered a complete year. The default value is 365 * 80 / 100 days

ths

list. List of parameters to define in which level would be the daily precipitation.

Details

Small gaps can be seen as unreported precipitation events that result in a gap or a frequency reduction in values below a specific threshold. To define the small gaps, we calculated the total count of values in five precipitation ranges from 0-1, 1-2, 2-3, 3-4, and 4-5 mm (not including the values in the limits) for each year. Therefore, considering the percentage of years with zero counts: Level 0: no small gaps (0%; years with at least one value in any of the precipitation ranges). Level 1: small gaps in at least 20% of years. Level 2: small gaps in more than 20% of years Argument ths present the above default thresholds for the levels definition.

References

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Examples

## Not run: 
set.seed(123)

# creating fake daily precipitation data
dates_t <- seq(as.Date("1980-01-01"), as.Date("2015-12-31"), by = "day")
prec <- round(rnorm(length(dates_t), mean = 1.2, sd = 6), 1)
prec[prec<0] <- 0
prec[prec<3 & prec>2] <- 0
xts_obj <- xts::xts(prec, dates_t)
names(xts_obj) <- "prec"

# Comparison with visual inspection
eqc_Plot(xts_obj)
eqc_SmallGaps(xts_obj)

# it also work if there is some empty data (but not if all is NA)
xts_obj["1990/2010"] <- NA
eqc_Plot(xts_obj)
eqc_SmallGaps(xts_obj)


## End(Not run)

Truncation level of daily precipitation

Description

The function determine the level (0, 1 or 2) of the truncation test

Usage

eqc_Truncation(
  xts_obj,
  ths = list(lv0 = c(n_years = 2), lv1to2 = c(n_years_l = 3, n_years_h = 5))
)

Arguments

xts_obj

xts of a single time series

ths

list. List of parameters to define in which level would be the daily precipitation.

Details

References

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Examples

## Not run: 
set.seed(123)

# creating fake daily precipitation data
dates_t <- seq(as.Date("1980-01-01"), as.Date("2015-12-31"), by = "day")
prec <- round(rnorm(length(dates_t), mean = 1.2, sd = 50), 1)
prec[prec<0] <- 0
prec[prec<3 & prec>2] <- 0
xts_obj <- xts::xts(prec, dates_t)
names(xts_obj) <- "prec"

# Comparison with visual inspection
eqc_Plot(xts_obj)
eqc_Truncation(xts_obj)

# it also work if there is some empty data (but not if all is NA)
xts_obj["1990/2010"] <- NA
eqc_Plot(xts_obj)
eqc_Truncation(xts_obj)


## End(Not run)

Automatic enhanced quality control for daily precipitation time series

Description

The function determine the level (0, 1 or 2) of the enhanced quality control tests

Usage

eqc_Ts(
  prec,
  sts,
  lmn_yday = 365 * 80/100,
  ths_trc = list(lv0 = c(n_years = 2), lv1to2 = c(n_years_l = 3, n_years_h = 5)),
  ths_sgs = list(lv0 = c(percent = 0), lv1to2 = c(percent = 20)),
  ths_wcc = list(lv0 = c(n_days = 0), lv1to2 = c(n_days = 2, percent = 10)),
  ths_prp = list(lv0 = c(percent = 70), lv1to2 = c(percent = 50)),
  ncpu = 1
)

Arguments

prec

xts matrix of precipitation time series

sts

data.frame with metadata of the stations. A column "ID" (unique ID of stations) is required.

lmn_yday

numeric. value of the minimum number of days to be considered a complete year. The default value is 365 * 80 / 100 days

ths_trc

list. List of parameters to define in which level would be the daily precipitation for truncation.

ths_sgs

list. List of parameters to define in which level would be the daily precipitation for small gaps.

ths_wcc

list. List of parameters to define in which level would be the daily precipitation for weekly cycle.

ths_prp

list. List of parameters to define in which level would be the daily precipitation for precision and rounding patterns.

ncpu

integer. number of CPU threads to use for parallel computing.

Details

Truncation is when heavy precipitation episodes are truncated or noticeably reduced in frequency above a given threshold. Because there is no preceding algorithm for truncation, it is defined here as when the maximum boundary of a time series lasts for a set length of time (years). The maximum boundary is computed as the daily precipitation's maximum moving window value. Thus, based on the length of years: Level 0: no truncation (a constant maximum boundary lasts at least 2 years). Level 1: a constant maximum boundary lasts longer equal (or above) 3 years but less than 5 years Level 2: a constant maximum boundary lasts more than 5 years. Small gaps can be seen as unreported precipitation events that result in a gap or a frequency reduction in values below a specific threshold. To define the small gaps, we calculated the total count of values in five precipitation ranges from 0-1, 1-2, 2-3, 3-4, and 4-5 mm (not including the values in the limits) for each year. Therefore, considering the percentage of years with zero counts: Level 0: no small gaps (0%; years with at least one value in any of the precipitation ranges). Level 1: small gaps in at least 20% of years. Level 2: small gaps in more than 20% of years Weekly cycles are characterized by the occurrence of wet days that significantly differ between the days of the week. To compute the weekly cycles, first, for each day of the week, the probability of precipitation is calculated by dividing the total number of wet days by the total counts of values. Later, the number of wet days is tested by a two-sided binomial test (95 % confidence level). Based on how many days were significant, it was defined: Level 0: no atypical weekly cycle (similar probability between the days of the week). Level 1: at least two days present an atypical probability (significant test). Level 2: more than two days present an atypical probability (significant test) or one day presents an extremely different probability (more than 10%). Precision and rounding patterns depict inconsistencies in the frequency of decimal values in the time series. As there is no absolute correct frequency of decimals, we decided to measure how similar the decimal patterns are in the time series. A decimal pattern is interpreted as the list of unique decimal values observed, sorted in descending order. In this way, the decimal pattern for each year is computed first, followed by the selection of the most dominating pattern (mode). Based on how much (in percentage) this dominating pattern represents the time series, it is defined: Level 0: coherent precision and rounding pattern (similar decimal pattern in more (or equal) than 70% of the time series). Level 1: a similar decimal pattern in less than 70% but more than (or equal) 50% of the time series. Level 2: different decimal patterns (no dominant pattern).

References

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Weekly cycle level of daily precipitation

Description

The function determine the level (0, 1 or 2) of the weekly cycle test

Usage

eqc_WeeklyCycle(
  xts_obj,
  ths = list(lv0 = c(n_days = 0), lv1to2 = c(n_days = 2, percent = 10))
)

Arguments

xts_obj

xts of a single time series

ths

list. List of parameters to define in which level would be the daily precipitation.

Details

Weekly cycles are characterized by the occurrence of wet days that significantly differ between the days of the week. To compute the weekly cycles, first, for each day of the week, the probability of precipitation is calculated by dividing the total number of wet days by the total counts of values. Later, the number of wet days is tested by a two-sided binomial test (95 % confidence level). Based on how many days were significant, it was defined: Level 0: no atypical weekly cycle (similar probability between the days of the week). Level 1: at least two days present an atypical probability (significant test). Level 2: more than two days present an atypical probability (significant test) or one day presents an extremely different probability (more than 10%). Argument ths present the above default thresholds for the levels definition.

References

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Examples

## Not run: 
set.seed(123)

# creating fake daily precipitation data
dates_t <- seq(as.Date("1980-01-01"), as.Date("2015-12-31"), by = "day")
prec <- round(rnorm(length(dates_t), mean = 1.2, sd = 6), 1)
prec[prec<0] <- 0
xts_obj <- xts::xts(prec, dates_t)
names(xts_obj) <- "prec"

# Comparison with visual inspection
eqc_Plot(xts_obj)
eqc_WeeklyCycle(xts_obj)

# it also work if there is some empty data (but not if all is NA)
xts_obj["1990/2010"] <- NA
eqc_Plot(xts_obj)
eqc_WeeklyCycle(xts_obj)


## End(Not run)

Estimating new values in original missing values data series of daily precipitation

Description

This function uses the neighboring observations to estimate new precipitation values in those days and locations where no records exist.

Usage

gapFilling(
  prec,
  sts,
  model_fun = learner_glm,
  dates,
  stmethod = NULL,
  thres = NA,
  neibs = 10,
  coords,
  crs,
  coords_as_preds = TRUE,
  window,
  ncpu = 2
)

Arguments

prec

matrix containing the original (cleaned) precipitation data. Each column represents one station. The names of columns must coincide with the names of the stations.

sts

data.frame. A column "ID" (unique ID of stations) is required. The rest of the columns (all of them) will act as predictors of the model.

model_fun

function. A function that integrates the statistical hybrid model (classification and regression). The default is learner_glm, which is the original model. Other models are also available (learner_rf and learner_xgboost). Users can create their functions with different models as well.

dates

vector of class "Date" with all days of observations (yyyy-mm-dd).

stmethod

standardization method. 'quant' or 'ratio', see details.

thres

numeric. Maximum radius (in km) where neighboring stations will be searched. NA value uses the whole spatial domain.

neibs

integer. Number of nearest neighbors to use.

coords

vector of two character elements. Names of the fields in "sts" containing longitude and latitude.

crs

character. Coordinates system in EPSG format (e.g.: "EPSG:4326").

coords_as_preds

logical. If TRUE (default), "coords" are also taken as predictors.

window

odd integer. Length of data considered for standardization

ncpu

number of processor cores used to parallel computing.

Details

After the gap filling, "stmethod" allows for an standardization of the predictions based on the observations. It only works for daily data. For other timescales (monthly, annual) use "stmethod=NULL". The "window" parameter is a daily-moving centered window from which data is collected for each year (i.e. a 15-day window on 16th January will take all predictions from 1st to 30th January of all years to standardize them with their corresponding observations. Only standardized prediction of 16th January is returned. Process is repeated for all days).

Examples

## Not run: 
set.seed(123)
prec <- round(matrix(rnorm(30*50, mean = 1.2, sd = 6), 30, 50), 1)
prec[prec<0] <- 0
prec <- apply(prec, 2, FUN = function(x){x[sample(length(x),5)] <- NA; x})
colnames(prec) <- paste0('sts_',1:50)
sts <- data.frame(ID = paste0('sts_',1:50), lon = rnorm(50,0,1), 
                  lat = rnorm(50,40,1), dcoast = rnorm(50,200,50))
filled <- gapFilling(prec, sts, 
                    dates = seq.Date(as.Date('2023-04-01'),
                    as.Date('2023-04-30'),by='day'), 
                    stmethod = "ratio", thres = NA, coords = c('lon','lat'),
                    coords_as_preds = TRUE, crs = 'EPSG:4326', neibs = 10, 
                    window = 11, ncpu = 2)
str(filled)
summary(filled)

## End(Not run)

Gridded dataset creation

Description

This function creates a gridded precipitation dataset from a station-based dataset.

Usage

gridPcp(
  prec,
  grid,
  dyncovars = NULL,
  sts,
  model_fun,
  dates,
  ncpu,
  thres,
  neibs,
  coords,
  crs,
  coords_as_preds,
  dir_name = NA
)

Arguments

prec

matrix or data.frame containing the original (cleaned) precipitation data. Each column represents one station. The names of columns must coincide with the names of the stations.

grid

SpatRaster. Collection of rasters representing each one of the predictors.

dyncovars

SpatRasterDataset. Collection of variables acting as dynamic predictors (changing each day). Each dataset inside the sds object (corresponding to each variable) must have the same rasters as number of days to grid. See Details.

sts

matrix or data.frame. A column "ID" (unique ID of stations) is required. The rest of the columns (all of them) will act as predictors of the model.

model_fun

dates

vector of class "Date" with all days of observations (yyyy-mm-dd).

ncpu

number of processor cores used to parallel computing.

thres

numeric. Maximum radius (in km) where neighboring stations will be searched. NA value uses the whole spatial domain.

neibs

integer. Number of nearest neighbors to use.

coords

vector of two character elements. Names of the fields in "sts" containing longitude and latitude.

crs

character. Coordinates system in EPSG format (e.g.: "EPSG:4326").

coords_as_preds

logical. If TRUE (default), "coords" are also taken as predictors.

dir_name

character. Name of the of the folder in which the data will be saved. Default NA uses the original names.

Details

All the rasters provided in "grid" and "dyncovars" must have the same spatial characteristics (resolution, crs, extent, etc.). The function estimates precipitation based on the nearest observations ("sts" and "prec") using as covariates the predictors contained in "grid" and "dyncovars". Predictors of "grid" are used for all days while those on "dyncovars" are selected depending on the day. For instance, to model the first day the algorithm considers all rasters in "grid" and only those corresponding to the first day of each variable.

Examples

## Not run: 
# fixed covariates (elevation, latitude, longitude)
alt <- terra::rast(volcano, crs = 'EPSG:4326')
terra::ext(alt) <- c(-1,3,38,42)
lon <- terra::rast(cbind(terra::crds(alt),terra::crds(alt)[,1]),type='xyz',crs='EPSG:4326')
lat <- terra::rast(cbind(terra::crds(alt),terra::crds(alt)[,2]),type='xyz',crs='EPSG:4326')
dcoast <- terra::costDist(alt,target=min(terra::values(alt)))/1000
grid <- c(alt, lon, lat, dcoast)
names(grid) <- c('alt', 'lon', 'lat', 'dcoast')

# Dynamic covariates (Variable 1, Variable 2, Variable 3)
foo <- alt
terra::values(foo) <- runif(length(terra::values(alt)))
dyncovars1 <- rep(foo, 7)
names(dyncovars1) <- paste('dynvar1.day',1:terra::nlyr(dyncovars1), 
                      sep = "_") # not use blank space!
dyncovars2 <- dyncovars1*0.0234
names(dyncovars2) <- paste('dynvar2.day',1:terra::nlyr(dyncovars2), 
                      sep = "_") # not use blank space!
dyncovars3 <- dyncovars1*10502
names(dyncovars3) <- paste('dynvar3.day',1:terra::nlyr(dyncovars3), 
                      sep = "_") # not use blank space!
dyncovars <- terra::sds(dyncovars1, dyncovars2, dyncovars3)
 
# precipitation and stations generation
set.seed(123)
prec <- round(matrix(rnorm(7*25, mean = 1.2, sd = 4), 7, 25), 1)+1
prec[prec<0] <- 0
colnames(prec) <- paste0('sts_',1:25)
sts <- data.frame(ID = paste0('sts_',1:25), as.data.frame(terra::spatSample(grid, 25)))


gridPcp(prec = prec, 
        grid = grid, 
        sts = sts, 
        model_fun = learner_glm, 
        dates = seq.Date(as.Date('2023-04-01'),as.Date('2023-04-07'),by='day'), 
        ncpu = 4, 
        thres = NA, 
        neibs = 15,
        coords = c('lon','lat'), 
        crs = 'EPSG:4326', 
        coords_as_preds = TRUE)

r <- terra::rast(c('./pred/20230401.tif','./err/20230401.tif'))
terra::plot(r)

## End(Not run)

Homogenization of daily precipitation time series

Description

The function applies the homogenization procedure (detection and adjustment) for daily precipitation based on Huerta et al. (2024).

Usage

hmg_Ts(
  prec,
  sts,
  neibs_max = 8,
  neibs_min = 3,
  thres = 1e+06,
  cor_neibs = 0.5,
  cleaning = FALSE,
  perc_break = 7,
  wet_day = 0,
  window_c = 15,
  apply_qc = 1,
  mm_apply_qc = 0,
  ncpu = 2
)

Arguments

prec

xts matrix containing the raw (quality-controlled and gap-filled) precipitation data. Each column represents one station. The names of columns must coincide with the names of the stations.

sts

data.frame. A column "ID" (unique ID of stations), "LON" (decimal degree), and "LAT" (decimal degree) are required.

neibs_max

integer. Number of maximum nearest neighbors to use. NA or Null value uses the whole number of stations

neibs_min

integer. Number of minimum nearest neighbors to use. This value represent the option to use the absolute or relative approach on the homogenization.

thres

numeric. Maximum radius (in m) where neighboring stations will be searched. NA or Null value uses the whole spatial domain.

cor_neibs

numeric. Minimum value of temporal correlation to define nearest neighbors.

cleaning

logical. Set to FALSE as default. TRUE if the time series should be cleaned (trends and autocorrelation removed) before the detection test, as in Lund et al. (2023).

perc_break

numeric. Value that define the percentage of time series that are statistically significant in the detection test in order to define a break point (year).

wet_day

numeric. Value that define if the adjustment should be only performed on wet day (> 0 mm). Negative value mean that the adjustment will be also applied on zeros.

window_c

integer. Window size of the application of the quantile matching adjustment.

apply_qc

numeric. Maximum threshold (cubic difference) in which the adjusted data be considered corrected. Set to 1 as default. If the difference of the root cubic between the adjusted and raw data is above 1, the adjusted value will be corrected to be not above (or below) 1.

mm_apply_qc

numeric. Precipitation threshold in which apply_qc would be applied. Set to 0 as default, meaning that only the adjustment will be on wet days (> 0 mm)

ncpu

number of processor cores used to parallel computing.

Details

The homogenization procedure uses an automatic algorithm for both detection and adjustment without metadata information. In addition, relative and absolute approaches are combined for situations in which relative homogenization can not be performed. The absolute test, which has a lower power of detection than the relative tests, is thus intended as a backup test for when a relative test is hardly possible. To ensure high confidence in breakpoint detection, a combination of different statistical tests and intercomparison of their results was used. Five univariate breakpoint tests were applied: Student's, Mann-Whitney, Buishand-R, Pettit, and the Standard Normal Homogeneity Test. Depending on the availability of nearby stations for a target time series: relative and absolute. For the relative approach, the algorithm searches for up to neibs_max well-correlated (> cor_neibs) nearby stations within a three radius. Later, the five tests are applied to difference series (target minus nearby) created with three different temporal aggregations. Finally, the breakpoint is set to a certain year if it is found in at least perc_break (%) of the number of difference time series that are significant (p-value < 0.05), using a tolerance of +/- 1 year. The absolute approach is used if the algorithm detects fewer than neibs_min nearby stations or none at all. In the adjustment, it was adapted to the quantile-matching technique outlined in Squintu et al. (2018). It should be mentioned that this algorithm was created for temperature data; therefore, we made some changes to be used for precipitation. Dry values (wet_day parameter) cannot be corrected, and wet values were transformed twice (square root and log) before the algorithm execution to force a normal distribution. Based on this consideration, the correction was applied in two ways: relative and absolute. For the relative approach, the adjustment factor was computed using the target and nearby time series of the detection stage. It is assumed that the data after the break is correct; thus, the correction is backward. For the absolute case, the adjustment factor was computed using the target time series. This can be seen as an application of quantile mapping as there are no nearby stations. Adjustment of daily precipitation can influence the extreme tails. Therefore, adjusted values can be set to not exceed a limit (apply_qc - difference of the root cubic) with the raw data. The apply_qc can be applied to all precipitation values or above a specific threshold: mm_apply_qc. This procedure still keeps the extreme adjustment while preventing the creation of extremely excessive values. The output of the function is a list that contains: the detection results and the adjusted time series for each station.

References

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Lund, R. B., Beaulieu, C., Killick, R., Lu, Q., & Shi, X. (2023). Good practices and common pitfalls in climate time series changepoint techniques: A review. Journal of Climate, 36(23), 8041-8057.

Squintu, A. A., van der Schrier, G., Brugnara, Y., & Klein Tank, A. (2018). Homogenization of daily ECA&D temperature series. International journal of climatology, 39(3), 1243-1261.

Function of the generalized linear model (glm)

Description

The function of the generalized linear model (glm) is used to predict wet/dry (classification) days and the amount of rain on wet days (regression). This function employs the hybrid model of Serrano-Notivoli et al. (2017).

Usage

learner_glm(ref, can, covars)

Arguments

ref

matrix or data.frame containing the covariable data and precipitation value for each point location. This data is used to train (build) the model

can

matrix or data.frame containing the variable data and precipitation value for one single point location. This data is used to execute the build model and get the predictions

covars

vector of character elements containing the covariables names used in the model.

Details

The function should not be used directly, as its main purpose is to be passed as an argument in the core functions (qcPrec, gapFilling and gridPcp) of the package. The core functions can handle whatever condition that avoids the execution of this function. Therefore, there is no need to test different situations on the data (empty values, among others). However, users who want to build their functions should test first before being passed as an argument in the core functions (you can check the other models (learner) functions available in the package). This function encapsulates two types of models: classification and regression, and its output is a single numeric vector of three elements: probability of wet day, amount of wet day, and uncertainty of the amount of wet day.

References

Serrano-Notivoli, R., Beguería, S., Saz, M. Á., Longares, L. A., & de Luis, M. (2017). SPREAD: a high-resolution daily gridded precipitation dataset for Spain–an extreme events frequency and intensity overview. Earth System Science Data, 9(2), 721-738.

Examples

## Not run: 
set.seed(123)
# creating random data (three predictors)
lon = rnorm(50,0,1)
lat = rnorm(50,40,1)
dcoast = rnorm(50,200,50)
prec = rnorm(1*50, mean = 1.2, sd = 6)
prec[prec<0] <- 0

# precipitation column should be call as "val"
data_full <- data.frame(lon = lon, lat = lat, dcoast = dcoast, val = prec)

# parameters for learner_glm
ref = data_full[-1, ]
can = data_full[1, ]
covars = c("lon", "lat", "dcoast")

learner_glm(ref = ref, can = can, covars = covars)

# case when "prec" is full wet
prec = rnorm(1*50, mean = 1.2, sd = 6)
prec[prec<0] <- 1

data_full <- data.frame(lon = lon, lat = lat, dcoast = dcoast, val = prec)

# parameters for learner_glm
ref = data_full[-1, ]
can = data_full[1, ]
covars = c("lon", "lat", "dcoast")

learner_glm(ref = ref, can = can, covars = covars)

## End(Not run)

Function of the neural network model (nn)

Description

The function of the neural network model (nn) is used to predict wet/dry (classification) days and the amount of rain on wet days (regression). This function employs the hybrid model of Serrano-Notivoli et al. (2017).

Usage

learner_nn(ref, can, covars)

Arguments

ref

matrix or data.frame containing the covariable data and precipitation value for each point location. This data is used to train (build) the model

can

matrix or data.frame containing the variable data and precipitation value for one single point location. This data is used to execute the build model and get the predictions

covars

vector of character elements containing the covariables names used in the model.

Details

References

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Examples

## Not run: 
set.seed(123)
# creating random data (three predictors)
lon = rnorm(50,0,1)
lat = rnorm(50,40,1)
dcoast = rnorm(50,200,50)
prec = rnorm(1*50, mean = 1.2, sd = 6)
prec[prec<0] <- 0

# precipitation column should be call as "val"
data_full <- data.frame(lon = lon, lat = lat, dcoast = dcoast, val = prec)

# parameters for learner_nn
ref = data_full[-1, ]
can = data_full[1, ]
covars = c("lon", "lat", "dcoast")

learner_nn(ref = ref, can = can, covars = covars)

# case when "prec" is full wet
prec = rnorm(1*50, mean = 1.2, sd = 6)
prec[prec<0] <- 1

data_full <- data.frame(lon = lon, lat = lat, dcoast = dcoast, val = prec)

# parameters for learner_nn
ref = data_full[-1, ]
can = data_full[1, ]
covars = c("lon", "lat", "dcoast")

learner_nn(ref = ref, can = can, covars = covars)

## End(Not run)

Function of the random forest model (rf)

Description

The function of the random forest model (rf) is used to predict wet/dry (classification) days and the amount of rain on wet days (regression). This function employs the hybrid model of Huerta et al. (2024).

Usage

learner_rf(ref, can, covars)

Arguments

ref

matrix or data.frame containing the covariable data and precipitation value for each point location. This data is used to train (build) the model

can

matrix or data.frame containing the variable data and precipitation value for one single point location. This data is used to execute the build model and get the predictions

covars

vector of character elements containing the covariables names used in the model.

Details

References

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Examples

## Not run: 
set.seed(123)
# creating random data (three predictors)
lon = rnorm(50,0,1)
lat = rnorm(50,40,1)
dcoast = rnorm(50,200,50)
prec = rnorm(1*50, mean = 1.2, sd = 6)
prec[prec<0] <- 0

# precipitation column should be call as "val"
data_full <- data.frame(lon = lon, lat = lat, dcoast = dcoast, val = prec)

# parameters for learner_rf
ref = data_full[-1, ]
can = data_full[1, ]
covars = c("lon", "lat", "dcoast")

learner_rf(ref = ref, can = can, covars = covars)

# case when "prec" is full wet
prec = rnorm(1*50, mean = 1.2, sd = 6)
prec[prec<0] <- 1

data_full <- data.frame(lon = lon, lat = lat, dcoast = dcoast, val = prec)

# parameters for learner_rf
ref = data_full[-1, ]
can = data_full[1, ]
covars = c("lon", "lat", "dcoast")

learner_rf(ref = ref, can = can, covars = covars)

## End(Not run)

Function of the support vector machine model (svm)

Description

The function of the support vector machine model (svm) is used to predict wet/dry (classification) days and the amount of rain on wet days (regression). This function employs the hybrid model of Serrano-Notivoli et al. (2017).

Usage

learner_svm(ref, can, covars)

Arguments

ref

matrix or data.frame containing the covariable data and precipitation value for each point location. This data is used to train (build) the model

can

matrix or data.frame containing the variable data and precipitation value for one single point location. This data is used to execute the build model and get the predictions

covars

vector of character elements containing the covariables names used in the model.

Details

References

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Examples

## Not run: 
set.seed(123)
# creating random data (three predictors)
lon = rnorm(50,0,1)
lat = rnorm(50,40,1)
dcoast = rnorm(50,200,50)
prec = rnorm(1*50, mean = 1.2, sd = 6)
prec[prec<0] <- 0

# precipitation column should be call as "val"
data_full <- data.frame(lon = lon, lat = lat, dcoast = dcoast, val = prec)

# parameters for learner_svm
ref = data_full[-1, ]
can = data_full[1, ]
covars = c("lon", "lat", "dcoast")

learner_svm(ref = ref, can = can, covars = covars)

# case when "prec" is full wet
prec = rnorm(1*50, mean = 1.2, sd = 6)
prec[prec<0] <- 1

data_full <- data.frame(lon = lon, lat = lat, dcoast = dcoast, val = prec)

# parameters for learner_svm
ref = data_full[-1, ]
can = data_full[1, ]
covars = c("lon", "lat", "dcoast")

learner_svm(ref = ref, can = can, covars = covars)

## End(Not run)

Function of the eXtreme Gradient Boosting model (xgboost)

Description

The function of the eXtreme Gradient Boosting model (xgboost) is used to predict wet/dry (classification) days and the amount of rain on wet days (regression). This function employs the hybrid model of Huerta et al. (2024).

Usage

learner_xgboost(ref, can, covars)

Arguments

ref

matrix or data.frame containing the covariable data and precipitation value for each point location. This data is used to train (build) the model

can

matrix or data.frame containing the variable data and precipitation value for one single point location. This data is used to execute the build model and get the predictions

covars

vector of character elements containing the covariables names used in the model.

Details

References

Huerta, A., Serrano-Notivoli, R., & Brönnimann, S. (2024). SC-PREC4SA: A serially complete daily precipitation dataset for South America. https://doi.org/10.31223/X57D8R

Examples

## Not run: 
set.seed(123)
# creating random data (three predictors)
lon = rnorm(50,0,1)
lat = rnorm(50,40,1)
dcoast = rnorm(50,200,50)
prec = rnorm(1*50, mean = 1.2, sd = 6)
prec[prec<0] <- 0

# precipitation column should be call as "val"
data_full <- data.frame(lon = lon, lat = lat, dcoast = dcoast, val = prec)

# parameters for learner_xgboost
ref = data_full[-1, ]
can = data_full[1, ]
covars = c("lon", "lat", "dcoast")

learner_xgboost(ref = ref, can = can, covars = covars)

# case when "prec" is full wet
prec = rnorm(1*50, mean = 1.2, sd = 6)
prec[prec<0] <- 1

data_full <- data.frame(lon = lon, lat = lat, dcoast = dcoast, val = prec)

# parameters for learner_xgboost
ref = data_full[-1, ]
can = data_full[1, ]
covars = c("lon", "lat", "dcoast")

learner_xgboost(ref = ref, can = can, covars = covars)

## End(Not run)

Quality Control of daily precipitation observations

Description

This function apply several threshold-based criteria to filter original observations of daily precipitation.

Usage

qcPrec(
  prec,
  sts,
  model_fun = learner_glm,
  crs,
  coords,
  coords_as_preds = TRUE,
  neibs = 10,
  thres = NA,
  qc = "all",
  qc3 = 10,
  qc4 = c(0.99, 5),
  qc5 = c(0.01, 0.1, 5),
  ncpu = 1
)

Arguments

prec

matrix containing the original precipitation data. Each column represents one station. The names of columns have to be names of the stations.

sts

data.frame. A column "ID" (unique ID of stations) is required. The rest of the columns (all of them) will act as predictors of the model.

model_fun

crs

character. Coordinates system in EPSG format (e.g.: "EPSG:4326").

coords

vector of two character elements. Names of the fields in "sts" containing longitude and latitude.

coords_as_preds

logical. If TRUE (default), "coords" are also taken as predictors.

neibs

integer. Number of nearest neighbors to use.

thres

numeric. Maximum radius (in km) where neighboring stations will be searched. NA value uses the whole spatial domain.

qc

vector of strings with the QC criteria to apply. Default is "all". See details.

qc3

numeric. Indicates the threshold (number of times higher or lower) from which a observation, in comparison with its estimate, should be deleted. Default is 10.

qc4

numeric vector of length 2. Thresholds of wet probability (0 to 1) and magnitude (in the units of input precipitation data) from which a observation of value zero, in comparison with its estimate, should be deleted. Default is c(0.99, 5).

qc5

numeric vector of length 2. Thresholds of dry probability (0 to 1) and magnitude (in the units of input precipitation data) from which a observation higher than a specific value (also in the original units), in comparison with its estimate, should be deleted. Default is c(0.01, 0.1, 5).

ncpu

number of processor cores used to parallel computing.

Details

Parameter "sts" must have an "ID" field containing unique identifiers of the stations.

"qc" can be "all" (all criteria are applied) or a vector of strings (e.g.: c("1","2","4")) indicating the QC criteria to apply to observations: "1" (suspect value): obs==0 & all(neibs>0); "2" (suspect zero): obs>0 & all(neibs==0); "3" (suspect outlier): obs is "qc3" times higher or lower than the estimate; "4" (suspect wet): obs==0 & wet probability > "qc4[1]" & estimate > "qc4[2]"; "5" (suspect dry): obs>"qc5[3]" & dry probability < "qc5[1]" & estimate < "qc5[2]"

Examples

## Not run: 
set.seed(123)
prec <- round(matrix(rnorm(30*50, mean = 1.2, sd = 6), 30, 50), 1)
prec[prec<0] <- 0
colnames(prec) <- paste0('sts_',1:50) 
sts <- data.frame(ID = paste0('sts_',1:50), lon = rnorm(50,0,1), 
                  lat = rnorm(50,40,1), dcoast = rnorm(50,200,50))
qcdata <- qcPrec(prec, sts, crs = 'EPSG:4326', coords = c('lon','lat'),
                 coords_as_preds = TRUE, neibs = 10, thres = NA,
                 qc = 'all', qc3 = 10, qc4 = c(0.99, 5), qc5 = c(0.01, 0.1, 5),
                 ncpu=2)
str(qcdata)

## End(Not run)