Help for package cSEM

Title:

Composite-Based Structural Equation Modeling

Version:

0.6.1

Date:

2025-05-15

Maintainer:

Florian Schuberth <f.schuberth@utwente.nl>

Depends:

R (≥ 3.5.0)

Description:

Estimate, assess, test, and study linear, nonlinear, hierarchical and multigroup structural equation models using composite-based approaches and procedures, including estimation techniques such as partial least squares path modeling (PLS-PM) and its derivatives (PLSc, ordPLSc, robustPLSc), generalized structured component analysis (GSCA), generalized structured component analysis with uniqueness terms (GSCAm), generalized canonical correlation analysis (GCCA), principal component analysis (PCA), factor score regression (FSR) using sum score, regression or Bartlett scores (including bias correction using Croon’s approach), as well as several tests and typical postestimation procedures (e.g., verify admissibility of the estimates, assess the model fit, test the model fit etc.).

BugReports:

https://github.com/FloSchuberth/cSEM/issues/

URL:

https://github.com/FloSchuberth/cSEM/, https://floschuberth.github.io/cSEM/

License:

GPL-3

Encoding:

UTF-8

NeedsCompilation:

yes

LazyData:

true

Imports:

alabama, cli, crayon, expm (≥ 0.999-5), future.apply, future, lifecycle, lavaan, magrittr, MASS, Matrix, matrixcalc, matrixStats, polycor, progressr, psych, purrr, Rdpack, rlang, stats, symmoments, TruncatedNormal, utils

RdMacros:

Rdpack, lifecycle

RoxygenNote:

7.3.2

Suggests:

DiagrammeR, DiagrammeRsvg, dplyr, tidyr, knitr, nnls, prettydoc, plotly, rsvg, rmarkdown, rootSolve, listviewer, testthat (≥ 3.0.0), ggplot2, openxlsx, graphics, spelling

Config/testthat/edition:

VignetteBuilder:

knitr

Language:

en-US

Packaged:

2025-05-15 14:33:52 UTC; SchuberthF

Author:

Manuel E. Rademaker

[aut], Florian Schuberth

[aut, cre], Tamara Schamberger

[ctb], Michael Klesel

[ctb], Huu Phuc Nguyen

[ctb], Theo K. Dijkstra [ctb], Jörg Henseler

[ctb]

Repository:

CRAN

Date/Publication:

2025-05-16 09:40:14 UTC

cSEM: A package for composite-based structural equation modeling

Description

Estimate, analyze, test, and study linear, nonlinear, hierarchical and multigroup structural equation models using composite-based approaches and procedures including estimation techniques such as partial least squares path modeling (PLS) and its derivatives (PLSc, ordPLSc, robustPLSc), generalized structured component analysis (GSCA), generalized structured component analysis with uniqueness terms (GSCAm), generalized canonical correlation analysis (GCCA) unit weights (sum score) and fixed weights, as well as several tests and typical postestimation procedures (e.g., assess the model fit, compute direct, indirect and total effects).

Author(s)

Maintainer: Florian Schuberth f.schuberth@utwente.nl (ORCID)

Authors:

Manuel E. Rademaker manuel-rademaker@outlook.de (ORCID)

Other contributors:

Tamara Schamberger tamara.schamberger@uni-wuerzburg.de (ORCID) [contributor]
Michael Klesel (ORCID) [contributor]
Huu Phuc Nguyen (ORCID) [contributor]
Theo K. Dijkstra [contributor]
Jörg Henseler (ORCID) [contributor]

Data: Anime

Description

A data frame with 183 observations and 13 variables.

Usage

Anime

Format

An object of class data.frame with 183 rows and 13 columns.

Details

The data set for the example on github.com/ISS-Analytics/pls-predict/ with irrelevant variables removed.

Source

Original source: github.com/ISS-Analytics/pls-predict/

Data: Benitezetal2020

Description

A data frame containing 22 variables with 300 observations.

Usage

Benitezetal2020

Format

An object of class data.frame with 300 rows and 22 columns.

Details

The simulated data contains variables about the social executive and employee behavior. Moreover, it contains variables about the social media capability and business performance. The dataset was used as an illustrative example in Benitez et al. (2020).

Source

The dataset is provided as supplementary material by Benitez et al. (2020).

References

Benitez J, Henseler J, Castillo A, Schuberth F (2020). “How to perform and report an impactful analysis using partial least squares: Guidelines for confirmatory and explanatory IS research.” Information & Management, 2(57), 103168. doi:10.1016/j.im.2019.05.003.

Examples

#============================================================================
# Example is taken from Benitez et al. (2020)
#============================================================================
model_Benitez <-"
# Reflective measurement models# Reflective measurement models
SEXB =~ SEXB1 + SEXB2 + SEXB3 +SEXB4
SEMB =~ SEMB1 + SEMB2 + SEMB3 + SEMB4

# Composite models
SMC <~ SMC1 + SMC2 + SMC3 + SMC4
BPP <~ BPP1 + BPP2 + BPP3 + BPP4 + BPP5

# Control variables
FS<~ FirmSize
Ind <~ Industry1 + Industry2 + Industry3

# Structural model
SMC ~ SEXB + SEMB 
BPP ~ SMC + Ind + FS
"

out <- csem(.data = Benitezetal2020, .model = model_Benitez,
            .PLS_weight_scheme_inner = 'factorial',
            .tolerance = 1e-06)

Data: BergamiBagozzi2000

Description

A data frame containing 22 variables with 305 observations.

Usage

BergamiBagozzi2000

Format

An object of class data.frame with 305 rows and 22 columns.

Details

The dataset contains 22 variables and originates from a larger survey among South Korean employees conducted and reported by Bergami and Bagozzi (2000). It is also used in Hwang and Takane (2004) and Henseler (2021) for demonstration purposes, see the corresponding tutorial.

Source

Survey among South Korean employees conducted and reported by Bergami and Bagozzi (2000).

References

Bergami M, Bagozzi RP (2000). “Self-categorization, affective commitment and group self-esteem as distinct aspects of social identity in the organization.” British Journal of Social Psychology, 39(4), 555–577. doi:10.1348/014466600164633.

Henseler J (2021). Composite-Based Structural Equation Modeling: Analyzing Latent and Emergent Variables. Guilford Press, New York.

Hwang H, Takane Y (2004). “Generalized Structured Component Analysis.” Psychometrika, 69(1), 81–99.

Examples

#============================================================================
# Example is taken from Henseler (2021)
#============================================================================
model_Bergami_Bagozzi_Henseler="
# Measurement models
OrgPres =~ cei1 + cei2 + cei3 + cei4 + cei5 + cei6 + cei7 + cei8 
OrgIden =~ ma1 + ma2 + ma3 + ma4 + ma5 + ma6
AffLove =~ orgcmt1 + orgcmt2 + orgcmt3 + orgcmt7
AffJoy  =~ orgcmt5 + orgcmt8
Gender  <~ gender

# Structural model 
OrgIden ~ OrgPres
AffLove ~ OrgPres + OrgIden + Gender 
AffJoy  ~ OrgPres + OrgIden + Gender 
"

out <- csem(.data = BergamiBagozzi2000, 
            .model = model_Bergami_Bagozzi_Henseler,
            .PLS_weight_scheme_inner = 'factorial',
            .tolerance = 1e-06
)

#============================================================================
# Example is taken from Hwang et al. (2004)
#============================================================================ 

model_Bergami_Bagozzi_Hwang="
# Measurement models
OrgPres =~ cei1 + cei2 + cei3 + cei4 + cei5 + cei6 + cei7 + cei8 
OrgIden =~ ma1 + ma2 + ma3 + ma4 + ma5 + ma6
AffJoy =~ orgcmt1 + orgcmt2 + orgcmt3 + orgcmt7
AffLove  =~ orgcmt5 + orgcmt6 + orgcmt8

# Structural model 
OrgIden ~ OrgPres 
AffLove ~ OrgIden
AffJoy  ~ OrgIden"

out_Hwang <- csem(.data = BergamiBagozzi2000, 
                 .model = model_Bergami_Bagozzi_Hwang,
                 .approach_weights = "GSCA",
                 .disattenuate = FALSE,
                 .id = "gender",
                 .tolerance = 1e-06)

Data: ITFlex

Description

A data frame containing 16 variables with 100 observations.

Usage

ITFlex

Format

A data frame containing the following variables:

ITCOMP1: Software applications can be easily transported and used across multiple platforms.
ITCOMP2: Our firm provides multiple interfaces or entry points (e.g., web access) for external end users.
ITCOMP3: Our firm establishes corporate rules and standards for hardware and operating systems to ensure platform compatibility.
ITCOMP4: Data captured in one part of our organization are immediately available to everyone in the firm.
ITCONN1: Our organization has electronic links and connections throughout the entire firm.
ITCONN2: Our firm is linked to business partners through electronic channels (e.g., websites, e-mail, wireless devices, electronic data interchange).
ITCONN3: All remote, branch, and mobile offices are connected to the central office.
ITCONN4: There are very few identifiable communications bottlenecks within our firm.
MOD1: Our firm possesses a great speed in developing new business applications or modifying existing applications.
MOD2: Our corporate database is able to communicate in several different protocols.
MOD3: Reusable software modules are widely used in new systems development.
MOD4: IT personnel use object-oriented and prepackaged modular tools to create software applications.
ITPSF1: Our IT personnel have the ability to work effectively in cross-functional teams.
ITPSF2: Our IT personnel are able to interpret business problems and develop appropriate technical solutions.
ITPSF3: Our IT personnel are self-directed and proactive.
ITPSF4: Our IT personnel are knowledgeable about the key success factors in our firm.

Details

The dataset was studied by Benitez et al. (2018) and is used in Henseler (2021) for demonstration purposes, see the corresponding tutorial. All questionnaire items are measured on a 5-point scale.

Source

The data was collected through a survey by Benitez et al. (2018).

References

Benitez J, Ray G, Henseler J (2018). “Impact of Information Technology Infrastructure Flexibility on Mergers and Acquisitions.” MIS Quarterly, 42(1), 25–43.

Henseler J (2021). Composite-Based Structural Equation Modeling: Analyzing Latent and Emergent Variables. Guilford Press, New York.

Examples

#============================================================================
# Example is taken from Henseler (2020)
#============================================================================
model_IT_Fex="
# Composite models
ITComp  <~ ITCOMP1 + ITCOMP2 + ITCOMP3 + ITCOMP4
Modul   <~ MOD1 + MOD2 + MOD3 + MOD4
ITConn  <~ ITCONN1 + ITCONN2 + ITCONN3 + ITCONN4
ITPers  <~ ITPSF1 + ITPSF2 + ITPSF3 + ITPSF4

# Saturated structural model
ITPers ~ ITComp + Modul + ITConn
Modul  ~ ITComp + ITConn 
ITConn ~ ITComp 
"

out <- csem(.data = ITFlex, .model = model_IT_Fex,
           .PLS_weight_scheme_inner = 'factorial',
           .tolerance = 1e-06,
           .PLS_ignore_structural_model = TRUE)

Data: LancelotMiltgenetal2016

Description

A data frame containing 10 variables with 1090 observations.

Usage

LancelotMiltgenetal2016

Format

An object of class data.frame with 1090 rows and 11 columns.

Details

The data was analysed by Lancelot-Miltgen et al. (2016) to study young consumers’ adoption intentions of a location tracker technology in the light of privacy concerns. It is also used in Henseler (2021) for demonstration purposes, see the corresponding tutorial.

Source

This data has been collected through a cooperation with the European Commission Joint Research Center Institute for Prospective Technological Studies, contract “Young People and Emerging Digital Services: An Exploratory Survey on Motivations, Perceptions, and Acceptance of Risk” (EC JRC Contract IPTS No: 150876-2007 F1ED-FR).

References

Henseler J (2021). Composite-Based Structural Equation Modeling: Analyzing Latent and Emergent Variables. Guilford Press, New York.

Lancelot-Miltgen C, Henseler J, Gelhard C, Popovic A (2016). “Introducing new products that affect consumer privacy: A mediation model.” Journal of Business Research, 69(10), 4659–4666. doi:10.1016/j.jbusres.2016.04.015.

Examples

#============================================================================
# Example is taken from Henseler (2020)
#============================================================================
model_Med <- "
# Reflective measurement model
Trust =~ trust1 + trust2
PrCon =~ privcon1 + privcon2 + privcon3 + privcon4
Risk  =~ risk1 + risk2 + risk3
Int   =~ intent1 + intent2

# Structural model
Int   ~ Trust + PrCon + Risk
Risk  ~ Trust + PrCon
Trust ~ PrCon
"

out <- csem(.data = LancelotMiltgenetal2016, .model = model_Med,
            .PLS_weight_scheme_inner = 'factorial',
            .tolerance = 1e-06
)

Data: political democracy

Description

The Industrialization and Political Democracy dataset. This dataset is used throughout Bollen's 1989 book (see pages 12, 17, 36 in chapter 2, pages 228 and following in chapter 7, pages 321 and following in chapter 8; Bollen (1989)). The dataset contains various measures of political democracy and industrialization in developing countries.

Usage

PoliticalDemocracy

Format

A data frame of 75 observations of 11 variables.

y1: Expert ratings of the freedom of the press in 1960
y2: The freedom of political opposition in 1960
y3: The fairness of elections in 1960
y4: The effectiveness of the elected legislature in 1960
y5: Expert ratings of the freedom of the press in 1965
y6: The freedom of political opposition in 1965
y7: The fairness of elections in 1965
y8: The effectiveness of the elected legislature in 1965
x1: The gross national product (GNP) per capita in 1960
x2: The inanimate energy consumption per capita in 1960
x3: The percentage of the labor force in industry in 1960

Source

The lavaan package (version 0.6-3).

References

Bollen KA (1989). Structural Equations with Latent Variables. Wiley-Interscience. ISBN 978-0471011712.

Examples

#============================================================================
# Example is taken from the lavaan website
#============================================================================
# Note: example is modified. Across-block correlations are removed
model <- "
# Measurement model
  ind60 =~ x1 + x2 + x3
  dem60 =~ y1 + y2 + y3 + y4
  dem65 =~ y5 + y6 + y7 + y8
  
# Regressions / Path model
  dem60 ~ ind60
  dem65 ~ ind60 + dem60
  
# residual correlations
  y2 ~~ y4
  y6 ~~ y8
"

aa <- csem(PoliticalDemocracy, model)

Data: Russett

Description

A data frame containing 10 variables with 47 observations.

Usage

Russett

Format

A data frame containing the following variables for 47 countries:

gini: The Gini index of concentration
farm: The percentage of landholders who collectively occupy one-half of all the agricultural land (starting with the farmers with the smallest plots of land and working toward the largest)
rent: The percentage of the total number of farms that rent all their land. Transformation: ln (x + 1)
gnpr: The 1955 gross national product per capita in U.S. dollars. Transformation: ln (x)
labo: The percentage of the labor force employed in agriculture. Transformation: ln (x)
inst: Instability of personnel based on the term of office of the chief executive. Transformation: exp (x - 16.3)
ecks: The total number of politically motivated violent incidents, from plots to protracted guerrilla warfare. Transformation: ln (x + 1)
deat: The number of people killed as a result of internal group violence per 1,000,000 people. Transformation: ln (x + 1)
stab: One if the country has a stable democracy, and zero otherwise
dict: One if the country experiences a dictatorship, and zero otherwise

Details

The dataset was initially compiled by Russett (1964), discussed and reprinted by Gifi (1990), and partially transformed by Tenenhaus and Tenenhaus (2011). It is also used in Henseler (2021) for demonstration purposes.

Source

From: Henseler (2021)

References

Gifi A (1990). Nonlinear multivariate analysis. Wiley.

Henseler J (2021). Composite-Based Structural Equation Modeling: Analyzing Latent and Emergent Variables. Guilford Press, New York.

Russett BM (1964). “Inequality and Instability: The Relation of Land Tenure to Politics.” World Politics, 16(3), 442–454. doi:10.2307/2009581.

Tenenhaus A, Tenenhaus M (2011). “Regularized generalized canonical correlation analysis.” Psychometrika, 76(2), 257–284.

Examples

#============================================================================
# Example is taken from Henseler (2020)
#============================================================================
model_Russett="
# Composite model
AgrIneq <~ gini + farm + rent
IndDev  <~ gnpr + labo
PolInst <~ inst + ecks + deat + stab + dict

# Structural model
PolInst ~ AgrIneq + IndDev
"

out <- csem(.data = Russett, .model = model_Russett,
            .PLS_weight_scheme_inner = 'factorial',
            .tolerance = 1e-06
)

Data: SQ

Description

A data frame containing 23 variables with 411 observations. The original indicators were measured on a 6-point scale. In this version of the dataset, the indicators are scaled to be between 0 and 100.

Usage

SQ

Format

An object of class data.frame with 411 rows and 23 columns.

Details

The data comes from a European manufacturer of durable consumer goods and was studied by Bliemel et al. (2004) who focused on service quality. It is also used in Henseler (2021) for demonstration purposes, see the corresponding tutorial.

Source

The dataset is provided by Jörg Henseler.

References

Bliemel FW, Adolphs K, Henseler J (2004). “Reconceptualizing service quality. A formative measurement approach using PLS path modeling.” In Munuera-Aleman JL (ed.), Proceedings of the 33rd EMAC Conference, 224.

Henseler J (2021). Composite-Based Structural Equation Modeling: Analyzing Latent and Emergent Variables. Guilford Press, New York.

Data: Summers

Description

A (18 x 18) indicator correlation matrix.

Usage

Sigma_Summers_composites

Format

An object of class matrix (inherits from array) with 18 rows and 18 columns.

Details

The indicator correlation matrix for a modified version of Summers (1965) model. All constructs are modeled as composites.

Source

Own calculation based on Dijkstra and Henseler (2015).

References

Dijkstra TK, Henseler J (2015). “Consistent and Asymptotically Normal PLS Estimators for Linear Structural Equations.” Computational Statistics & Data Analysis, 81, 10–23.

Summers R (1965). “A Capital Intensive Approach to the Small Sample Properties of Various Simultaneous Equation Estimators.” Econometrica, 33(1), 1–41.

Examples


require(cSEM)

model <- "
ETA1 ~ ETA2 + XI1 + XI2
ETA2 ~ ETA1 + XI3 +XI4

ETA1 ~~ ETA2

XI1  <~ x1 + x2 + x3
XI2  <~ x4 + x5 + x6
XI3  <~ x7 + x8 + x9
XI4  <~ x10 + x11 + x12
ETA1 <~ y1 + y2 + y3
ETA2 <~ y4 + y5 + y6
"

## Generate data
summers_dat <- MASS::mvrnorm(n = 300, mu = rep(0, 18), 
                             Sigma = Sigma_Summers_composites, empirical = TRUE)

## Estimate
res <- csem(.data = summers_dat, .model = model) # inconsistent

## 
# 2SLS
res_2SLS <- csem(.data = summers_dat, .model = model, .approach_paths = "2SLS",
                 .instruments = list(ETA1 = c('XI1', 'XI2', 'XI3', 'XI4'),
                                     ETA2 = c('XI1', 'XI2', 'XI3', 'XI4'))
)

Data: Switching

Description

A data frame containing 26 variables with 767 observations.

Usage

Switching

Format

An object of class data.frame with 767 rows and 26 columns.

Details

The data contains variables about the consumers’ intention to switch a service provider. It is also used in Henseler (2021) for demonstration purposes, see the corresponding tutorial.

Source

The dataset is provided by Jörg Henseler.

References

Henseler J (2021). Composite-Based Structural Equation Modeling: Analyzing Latent and Emergent Variables. Guilford Press, New York.

Examples

#============================================================================
# Example is taken from Henseler (2021)
#============================================================================
model_Int <-"
# Measurement models
INV =~ INV1 + INV2 + INV3 +INV4
SAT =~ SAT1 + SAT2 + SAT3
INT =~ INT1 + INT2

# Structural model containing an interaction term.
INT ~ INV + SAT + INV.SAT
"

out <- csem(.data = Switching, .model = model_Int,
            .PLS_weight_scheme_inner = 'factorial',
            .tolerance = 1e-06)

Data: Yooetal2000

Description

A data frame containing 34 variables with 569 observations.

Usage

Yooetal2000

Format

An object of class data.frame with 569 rows and 34 columns.

Details

The data is simulated and has the identical correlation matrix as the data that was analysed by Yoo et al. (2000) to examine how five elements of the marketing mix, namely price, store image, distribution intensity, advertising spending, and price deals, are related to the so-called dimensions of brand equity, i.e., perceived brand quality, brand loyalty, and brand awareness/associations. It is also used in Henseler (2017) and Henseler (2021) for demonstration purposes, see the corresponding tutorial.

Source

Simulated data with the same correlation matrix as the data studied by Yoo et al. (2000).

References

Henseler J (2017). “Bridging Design and Behavioral Research With Variance-Based Structural Equation Modeling.” Journal of Advertising, 46(1), 178–192. doi:10.1080/00913367.2017.1281780.

Henseler J (2021). Composite-Based Structural Equation Modeling: Analyzing Latent and Emergent Variables. Guilford Press, New York.

Yoo B, Donthu N, Lee S (2000). “An Examination of Selected Marketing Mix Elements and Brand Equity.” Journal of the Academy of Marketing Science, 28(2), 195–211. doi:10.1177/0092070300282002.

Examples

#============================================================================
# Example is taken from Henseler (2021)
#============================================================================
model_HOC="
# Measurement models FOC
PR =~ PR1 + PR2 + PR3
IM =~ IM1 + IM2 + IM3
DI =~ DI1 + DI2 + DI3
AD =~ AD1 + AD2 + AD3
DL =~ DL1 + DL2 + DL3
AA =~ AA1 + AA2 + AA3 + AA4 + AA5 + AA6
LO =~ LO1 + LO3
QL =~ QL1 + QL2 + QL3 + QL4 + QL5 + QL6

# Composite model for SOC
BR <~ QL + LO + AA

# Structural model
BR~ PR + IM + DI + AD + DL 
"

out <- csem(.data = Yooetal2000, .model = model_HOC,
            .PLS_weight_scheme_inner = 'factorial',
            .tolerance = 1e-06)

Internal: Multiple testing correction

Description

Adjust a given significance level .alpha to accommodate multiple testing. The following corrections are implemented:

none: (Default) No correction is done.
bonferroni: A Bonferroni correction is done, i.e., alpha is divided by the number of comparisons .nr_comparisons.

Usage

adjustAlpha(
 .alpha                 = args_default()$.alpha,
 .approach_alpha_adjust = args_default()$.approach_alpha_adjust,
 .nr_comparisons        = args_default()$.nr_comparisons
)

Arguments

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

.approach_alpha_adjust

Character string. Approach used to adjust the significance level to accommodate multiple testing. One of "none" or "bonferroni". Defaults to "none".

.nr_comparisons

Integer. The number of comparisons. Defaults to NULL.

Value

A vector of (possibly adjusted) significance levels.

Complete list of assess()'s ... arguments

Description

A complete alphabetical list of all possible arguments accepted by assess()'s ... (dotdotdot) argument.

Arguments

.absolute

Logical. Should the absolute HTMT values be returned? Defaults to TRUE .

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

.ci

A vector of character strings naming the confidence interval to compute. For possible choices see infer().

.closed_form_ci

Logical. Should a closed-form confidence interval be computed? Defaults to FALSE.

.handle_inadmissibles

Character string. How should inadmissible results be treated? One of "drop", "ignore", or "replace". If "drop", all replications/resamples yielding an inadmissible result will be dropped (i.e. the number of results returned will potentially be less than .R). For "ignore" all results are returned even if all or some of the replications yielded inadmissible results (i.e. number of results returned is equal to .R). For "replace" resampling continues until there are exactly .R admissible solutions. Depending on the frequency of inadmissible solutions this may significantly increase computing time. Defaults to "drop".

.inference

Logical. Should critical values be computed? Defaults to FALSE.

.null_model

Logical. Should the degrees of freedom for the null model be computed? Defaults to FALSE.

.R

Integer. The number of bootstrap replications. Defaults to 499.

.saturated

Logical. Should a saturated structural model be used? Defaults to FALSE.

.seed

Integer or NULL. The random seed to use. Defaults to NULL in which case an arbitrary seed is chosen. Note that the scope of the seed is limited to the body of the function it is used in. Hence, the global seed will not be altered!

.type_gfi

Character string. Which fitting function should the GFI be based on? One of "ML" for the maximum likelihood fitting function, "GLS" for the generalized least squares fitting function or "ULS" for the unweighted least squares fitting function (same as the squared Euclidean distance). Defaults to "ML".

.type_vcv

Character string. Which model-implied correlation matrix should be calculated? One of "indicator" or "construct". Defaults to "indicator".

Details

Most arguments supplied to the ... argument of assess() are only accepted by a subset of the functions called by assess(). The following list shows which argument is passed to which function:

.absolute: Accepted by/Passed down to: calculateHTMT()
.alpha: Accepted by/Passed down to: calculateRhoT(), calculateHTMT(), calculateCN()
.ci: Accepted by/Passed down to: calculateHTMT()
.closed_form_ci: Accepted by/Passed down to: calculateRhoT()
.handle_inadmissibles: Accepted by/Passed down to: calculateHTMT()
.inference: Accepted by/Passed down to: calculateHTMT
.null_model: Accepted by/Passed down to: calculateDf()
.R: Accepted by/Passed down to: calculateHTMT()
.saturated: Accepted by/Passed down to: calculateSRMR(), calculateDG(), calculateDL(), calculateDML()and subsequently fit().
.seed: Accepted by/Passed down to: calculateHTMT()
.type_gfi: Accepted by/Passed down to: calculateGFI()
.type_vcv: Accepted by/Passed down to: calculateSRMR(), calculateDG(), calculateDL(), calculateDML() and subsequently fit().

Show argument defaults or candidates

Description

Show all arguments used by package functions including default or candidate values. For argument descriptions see: csem_arguments.

Usage

args_default(.choices = FALSE)

Arguments

.choices

Logical. Should candidate values for the arguments be returned? Defaults to FALSE.

Details

By default args_default()returns a list of default values by argument name. If the list of accepted candidate values is required instead, use .choices = TRUE.

Value

A named list of argument names and defaults or accepted candidates.

Assess model

Description

Usage

assess(
  .object              = NULL, 
  .quality_criterion   = c("all", "aic", "aicc", "aicu", "bic", "fpe", "gm", "hq",
                           "hqc", "mallows_cp", "ave",
                           "rho_C", "rho_C_mm", "rho_C_weighted", 
                           "rho_C_weighted_mm", "dg", "dl", "dml", "df",
                           "effects", "f2", "fl_criterion", "chi_square", "chi_square_df",
                           "cfi", "cn", "gfi", "ifi", "nfi", "nnfi", 
                           "reliability",
                           "rmsea", "rms_theta", "srmr",
                           "gof", "htmt", "htmt2", "r2", "r2_adj",
                           "rho_T", "rho_T_weighted", "vif", 
                           "vifmodeB"),
  .only_common_factors = TRUE, 
  ...
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.quality_criterion

Character string. A single character string or a vector of character strings naming the quality criterion to compute. See the Details section for a list of possible candidates. Defaults to "all" in which case all possible quality criteria are computed.

.only_common_factors

Logical. Should only concepts modeled as common factors be included when calculating one of the following quality criteria: AVE, the Fornell-Larcker criterion, HTMT, and all reliability estimates. Defaults to TRUE.

...

Further arguments passed to functions called by assess(). See args_assess_dotdotdot for a complete list of available arguments.

Details

Assess a model using common quality criteria. See the Postestimation: Assessing a model article on the cSEM website for details.

The function is essentially a wrapper around a number of internal functions that perform an "assessment task" (called a quality criterion in cSEM parlance) like computing reliability estimates, the effect size (Cohen's f^2), the heterotrait-monotrait ratio of correlations (HTMT) etc.

By default every possible quality criterion is calculated (.quality_criterion = "all"). If only a subset of quality criteria are needed a single character string or a vector of character strings naming the criteria to be computed may be supplied to assess() via the .quality_criterion argument. Currently, the following quality criteria are implemented (in alphabetical order):

Average variance extracted (AVE); "ave": An estimate of the amount of variation in the indicators that is due to the underlying latent variable. Practically, it is calculated as the ratio of the (indicator) true score variances (i.e., the sum of the squared loadings) relative to the sum of the total indicator variances. The AVE is inherently tied to the common factor model. It is therefore unclear how to meaningfully interpret AVE results for constructs modeled as composites. It is possible to report the AVE for constructs modeled as composites by setting .only_common_factors = FALSE, however, result should be interpreted with caution as they may not have a conceptual meaning. Calculation is done by calculateAVE().
Congeneric reliability; "rho_C", "rho_C_mm", "rho_C_weighted", "rho_C_weighted_mm": An estimate of the reliability assuming a congeneric measurement model (i.e., loadings are allowed to differ) and a test score (proxy) based on unit weights. There are four different versions implemented. See the Methods and Formulae section of the Postestimation: Assessing a model article on the cSEM website for details. Alternative but synonymous names for "rho_C" are: composite reliability, construct reliability, reliability coefficient, Jöreskog's rho, coefficient omega, or Dillon-Goldstein's rho. For "rho_C_weighted": (Dijkstra-Henselers) rhoA. rho_C_mm and rho_C_weighted_mm have no corresponding names. The former uses unit weights scaled by (w'Sw)^(-1/2) and the latter weights scaled by (w'Sigma_hat w)^(-1/2) where Sigma_hat is the model-implied indicator correlation matrix. The Congeneric reliability is inherently tied to the common factor model. It is therefore unclear how to meaningfully interpret congeneric reliability estimates for constructs modeled as composites. It is possible to report the congeneric reliability for constructs modeled as composites by setting .only_common_factors = FALSE, however, result should be interpreted with caution as they may not have a conceptual meaning. Calculation is done by calculateRhoC().
Distance measures; "dg", "dl", "dml": Measures of the distance between the model-implied and the empirical indicator correlation matrix. Currently, the geodesic distance ("dg"), the squared Euclidean distance ("dl") and the the maximum likelihood-based distance function are implemented ("dml"). Calculation is done by calculateDL(), calculateDG(), and calculateDML().
Degrees of freedom, "df": Returns the degrees of freedom. Calculation is done by calculateDf().
Effects; "effects": Total and indirect effect estimates. Additionally, the variance accounted for (VAF) is computed. The VAF is defined as the ratio of a variables indirect effect to its total effect. Calculation is done by calculateEffects().
Effect size; "f2": An index of the effect size of an independent variable in a structural regression equation. This measure is commonly known as Cohen's f^2. The effect size of the k'th independent variable in this case is defined as the ratio (R2_included - R2_excluded)/(1 - R2_included), where R2_included and R2_excluded are the R squares of the original structural model regression equation (R2_included) and the alternative specification with the k'th variable dropped (R2_excluded). Calculation is done by calculatef2().
Fit indices; "chi_square", "chi_square_df", "cfi", "cn", "gfi", "ifi", "nfi", "nnfi", "rmsea", "rms_theta", "srmr": Several absolute and incremental fit indices. Note that their suitability for models containing constructs modeled as composites is still an open research question. Also note that fit indices are not tests in a hypothesis testing sense and decisions based on common one-size-fits-all cut-offs proposed in the literature suffer from serious statistical drawbacks. Calculation is done by calculateChiSquare(), calculateChiSquareDf(), calculateCFI(), calculateGFI(), calculateIFI(), calculateNFI(), calculateNNFI(), calculateRMSEA(), calculateRMSTheta() and calculateSRMR().
Fornell-Larcker criterion; "fl_criterion": A rule suggested by Fornell and Larcker (1981) to assess discriminant validity. The Fornell-Larcker criterion is a decision rule based on a comparison between the squared construct correlations and the average variance extracted. FL returns a matrix with the squared construct correlations on the off-diagonal and the AVEs on the main diagonal. Calculation is done by calculateFLCriterion().
Goodness of Fit (GoF); "gof": The GoF is defined as the square root of the mean of the R squares of the structural model times the mean of the variances in the indicators that are explained by their related constructs (i.e., the average over all lambda^2_k). For the latter, only constructs modeled as common factors are considered as they explain their indicator variance in contrast to a composite where indicators actually build the construct. Note that, contrary to what the name suggests, the GoF is not a measure of model fit in a Chi-square fit test sense. Calculation is done by calculateGoF().
Heterotrait-monotrait ratio of correlations (HTMT); "htmt": An estimate of the correlation between latent variables assuming tau equivalent measurement models. The HTMT is used to assess convergent and/or discriminant validity of a construct. The HTMT is inherently tied to the common factor model. If the model contains less than two constructs modeled as common factors and .only_common_factors = TRUE, NA is returned. It is possible to report the HTMT for constructs modeled as composites by setting .only_common_factors = FALSE, however, result should be interpreted with caution as they may not have a conceptual meaning. Calculation is done by calculateHTMT().
HTMT2; "htmt2": An estimate of the correlation between latent variables assuming congeneric measurement models. The HTMT2 is used to assess convergent and/or discriminant validity of a construct. The HTMT is inherently tied to the common factor model. If the model contains less than two constructs modeled as common factors and .only_common_factors = TRUE, NA is returned. It is possible to report the HTMT for constructs modeled as composites by setting .only_common_factors = FALSE, however, result should be interpreted with caution as they may not have a conceptual meaning. Calculation is done by calculateHTMT().
Model selection criteria: "aic", "aicc", "aicu", "bic", "fpe", "gm", "hq", "hqc", "mallows_cp": Several model selection criteria as suggested by Sharma et al. (2019) in the context of PLS. See: calculateModelSelectionCriteria() for details.
Reliability: "reliability": As described in the Methods and Formulae section of the Postestimation: Assessing a model article on the cSEM website there are many different estimators for the (internal consistency) reliability. Choosing .quality_criterion = "reliability" computes the three most common measures, namely: "Cronbach's alpha" (identical to "rho_T"), "Jöreskog's rho" (identical to "rho_C_mm"), and "Dijkstra-Henseler's rho A" (identical to "rho_C_weighted_mm"). Reliability is inherently tied to the common factor model. It is therefore unclear how to meaningfully interpret reliability estimates for constructs modeled as composites. It is possible to report the three common reliability estimates for constructs modeled as composites by setting .only_common_factors = FALSE, however, result should be interpreted with caution as they may not have a conceptual meaning.
R square and R square adjusted; "r2", "r2_adj": The R square and the adjusted R square for each structural regression equation. Calculated when running csem().
Tau-equivalent reliability; "rho_T": An estimate of the reliability assuming a tau-equivalent measurement model (i.e. a measurement model with equal loadings) and a test score (proxy) based on unit weights. Tau-equivalent reliability is the preferred name for reliability estimates that assume a tau-equivalent measurement model such as Cronbach's alpha. The tau-equivalent reliability (Cronbach's alpha) is inherently tied to the common factor model. It is therefore unclear how to meaningfully interpret tau-equivalent reliability estimates for constructs modeled as composites. It is possible to report tau-equivalent reliability estimates for constructs modeled as composites by setting .only_common_factors = FALSE, however, result should be interpreted with caution as they may not have a conceptual meaning. Calculation is done by calculateRhoT().
Variance inflation factors (VIF); "vif": An index for the amount of (multi-)collinearity between independent variables of a regression equation. Computed for each structural equation. Practically, VIF_k is defined as the ratio of 1 over (1 - R2_k) where R2_k is the R squared from a regression of the k'th independent variable on all remaining independent variables. Calculated when running csem().
Variance inflation factors for PLS-PM mode B (VIF-ModeB); "vifmodeB": An index for the amount of (multi-)collinearity between independent variables (indicators) in mode B regression equations. Computed only if .object was obtained using .weight_approach = "PLS-PM" and at least one mode was mode B. Practically, VIF-ModeB_k is defined as the ratio of 1 over (1 - R2_k) where R2_k is the R squared from a regression of the k'th indicator of block j on all remaining indicators of the same block. Calculation is done by calculateVIFModeB().

For details on the most important quality criteria see the Methods and Formulae section of the Postestimation: Assessing a model article on the on the cSEM website.

Some of the quality criteria are inherently tied to the classical common factor model and therefore only meaningfully interpreted within a common factor model (see the Postestimation: Assessing a model article for details). It is possible to force computation of all quality criteria for constructs modeled as composites by setting .only_common_factors = FALSE, however, we explicitly warn to interpret quality criteria in analogy to the common factor model in this case, as the interpretation often does not carry over to composite models.

Resampling

To resample a given quality criterion supply the name of the function that calculates the desired quality criterion to csem()'s .user_funs argument. See resamplecSEMResults() for details.

Value

A named list of quality criteria. Note that if only a single quality criteria is computed the return value is still a list!

Examples

# ===========================================================================
# Using the three common factors dataset
# ===========================================================================
model <- "
# Structural model
eta2 ~ eta1
eta3 ~ eta1 + eta2

# Each concept is measured by 3 indicators, i.e., modeled as latent variable
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"

res <- csem(threecommonfactors, model)
a   <- assess(res) # computes all quality criteria (.quality_criterion = "all")
a

## The return value is a named list. Type for example:
a$HTMT

# You may also just compute a subset of the quality criteria
assess(res, .quality_criterion = c("ave", "rho_C", "htmt"))

## Resampling ---------------------------------------------------------------
# To resample a given quality criterion use csem()'s .user_funs argument
# Note: The output of the quality criterion needs to be a vector or a matrix.
#       Matrices will be vectorized columnwise.
res <- csem(threecommonfactors, model, 
            .resample_method = "bootstrap", 
            .R               = 40,
            .user_funs       = cSEM:::calculateSRMR
)

## Look at the resamples
res$Estimates$Estimates_resample$Estimates1$User_fun$Resampled[1:4, ]

## Use infer() to compute e.g., the 95% percentile confidence interval
res_infer <- infer(res, .quantity = "CI_percentile")

## The results are saved under the name "User_fun"
res_infer$User_fun 

## Several quality criteria can be resampled simultaneously
res <- csem(threecommonfactors, model, 
            .resample_method = "bootstrap",
            .R               = 40,
            .user_funs       = list(
              "SRMR" = cSEM:::calculateSRMR,
              "RMS_theta" = cSEM:::calculateRMSTheta
            ),
            .tolerance = 1e-04
)
res$Estimates$Estimates_resample$Estimates1$SRMR$Resampled[1:4, ]
res$Estimates$Estimates_resample$Estimates1$RMS_theta$Resampled[1:4]

Internal: Build DOT code for the SEM plot, including construct correlations.

Description

Constructs the DOT script for the SEM path diagram, now including correlations between constructs (not just exogenous ones). Correctly handles drawing only one edge per correlation.

Usage

buildDotCode(
  title,
  graph_attrs,
  constructs,
  r2_values,
  measurement_edge_fun,
  path_coefficients,
  path_p_values,
  correlations,
  plot_significances,
  plot_correlations,
  plot_structural_model_only,
  plot_labels,
  is_second_order = FALSE,
  model_measurement = NULL,
  model_error_cor = NULL,
  construct_correlations = NULL,
  indicator_correlations = NULL
)

Arguments

title

The title of the plot.

graph_attrs

Optional graph attributes.

constructs

A vector of constructs.

r2_values

Named vector of R2 values.

measurement_edge_fun

Function to generate measurement edge code.

path_coefficients

Matrix/data frame of path coefficients.

path_p_values

Named vector of path p-values. Used for construct correlations too.

correlations

List containing correlations (exogenous and indicator).

plot_significances

Logical. Whether to display significance levels.

plot_correlations

Option for indicator correlations ("none", "exo", or "all").

plot_structural_model_only

Logical. Whether to display only the structural model.

is_second_order

Logical. Whether the model is second-order.

model_measurement

a matrix. The measurement matrix.

model_error_cor

a matrix.

construct_correlations

A matrix. The construct correlation matrix.

indicator_correlations

A matrix. The indicator correlation matrix.

Value

A character string containing the complete DOT code.

Internal: Second/Third stage of the two-stage approach for second order constructs

Description

Performs the second and third stage for a model containing second order constructs.

Usage

calculate2ndStage(
 .csem_model          = args_default()$.csem_model,
 .first_stage_results = args_default()$.first_stage_results,
 .original_arguments  = args_default()$.original_arguments,
 .approach_2ndorder   = args_default()$.approach_2ndorder
  )

Arguments

.csem_model

A (possibly incomplete) cSEMModel-list.

.original_arguments

The list of arguments used within csem().

.approach_2ndorder

Character string. Approach used for models containing second-order constructs. One of: "2stage", or "mixed". Defaults to "2stage".

Value

A cSEMResults object.

Average variance extracted (AVE)

Description

Calculate the average variance extracted (AVE) as proposed by Fornell and Larcker (1981). For details see the cSEM website

Usage

calculateAVE(
 .object              = NULL,
 .only_common_factors = TRUE
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.only_common_factors

Details

The AVE is inherently tied to the common factor model. It is therefore unclear how to meaningfully interpret the AVE in the context of a composite model. It is possible, however, to force computation of the AVE for constructs modeled as composites by setting .only_common_factors = FALSE.

Value

A named vector of numeric values (the AVEs). If .object is a list of cSEMResults objects, a list of AVEs is returned.

References

Fornell C, Larcker DF (1981). “Evaluating structural equation models with unobservable variables and measurement error.” Journal of Marketing Research, XVIII, 39–50.

Internal: Calculate composite variance-covariance matrix

Description

Calculate the sample variance-covariance (VCV) matrix of the composites/proxies.

Usage

calculateCompositeVCV(
 .S  = args_default()$.S,
 .W  = args_default()$.W
 )

Arguments

.S

The (K x K) empirical indicator correlation matrix.

.W

A (J x K) matrix of weights.

Value

A (J x J) composite VCV matrix.

Internal: Calculate construct variance-covariance matrix

Description

Calculate the variance-covariance matrix (VCV) of the constructs, i.e., correlations that involve common factors/latent variables are diattenuated.

Usage

calculateConstructVCV(
 .C          = args_default()$.C, 
 .Q          = args_default()$.Q
 )

Arguments

.C

A (J x J) composite variance-covariance matrix.

.Q

A vector of composite-construct correlations with element names equal to the names of the J construct names used in the measurement model. Note Q^2 is also called the reliability coefficient.

Value

The (J x J) construct VCV matrix. Disattenuated if requested.

Internal: Calculate PLSc correction factors

Description

Calculates the correction factor used by PLSc.

Usage

calculateCorrectionFactors(
 .S               = args_default()$.S,
 .W               = args_default()$.W,
 .modes           = args_default()$.modes,
 .csem_model      = args_default()$.csem_model,
 .PLS_approach_cf = args_default()$.PLS_approach_cf
 )

Arguments

.S

The (K x K) empirical indicator correlation matrix.

.W

A (J x K) matrix of weights.

.modes

A vector giving the mode for each construct in the form "name" = "mode". Only used internally.

.csem_model

A (possibly incomplete) cSEMModel-list.

.PLS_approach_cf

Character string. Approach used to obtain the correction factors for PLSc. One of: "dist_squared_euclid", "dist_euclid_weighted", "fisher_transformed", "mean_arithmetic", "mean_geometric", "mean_harmonic", "geo_of_harmonic". Defaults to "dist_squared_euclid". Ignored if .disattenuate = FALSE or if .approach_weights is not PLS-PM.

Details

Currently, seven approaches are available:

"dist_squared_euclid" (default)
"dist_euclid_weighted"
"fisher_transformed"
"mean_geometric"
"mean_harmonic"
"mean_arithmetic"
"geo_of_harmonic" (not yet implemented)

See (Dijkstra 2013) for details.

Value

A numeric vector of correction factors with element names equal to the names of the J constructs used in the measurement model.

References

Dijkstra TK (2013). “A Note on How to Make Partial Least Squares Consistent.” Working Paper. doi:10.13140/RG.2.1.4547.5688.

Degrees of freedom

Description

Calculate the degrees of freedom for a given model from a cSEMResults object.

Usage

calculateDf(
  .object     = NULL,
  .null_model = FALSE,
  ...
  )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.null_model

Logical. Should the degrees of freedom for the null model be computed? Defaults to FALSE.

...

Ignored.

Details

Although, composite-based estimators always retrieve parameters of the postulated models via the estimation of a composite model, the computation of the degrees of freedom depends on the postulated model.

See: cSEM website for details on how the degrees of freedom are calculated.

To compute the degrees of freedom of the null model use .null_model = TRUE. The degrees of freedom of the null model are identical to the number of non-redundant off-diagonal elements of the empirical indicator correlation matrix. This implicitly assumes a null model with model-implied indicator correlation matrix equal to the identity matrix.

Value

A single numeric value.

Internal: Matrix difference

Description

Calculates the average of the differences between all possible pairs of (symmetric) matrices in a list using a given distance measure.

Usage

calculateDistance(
  .matrices = NULL, 
  .distance = args_default()$.distance
  )

Arguments

.matrices

A list of at least two matrices.

.distance

Character string. A distance measure. One of: "geodesic" or "squared_euclidean". Defaults to "geodesic".

Details

.matrices must be a list of at least two matrices. If more than two matrices are supplied the arithmetic mean of the differences between all possible pairs of (symmetric) matrices in a list is computed. Mathematically this is n chose 2. Hence, supplying a large number of matrices will become computationally challenging.

Currently two distance measures are supported:

geodesic: (Default) The geodesic distance.
squared_euclidean: The squared Euclidean distance

Value

A numeric vector of length one containing the (arithmetic) mean of the differences between all possible pairs of matrices supplied via .matrices.

Internal: Calculate direct, indirect and total effect

Description

The direct effects are equal to the estimated coefficients. The total effect equals (I-B)^-1 Gamma. The indirect effect equals the difference between the total effect and the indirect effect. In addition, the variance accounted for (VAF) is calculated. The VAF is defined as the ratio of a variables indirect effect to its total effect. Helper for generic functions summarize() and assess().

Usage

calculateEffects(
 .object       = NULL,
 .output_type  = c("data.frame", "matrix")
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.output_type

Character string. The type of output to return. One of "complete" or "structured". See the Value section for details. Defaults to "complete".

Value

A matrix or a data frame of effects.

Fornell-Larcker criterion

Description

Computes the Fornell-Larcker matrix.

Usage

calculateFLCriterion(
  .object              = NULL,
  .only_common_factors = TRUE,
  ...
  )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.only_common_factors

...

Ignored.

Details

The Fornell-Larcker criterion (FL criterion) is a rule suggested by Fornell and Larcker (1981) to assess discriminant validity. The Fornell-Larcker criterion is a decision rule based on a comparison between the squared construct correlations and the average variance extracted (AVE).

The FL criterion is inherently tied to the common factor model. It is therefore unclear how to meaningfully interpret the FL criterion in the context of a model that contains constructs modeled as composites.

Value

A matrix with the squared construct correlations on the off-diagonal and the AVEs on the main diagonal.

References

Fornell C, Larcker DF (1981). “Evaluating structural equation models with unobservable variables and measurement error.” Journal of Marketing Research, XVIII, 39–50.

Internal: ANOVA F-test statistic

Description

Calculate the ANOVA F-test statistic suggested by Sarstedt et al. (2011) in the OTG testing procedure.

Usage

calculateFR(.resample_sarstedt)

Arguments

.resample_sarstedt

A matrix containing the parameter estimates that could potentially be compared and an id column indicating the group adherence of each row.

Value

A named scalar, the test statistic of the ANOVA F-test

References

Sarstedt M, Henseler J, Ringle CM (2011). “Multigroup Analysis in Partial Least Squares (PLS) Path Modeling: Alternative Methods and Empirical Results.” In Advances in International Marketing, 195–218. Emerald Group Publishing Limited. doi:10.1108/s1474-7979(2011)0000022012.

Goodness of Fit (GoF)

Description

Calculate the Goodness of Fit (GoF) proposed by Tenenhaus et al. (2004). Note that, contrary to what the name suggests, the GoF is not a measure of model fit in the sense of SEM. See e.g. Henseler and Sarstedt (2012) for a discussion.

Usage

calculateGoF(
 .object              = NULL
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

Details

The GoF is inherently tied to the common factor model. It is therefore unclear how to meaningfully interpret the GoF in the context of a model that contains constructs modeled as composites.

Value

A single numeric value.

References

Henseler J, Sarstedt M (2012). “Goodness-of-fit Indices for Partial Least Squares Path Modeling.” Computational Statistics, 28(2), 565–580. doi:10.1007/s00180-012-0317-1.

Tenenhaus M, Amanto S, Vinzi VE (2004). “A Global Goodness-of-Fit Index for PLS Structural Equation Modelling.” In Proceedings of the XLII SIS Scientific Meeting, 739–742.

HTMT

Description

Computes either the heterotrait-monotrait ratio of correlations (HTMT) based on Henseler et al. (2015) or the HTMT2 proposed by Roemer et al. (2021). While the HTMT is a consistent estimator for the construct correlation in case of tau-equivalent measurement models, the HTMT2 is a consistent estimator for congeneric measurement models. In general, they are used to assess discriminant validity.

Usage

calculateHTMT(
 .object               = NULL,
 .type_htmt            = c('htmt','htmt2'),
 .absolute             = TRUE,
 .alpha                = 0.05,
 .ci                   = c("CI_percentile", "CI_standard_z", "CI_standard_t", 
                           "CI_basic", "CI_bc", "CI_bca", "CI_t_interval"),
 .inference            = FALSE,
 .only_common_factors  = TRUE,
 .R                    = 499,
 .seed                 = NULL,
 ...
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.type_htmt

Character string indicating the type of HTMT that should be calculated, i.e., the original HTMT ("htmt") or the HTMT2 ("htmt2"). Defaults to "htmt"

.absolute

Logical. Should the absolute HTMT values be returned? Defaults to TRUE .

.alpha

A numeric value giving the significance level. Defaults to 0.05.

.ci

A character strings naming the type of confidence interval to use to compute the 1-alpha% quantile of the bootstrap HTMT values. For possible choices see infer(). Ignored if .inference = FALSE. Defaults to "CI_percentile".

.inference

Logical. Should critical values be computed? Defaults to FALSE.

.only_common_factors

.R

Integer. The number of bootstrap replications. Defaults to 499.

.seed

...

Ignored.

Details

Computation of the HTMT/HTMT2 assumes that all intra-block and inter-block correlations between indicators are either all-positive or all-negative. A warning is given if this is not the case.

To obtain bootstrap confidence intervals for the HTMT/HTMT2 values, set .inference = TRUE. To choose the type of confidence interval, use .ci. To control the bootstrap process, arguments .R and .seed are available. Note, that .alpha is multiplied by two because typically researchers are interested in one-sided bootstrap confidence intervals for the HTMT/HTMT2.

Since the HTMT and the HTMT2 both assume a reflective measurement model only concepts modeled as common factors are considered by default. For concepts modeled as composites the HTMT may be computed by setting .only_common_factors = FALSE, however, it is unclear how to interpret values in this case.

Value

A named list containing:

the values of the HTMT/HTMT2, i.e., a matrix with the HTMT/HTMT2 values at its lower triangular and if .inference = TRUE the upper triangular contains the upper limit of the 1-2*.alpha% bootstrap confidence interval if the HTMT/HTMT2 is positive and the lower limit if the HTMT/HTMT2 is negative.
the lower and upper limits of the 1-2*.alpha% bootstrap confidence interval if .inference = TRUE; otherwise it is NULL.
the number of admissible bootstrap runs, i.e., the number of HTMT/HTMT2 values calculated during bootstrap if .inference = TRUE; otherwise it is NULL. Note, the HTMT2 is based on the geometric and thus cannot always be calculated.

References

Henseler J, Ringle CM, Sarstedt M (2015). “A New Criterion for Assessing Discriminant Validity in Variance-based Structural Equation Modeling.” Journal of the Academy of Marketing Science, 43(1), 115–135. doi:10.1007/s11747-014-0403-8.

Roemer E, Schuberth F, Henseler J (2021). “HTMT2 – an improved criterion for assessing discriminant validity in structural equation modeling.” Industrial Management & Data Systems, 121(12), 2637–2650.

Internal: Calculate indicator correlation matrix

Description

Calculate the indicator correlation matrix using conventional or robust methods.

Usage

calculateIndicatorCor(
  .X_cleaned           = NULL, 
  .approach_cor_robust = "none"
 )

Arguments

.X_cleaned

A data.frame of processed data (cleaned and ordered). Note: X_cleaned may not be scaled!

.approach_cor_robust

Character string. Approach used to obtain a robust indicator correlation matrix. One of: "none" in which case the standard Bravais-Pearson correlation is used, "spearman" for the Spearman rank correlation, or "mcd" via MASS::cov.rob() for a robust correlation matrix. Defaults to "none". Note that many postestimation procedures (such as testOMF() or fit() implicitly assume a continuous indicator correlation matrix (e.g. Bravais-Pearson correlation matrix). Only use if you know what you are doing.

Details

If .approach_cor_robust = "none" (the default) the type of correlation computed depends on the types of the columns of .X_cleaned (i.e., the indicators) involved in the computation.

Numeric-numeric: If both columns (indicators) involved are numeric, the Bravais-Pearson product-moment correlation is computed (via stats::cor()).
Numeric-factor: If any of the columns is a factor variable, the polyserial correlation (Drasgow 1988) is computed (via polycor::polyserial()).
Factor-factor: If both columns are factor variables, the polychoric correlation (Drasgow 1988) is computed (via polycor::polychor()).

Note: logical input is treated as a 0-1 factor variable.

If "mcd" (= minimum covariance determinant), the MCD estimator (Rousseeuw and Driessen 1999), a robust covariance estimator, is applied (via MASS::cov.rob()).

If "spearman", the Spearman rank correlation is used (via stats::cor()).

Value

A list with elements:

⁠$S⁠: The (K x K) indicator correlation matrix
⁠$cor_type⁠: The type(s) of indicator correlation computed ( "Pearson", "Polyserial", "Polychoric")
⁠$thre_est⁠: Currently ignored (NULL)

References

Drasgow F (1988). “Polychoric and polyserial correlations.” In Encyclopedia of Statistical Sciences, volume 7, 68-74. John Wiley & Sons Inc, Hoboken.

Rousseeuw PJ, Driessen KV (1999). “A Fast Algorithm for the Minimum Covariance Determinant Estimator.” Technometrics, 41(3), 212–223. doi:10.1080/00401706.1999.10485670.

Internal: Calculate the inner weights for PLS-PM

Description

PLS-PM forms "inner" composites as a weighted sum of its I related composites. These inner weights are obtained using one of the following schemes (Lohmöller 1989):

centroid

According to the centroid weighting scheme each inner weight used to form composite j is either 1 if the correlation between composite j and its via the structural model related composite i = 1, ..., I is positive and -1 if it is negative.

factorial

According to the factorial weighting scheme each inner weight used to form inner composite j is equal to the correlation between composite j and its via the structural model related composite i = 1, ..., I.

path

Lets call all construct that have an arrow pointing to construct j predecessors of j and all arrows going from j to other constructs followers of j. According the path weighting scheme, inner weights are computed as follows. Take construct j:

For all predecessors of j set the inner weight of predecessor i to the correlation of i with j.
For all followers of j set the inner weight of follower i to the coefficient of a multiple regression of j on all followers i with i = 1,...,I.

Except for the path weighting scheme relatedness can come in two flavors. If .PLS_ignore_structural_model = TRUE all constructs are considered related. If .PLS_ignore_structural_model = FALSE (the default) only adjacent constructs are considered. If .PLS_ignore_structural_model = TRUE and .PLS_weight_scheme_inner = "path" a warning is issued and .PLS_ignore_structural_model is changed to FALSE.

Usage

calculateInnerWeightsPLS(
  .S                           = args_default()$.S,
  .W                           = args_default()$.W,
  .csem_model                  = args_default()$.csem_model,
  .PLS_ignore_structural_model = args_default()$.PLS_ignore_structrual_model,
  .PLS_weight_scheme_inner     = args_default()$.PLS_weight_scheme_inner
)

Arguments

.S

The (K x K) empirical indicator correlation matrix.

.W

A (J x K) matrix of weights.

.csem_model

A (possibly incomplete) cSEMModel-list.

.PLS_ignore_structural_model

Logical. Should the structural model be ignored when calculating the inner weights of the PLS-PM algorithm? Defaults to FALSE. Ignored if .approach_weights is not PLS-PM.

.PLS_weight_scheme_inner

Character string. The inner weighting scheme used by PLS-PM. One of: "centroid", "factorial", or "path". Defaults to "path". Ignored if .approach_weight is not PLS-PM.

Value

The (J x J) matrix E of inner weights.

Internal: Calculate prediction metrics

Description

Currently, the following prediction measures are available:

Usage

calculateMAE(resid)

Details

Mean absolute error
Mean absolute percentage error
Mean squared error (MSE)
Root mean squared error
Theil's forecast accuracy
Theil's forecast quality
Bias proportion of MSE
Regression proportion of MSE
Disturbance proportion of MSE

Value

A vector of the prediction measures for the observed variables belonging to endogenous constructs

Model selection criteria

Description

Calculate several information or model selection criteria (MSC) such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC) or the Hannan-Quinn criterion (HQ).

Usage

calculateModelSelectionCriteria(
  .object          = NULL,
  .ms_criterion    = c("all", "aic", "aicc", "aicu", "bic", "fpe", "gm", "hq",
                       "hqc", "mallows_cp"),
  .by_equation     = TRUE, 
  .only_structural = TRUE 
  )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.ms_criterion

Character string. Either a single character string or a vector of character strings naming the model selection criterion to compute. Defaults to "all".

.by_equation

Should the criteria be computed for each structural model equation separately? Defaults to TRUE.

.only_structural

Should the the log-likelihood be based on the structural model? Ignored if .by_equation == TRUE. Defaults to TRUE.

Details

By default, all criteria are calculated (.ms_criterion == "all"). To compute only a subset of the criteria a vector of criteria may be given.

If .by_equation == TRUE (the default), the criteria are computed for each structural equation of the model separately, as suggested by Sharma et al. (2019) in the context of PLS. The relevant formula can be found in Table B1 of the appendix of Sharma et al. (2019).

If .by_equation == FALSE the AIC, the BIC and the HQ for whole model are calculated. All other criteria are currently ignored in this case! The relevant formula are (see, e.g., (Akaike 1974), Schwarz (1978), Hannan and Quinn (1979)):

AIC = - 2*log(L) + 2*k

BIC = - 2*log(L) + k*ln(n)

HQ = - 2*log(L) + 2*k*ln(ln(n))

where log(L) is the log likelihood function of the multivariate normal distribution of the observable variables, k the (total) number of estimated parameters, and n the sample size.

If .only_structural == TRUE, log(L) is based on the structural model only. The argument is ignored if .by_equation == TRUE.

Value

If .by_equation == TRUE a named list of model selection criteria.

References

Akaike H (1974). “A New Look at the Statistical Model Identification.” IEEE Transactions on Automatic Control, 19(6), 716–723.

Hannan EJ, Quinn BG (1979). “The Determination of the order of an autoregression.” Journal of the Royal Statistical Society: Series B (Methodological), 41(2), 190–195.

Schwarz G (1978). “Estimating the Dimension of a Model.” The Annals of Statistics, 6(2), 461–464. doi:10.1214/aos/1176344136.

Sharma P, Sarstedt M, Shmueli G, Kim KH, Thiele KO (2019). “PLS-Based Model Selection: The Role of Alternative Explanations in Information Systems Research.” Journal of the Association for Information Systems, 20(4).

Internal: Calculate the outer weights for PLS-PM

Description

Calculates outer weights in PLS-PM. Currently, the originally suggested mode A and mode B are suggested. Additionally, non-negative least squares (modeBNNLS) and weights of principal component analysis (PCA) are implemented.

Usage

calculateOuterWeightsPLS(
   .data   = args_default()$.data,  
   .S      = args_default()$.S,
   .W      = args_default()$.W,
   .E      = args_default()$.E,
   .modes  = args_default()$.modes
   )

Arguments

.data

A data.frame or a matrix of standardized or unstandardized data (indicators/items/manifest variables). Possible column types or classes of the data provided are: "logical", "numeric" ("double" or "integer"), "factor" ("ordered" and/or "unordered"), "character" (converted to factor), or a mix of several types.

.S

The (K x K) empirical indicator correlation matrix.

.W

A (J x K) matrix of weights.

.E

A (J x J) matrix of inner weights.

.modes

A vector giving the mode for each construct in the form "name" = "mode". Only used internally.

Value

A (J x K) matrix of outer weights.

Internal: Parameter differences across groups

Description

Calculate the difference between one or more parameter estimates across all possible pairs of groups (data sets) in .object.

Usage

calculateParameterDifference(
  .object     = args_default()$.object,
  .model      = args_default()$.model
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.model

A model in lavaan model syntax indicating which parameters (i.e., path (~), loadings (⁠=~⁠), or weights (⁠<~⁠)) should be compared across groups. Defaults to NULL in which case all parameters of the model are compared.

Value

A list of length equal to the number of possible pairs of groups in .object (mathematically, this is n choose 2, i.e., 3 if there are three groups and 6 if there are 4 groups). Each list elements is itself a list of three. The first list element contains the difference between parameter estimates of the structural model, the second list element the difference between estimated loadings, and the third the difference between estimated weights.

Internal: Calculation of the CDF used in Henseler et al. (2009)

Description

Calculates the probability that theta^1 is smaller than or equal to theta^2. See Equation (6) in Sarstedt et al. (2011).

Usage

calculatePr(.resample_centered = NULL, .parameters_to_compare = NULL)

Arguments

.parameters_to_compare

A model in lavaan model syntax indicating which parameters (i.e, path (~), loadings (⁠=~⁠), weights (⁠<~⁠), or correlations (⁠~~⁠)) should be compared across groups. Defaults to NULL in which case all weights, loadings and path coefficients of the originally specified model are compared.

Value

A named vector

References

Relative Goodness of Fit (relative GoF)

Description

Calculate the Relative Goodness of Fit (GoF) proposed by Vinzi et al. (2010). Note that, contrary to what the name suggests, the Relative GoF is not a measure of model fit in the sense of SEM. See e.g. Henseler and Sarstedt (2012) for a discussion.

Usage

calculateRelativeGoF(
 .object              = NULL
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

Value

A single numeric value.

References

Henseler J, Sarstedt M (2012). “Goodness-of-fit Indices for Partial Least Squares Path Modeling.” Computational Statistics, 28(2), 565–580. doi:10.1007/s00180-012-0317-1.

Vinzi VE, Trinchera L, Amato S (2010). “PLS path modeling: From foundations to recent developments and open issues for model assessment and improvement.” In Vinzi VE, Wang H (eds.), Handbook of Partial Least Squares, 47–82. Springer.

Internal: Calculate Reliabilities

Description

Internal: Calculate Reliabilities

Usage

calculateReliabilities(
  .X = args_default()$.X,
  .S = args_default()$.S,
  .W = args_default()$.W,
  .approach_weights = args_default()$.approach_weights,
  .csem_model = args_default()$.csem_model,
  .disattenuate = args_default()$.disattenuate,
  .PLS_approach_cf = args_default()$.PLS_approach_cf,
  .reliabilities = args_default()$.reliabilities
)

Arguments

.X

A matrix of processed data (scaled, cleaned and ordered).

.S

The (K x K) empirical indicator correlation matrix.

.W

A (J x K) matrix of weights.

.approach_weights

Character string. Approach used to obtain composite weights. One of: "PLS-PM", "SUMCORR", "MAXVAR", "SSQCORR", "MINVAR", "GENVAR", "GSCA", "PCA", "unit", "bartlett", or "regression". Defaults to "PLS-PM".

.csem_model

A (possibly incomplete) cSEMModel-list.

.disattenuate

Logical. Should composite/proxy correlations be disattenuated to yield consistent loadings and path estimates if at least one of the construct is modeled as a common factor? Defaults to TRUE.

.PLS_approach_cf

.reliabilities

A character vector of "name" = value pairs, where value is a number between 0 and 1 and "name" a character string of the corresponding construct name, or NULL. Reliabilities may be given for a subset of the constructs. Defaults to NULL in which case reliabilities are estimated by csem(). Currently, only supported for .approach_weights = "PLS-PM".

Calculate variance inflation factors (VIF) for weights obtained by PLS Mode B

Description

Calculate the variance inflation factor (VIF) for weights obtained by PLS-PM's Mode B.

Usage

calculateVIFModeB(.object = NULL)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

Details

Weight estimates obtained by Mode B can suffer from multicollinearity. VIF values are commonly used to assess the severity of multicollinearity.

The function is only applicable to objects of class cSEMResults_default. For other object classes use assess().

Value

A named list of vectors containing the VIF values. Each list name is the name of a construct whose weights were obtained by Mode B. The vectors contain the VIF values obtained from a regression of each explanatory variable of a given construct on the remaining explanatory variables of that construct.

If the weighting approach is not "PLS-PM" or for none of the constructs Mode B is used, the function silently returns NA.

References

There are no references for Rd macro ⁠\insertAllCites⁠ on this help page.

Calculate composite weights using GSCA

Description

Calculate composite weights using generalized structure component analysis (GSCA). The first version of this approach was presented in Hwang and Takane (2004). Since then, several advancements have been proposed. The latest version of GSCA can been found in Hwang and Takane (2014). This is the version cSEMs implementation is based on.

Usage

calculateWeightsGSCA(
  .X                           = args_default()$.X,
  .S                           = args_default()$.S,
  .csem_model                  = args_default()$.csem_model,
  .conv_criterion              = args_default()$.conv_criterion,
  .iter_max                    = args_default()$.iter_max,
  .starting_values             = args_default()$.starting_values,
  .tolerance                   = args_default()$.tolerance
   )

Arguments

.X

A matrix of processed data (scaled, cleaned and ordered).

.S

The (K x K) empirical indicator correlation matrix.

.csem_model

A (possibly incomplete) cSEMModel-list.

.conv_criterion

Character string. The criterion to use for the convergence check. One of: "diff_absolute", "diff_squared", or "diff_relative". Defaults to "diff_absolute".

.iter_max

Integer. The maximum number of iterations allowed. If iter_max = 1 and .approach_weights = "PLS-PM" one-step weights are returned. If the algorithm exceeds the specified number, weights of iteration step .iter_max - 1 will be returned with a warning. Defaults to 100.

.starting_values

A named list of vectors where the list names are the construct names whose indicator weights the user wishes to set. The vectors must be named vectors of "indicator_name" = value pairs, where value is the (scaled or unscaled) starting weight. Defaults to NULL.

.tolerance

Double. The tolerance criterion for convergence. Defaults to 1e-05.

Value

A named list. J stands for the number of constructs and K for the number of indicators.

⁠$W⁠: A (J x K) matrix of estimated weights.
⁠$E⁠: NULL
⁠$Modes⁠: A named vector of Modes used for the outer estimation, for GSCA the mode is automatically set to "gsca".
⁠$Conv_status⁠: The convergence status. TRUE if the algorithm has converged and FALSE otherwise.
⁠$Iterations⁠: The number of iterations required.

References

Hwang H, Takane Y (2004). “Generalized Structured Component Analysis.” Psychometrika, 69(1), 81–99.

Hwang H, Takane Y (2014). Generalized Structured Component Analysis: A Component-Based Approach to Structural Equation Modeling, Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences. Chapman and Hall/CRC.

Calculate weights using GSCAm

Description

Calculate composite weights using generalized structured component analysis with uniqueness terms (GSCAm) proposed by Hwang et al. (2017).

Usage

calculateWeightsGSCAm(
  .X                           = args_default()$.X,
  .csem_model                  = args_default()$.csem_model,
  .conv_criterion              = args_default()$.conv_criterion,
  .iter_max                    = args_default()$.iter_max,
  .starting_values             = args_default()$.starting_values,
  .tolerance                   = args_default()$.tolerance
   )

Arguments

.X

A matrix of processed data (scaled, cleaned and ordered).

.csem_model

A (possibly incomplete) cSEMModel-list.

.conv_criterion

Character string. The criterion to use for the convergence check. One of: "diff_absolute", "diff_squared", or "diff_relative". Defaults to "diff_absolute".

.iter_max

.starting_values

.tolerance

Double. The tolerance criterion for convergence. Defaults to 1e-05.

Details

If there are only constructs modeled as common factors calling csem() with .appraoch_weights = "GSCA" will automatically call calculateWeightsGSCAm() unless .disattenuate = FALSE. GSCAm currently only works for pure common factor models. The reason is that the implementation in cSEM is based on (the appendix) of Hwang et al. (2017). Following the appendix, GSCAm fails if there is at least one construct modeled as a composite because calculating weight estimates with GSCAm leads to a product involving the measurement matrix. This matrix does not have full rank if a construct modeled as a composite is present. The reason is that the measurement matrix has a zero row for every construct which is a pure composite (i.e. all related loadings are zero) and, therefore, leads to a non-invertible matrix when multiplying it with its transposed.

Value

A list with the elements

⁠$W⁠: A (J x K) matrix of estimated weights.
⁠$C⁠: The (J x K) matrix of estimated loadings.
⁠$B⁠: The (J x J) matrix of estimated path coefficients.
⁠$E⁠: NULL
⁠$Modes⁠: A named vector of Modes used for the outer estimation, for GSCA the mode is automatically set to 'gsca'.
⁠$Conv_status⁠: The convergence status. TRUE if the algorithm has converged and FALSE otherwise.
⁠$Iterations⁠: The number of iterations required.

References

Hwang H, Takane Y, Jung K (2017). “Generalized structured component analysis with uniqueness terms for accommodating measurement error.” Frontiers in Psychology, 8(2137), 1–12.

Calculate composite weights using GCCA

Description

Calculates composite weights according to one of the the five criteria "SUMCORR", "MAXVAR", "SSQCORR", "MINVAR", and "GENVAR" suggested by Kettenring (1971).

Usage

calculateWeightsKettenring(
  .S              = args_default()$.S, 
  .csem_model     = args_default()$.csem_model,   
  .approach_gcca  = args_default()$.approach_gcca
  )

Arguments

.S

The (K x K) empirical indicator correlation matrix.

.csem_model

A (possibly incomplete) cSEMModel-list.

.approach_gcca

Character string. The Kettenring approach to use for GCCA. One of "SUMCORR", "MAXVAR", "SSQCORR", "MINVAR" or "GENVAR". Defaults to "SUMCORR".

Value

A named list. J stands for the number of constructs and K for the number of indicators.

⁠$W⁠: A (J x K) matrix of estimated weights.
⁠$E⁠: NULL
⁠$Modes⁠: The GCCA mode used for the estimation.
⁠$Conv_status⁠: The convergence status. TRUE if the algorithm has converged and FALSE otherwise. For .approach_gcca = "MINVAR" or .approach_gcca = "MAXVAR" the convergence status is NULL since both are closed-form estimators.
⁠$Iterations⁠: The number of iterations required. 0 for .approach_gcca = "MINVAR" or .approach_gcca = "MAXVAR"

References

Kettenring JR (1971). “Canonical Analysis of Several Sets of Variables.” Biometrika, 58(3), 433–451.

Calculate composite weights using principal component analysis (PCA)

Description

Calculate weights for each block by extracting the first principal component of the indicator correlation matrix S_jj for each blocks, i.e., weights are the simply the first eigenvector of S_jj.

Usage

calculateWeightsPCA(
 .S                 = args_default()$.S,
 .csem_model        = args_default()$.csem_model
  )

Arguments

.S

The (K x K) empirical indicator correlation matrix.

.csem_model

A (possibly incomplete) cSEMModel-list.

Value

A named list. J stands for the number of constructs and K for the number of indicators.

⁠$W⁠: A (J x K) matrix of estimated weights.
⁠$E⁠: NULL
⁠$Modes⁠: The mode used. Always "PCA".
⁠$Conv_status⁠: NULL as there are no iterations
⁠$Iterations⁠: 0 as there are no iterations

Calculate composite weights using PLS-PM

Description

Calculate composite weights using the partial least squares path modeling (PLS-PM) algorithm (Wold 1975).

Usage

calculateWeightsPLS(
  .data                        = args_default()$.data,
  .S                           = args_default()$.S,
  .csem_model                  = args_default()$.csem_model,
  .conv_criterion              = args_default()$.conv_criterion,
  .iter_max                    = args_default()$.iter_max,
  .PLS_ignore_structural_model = args_default()$.PLS_ignore_structural_model,
  .PLS_modes                   = args_default()$.PLS_modes,
  .PLS_weight_scheme_inner     = args_default()$.PLS_weight_scheme_inner,
  .starting_values             = args_default()$.starting_values,
  .tolerance                   = args_default()$.tolerance
   )

Arguments

.data

.S

The (K x K) empirical indicator correlation matrix.

.csem_model

A (possibly incomplete) cSEMModel-list.

.conv_criterion

Character string. The criterion to use for the convergence check. One of: "diff_absolute", "diff_squared", or "diff_relative". Defaults to "diff_absolute".

.iter_max

.PLS_ignore_structural_model

Logical. Should the structural model be ignored when calculating the inner weights of the PLS-PM algorithm? Defaults to FALSE. Ignored if .approach_weights is not PLS-PM.

.PLS_modes

Either a named list specifying the mode that should be used for each construct in the form "construct_name" = mode, a single character string giving the mode that should be used for all constructs, or NULL. Possible choices for mode are: "modeA", "modeB", "modeBNNLS", "unit", "PCA", a single integer or a vector of fixed weights of the same length as there are indicators for the construct given by "construct_name". If only a single number is provided this is identical to using unit weights, as weights are rescaled such that the related composite has unit variance. Defaults to NULL. If NULL the appropriate mode according to the type of construct used is chosen. Ignored if .approach_weight is not PLS-PM.

.PLS_weight_scheme_inner

Character string. The inner weighting scheme used by PLS-PM. One of: "centroid", "factorial", or "path". Defaults to "path". Ignored if .approach_weight is not PLS-PM.

.starting_values

.tolerance

Double. The tolerance criterion for convergence. Defaults to 1e-05.

Value

A named list. J stands for the number of constructs and K for the number of indicators.

⁠$W⁠: A (J x K) matrix of estimated weights.
⁠$E⁠: A (J x J) matrix of inner weights.
⁠$Modes⁠: A named vector of modes used for the outer estimation.
⁠$Conv_status⁠: The convergence status. TRUE if the algorithm has converged and FALSE otherwise. If one-step weights are used via .iter_max = 1 or a non-iterative procedure was used, the convergence status is set to NULL.
⁠$Iterations⁠: The number of iterations required.

References

Wold H (1975). “Path models with latent variables: The NIPALS approach.” In Blalock HM, Aganbegian A, Borodkin FM, Boudon R, Capecchi V (eds.), Quantitative Sociology, International Perspectives on Mathematical and Statistical Modeling, 307–357. Academic Press, New York.

Calculate composite weights using unit weights

Description

Calculate unit weights for all blocks, i.e., each indicator of a block is equally weighted.

Usage

calculateWeightsUnit(
 .S                 = args_default()$.S,
 .csem_model        = args_default()$.csem_model,
 .starting_values   = args_default()$.starting_values
  )

Arguments

.S

The (K x K) empirical indicator correlation matrix.

.csem_model

A (possibly incomplete) cSEMModel-list.

.starting_values

Value

A named list. J stands for the number of constructs and K for the number of indicators.

⁠$W⁠: A (J x K) matrix of estimated weights.
⁠$E⁠: NULL
⁠$Modes⁠: The mode used. Always "unit".
⁠$Conv_status⁠: NULL as there are no iterations
⁠$Iterations⁠: 0 as there are no iterations

Calculate Cohen's f^2

Description

Calculate the effect size for regression analysis (Cohen 1992) known as Cohen's f^2.

Usage

calculatef2(.object = NULL)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

Value

A matrix with as many rows as there are structural equations. The number of columns is equal to the total number of right-hand side variables of these equations.

References

Cohen J (1992). “A power primer.” Psychological Bulletin, 112(1), 155–159.

Internal: Check convergence

Description

Check convergence of an algorithm using one of the following criteria:

diff_absolute: Checks if the largest elementwise absolute difference between two matrices .W_new and W.old is smaller than a given tolerance.
diff_squared: Checks if the largest elementwise squared difference between two matrices .W_new and W.old is smaller than a given tolerance.
diff_relative: Checks if the largest elementwise absolute rate of change (new - old / new) for two matrices .W_new and W.old is smaller than a given tolerance.

Usage

checkConvergence(
  .W_new          = args_default()$.W_new,
  .W_old          = args_default()$.W_old,
  .conv_criterion = args_default()$.conv_criterion,
  .tolerance      = args_default()$.tolerance
  )

Arguments

.W_new

A (J x K) matrix of weights.

.W_old

A (J x K) matrix of weights.

.conv_criterion

Character string. The criterion to use for the convergence check. One of: "diff_absolute", "diff_squared", or "diff_relative". Defaults to "diff_absolute".

.tolerance

Double. The tolerance criterion for convergence. Defaults to 1e-05.

Value

TRUE if converged; FALSE otherwise.

Internal: Check whether two indicators belong to the same construct.

Description

Checks whether two indicators belong to the same construct.

Usage

check_connection(
  .indicator1,
  .indicator2,
  .model_measurement,
  .model_error_cor
)

Arguments

.indicator1

Character string. The name of the indicator 1.

.indicator2

Character string. The name of the indicator 1.

.model_measurement

Matrix. The measurement matrix indicating the relationship between constructs and indicators.

.model_error_cor

Matrix. The matrix indicates the error correlation structure.

Value

TRUE if both indicators belong to the same construct, FALSE otherwise.

Internal: Classify structural model terms by type

Description

Classify terms of the structural model according to their type.

Usage

classifyConstructs(.terms = args_default()$.terms)

Arguments

.terms

A vector of construct names to be classified.

Details

Classification is required to estimate nonlinear structural relationships. Currently the following terms are supported

Single, e.g., eta1
Quadratic, e.g., eta1.eta1
Cubic, e.g., eta1.eta1.eta1
Two-way interaction, e.g., eta1.eta2
Three-way interaction, e.g., eta1.eta2.eta3
Quadratic and two-way interaction, e.g., eta1.eta1.eta3

Note that exponential terms are modeled as "interactions with itself" as in i.e., eta1^3 = eta1.eta1.eta1.

Value

A named list of length equal to the number of terms provided containing a data frame with columns "Term_class", "Component", "Component_type", and "Component_freq".

Internal: Clean a node name.

Description

Removes a trailing "_temp" from a node name.

Usage

cleanNode(node)

Arguments

node

A node name.

Value

A cleaned node name.

Internal: Convert second order cSEMModel

Description

Uses a cSEMModel containing second order constructs and turns it into an estimable model using either the "2stage" approach or the "mixed" approach.

Usage

convertModel(
 .csem_model        = NULL, 
 .approach_2ndorder = "2stage",
 .stage             = "first"
 )

Arguments

.csem_model

A (possibly incomplete) cSEMModel-list.

.approach_2ndorder

Character string. Approach used for models containing second-order constructs. One of: "2stage", or "mixed". Defaults to "2stage".

.stage

Character string. The stage the model is needed for. One of "first" or "second". Defaults to "first".

Value

A cSEMModel list that may be passed to any function requiring .csem_model as a mandatory argument.

Composite-based SEM

Description

Usage

csem(
.data                  = NULL,
.model                 = NULL,
.approach_2ndorder     = c("2stage", "mixed"),
.approach_cor_robust   = c("none", "mcd", "spearman"),
.approach_nl           = c("sequential", "replace"),
.approach_paths        = c("OLS", "2SLS"),
.approach_weights      = c("PLS-PM", "SUMCORR", "MAXVAR", "SSQCORR", 
                           "MINVAR", "GENVAR","GSCA", "PCA",
                           "unit", "bartlett", "regression"),
.conv_criterion        = c("diff_absolute", "diff_squared", "diff_relative"),
.disattenuate          = TRUE,
.dominant_indicators   = NULL,
.estimate_structural   = TRUE,
.id                    = NULL,
.instruments           = NULL,
.iter_max              = 100,
.normality             = FALSE,
.PLS_approach_cf       = c("dist_squared_euclid", "dist_euclid_weighted", 
                           "fisher_transformed", "mean_arithmetic",
                           "mean_geometric", "mean_harmonic",
                           "geo_of_harmonic"),
.PLS_ignore_structural_model = FALSE,
.PLS_modes                   = NULL,
.PLS_weight_scheme_inner     = c("path", "centroid", "factorial"),
.reliabilities         = NULL,
.starting_values       = NULL,
.resample_method       = c("none", "bootstrap", "jackknife"),
.resample_method2      = c("none", "bootstrap", "jackknife"),
.R                     = 499,
.R2                    = 199,
.handle_inadmissibles  = c("drop", "ignore", "replace"),
.user_funs             = NULL,
.eval_plan             = c("sequential", "multicore", "multisession"),
.seed                  = NULL,
.sign_change_option    = c("none", "individual", "individual_reestimate", 
                           "construct_reestimate"),
.tolerance             = 1e-05
)

Arguments

.data

A data.frame or a matrix of standardized or unstandardized data (indicators/items/manifest variables). Additionally, a list of data sets (data frames or matrices) is accepted in which case estimation is repeated for each data set. Possible column types or classes of the data provided are: "logical", "numeric" ("double" or "integer"), "factor" ("ordered" and/or "unordered"), "character" (will be converted to factor), or a mix of several types.

.model

A model in lavaan model syntax or a cSEMModel list.

.approach_2ndorder

Character string. Approach used for models containing second-order constructs. One of: "2stage", or "mixed". Defaults to "2stage".

.approach_cor_robust

.approach_nl

Character string. Approach used to estimate nonlinear structural relationships. One of: "sequential" or "replace". Defaults to "sequential".

.approach_paths

Character string. Approach used to estimate the structural coefficients. One of: "OLS" or "2SLS". If "2SLS", instruments need to be supplied to .instruments. Defaults to "OLS".

.approach_weights

.conv_criterion

Character string. The criterion to use for the convergence check. One of: "diff_absolute", "diff_squared", or "diff_relative". Defaults to "diff_absolute".

.disattenuate

Logical. Should composite/proxy correlations be disattenuated to yield consistent loadings and path estimates if at least one of the construct is modeled as a common factor? Defaults to TRUE.

.dominant_indicators

A character vector of "construct_name" = "indicator_name" pairs, where "indicator_name" is a character string giving the name of the dominant indicator and "construct_name" a character string of the corresponding construct name. Dominant indicators may be specified for a subset of the constructs. Default to NULL.

.estimate_structural

Logical. Should the structural coefficients be estimated? Defaults to TRUE.

.id

Character string or integer. A character string giving the name or an integer of the position of the column of .data whose levels are used to split .data into groups. Defaults to NULL.

.instruments

A named list of vectors of instruments. The names of the list elements are the names of the dependent (LHS) constructs of the structural equation whose explanatory variables are endogenous. The vectors contain the names of the instruments corresponding to each equation. Note that exogenous variables of a given equation must be supplied as instruments for themselves. Defaults to NULL.

.iter_max

.normality

Logical. Should joint normality of [\eta_{1:p}; \zeta; \epsilon] be assumed in the nonlinear model? See (Dijkstra and Schermelleh-Engel 2014) for details. Defaults to FALSE. Ignored if the model is not nonlinear.

.PLS_approach_cf

.PLS_ignore_structural_model

Logical. Should the structural model be ignored when calculating the inner weights of the PLS-PM algorithm? Defaults to FALSE. Ignored if .approach_weights is not PLS-PM.

.PLS_modes

.PLS_weight_scheme_inner

Character string. The inner weighting scheme used by PLS-PM. One of: "centroid", "factorial", or "path". Defaults to "path". Ignored if .approach_weight is not PLS-PM.

.reliabilities

.starting_values

.resample_method

Character string. The resampling method to use. One of: "none", "bootstrap" or "jackknife". Defaults to "none".

.resample_method2

Character string. The resampling method to use when resampling from a resample. One of: "none", "bootstrap" or "jackknife". For "bootstrap" the number of draws is provided via .R2. Currently, resampling from each resample is only required for the studentized confidence interval ("CI_t_interval") computed by the infer() function. Defaults to "none".

.R

Integer. The number of bootstrap replications. Defaults to 499.

.R2

Integer. The number of bootstrap replications to use when resampling from a resample. Defaults to 199.

.handle_inadmissibles

.user_funs

A function or a (named) list of functions to apply to every resample. The functions must take .object as its first argument (e.g., ⁠myFun <- function(.object, ...) {body-of-the-function}⁠). Function output should preferably be a (named) vector but matrices are also accepted. However, the output will be vectorized (columnwise) in this case. See the examples section for details.

.eval_plan

Character string. The evaluation plan to use. One of "sequential", "multicore", or "multisession". In the two latter cases all available cores will be used. Defaults to "sequential".

.seed

.sign_change_option

Character string. Which sign change option should be used to handle flipping signs when resampling? One of "none","individual", "individual_reestimate", "construct_reestimate". Defaults to "none".

.tolerance

Double. The tolerance criterion for convergence. Defaults to 1e-05.

Details

Estimate linear, nonlinear, hierarchical and multigroup structural equation models using a composite-based approach. In cSEM any method or approach that involves linear compounds (scores/proxies/composites) of observables (indicators/items/manifest variables) is defined as composite-based. See the Get started section of the cSEM website for a general introduction to composite-based SEM and cSEM.

csem() estimates linear, nonlinear, hierarchical or multigroup structural equation models using a composite-based approach.

Data and model:

The .data and .model arguments are required. .data must be given a matrix or a data.frame with column names matching the indicator names used in the model description. Alternatively, a list of data sets (matrices or data frames) may be provided in which case estimation is repeated for each data set. Possible column types/classes of the data provided are: "logical", "numeric" ("double" or "integer"), "factor" ("ordered" and/or "unordered"), "character", or a mix of several types. Character columns will be treated as (unordered) factors.

Depending on the type/class of the indicator data provided cSEM computes the indicator correlation matrix in different ways. See calculateIndicatorCor() for details.

In the current version .data must not contain missing values. Future versions are likely to handle missing values as well.

To provide a model use the lavaan model syntax. Note, however, that cSEM currently only supports the "standard" lavaan model syntax (Types 1, 2, 3, and 7 as described on the help page). Therefore, specifying e.g., a threshold or scaling factors is ignored. Alternatively, a standardized (possibly incomplete) cSEMModel-list may be supplied. See parseModel() for details.

Weights and path coefficients:

By default weights are estimated using the partial least squares path modeling algorithm ("PLS-PM"). A range of alternative weighting algorithms may be supplied to .approach_weights. Currently, the following approaches are implemented

(Default) Partial least squares path modeling ("PLS-PM"). The algorithm can be customized. See calculateWeightsPLS() for details.
Generalized structured component analysis ("GSCA") and generalized structured component analysis with uniqueness terms (GSCAm). The algorithms can be customized. See calculateWeightsGSCA() and calculateWeightsGSCAm() for details. Note that GSCAm is called indirectly when the model contains constructs modeled as common factors only and .disattenuate = TRUE. See below.
Generalized canonical correlation analysis (GCCA), including "SUMCORR", "MAXVAR", "SSQCORR", "MINVAR", "GENVAR".
Principal component analysis ("PCA")
Factor score regression using sum scores ("unit"), regression ("regression") or Bartlett scores ("bartlett")

It is possible to supply starting values for the weighting algorithm via .starting_values. The argument accepts a named list of vectors where the list names are the construct names whose indicator weights the user wishes to set. The vectors must be named vectors of "indicator_name" = value pairs, where value is the starting weight. See the examples section below for details.

Composite-indicator and composite-composite correlations are properly disattenuated by default to yield consistent loadings, construct correlations, and path coefficients if any of the concepts are modeled as a common factor.

For PLS-PM disattenuation is done using PLSc (Dijkstra and Henseler 2015). For GSCA disattenuation is done implicitly by using GSCAm (Hwang et al. 2017). Weights obtained by GCCA, unit, regression, bartlett or PCA are disattenuated using Croon's approach (Croon 2002). Disattenuation my be suppressed by setting .disattenuate = FALSE. Note, however, that quantities in this case are inconsistent estimates for their construct level counterparts if any of the constructs in the structural model are modeled as a common factor!

By default path coefficients are estimated using ordinary least squares (.approach_path = "OLS"). For linear models, two-stage least squares ("2SLS") is available, however, only if instruments are internal, i.e., part of the structural model. Future versions will add support for external instruments if possible. Instruments must be supplied to .instruments as a named list where the names of the list elements are the names of the dependent constructs of the structural equations whose explanatory variables are believed to be endogenous. The list consists of vectors of names of instruments corresponding to each equation. Note that exogenous variables of a given equation must be supplied as instruments for themselves.

If reliabilities are known they can be supplied as "name" = value pairs to .reliabilities, where value is a numeric value between 0 and 1. Currently, only supported for "PLS-PM".

Nonlinear models:

If the model contains nonlinear terms csem() estimates a polynomial structural equation model using a non-iterative method of moments approach described in Dijkstra and Schermelleh-Engel (2014). Nonlinear terms include interactions and exponential terms. The latter is described in model syntax as an "interaction with itself", e.g., xi^3 = xi.xi.xi. Currently only exponential terms up to a power of three (e.g., three-way interactions or cubic terms) are allowed:

- Single, e.g., eta1
- Quadratic, e.g., eta1.eta1
- Cubic, e.g., eta1.eta1.eta1
- Two-way interaction, e.g., eta1.eta2
- Three-way interaction, e.g., eta1.eta2.eta3
- Quadratic and two-way interaction, e.g., eta1.eta1.eta3

The current version of the package allows two kinds of estimation: estimation of the reduced form equation (.approach_nl = "replace") and sequential estimation (.approach_nl = "sequential", the default). The latter does not allow for multivariate normality of all exogenous variables, i.e., the latent variables and the error terms.

Distributional assumptions are kept to a minimum (an i.i.d. sample from a population with finite moments for the relevant order); for higher order models, that go beyond interaction, we work in this version with the assumption that as far as the relevant moments are concerned certain combinations of measurement errors behave as if they were Gaussian. For details see: Dijkstra and Schermelleh-Engel (2014).

Models containing second-order constructs

Second-order constructs are specified using the operators ⁠=~⁠ and ⁠<~⁠. These operators are usually used with indicators on their right-hand side. For second-order constructs the right-hand side variables are constructs instead. If c1, and c2 are constructs forming or measuring a higher-order construct, a model would look like this:

my_model <- "
# Structural model
SAT  ~ QUAL
VAL  ~ SAT

# Measurement/composite model
QUAL =~ qual1 + qual2
SAT  =~ sat1 + sat2

c1 =~ x11 + x12
c2 =~ x21 + x22

# Second-order construct (in this case a second-order composite build by common
# factors)
VAL <~ c1 + c2
"

Currently, two approaches are explicitly implemented:

(Default) "2stage". The (disjoint) two-stage approach as proposed by Agarwal and Karahanna (2000). Note that by default a correction for attenuation is applied if common factors are involved in modeling second-order constructs. For instance, the three-stage approach proposed by Van Riel et al. (2017) is applied in case of a second-order construct specified as a composite of common factors. On the other hand, if no common factors are involved the two-stage approach is applied as proposed by Schuberth et al. (2020).
"mixed". The mixed repeated indicators/two-stage approach as proposed by Ringle et al. (2012).

The repeated indicators approach as proposed by Joereskog and Wold (1982) and the extension proposed by Becker et al. (2012) are not directly implemented as they simply require a respecification of the model. In the above example the repeated indicators approach would require to change the model and to append the repeated indicators to the data supplied to .data. Note that the indicators need to be renamed in this case as csem() does not allow for one indicator to be attached to multiple constructs.

my_model <- "
# Structural model
SAT  ~ QUAL
VAL  ~ SAT

VAL ~ c1 + c2

# Measurement/composite model
QUAL =~ qual1 + qual2
SAT  =~ sat1 + sat2
VAL  =~ x11_temp + x12_temp + x21_temp + x22_temp

c1 =~ x11 + x12
c2 =~ x21 + x22
"

According to the extended approach indirect effects of QUAL on VAL via c1 and c2 would have to be specified as well.

Multigroup analysis

To perform a multigroup analysis provide either a list of data sets or one data set containing a group-identifier-column whose column name must be provided to .id. Values of this column are taken as levels of a factor and are interpreted as group identifiers. csem() will split the data by levels of that column and run the estimation for each level separately. Note, the more levels the group-identifier-column has, the more estimation runs are required. This can considerably slow down estimation, especially if resampling is requested. For the latter it will generally be faster to use .eval_plan = "multisession" or .eval_plan = "multicore".

Inference:

Inference is done via resampling. See resamplecSEMResults() and infer() for details.

Value

An object of class cSEMResults with methods for all postestimation generics. Technically, a call to csem() results in an object with at least two class attributes. The first class attribute is always cSEMResults. The second is one of cSEMResults_default, cSEMResults_multi, or cSEMResults_2ndorder and depends on the estimated model and/or the type of data provided to the .model and .data arguments. The third class attribute cSEMResults_resampled is only added if resampling was conducted. For a details see the cSEMResults helpfile .

Postestimation

assess(): Assess results using common quality criteria, e.g., reliability, fit measures, HTMT, R2 etc.
infer(): Calculate common inferential quantities, e.g., standard errors, confidence intervals.
plot(): Creates a plot of the model. For the help file see plot.cSEMResults_default().
predict(): Predict endogenous indicator scores and compute common prediction metrics.
summarize(): Summarize the results. Mainly called for its side-effect the print method.
verify(): Verify/Check admissibility of the estimates.

Tests are performed using the test-family of functions. Currently the following tests are implemented:

testCVPAT(): Cross-validated predictive ability test proposed by Liengaard et al. (2021)
testOMF(): Bootstrap-based test for overall model fit based on Beran and Srivastava (1985).
testMICOM(): Permutation-based test for measurement invariance of composites proposed by Henseler et al. (2016).
testMGD(): Several (mainly) permutation-based tests for multi-group comparisons.
testHausman(): Regression-based Hausman test to test for endogeneity.

Other miscellaneous postestimation functions belong do the do-family of functions. Currently three do functions are implemented:

doIPMA(): Performs an importance-performance matrix analysis (IPMA).
doNonlinearEffectsAnalysis(): Perform a nonlinear effects analysis as described in e.g., Spiller et al. (2013)
doRedundancyAnalysis(): Perform a redundancy analysis (RA) as proposed by Hair et al. (2016) with reference to Chin (1998)

References

Agarwal R, Karahanna E (2000). “Time Flies When You're Having Fun: Cognitive Absorption and Beliefs about Information Technology Usage.” MIS Quarterly, 24(4), 665.

Becker J, Klein K, Wetzels M (2012). “Hierarchical Latent Variable Models in PLS-SEM: Guidelines for Using Reflective-Formative Type Models.” Long Range Planning, 45(5-6), 359–394. doi:10.1016/j.lrp.2012.10.001.

Beran R, Srivastava MS (1985). “Bootstrap Tests and Confidence Regions for Functions of a Covariance Matrix.” The Annals of Statistics, 13(1), 95–115. doi:10.1214/aos/1176346579.

Chin WW (1998). “Modern Methods for Business Research.” In Marcoulides GA (ed.), chapter The Partial Least Squares Approach to Structural Equation Modeling, 295–358. Mahwah, NJ: Lawrence Erlbaum.

Croon MA (2002). “Using predicted latent scores in general latent structure models.” In Marcoulides GA, Moustaki I (eds.), Latent Variable and Latent Structure Models, chapter 10, 195–224. Lawrence Erlbaum. ISBN 080584046X, Pagination: 288.

Dijkstra TK, Henseler J (2015). “Consistent and Asymptotically Normal PLS Estimators for Linear Structural Equations.” Computational Statistics & Data Analysis, 81, 10–23.

Dijkstra TK, Schermelleh-Engel K (2014). “Consistent Partial Least Squares For Nonlinear Structural Equation Models.” Psychometrika, 79(4), 585–604.

Hair JF, Hult GTM, Ringle C, Sarstedt M (2016). A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM). Sage publications.

Henseler J, Ringle CM, Sarstedt M (2016). “Testing Measurement Invariance of Composites Using Partial Least Squares.” International Marketing Review, 33(3), 405–431. doi:10.1108/imr-09-2014-0304.

Hwang H, Takane Y, Jung K (2017). “Generalized structured component analysis with uniqueness terms for accommodating measurement error.” Frontiers in Psychology, 8(2137), 1–12.

Joereskog KG, Wold HO (1982). Systems under Indirect Observation: Causality, Structure, Prediction - Part II, volume 139. North Holland.

Liengaard BD, Sharma PN, Hult GTM, Jensen MB, Sarstedt M, Hair JF, Ringle CM (2021). “Prediction: Coveted, Yet Forsaken? Introducing a Cross-Validated Predictive Ability Test in Partial Least Squares Path Modeling.” Decision Sciences, 52(2), 362–392.

Ringle CM, Sarstedt M, Straub D (2012). “A Critical Look at the Use of PLS-SEM in MIS Quarterly.” MIS Quarterly, 36(1), iii–xiv.

Schuberth F, Rademaker ME, Henseler J (2020). “Estimating and assessing second-order constructs using PLS-PM: the case of composites of composites.” Industrial Management & Data Systems, 120(12), 2211-2241. doi:10.1108/imds-12-2019-0642.

Spiller SA, Fitzsimons GJ, Lynch JG, Mcclelland GH (2013). “Spotlights, Floodlights, and the Magic Number Zero: Simple Effects Tests in Moderated Regression.” Journal of Marketing Research, 50(2), 277–288. doi:10.1509/jmr.12.0420.

Van Riel ACR, Henseler J, Kemeny I, Sasovova Z (2017). “Estimating hierarchical constructs using Partial Least Squares: The case of second order composites of factors.” Industrial Management & Data Systems, 117(3), 459–477. doi:10.1108/IMDS-07-2016-0286.

Examples

# ===========================================================================
# Basic usage
# ===========================================================================
### Linear model ------------------------------------------------------------
# Most basic usage requires a dataset and a model. We use the 
#  `threecommonfactors` dataset. 

## Take a look at the dataset
#?threecommonfactors

## Specify the (correct) model
model <- "
# Structural model
eta2 ~ eta1
eta3 ~ eta1 + eta2

# (Reflective) measurement model
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"

## Estimate
res <- csem(threecommonfactors, model)

## Postestimation
verify(res)
summarize(res)  
assess(res)

# Notes: 
#   1. By default no inferential quantities (e.g. Std. errors, p-values, or
#      confidence intervals) are calculated. Use resampling to obtain
#      inferential quantities. See "Resampling" in the "Extended usage"
#      section below.
#   2. `summarize()` prints the full output by default. For a more condensed
#       output use:
print(summarize(res), .full_output = FALSE)

## Dealing with endogeneity -------------------------------------------------

# See: ?testHausman()

### Models containing second constructs--------------------------------------
## Take a look at the dataset
#?dgp_2ndorder_cf_of_c

model <- "
# Path model / Regressions 
c4   ~ eta1
eta2 ~ eta1 + c4

# Reflective measurement model
c1   <~ y11 + y12 
c2   <~ y21 + y22 + y23 + y24
c3   <~ y31 + y32 + y33 + y34 + y35 + y36 + y37 + y38
eta1 =~ y41 + y42 + y43
eta2 =~ y51 + y52 + y53

# Composite model (second order)
c4   =~ c1 + c2 + c3
"

res_2stage <- csem(dgp_2ndorder_cf_of_c, model, .approach_2ndorder = "2stage")
res_mixed  <- csem(dgp_2ndorder_cf_of_c, model, .approach_2ndorder = "mixed")

# The standard repeated indicators approach is done by 1.) respecifying the model
# and 2.) adding the repeated indicators to the data set

# 1.) Respecify the model
model_RI <- "
# Path model / Regressions 
c4   ~ eta1
eta2 ~ eta1 + c4
c4   ~ c1 + c2 + c3

# Reflective measurement model
c1   <~ y11 + y12 
c2   <~ y21 + y22 + y23 + y24
c3   <~ y31 + y32 + y33 + y34 + y35 + y36 + y37 + y38
eta1 =~ y41 + y42 + y43
eta2 =~ y51 + y52 + y53

# c4 is a common factor measured by composites
c4 =~ y11_temp + y12_temp + y21_temp + y22_temp + y23_temp + y24_temp +
      y31_temp + y32_temp + y33_temp + y34_temp + y35_temp + y36_temp + 
      y37_temp + y38_temp
"

# 2.) Update data set
data_RI <- dgp_2ndorder_cf_of_c
coln <- c(colnames(data_RI), paste0(colnames(data_RI), "_temp"))
data_RI <- data_RI[, c(1:ncol(data_RI), 1:ncol(data_RI))]
colnames(data_RI) <- coln

# Estimate
res_RI <- csem(data_RI, model_RI)
summarize(res_RI)

### Multigroup analysis -----------------------------------------------------

# See: ?testMGD()

# ===========================================================================
# Extended usage
# ===========================================================================
# `csem()` provides defaults for all arguments except `.data` and `.model`.
#   Below some common options/tasks that users are likely to be interested in.
#   We use the threecommonfactors data set again:

model <- "
# Structural model
eta2 ~ eta1
eta3 ~ eta1 + eta2

# (Reflective) measurement model
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"

### PLS vs PLSc and disattenuation
# In the model all concepts are modeled as common factors. If 
#   .approach_weights = "PLS-PM", csem() uses PLSc to disattenuate composite-indicator 
#   and composite-composite correlations.
res_plsc <- csem(threecommonfactors, model, .approach_weights = "PLS-PM")
res$Information$Model$construct_type # all common factors

# To obtain "original" (inconsistent) PLS estimates use `.disattenuate = FALSE`
res_pls <- csem(threecommonfactors, model, 
                .approach_weights = "PLS-PM",
                .disattenuate = FALSE
                )

s_plsc <- summarize(res_plsc)
s_pls  <- summarize(res_pls)

# Compare
data.frame(
  "Path"      = s_plsc$Estimates$Path_estimates$Name,
  "Pop_value" = c(0.6, 0.4, 0.35), # see ?threecommonfactors
  "PLSc"      = s_plsc$Estimates$Path_estimates$Estimate,
  "PLS"       = s_pls$Estimates$Path_estimates$Estimate
  )

### Resampling --------------------------------------------------------------
## Not run: 
## Basic resampling
res_boot <- csem(threecommonfactors, model, .resample_method = "bootstrap")
res_jack <- csem(threecommonfactors, model, .resample_method = "jackknife")

# See ?resamplecSEMResults for more examples

### Choosing a different weightning scheme ----------------------------------

res_gscam  <- csem(threecommonfactors, model, .approach_weights = "GSCA")
res_gsca   <- csem(threecommonfactors, model, 
                   .approach_weights = "GSCA",
                   .disattenuate = FALSE
)

s_gscam <- summarize(res_gscam)
s_gsca  <- summarize(res_gsca)

# Compare
data.frame(
  "Path"      = s_gscam$Estimates$Path_estimates$Name,
  "Pop_value" = c(0.6, 0.4, 0.35), # see ?threecommonfactors
  "GSCAm"      = s_gscam$Estimates$Path_estimates$Estimate,
  "GSCA"       = s_gsca$Estimates$Path_estimates$Estimate
)
## End(Not run)
### Fine-tuning a weighting scheme ------------------------------------------
## Setting starting values

sv <- list("eta1" = c("y12" = 10, "y13" = 4, "y11" = 1))
res <- csem(threecommonfactors, model, .starting_values = sv)

## Choosing a different inner weighting scheme 
#?args_csem_dotdotdot

res <- csem(threecommonfactors, model, .PLS_weight_scheme_inner = "factorial",
            .PLS_ignore_structural_model = TRUE)


## Choosing different modes for PLS
# By default, concepts modeled as common factors uses PLS Mode A weights.
modes <- list("eta1" = "unit", "eta2" = "modeB", "eta3" = "unit")
res   <- csem(threecommonfactors, model, .PLS_modes = modes)
summarize(res)

cSEMArguments

Description

An alphabetical list of all arguments used by functions of the cSEM package including their description and defaults. Mainly used for internal purposes (parameter inheritance). To list all arguments and their defaults, use args_default(). To list all arguments and their possible choices, use args_default(.choices = TRUE).

Arguments

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

.absolute

Logical. Should the absolute HTMT values be returned? Defaults to TRUE .

.approach_gcca

Character string. The Kettenring approach to use for GCCA. One of "SUMCORR", "MAXVAR", "SSQCORR", "MINVAR" or "GENVAR". Defaults to "SUMCORR".

.approach_2ndorder

Character string. Approach used for models containing second-order constructs. One of: "2stage", or "mixed". Defaults to "2stage".

.approach_alpha_adjust

Character string. Approach used to adjust the significance level to accommodate multiple testing. One of "none" or "bonferroni". Defaults to "none".

.approach_cor_robust

.approach_mgd

Character string or a vector of character strings. Approach used for the multi-group comparison. One of: "all", "Klesel", "Chin", "Sarstedt", "Keil, "Nitzl", "Henseler", "CI_para", or "CI_overlap". Default to "all" in which case all approaches are computed (if possible).

.approach_nl

Character string. Approach used to estimate nonlinear structural relationships. One of: "sequential" or "replace". Defaults to "sequential".

.approach_predict

Character string. Which approach should be used to predictions? One of "earliest" and "direct". If "earliest" predictions for indicators associated to endogenous constructs are performed using only indicators associated to exogenous constructs. If "direct", predictions for indicators associated to endogenous constructs are based on indicators associated to their direct antecedents. Defaults to "earliest".

.approach_p_adjust

Character string or a vector of character strings. Approach used to adjust the p-value for multiple testing. See the methods argument of stats::p.adjust() for a list of choices and their description. Defaults to "none".

.approach_paths

Character string. Approach used to estimate the structural coefficients. One of: "OLS" or "2SLS". If "2SLS", instruments need to be supplied to .instruments. Defaults to "OLS".

.approach_score_benchmark

Character string. How should the aggregation of the estimates of the truncated normal distribution be done for the benchmark predictions? Ignored if not OrdPLS or OrdPLSc is used to obtain benchmark predictions. One of "mean", "median", "mode" or "round". If "round", the benchmark predictions are obtained using the traditional prediction algorithm for PLS-PM which are rounded for categorical indicators. If "mean", the mean of the estimated endogenous indicators is calculated. If "median", the mean of the estimated endogenous indicators is calculated. If "mode", the maximum empirical density on the intervals defined by the thresholds is used. If .treat_as_continuous = TRUE or if all indicators are on a continuous scale, .approach_score_benchmark is ignored. Defaults to "round".

.approach_score_target

Character string. How should the aggregation of the estimates of the truncated normal distribution for the predictions using OrdPLS/OrdPLSc be done? One of "mean", "median" or "mode". If "mean", the mean of the estimated endogenous indicators is calculated. If "median", the mean of the estimated endogenous indicators is calculated. If "mode", the maximum empirical density on the intervals defined by the thresholds is used. Defaults to "mean".

.approach_weights

.args_used

A list of function argument names whose value was modified by the user.

.attrbutes

Character string. Variables used as attributes in IPMA.

.benchmark

Character string. The procedure to obtain benchmark predictions. One of "lm", "unit", "PLS-PM", "GSCA", "PCA", "MAXVAR", or "NA". Default to "lm".

.bias_corrected

Logical. Should the standard and the tStat confidence interval be bias-corrected using the bootstrapped bias estimate? If TRUE the confidence interval for some estimated parameter theta is centered at ⁠2*theta - theta*_hat⁠, where ⁠theta*_hat⁠ is the average over all .R bootstrap estimates of theta. Defaults to TRUE

.by_equation

Should the criteria be computed for each structural model equation separately? Defaults to TRUE.

.C

A (J x J) composite variance-covariance matrix.

.check_errors

Logical. Should the model to parse be checked for correctness in a sense that all necessary components to estimate the model are given? Defaults to TRUE.

.choices

Logical. Should candidate values for the arguments be returned? Defaults to FALSE.

.ci

A vector of character strings naming the confidence interval to compute. For possible choices see infer().

.ci_colnames

Internal argument used by several print helper functions.

.closed_form_ci

Logical. Should a closed-form confidence interval be computed? Defaults to FALSE.

.conv_criterion

Character string. The criterion to use for the convergence check. One of: "diff_absolute", "diff_squared", or "diff_relative". Defaults to "diff_absolute".

.csem_model

A (possibly incomplete) cSEMModel-list.

.csem_resample

A list resulting from a call to resamplecSEMResults().

.cv_folds

Integer. The number of cross-validation folds to use. Setting .cv_folds to N (the number of observations) produces leave-one-out cross-validation samples. Defaults to 10.

.data

.dependent

Character string. The name of the dependent variable.

.disattenuate

Logical. Should composite/proxy correlations be disattenuated to yield consistent loadings and path estimates if at least one of the construct is modeled as a common factor? Defaults to TRUE.

.dist

Character string. The distribution to use for the critical value. One of "t" for Student's t-distribution or "z" for the standard normal distribution. Defaults to "z".

.distance

Character string. A distance measure. One of: "geodesic" or "squared_euclidean". Defaults to "geodesic".

.df

Character string. The method for obtaining the degrees of freedom. Choices are "type1" and "type2". Defaults to "type1" .

.dominant_indicators

.E

A (J x J) matrix of inner weights.

.effect

Internal argument used by helper printEffects().

.estimate_structural

Logical. Should the structural coefficients be estimated? Defaults to TRUE.

.eval_plan

Character string. The evaluation plan to use. One of "sequential", "multicore", or "multisession". In the two latter cases all available cores will be used. Defaults to "sequential".

.filename

Character string. The file name.

.first_resample

A list containing the .R resamples based on the original data obtained by resamplecSEMResults().

.fit_measures

Logical. (EXPERIMENTAL) Should additional fit measures be included? Defaults to FALSE.

.force

Logical. Should .object be resampled even if it contains resamples already?. Defaults to FALSE.

.full_output

Logical. Should the full output of summarize be printed. Defaults to TRUE.

.graph_attrs

Character string. Additional attributes that should be passed to the DiagrammeR syntax, e.g., c("rankdir=LR", "ranksep=1.0"). Defaults to c("rankdir=LR").

.H

The (N x J) matrix of construct scores.

.handle_inadmissibles

.id

Character string or integer. A character string giving the name or an integer of the position of the column of .data whose levels are used to split .data into groups. Defaults to NULL.

.inference

Logical. Should critical values be computed? Defaults to FALSE.

.independent

Character string. The name of the independent variable.

.instruments

.iter_max

.level

Character. Used in plot.cSEMIPMA to indicate whether IPMA should be done for constructs or indicators.

.matrix1

A matrix to compare.

.matrix2

A matrix to compare.

.matrices

A list of at least two matrices.

.metrics

Character string or a vector of character strings. Which prediction metrics should be displayed? One of: "MAE", "RMSE", "Q2", "MER", "MAPE, "MSE2", "U1", "U2", "UM", "UR", or "UD". Default to c("MAE", "RMSE", "Q2").

.model

A model in lavaan model syntax or a cSEMModel list.

.moderator

Character string. The name of the moderator variable.

.modes

A vector giving the mode for each construct in the form "name" = "mode". Only used internally.

.ms_criterion

Character string. Either a single character string or a vector of character strings naming the model selection criterion to compute. Defaults to "all".

.n

Integer. The number of observations of the original data.

.n_steps

Integer. A value giving the number of steps (the spotlights, i.e., values of .moderator in surface analysis or floodlight analysis) between the minimum and maximum value of the moderator. Defaults to 100.

.normality

.nr_comparisons

Integer. The number of comparisons. Defaults to NULL.

.null_model

Logical. Should the degrees of freedom for the null model be computed? Defaults to FALSE.

.object

An R object of class cSEMResults resulting from a call to csem().

.object1

An R object of class cSEMResults resulting from a call to csem().

.object2

An R object of class cSEMResults resulting from a call to csem().

.only_common_factors

.only_structural

Should the the log-likelihood be based on the structural model? Ignored if .by_equation == TRUE. Defaults to TRUE.

.original_arguments

The list of arguments used within csem().

.output_type

Character string. The type of output to return. One of "complete" or "structured". See the Value section for details. Defaults to "complete".

.P

A (J x J) construct variance-covariance matrix (possibly disattenuated).

.parameters_to_compare

.path

Character string. Path of the directory to save the file to. Defaults to NULL.

.path_coefficients

List. A list that contains the resampled and the original path coefficient estimates. Typically a part of a cSEMResults_resampled object. Defaults to NULL.

.PLS_approach_cf

.plot_correlations

Character string. Specify which correlations should be plotted, i.e., between the exogenous constructs (exo), between the exogenous constructs and the indicators (all), or not at all (none). Defaults to exo.

.plot_labels

Logical. Whether to display edge labels. Defaults to TRUE.

.plot_package

Character string. Indicates which packages should be used for plotting.

.plot_significances

Logical. Should p-values in the form of stars be plotted? Defaults to TRUE.

.plot_structural_model_only

Logical. Should only the structural model, i.e., the constructs and their relationships be plotted? Defaults to FALSE.

.plot_type

Character string. Indicates the type of plot that is produced.

.PLS_ignore_structural_model

Logical. Should the structural model be ignored when calculating the inner weights of the PLS-PM algorithm? Defaults to FALSE. Ignored if .approach_weights is not PLS-PM.

.PLS_modes

.PLS_weight_scheme_inner

Character string. The inner weighting scheme used by PLS-PM. One of: "centroid", "factorial", or "path". Defaults to "path". Ignored if .approach_weight is not PLS-PM.

.probs

A vector of probabilities.

.postestimation_object

An object resulting from a call to one of cSEM's postestimation functions (e.g. summarize()).

.quality_criterion

.quantity

Character string. Which statistic should be returned? One of "all", "mean", "sd", "bias", "CI_standard_z", "CI_standard_t", "CI_percentile", "CI_basic", "CI_bc", "CI_bca", "CI_t_interval" Defaults to "all" in which case all quantities that do not require additional resampling are returned, i.e., all quantities but "CI_bca", "CI_t_interval".

.Q

A vector of composite-construct correlations with element names equal to the names of the J construct names used in the measurement model. Note Q^2 is also called the reliability coefficient.

.reliabilities

.resample_method

Character string. The resampling method to use. One of: "none", "bootstrap" or "jackknife". Defaults to "none".

.resample_method2

`.resample_object`

An R object of class cSEMResults_resampled obtained from resamplecSEMResults() or by setting .resample_method = "bootstrap" or "jackknife" when calling csem().

.resample_sarstedt

A matrix containing the parameter estimates that could potentially be compared and an id column indicating the group adherence of each row.

.r

Integer. The number of repetitions to use. Defaults to 1.

.R

Integer. The number of bootstrap replications. Defaults to 499.

.R2

Integer. The number of bootstrap replications to use when resampling from a resample. Defaults to 199.

.R_bootstrap

Integer. The number of bootstrap runs. Ignored if .object contains resamples. Defaults to 499

.R_permutation

Integer. The number of permutations. Defaults to 499

.S

The (K x K) empirical indicator correlation matrix.

.saturated

Logical. Should a saturated structural model be used? Defaults to FALSE.

.second_resample

A list containing .R2 resamples for each of the .R resamples of the first run.

.seed

.sign_change_option

.sim_points

Integer. How many samples from the truncated normal distribution should be simulated to estimate the exogenous construct scores? Defaults to "100".

.stage

Character string. The stage the model is needed for. One of "first" or "second". Defaults to "first".

.standardized

Logical. Should standardized scores be returned? Defaults to TRUE.

.starting_values

.steps_mod

A numeric vector. Steps used for the moderator variable in calculating the simple effects of an independent variable on the dependent variable. Defaults to NULL.

.terms

A vector of construct names to be classified.

.test_data

A matrix of test data with the same column names as the training data.

.testtype

Character string. One of "twosided" (H1: The models do not perform equally in predicting indicators belonging to endogenous constructs)" and onesided" (H1: Model 1 performs better in predicting indicators belonging

.title

Character string. Title of an object. Defaults to "".

.tolerance

Double. The tolerance criterion for convergence. Defaults to 1e-05.

.treat_as_continuous

Logical. Should the indicators for the benchmark predictions be treated as continuous? If TRUE all indicators are treated as continuous and PLS-PM/PLSc is applied. If FALSE OrdPLS/OrdPLSc is applied. Defaults to TRUE.

.type_gfi

.type_ci

Character string. Which confidence interval should be calculated? For possible choices, see the .quantity argument of the infer() function. Only used if .approch_mgd is one of "CI_para" or "CI_overlap". Ignored otherwise. Defaults to "CI_percentile".

.type_htmt

Character string indicating the type of HTMT that should be calculated, i.e., the original HTMT ("htmt") or the HTMT2 ("htmt2"). Defaults to "htmt"

.type_vcv

Character string. Which model-implied correlation matrix should be calculated? One of "indicator" or "construct". Defaults to "indicator".

.verbose

Logical. Should information (e.g., progress bar) be printed to the console? Defaults to TRUE.

.user_funs

.value_independent

Integer. Only required for floodlight analysis; The value of the independent variable in case that it appears as a higher-order term.

.values_moderator

A numeric vector. The values of the moderator in a the simple effects analysis. Typically these are difference from the mean (=0) measured in standard deviations. Defaults to c(-2, -1, 0, 1, 2).

.vcv_asymptotic

Logical. Should the asymptotic variance-covariance matrix be used, i.e., VCV(b0) - VCV(b1)= VCV(b1-b0), or should VCV(b1-b0) be computed directly? Defaults to FALSE.

.vector1

A vector of numeric values.

.vector2

A vector of numeric values.

.W

A (J x K) matrix of weights.

.what

Internal argument used by several print helper functions.

.W_new

A (J x K) matrix of weights.

.W_old

A (J x K) matrix of weights.

.weighted

Logical. Should estimation be based on a score that uses the weights of the weight approach used to obtain .object?. Defaults to FALSE.

.X

A matrix of processed data (scaled, cleaned and ordered).

.X_cleaned

A data.frame of processed data (cleaned and ordered). Note: X_cleaned may not be scaled!

cSEMModel

Description

cSEMModel

Details

A standardized list containing model-related information. To convert a a model written in lavaan model syntax to a cSEMModel list use parseModel().

Value

An object of class cSEMModel is a standardized list containing the following components. J stands for the number of constructs and K for the number of indicators.

⁠$structural⁠: A matrix mimicking the structural relationship between constructs. If constructs are only linearly related, structural is of dimension (J x J) with row- and column names equal to the construct names. If the structural model contains nonlinear relationships structural is (J x (J + J*)) where J* is the number of nonlinear terms. Rows are ordered such that exogenous constructs are always first, followed by constructs that only depend on exogenous constructs and/or previously ordered constructs.
⁠$measurement⁠: A (J x K) matrix mimicking the measurement/composite relationship between constructs and their related indicators. Rows are in the same order as the matrix ⁠$structural⁠ with row names equal to the construct names. The order of the columns is such that ⁠$measurement⁠ forms a block diagonal matrix.
⁠$error_cor⁠: A (K x K) matrix mimicking the measurement error correlation relationship. The row and column order is identical to the column order of ⁠$measurement⁠.
⁠$cor_specified⁠: A matrix indicating the correlation relationships between any variables of the model as specified by the user. Mainly for internal purposes. Note that ⁠$cor_specified⁠ may also contain inadmissible correlations such as a correlation between measurement errors indicators and constructs.
⁠$construct_type⁠: A named vector containing the names of each construct and their respective type ("Common factor" or "Composite").
⁠$construct_order⁠: A named vector containing the names of each construct and their respective order ("First order" or "Second order").
⁠$model_type⁠: The type of model ("Linear" or "Nonlinear").
⁠$instruments⁠: Only if instruments are supplied: a list of structural equations relating endogenous RHS variables to instruments.
⁠$indicators⁠: The names of the indicators (i.e., observed variables and/or first-order constructs)
⁠$cons_exo⁠: The names of the exogenous constructs of the structural model (i.e., variables that do not appear on the LHS of any structural equation)
⁠$cons_endo⁠: The names of the endogenous constructs of the structural model (i.e., variables that appear on the LHS of at least one structural equation)
⁠$vars_2nd⁠: The names of the constructs modeled as second orders.
⁠$vars_attached_to_2nd⁠: The names of the constructs forming or building a second order construct.
⁠$vars_not_attached_to_2nd⁠: The names of the constructs not forming or building a second order construct.

It is possible to supply an incomplete list to parseModel(), resulting in an incomplete cSEMModel list which can be passed to all functions that require .csem_model as a mandatory argument. Currently, only the structural and the measurement matrix are required. However, specifying an incomplete cSEMModel list may lead to unexpected behavior and errors. Use with care.

cSEMResults

Description

A call to csem() results in an object with at least two class attributes. The first class attribute is always cSEMResults no matter the type of data or model provided. The second is one of cSEMResults_default, cSEMResults_multi, or cSEMResults_2ndorder and depends on the estimated model and/or the type of data provided to the .model and .data arguments of csem(). The third class attribute cSEMResults_resampled is only added if resampling was conducted.

Details

Depending on the type of data and/or model provided three different output types exists.

_default

This will be the structure for the vast majority of applications. If the data is a single matrix or data.frame with no id-column, the result is a list with elements:

⁠$Estimates⁠: A list containing a list of estimated quantities.
⁠$Information⁠: A list containing a list of additional information.

The resulting object has classes cSEMResults and cSEMResults_default.

_multi

If the data provided is a single matrix or data.frame containing an id-column to split the data by G group levels or if a list of G datasets is provided, the resulting object is a list of G lists, where G is equal to the number of groups or the number of datasets in the list of datasets provided. Each of the G list elements is itself a cSEMResults_default object. Hence its structure is identical to the structure described in ⁠_default⁠.

The resulting object has classes cSEMResults and cSEMResults_multi.

_2ndorder

A special output is generated if the model to estimate contains hierarchical constructs and the "2stage" or "mixed" approach is used to estimate the model. In this case the resulting object is a list containing two elements First_stage and Second_stage.

Each list element is itself a cSEMResults_default object. Hence its structure is identical to the structure described in ⁠_default⁠.

If .resample_method = "bootstrap" or .resample_method = "jackknife", resamples are attached to each object. For objects of class cSEMResults_default the resamples are attached to .object$Estimates$Estimates_resample. For objects of class cSEMResults_multi the same is done by group. For objects of class cSEMResults_2ndorder the resamples are attached to the .object$Second_stage$Information$Resamples. All objects containing these elements gain the cSEMResults_resampled class.

cSEMSummarize

Description

cSEMSummarize

Value

An object of class cSEMSummary. Technically cSEMSummary is a named list containing the following list elements:

'...: Not finished yet.

cSEMTest

Description

cSEMTest

Value

A standardized list of class cSEMTest. Technically cSEMTest is a named list containing the following list elements:

⁠$Test_statistic⁠: The value of test statistic(s).
⁠$Critical_value⁠: The critical value(s).
⁠$Decision⁠: The test decision. One of: Reject or Do not reject
⁠$Number_admissibles⁠: The number of admissible runs. See verify() for what constitutes and inadmissible run.

Data: Second order common factor of composites

Description

A dataset containing 500 standardized observations on 19 indicator generated from a population model with 6 concepts, three of which (c1-c3) are composites forming a second order common factor (c4). The remaining two (eta1, eta2) are concepts modeled as common factors .

Usage

dgp_2ndorder_cf_of_c

Format

A matrix with 500 rows and 19 variables:

y11-y12: Indicators attached to c1. Population weights are: 0.8; 0.4. Population loadings are: 0.925; 0.65
y21-y24: Indicators attached to c2. Population weights are: 0.5; 0.3; 0.2; 0.4. Population loadings are: 0.804; 0.68; 0.554; 0.708
y31-y38: Indicators attached to c3. Population weights are: 0.3; 0.3; 0.1; 0.1; 0.2; 0.3; 0.4; 0.2. Population loadings are: 0.496; 0.61; 0.535; 0.391; 0.391; 0.6; 0.5285; 0.53
y41-y43: Indicators attached to eta1. Population loadings are: 0.8; 0.7; 0.7
y51-y53: Indicators attached to eta1. Population loadings are: 0.8; 0.8; 0.7

The model is:

`c4` = gamma1 * `eta1` + zeta1

`eta2` = gamma2 * `eta1` + beta * `c4` + zeta2

with population values gamma1 = 0.6, gamma2 = 0.4 and beta = 0.35. The second order common factor is

`c4` = lambdac1 * `c1` + lambdac2 * `c2` + lambdac3 * `c3` + epsilon

Calculate difference between S and Sigma_hat

Description

Calculate the difference between the empirical (S) and the model-implied indicator variance-covariance matrix (Sigma_hat) using different distance measures.

Usage

calculateDG(
  .object = NULL,
  .matrix1 = NULL,
  .matrix2 = NULL,
  .saturated = FALSE,
  ...
)

calculateDL(
  .object = NULL,
  .matrix1 = NULL,
  .matrix2 = NULL,
  .saturated = FALSE,
  ...
)

calculateDML(
  .object = NULL,
  .matrix1 = NULL,
  .matrix2 = NULL,
  .saturated = FALSE,
  ...
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.matrix1

A matrix to compare.

.matrix2

A matrix to compare.

.saturated

Logical. Should a saturated structural model be used? Defaults to FALSE.

...

Ignored.

Details

The distances may also be computed for any two matrices A and B by supplying A and B directly via the .matrix1 and .matrix2 arguments. If A and B are supplied .object is ignored.

Value

A single numeric value giving the distance between two matrices.

Functions

calculateDG(): The geodesic distance (dG).
calculateDL(): The squared Euclidean distance
calculateDML(): The distance measure (fit function) used by ML

Do an importance-performance matrix analysis

Description

Usage

doIPMA(.object)

Arguments

.object

A cSEMResults object.'

Details

Performs an importance-performance matrix analysis (IPMA).

To calculate the performance and importance, the weights of the indicators are unstandardized using the standard deviation of the original indicators but normed to have a length of 1. Normed construct scores are calculated based on the original indicators and the unstandardized weights.

The importance is calculated as the mean of the original indicators or the unstandardized construct scores, respectively. The performance is calculated as the unstandardized total effect if .level == "construct" and as the normed weight times the unstandardized total effect if .level == "indicator". The literature recommends to use an estimation as input for 'doIPMA() that is based on normed indicators, e.g., by scaling all indicators to 0 to 100, see e.g., Henseler (2021); Ringle and Sarstedt (2016).

Note, indicators are not normed internally, as theoretical maximum/minimum can differ from the empirical maximum/minimum which would lead to an incorrect normalization.

Value

A list of class cSEMIPA with a corresponding method for plot(). See: plot.cSEMIPMA().

Do a nonlinear effects analysis

Description

Usage

doNonlinearEffectsAnalysis(
 .object            = NULL,
 .dependent         = NULL, 
 .independent       = NULL,
 .moderator         = NULL,
 .n_steps           = 100,
 .values_moderator  = c(-2, -1, 0, 1, 2),
 .value_independent = 0,
 .alpha             = 0.05
 )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.dependent

Character string. The name of the dependent variable.

.independent

Character string. The name of the independent variable.

.moderator

Character string. The name of the moderator variable.

.n_steps

.values_moderator

A numeric vector. The values of the moderator in a the simple effects analysis. Typically these are difference from the mean (=0) measured in standard deviations. Defaults to c(-2, -1, 0, 1, 2).

.value_independent

Integer. Only required for floodlight analysis; The value of the independent variable in case that it appears as a higher-order term.

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

Details

Calculate the expected value of the dependent variable conditional on the values of an independent variables and a moderator variable. All other variables in the model are assumed to be zero, i.e., they are fixed at their mean levels. Moreover, it produces the input for the floodlight analysis.

Value

A list of class cSEMNonlinearEffects with a corresponding method for plot(). See: plot.cSEMNonlinearEffects().

Examples

## Not run: 
model_Int <- "
# Measurement models
INV =~ INV1 + INV2 + INV3 +INV4
SAT =~ SAT1 + SAT2 + SAT3
INT =~ INT1 + INT2

# Structrual model containing an interaction term.
INT ~ INV + SAT + INV.SAT
"
  
# Estimate model
out <- csem(.data = Switching, .model = model_Int,
            # ADANCO settings
            .PLS_weight_scheme_inner = 'factorial',
            .tolerance = 1e-06,
            .resample_method = 'bootstrap'
)
  
# Do nonlinear effects analysis
neffects <- doNonlinearEffectsAnalysis(out, 
                                       .dependent = 'INT',
                                       .moderator = 'INV',
                                       .independent = 'SAT') 

# Get an overview
neffects

# Simple effects plot
plot(neffects, .plot_type = 'simpleeffects')

# Surface plot using plotly
plot(neffects, .plot_type = 'surface', .plot_package = 'plotly')

# Surface plot using persp
plot(neffects, .plot_type = 'surface', .plot_package = 'persp')

# Floodlight analysis
plot(neffects, .plot_type = 'floodlight')

## End(Not run)

Do a redundancy analysis

Description

Usage

doRedundancyAnalysis(.object = NULL)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

Details

Perform a redundancy analysis (RA) as proposed by Hair et al. (2016) with reference to Chin (1998).

RA is confined to PLS-PM, specifically PLS-PM with at least one construct whose weights are obtained by mode B. In cSEM this is the case if the construct is modeled as a composite or if argument .PLS_modes was explicitly set to mode B for at least one construct. Hence RA is only conducted if .approach_weights = "PLS-PM" and if at least one construct's mode is mode B.

The principal idea of RA is to take two different measures of the same construct and regress the scores obtained for each measure on each other. If they are similar they are likely to measure the same "thing" which is then taken as evidence that both measurement models actually measure what they are supposed to measure (validity).

There are several issues with the terminology and the reasoning behind this logic. RA is therefore only implemented since reviewers are likely to demand its computation, however, its actual application for validity assessment is discouraged.

Currently, the function is not applicable to models containing second-order constructs.

Value

A named numeric vector of correlations. If the weighting approach used to obtain .object is not "PLS-PM" or non of the PLS outer modes was mode B, the function silently returns NA.

References

Chin WW (1998). “Modern Methods for Business Research.” In Marcoulides GA (ed.), chapter The Partial Least Squares Approach to Structural Equation Modeling, 295–358. Mahwah, NJ: Lawrence Erlbaum.

Hair JF, Hult GTM, Ringle C, Sarstedt M (2016). A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM). Sage publications.

Internal: Estimate the structural coefficients

Description

Estimates the coefficients of the structural model (nonlinear and linear) using OLS, 2SLS. The latter currently work for linear models only.

Usage

estimatePath(
 .approach_nl      = args_default()$.approach_nl,
 .approach_paths   = args_default()$.approach_paths,
 .csem_model       = args_default()$.csem_model,
 .H                = args_default()$.H,
 .normality        = args_default()$.normality,
 .P                = args_default()$.P,
 .Q                = args_default()$.Q
 )

Arguments

.approach_nl

Character string. Approach used to estimate nonlinear structural relationships. One of: "sequential" or "replace". Defaults to "sequential".

.approach_paths

Character string. Approach used to estimate the structural coefficients. One of: "OLS" or "2SLS". If "2SLS", instruments need to be supplied to .instruments. Defaults to "OLS".

.csem_model

A (possibly incomplete) cSEMModel-list.

.H

The (N x J) matrix of construct scores.

.normality

.P

A (J x J) construct variance-covariance matrix (possibly disattenuated).

.Q

A vector of composite-construct correlations with element names equal to the names of the J construct names used in the measurement model. Note Q^2 is also called the reliability coefficient.

Value

A named list containing the estimated structural coefficients, the R2, the adjusted R2, and the VIFs for each regression.

Export to Excel (.xlsx)

Description

Usage

exportToExcel(
  .postestimation_object = NULL, 
  .filename              = "results.xlsx",
  .path                  = NULL
  )

Arguments

.postestimation_object

An object resulting from a call to one of cSEM's postestimation functions (e.g. summarize()).

.filename

Character string. The file name.

.path

Character string. Path of the directory to save the file to. Defaults to NULL.

Details

Export results from postestimation functions assess(), predict(), summarize() and testOMF() to an .xlsx (Excel) file. The function uses the openxlsx package which does not depend on Java!

The function is deliberately kept simple: it takes all the relevant elements in .postestimation_object and writes them (worksheet by worksheet) into an .xlsx file named .filename in the directory given by .path (the current working directory by default).

If .postestimation_object has class attribute ⁠_2ndorder⁠ two .xlsx files named ".filename_first_stage.xlsx" and ".filename_second_stage.xlsx" are created. If .postestimation_object is a list of appropriate objects, one file for each list elements is created.

Note: rerunning exportToExcel() without changing .filename and .path overwrites the file!

Internal: firstOrderMeasurementEdges

Description

Build measurement edges for a first–order model.

Usage

firstOrderMeasurementEdges(
    construct,
    weights,
    loadings,
    weight_p_values,
    loading_p_values,
    plot_signif,
    plot_labels,
    constructTypes
    )

Arguments

plot_labels

Logical. Whether to display edge labels. Defaults to TRUE.

Value

Character string containing DOT code.

Model-implied indicator or construct variance-covariance matrix

Description

Calculate the model-implied indicator or construct variance-covariance (VCV) matrix. Currently only the model-implied VCV for recursive linear models is implemented (including models containing second order constructs).

Usage

fit(
  .object    = NULL, 
  .saturated = args_default()$.saturated,
  .type_vcv  = args_default()$.type_vcv
  )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.saturated

Logical. Should a saturated structural model be used? Defaults to FALSE.

.type_vcv

Character string. Which model-implied correlation matrix should be calculated? One of "indicator" or "construct". Defaults to "indicator".

Details

Notation is taken from Bollen (1989). If .saturated = TRUE the model-implied variance-covariance matrix is calculated for a saturated structural model (i.e., the VCV of the constructs is replaced by their correlation matrix). Hence: V(eta) = WSW' (possibly disattenuated).

Value

Either a (K x K) matrix or a (J x J) matrix depending on the type_vcv.

References

Bollen KA (1989). Structural Equations with Latent Variables. Wiley-Interscience. ISBN 978-0471011712.

Model fit measures

Description

Calculate fit measures.

Usage

calculateChiSquare(.object, .saturated = FALSE)

calculateChiSquareDf(.object)

calculateCFI(.object)

calculateGFI(.object, .type_gfi = c("ML", "GLS", "ULS"), ...)

calculateCN(.object, .alpha = 0.05, ...)

calculateIFI(.object)

calculateNFI(.object)

calculateNNFI(.object)

calculateRMSEA(.object)

calculateRMSTheta(.object)

calculateSRMR(
  .object = NULL,
  .matrix1 = NULL,
  .matrix2 = NULL,
  .saturated = FALSE,
  ...
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.saturated

Logical. Should a saturated structural model be used? Defaults to FALSE.

.type_gfi

...

Ignored.

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

.matrix1

A matrix to compare.

.matrix2

A matrix to compare.

Details

See the Fit indices section of the cSEM website for details on the implementation.

Value

A single numeric value.

Functions

calculateChiSquare(): The chi square statistic.
calculateChiSquareDf(): The Chi square statistic divided by its degrees of freedom.
calculateCFI(): The comparative fit index (CFI).
calculateGFI(): The goodness of fit index (GFI).
calculateCN(): The Hoelter index alias Hoelter's (critical) N (CN).
calculateIFI(): The incremental fit index (IFI).
calculateNFI(): The normed fit index (NFI).
calculateNNFI(): The non-normed fit index (NNFI). Also called the Tucker-Lewis index (TLI).
calculateRMSEA(): The root mean square error of approximation (RMSEA).
calculateRMSTheta(): The root mean squared residual covariance matrix of the outer model residuals (RMS theta).
calculateSRMR(): The standardized root mean square residual (SRMR).

Internal: Composite-based SEM

Description

The central hub of the cSEM package. It acts like a foreman by collecting all (estimation) tasks, distributing them to lower level package functions, and eventually recollecting all of their results. It is called by csem() to manage the actual calculations. It may be called directly by the user, however, in most cases it will likely be more convenient to use csem() instead.

Usage

foreman(
  .data                        = args_default()$.data,
  .model                       = args_default()$.model,
  .approach_cor_robust         = args_default()$.approach_cor_robust,
  .approach_nl                 = args_default()$.approach_nl,
  .approach_paths              = args_default()$.approach_paths,
  .approach_weights            = args_default()$.approach_weights,
  .conv_criterion              = args_default()$.conv_criterion,
  .disattenuate                = args_default()$.disattenuate,
  .dominant_indicators         = args_default()$.dominant_indicators,
  .estimate_structural         = args_default()$.estimate_structural,
  .id                          = args_default()$.id,
  .instruments                 = args_default()$.instruments,
  .iter_max                    = args_default()$.iter_max,
  .normality                   = args_default()$.normality,
  .PLS_approach_cf             = args_default()$.PLS_approach_cf,
  .PLS_ignore_structural_model = args_default()$.PLS_ignore_structural_model,
  .PLS_modes                   = args_default()$.PLS_modes,
  .PLS_weight_scheme_inner     = args_default()$.PLS_weight_scheme_inner,
  .reliabilities               = args_default()$.reliabilities,
  .starting_values             = args_default()$.starting_values,
  .tolerance                   = args_default()$.tolerance
  )

Arguments

.data

.model

A model in lavaan model syntax or a cSEMModel list.

.approach_cor_robust

.approach_nl

Character string. Approach used to estimate nonlinear structural relationships. One of: "sequential" or "replace". Defaults to "sequential".

.approach_paths

Character string. Approach used to estimate the structural coefficients. One of: "OLS" or "2SLS". If "2SLS", instruments need to be supplied to .instruments. Defaults to "OLS".

.approach_weights

.conv_criterion

Character string. The criterion to use for the convergence check. One of: "diff_absolute", "diff_squared", or "diff_relative". Defaults to "diff_absolute".

.disattenuate

Logical. Should composite/proxy correlations be disattenuated to yield consistent loadings and path estimates if at least one of the construct is modeled as a common factor? Defaults to TRUE.

.dominant_indicators

.estimate_structural

Logical. Should the structural coefficients be estimated? Defaults to TRUE.

.id

Character string or integer. A character string giving the name or an integer of the position of the column of .data whose levels are used to split .data into groups. Defaults to NULL.

.instruments

.iter_max

.normality

.PLS_approach_cf

.PLS_ignore_structural_model

Logical. Should the structural model be ignored when calculating the inner weights of the PLS-PM algorithm? Defaults to FALSE. Ignored if .approach_weights is not PLS-PM.

.PLS_modes

.PLS_weight_scheme_inner

Character string. The inner weighting scheme used by PLS-PM. One of: "centroid", "factorial", or "path". Defaults to "path". Ignored if .approach_weight is not PLS-PM.

.reliabilities

.starting_values

.tolerance

Double. The tolerance criterion for convergence. Defaults to 1e-05.

Get construct scores

Description

Usage

getConstructScores(
 .object        = NULL,
 .standardized  = TRUE
 )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.standardized

Logical. Should standardized scores be returned? Defaults to TRUE.

Details

Get the standardized or unstandardized construct scores.

Value

A list of three with elements Construct_scores, W_used, Indicators_used.

Internal: Parameter names

Description

Based on a model in lavaan model syntax, returns the names of the parameters of the structural model, the measurement/composite model and the weight relationship. Used by testMGD() to extract the names of the parameters to compare across groups according to the test proposed by Chin and Dibbern (2010).

Usage

getParameterNames(
           .object  = args_default()$.object,
           .model   = args_default()$.model
  )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.model

A model in lavaan model syntax indicating which parameters (i.e, path (~), loadings (⁠=~⁠), or weights (⁠<~⁠)) should be compared across groups. Defaults to NULL in which case all parameters of the model are compared.

Value

A list with elements names_path, names_loadings, and names_weights containing the names of the structural parameters, the loadings, and the weight to compare across groups.

References

Internal: Extract relevant parameters from several cSEMResults_multi

Description

Extract the relevant parameters from a cSEMResult_multi object in .object.

Usage

getRelevantParameters(
  .object     = args_default()$.object,
  .model      = args_default()$.model
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.model

Value

A list of length equal to the number of groups in .object. Each list element is itself a list of three. The first list element contains the relevant parameter estimates of the structural model, the second list element the relevant estimated loadings, and the third the relevant estimated weights.

Internal: Helper for doNonlinearEffectsAnalysis()

Description

Function that calculates the values required for the floodlight analysis, namely 1) partial effect of the independent variable on the dependent variable for each bootstrap run and for the original estimation for each step of the moderator 2) alpha/2 and 1-alpha/2 quantile of the bootstrap estimates.

Usage

getValuesFloodlight(
  .model = NULL,
  .path_coefficients = args_default()$.path_coefficients,
  .dependent = args_default()$.dependent,
  .independent = args_default()$.independent,
  .moderator = args_default()$.moderator,
  .steps_mod = args_default()$.steps_mod,
  .value_independent = args_default()$.value_independent,
  .alpha = args_default()$.alpha
)

Arguments

.model

A model in lavaan model syntax or a cSEMModel list.

.path_coefficients

List. A list that contains the resampled and the original path coefficient estimates. Typically a part of a cSEMResults_resampled object. Defaults to NULL.

.dependent

Character string. The name of the dependent variable.

.independent

Character string. The name of the independent variable.

.moderator

Character string. The name of the moderator variable.

.steps_mod

A numeric vector. Steps used for the moderator variable in calculating the simple effects of an independent variable on the dependent variable. Defaults to NULL.

.value_independent

Integer. Only required for floodlight analysis; The value of the independent variable in case that it appears as a higher-order term.

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

Details

Only variables that comprise the independent variable are taken into account. If it contains a variable other than the independent variable and the moderator the effect is set to zero as the other variables are evaluated at their means (=0), hence the effect is zero.

Internal: get significance stars

Description

Transforms a p-value into stars.

Usage

get_significance_stars(
.pvalue
)

Details

.pvalue Numeric. A p-value that is transformed into significance stars.

Value

Character string. A p-value transformed into a star.

Internal: Handle arguments

Description

Internal helper function to handle arguments passed to any function within cSEM.

Usage

handleArgs(.args_used)

Arguments

.args_used

A list of argument names and user picked values.

Value

The args_default list, with default values changed to the values given by the user.

Inference

Description

Usage

infer(
 .object            = NULL,
 .quantity          = c("all", "mean", "sd", "bias", "CI_standard_z", 
                        "CI_standard_t", "CI_percentile", "CI_basic", 
                        "CI_bc", "CI_bca", "CI_t_interval"),
 .alpha             = 0.05,
 .bias_corrected    = TRUE
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.quantity

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

.bias_corrected

Details

Calculate common inferential quantities. For users interested in the estimated standard errors, t-values, p-values and/or confidences intervals of the path, weight or loading estimates, calling summarize() directly will usually be more convenient as it has a much more user-friendly print method. infer() is useful for comparing different confidence interval estimates.

infer() is a convenience wrapper around a number of internal functions that compute a particular inferential quantity, i.e., a value or set of values to be used in statistical inference.

cSEM relies on resampling (bootstrap and jackknife) as the basis for the computation of e.g., standard errors or confidence intervals. Consequently, infer() requires resamples to work. Technically, the cSEMResults object used in the call to infer() must therefore also have class attribute cSEMResults_resampled. If the object provided by the user does not contain resamples yet, infer() will obtain bootstrap resamples first. Naturally, computation will take longer in this case.

infer() does as much as possible in the background. Hence, every time infer() is called on a cSEMResults object the quantities chosen by the user are automatically computed for every estimated parameter contained in the object. By default all possible quantities are computed (.quantity = all). The following table list the available inferential quantities alongside a brief description. Implementation and terminology of the confidence intervals is based on Hesterberg (2015) and Davison and Hinkley (1997).

"mean", "sd": The mean or the standard deviation over all M resample estimates of a generic statistic or parameter.
"bias": The difference between the resample mean and the original estimate of a generic statistic or parameter.
"CI_standard_z" and "CI_standard_t": The standard confidence interval for a generic statistic or parameter with standard errors estimated by the resample standard deviation. While "CI_standard_z" assumes a standard normally distributed statistic, "CI_standard_t" assumes a t-statistic with N - 1 degrees of freedom.
"CI_percentile": The percentile confidence interval. The lower and upper bounds of the confidence interval are estimated as the alpha and 1-alpha quantiles of the distribution of the resample estimates.
"CI_basic": The basic confidence interval also called the reverse bootstrap percentile confidence interval. See Hesterberg (2015) for details.
"CI_bc": The bias corrected (Bc) confidence interval. See Davison and Hinkley (1997) for details.
"CI_bca": The bias-corrected and accelerated (Bca) confidence interval. Requires additional jackknife resampling to compute the influence values. See Davison and Hinkley (1997) for details.
"CI_t_interval": The "studentized" t-confidence interval. If based on bootstrap resamples the interval is also called the bootstrap t-interval confidence interval. See Hesterberg (2015) on page 381. Requires resamples of resamples. See resamplecSEMResults().

By default, all but the studendized t-interval confidence interval and the bias-corrected and accelerated confidence interval are calculated. The reason for excluding these quantities by default are that both require an additional resampling step. The former requires jackknife estimates to compute influence values and the latter requires double bootstrap. Both can potentially be time consuming. Hence, computation is triggered only if explicitly chosen.

Value

A list of class cSEMInfer.

References

Davison AC, Hinkley DV (1997). Bootstrap Methods and their Application. Cambridge University Press. doi:10.1017/cbo9780511802843.

Hesterberg TC (2015). “What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum.” The American Statistician, 69(4), 371–386. doi:10.1080/00031305.2015.1089789.

Examples

model <- "
# Structural model
QUAL ~ EXPE
EXPE ~ IMAG
SAT  ~ IMAG + EXPE + QUAL + VAL
LOY  ~ IMAG + SAT
VAL  ~ EXPE + QUAL

# Measurement model
EXPE =~ expe1 + expe2 + expe3 + expe4 + expe5
IMAG =~ imag1 + imag2 + imag3 + imag4 + imag5
LOY  =~ loy1  + loy2  + loy3  + loy4
QUAL =~ qual1 + qual2 + qual3 + qual4 + qual5
SAT  =~ sat1  + sat2  + sat3  + sat4
VAL  =~ val1  + val2  + val3  + val4
"
  
## Estimate the model with bootstrap resampling 
a <- csem(satisfaction, model, .resample_method = "bootstrap", .R = 20,
          .handle_inadmissibles = "replace")

## Compute inferential quantities
inf <- infer(a)

inf$Path_estimates$CI_basic
inf$Indirect_effect$sd

### Compute the bias-corrected and accelerated and/or the studentized t-inverval.
## For the studentied t-interval confidence interval a double bootstrap is required.
## This is pretty time consuming.
## Not run: 
  inf <- infer(a, .quantity = c("all", "CI_bca")) # requires jackknife estimates 
  
## Estimate the model with double bootstrap resampling:
# Notes:
#   1. The .resample_method2 arguments triggers a bootstrap of each bootstrap sample
#   2. The double bootstrap is is very time consuming, consider setting 
#      `.eval_plan = "multisession`. 
a1 <- csem(satisfaction, model, .resample_method = "bootstrap", .R = 499,
          .resample_method2 = "bootstrap", .R2 = 199, .handle_inadmissibles = "replace") 
infer(a1, .quantity = "CI_t_interval")
## End(Not run)

Internal: Helper for infer()

Description

Collection of various functions that compute an inferential quantity.

Usage

MeanResample(.first_resample)

SdResample(.first_resample, .resample_method, .n)

BiasResample(.first_resample, .resample_method, .n)

StandardCIResample(
  .first_resample,
  .bias_corrected,
  .dist = c("z", "t"),
  .df = c("type1", "type2"),
  .resample_method,
  .n,
  .probs
)

PercentilCIResample(.first_resample, .probs)

BasicCIResample(.first_resample, .bias_corrected, .probs)

TStatCIResample(
  .first_resample,
  .second_resample,
  .bias_corrected,
  .resample_method,
  .resample_method2,
  .n,
  .probs
)

BcCIResample(.first_resample, .probs)

BcaCIResample(.object, .first_resample, .probs)

Arguments

.first_resample

A list containing the .R resamples based on the original data obtained by resamplecSEMResults().

.resample_method

Character string. The resampling method to use. One of: "none", "bootstrap" or "jackknife". Defaults to "none".

.n

Integer. The number of observations of the original data.

.bias_corrected

.dist

Character string. The distribution to use for the critical value. One of "t" for Student's t-distribution or "z" for the standard normal distribution. Defaults to "z".

.df

Character string. The method for obtaining the degrees of freedom. Choices are "type1" and "type2". Defaults to "type1" .

.probs

A vector of probabilities.

.second_resample

A list containing .R2 resamples for each of the .R resamples of the first run.

.resample_method2

.object

An R object of class cSEMResults resulting from a call to csem().

Details

Implementation and terminology of the confidence intervals is based on Hesterberg (2015) and Davison and Hinkley (1997).

References

Internal: Calculate consistent moments of a nonlinear model

Description

Collection of various moment estimators. See classifyConstructs for a list of possible moments.

Usage

SingleSingle(.i, .j, .Q, .H)

SingleQuadratic(.i, .j, .Q, .H)

SingleCubic(.i, .j, .Q, .H)

SingleTwInter(.i, .j, .Q, .H)

SingleThrwInter(.i, .j, .Q, .H)

SingleQuadTwInter(.i, .j, .Q, .H)

QuadraticQuadratic(.i, .j, .Q, .H)

QuadraticCubic(.i, .j, .Q, .H)

QuadraticTwInter(.i, .j, .Q, .H)

QuadraticThrwInter(.i, .j, .Q, .H)

QuadraticQuadTwInter(.i, .j, .Q, .H)

CubicCubic(.i, .j, .Q, .H)

CubicTwInter(.i, .j, .Q, .H)

CubicThrwInter(.i, .j, .Q, .H)

CubicQuadTwInter(.i, .j, .Q, .H)

TwInterTwInter(.i, .j, .Q, .H)

TwInterThrwInter(.i, .j, .Q, .H)

TwInterQuadTwInter(.i, .j, .Q, .H)

ThrwInterThrwInter(.i, .j, .Q, .H)

ThrwInterQuadTwInter(.i, .j, .Q, .H)

QuadTwInercQuadTwInter(.i, .j, .Q, .H)

Arguments

.i

Row index

.j

Column index

.Q

A vector of composite-construct correlations with element names equal to the names of the J construct names used in the measurement model. Note Q^2 is also called the reliability coefficient.

.H

The (N x J) matrix of construct scores.

Details

M is the matrix of the sample counterparts (estimates) of the left-hand side terms in Equation (21) - (24) (Dijkstra and Schermelleh-Engel 2014). The label "M" did not appear in the paper and is only used in the package. Similar is suggested by Wall and Amemiya (2000) using classical factor scores.

References

Dijkstra TK, Schermelleh-Engel K (2014). “Consistent Partial Least Squares For Nonlinear Structural Equation Models.” Psychometrika, 79(4), 585–604.

Wall MM, Amemiya Y (2000). “Estimation for polynomial structural equation models.” Journal of the American Statistical Association, 95(451), 929–940.

Internal: Utility functions for the estimation of nonlinear models

Description

Internal: Utility functions for the estimation of nonlinear models

Usage

f1(.i, .j)

f2(.i, .j, .select_from, .Q, .H)

f3(.i, .j, .Q, .H)

f4(.i, .j, .Q, .H, .var_struc_error, .temp = NULL)

f5(.i, .j, .H, .Q, .var_struc_error)

Arguments

.i

Row index

.j

Column index

.select_from

matrix to select from

.Q

A vector of composite-construct correlations with element names equal to the names of the J construct names used in the measurement model. Note Q^2 is also called the reliability coefficient.

.H

The (N x J) matrix of construct scores.

Parse lavaan model

Description

Turns a model written in lavaan model syntax into a cSEMModel list.

Usage

parseModel(
  .model        = NULL, 
  .instruments  = NULL, 
  .check_errors = TRUE
  )

Arguments

.model

A model in lavaan model syntax or a cSEMModel list.

.instruments

.check_errors

Logical. Should the model to parse be checked for correctness in a sense that all necessary components to estimate the model are given? Defaults to TRUE.

Details

Instruments must be supplied separately as a named list of vectors of instruments. The names of the list elements are the names of the dependent constructs of the structural equation whose explanatory variables are endogenous. The vectors contain the names of the instruments corresponding to each equation. Note that exogenous variables of a given equation must be supplied as instruments for themselves.

By default parseModel() attempts to check if the model provided is correct in a sense that all necessary components required to estimate the model are specified (e.g., a construct of the structural model has at least 1 item). To prevent checking for errors use .check_errors = FALSE.

Value

An object of class cSEMModel is a standardized list containing the following components. J stands for the number of constructs and K for the number of indicators.

⁠$structural⁠: A matrix mimicking the structural relationship between constructs. If constructs are only linearly related, structural is of dimension (J x J) with row- and column names equal to the construct names. If the structural model contains nonlinear relationships structural is (J x (J + J*)) where J* is the number of nonlinear terms. Rows are ordered such that exogenous constructs are always first, followed by constructs that only depend on exogenous constructs and/or previously ordered constructs.
⁠$measurement⁠: A (J x K) matrix mimicking the measurement/composite relationship between constructs and their related indicators. Rows are in the same order as the matrix ⁠$structural⁠ with row names equal to the construct names. The order of the columns is such that ⁠$measurement⁠ forms a block diagonal matrix.
⁠$error_cor⁠: A (K x K) matrix mimicking the measurement error correlation relationship. The row and column order is identical to the column order of ⁠$measurement⁠.
⁠$cor_specified⁠: A matrix indicating the correlation relationships between any variables of the model as specified by the user. Mainly for internal purposes. Note that ⁠$cor_specified⁠ may also contain inadmissible correlations such as a correlation between measurement errors indicators and constructs.
⁠$construct_type⁠: A named vector containing the names of each construct and their respective type ("Common factor" or "Composite").
⁠$construct_order⁠: A named vector containing the names of each construct and their respective order ("First order" or "Second order").
⁠$model_type⁠: The type of model ("Linear" or "Nonlinear").
⁠$instruments⁠: Only if instruments are supplied: a list of structural equations relating endogenous RHS variables to instruments.
⁠$indicators⁠: The names of the indicators (i.e., observed variables and/or first-order constructs)
⁠$cons_exo⁠: The names of the exogenous constructs of the structural model (i.e., variables that do not appear on the LHS of any structural equation)
⁠$cons_endo⁠: The names of the endogenous constructs of the structural model (i.e., variables that appear on the LHS of at least one structural equation)
⁠$vars_2nd⁠: The names of the constructs modeled as second orders.
⁠$vars_attached_to_2nd⁠: The names of the constructs forming or building a second order construct.
⁠$vars_not_attached_to_2nd⁠: The names of the constructs not forming or building a second order construct.

Examples

# ===========================================================================
# Providing a model in lavaan syntax 
# ===========================================================================
model <- "
# Structural model
y1 ~ y2 + y3

# Measurement model
y1 =~ x1 + x2 + x3
y2 =~ x4 + x5
y3 =~ x6 + x7

# Error correlation
x1 ~~ x2
"

m <- parseModel(model)
m

# ===========================================================================
# Providing a complete model in cSEM format (class cSEMModel)
# ===========================================================================
# If the model is already a cSEMModel object, the model is returned as is:

identical(m, parseModel(m)) # TRUE

# ===========================================================================
# Providing a list 
# ===========================================================================
# It is possible to provide a list that contains at least the
# elements "structural" and "measurement". This is generally discouraged
# as this may cause unexpected errors.

m_incomplete <- m[c("structural", "measurement", "construct_type")]
parseModel(m_incomplete)

# Providing a list containing list names that are not part of a `cSEMModel`
# causes an error:

## Not run: 
m_incomplete[c("name_a", "name_b")] <- c("hello world", "hello universe")
parseModel(m_incomplete)

## End(Not run)

# Failing to provide "structural" or "measurement" also causes an error:

## Not run: 
m_incomplete <- m[c("structural", "construct_type")]
parseModel(m_incomplete)

## End(Not run)

`cSEMIPMA` method for `plot()`

Description

Plot the importance-performance matrix.

Usage

## S3 method for class 'cSEMIPMA'
plot(
  x = NULL,
  .dependent = NULL,
  .attributes = NULL,
  .level = c("construct", "indicator"),
  ...
)

Arguments

x

An R object of class cSEMIPMA.

.dependent

Character string. Name of the target construct for which the importance-performance matrix should be created.

.attributes

Character string. A vector containing indicator/construct names that should be plotted in the importance-performance matrix. It must be at least of length 2.

.level

Character string. Indicates the level for which the importance-performance matrix should be plotted. One of "construct" or "indicator". Defaults to "construct".

...

Currently ignored.

`cSEMNonlinearEffects` method for `plot()`

Description

This plot method can be used to create plots to analyze non-linear models in more depth. In doing so the following plot types can be selected:

.plot_type = "simpleeffects":: The plot of a simple effects analysis displays the predicted value of the dependent variable for different values of the independent variable and the moderator. As levels for the moderator the levels provided to the doNonlinearEffectsAnalysis() function are used. Since the constructs are standardized the values of the moderator equals the deviation from its mean measured in standard deviations.
.plot_type = "surface":: The plot of a surface analysis displays the predicted values of an independent variable (z). The values are predicted based on the values of the moderator and the independent variable including all their higher-order terms. For the values of the moderator and the independent variable steps between their minimum and maximum values are used.
.plot_type = "floodlight":: The plot of a floodlight analysis displays the direct effect of an continuous independent variable (z) on a dependent variable (y) conditional on the values of a continuous moderator variable (x), including the confidence interval and the Johnson-Neyman points. It is noted that in the floodlight plot only moderation is taken into account and higher order terms are ignored. For more details, see Spiller et al. (2013).

Plot the predicted values of an independent variable (z) The values are predicted based on a certain moderator and a certain independent variable including all their higher-order terms.

Usage

## S3 method for class 'cSEMNonlinearEffects'
plot(x, .plot_type = "simpleeffects", .plot_package = "plotly", ...)

Arguments

x

An R object of class cSEMNonlinearEffects.

.plot_type

A character string indicating the type of plot that should be produced. Options are "simpleeffects", "surface", and "floodlight". Defaults to "simpleeffects".

.plot_package

A character vector indicating the plot package used. Options are "plotly", and "persp". Defaults to "plotly".

...

Additional parameters that can be passed to graphics::persp, e.g., to rotate the plot.

`cSEMPredict` method for `plot()`

Description

The cSEMPredict method for the generic function plot().

Usage

## S3 method for class 'cSEMPredict'
plot(x, ...)

Arguments

x

An R object of class cSEMPredict.

...

Currently ignored.

`cSEMResults` method for `plot()` for second-order models.

Description

Usage

## S3 method for class 'cSEMResults_2ndorder'
plot(
  x,
  .title = args_default()$.title,
  .plot_significances = args_default()$.plot_significances,
  .plot_correlations = args_default()$.plot_correlations,
  .plot_structural_model_only = args_default()$.plot_structural_model_only,
  .plot_labels = args_default()$.plot_labels,
  .graph_attrs = args_default()$.graph_attrs,
  ...
)

Arguments

x

An R object of class cSEMResults_2ndorder object.

.title

Character string. Title of an object. Defaults to "".

.plot_significances

Logical. Should p-values in the form of stars be plotted? Defaults to TRUE.

.plot_correlations

.plot_structural_model_only

Logical. Should only the structural model, i.e., the constructs and their relationships be plotted? Defaults to FALSE.

.plot_labels

Logical. Whether to display edge labels and node R² values. Defaults to TRUE.

.graph_attrs

Character string. Additional attributes that should be passed to the DiagrammeR syntax, e.g., c("rankdir=LR", "ranksep=1.0"). Defaults to c("rankdir=LR").

...

Currently ignored.

Details

Creates a plot of a cSEMResults_2ndorder object using the grViz function. For more details on customizing plot, see https://rpubs.com/nguyen_mot/1275413.

`cSEMResults` method for `plot()`

Description

Usage

## S3 method for class 'cSEMResults_default'
plot(
  x = NULL,
  .title = args_default()$.title,
  .plot_significances = args_default()$.plot_significances,
  .plot_correlations = args_default()$.plot_correlations,
  .plot_structural_model_only = args_default()$.plot_structural_model_only,
  .plot_labels = args_default()$.plot_labels,
  .graph_attrs = args_default()$.graph_attrs,
  ...
)

Arguments

x

An R object of class cSEMResults_default object.

.title

Character string. Title of an object. Defaults to "".

.plot_significances

Logical. Should p-values in the form of stars be plotted? Defaults to TRUE.

.plot_correlations

.plot_structural_model_only

Logical. Should only the structural model, i.e., the constructs and their relationships be plotted? Defaults to FALSE.

.plot_labels

Logical. Whether to display edge labels and R² values in the nodes. Defaults to TRUE (i.e. original plot).

.graph_attrs

Character string. Additional attributes that should be passed to the DiagrammeR syntax, e.g., c("rankdir=LR", "ranksep=1.0"). Defaults to c("rankdir=LR").

...

Currently ignored.

Details

Creates a plot of a cSEMResults object using the grViz function. For more details on customizing plot, see https://rpubs.com/nguyen_mot/1275413.

Examples

## Not run: 
model_Bergami_int="
  # Common factor and composite models
  OrgPres <~ cei1 + cei2 + cei3 + cei4 + cei5 + cei6 + cei7 + cei8 
  OrgIden =~ ma1 + ma2 + ma3 + ma4 + ma5 + ma6
  AffJoy =~ orgcmt1 + orgcmt2 + orgcmt3 + orgcmt7
  AffLove  =~ orgcmt5 + orgcmt6 + orgcmt8

  # Structural model 
  OrgIden ~ OrgPres 
  AffLove ~ OrgPres+OrgIden+OrgPres.OrgIden
  AffJoy  ~ OrgPres+OrgIden
  "
  
  outBergamiInt <- csem(.data = BergamiBagozzi2000,.model = model_Bergami_int,
                        .disattenuate = T,
                        .PLS_weight_scheme_inner = 'factorial',
                        .tolerance = 1e-6,
                        .resample_method = 'none')
  
  outPlot <- plot(outBergamiInt)
  outPlot
  savePlot(outPlot,.file='plot.pdf')
  savePlot(outPlot,.file='plot.png')
  savePlot(outPlot,.file='plot.svg')
  savePlot(outPlot,.file='plot.dot')

## End(Not run)

`cSEMResults` method for `plot()` for multiple groups.

Description

Usage

## S3 method for class 'cSEMResults_multi'
plot(
  x = NULL,
  .title = args_default()$.title,
  .plot_significances = args_default()$.plot_significances,
  .plot_correlations = args_default()$.plot_correlations,
  .plot_structural_model_only = args_default()$.plot_structural_model_only,
  .plot_labels = args_default()$.plot_labels,
  .graph_attrs = args_default()$.graph_attrs,
  ...
)

Arguments

x

An R object of class cSEMResults_multi object.

.title

Character string. Title of an object. Defaults to "".

.plot_significances

Logical. Should p-values in the form of stars be plotted? Defaults to TRUE.

.plot_correlations

.plot_structural_model_only

Logical. Should only the structural model, i.e., the constructs and their relationships be plotted? Defaults to FALSE.

.plot_labels

Logical. Whether to display edge labels and node R² values. Defaults to TRUE.

.graph_attrs

Character string. Additional attributes that should be passed to the DiagrammeR syntax, e.g., c("rankdir=LR", "ranksep=1.0"). Defaults to c("rankdir=LR").

...

Currently ignored.

Details

Creates a plot of a cSEMResults object using the grViz function. For more details on customizing plot, see https://rpubs.com/nguyen_mot/1275413.

Predict indicator scores

Description

Usage

predict(
 .object                   = NULL,
 .benchmark                = c("lm", "unit", "PLS-PM", "GSCA", "PCA", "MAXVAR", "NA"),
 .approach_predict         = c("earliest", "direct"),
 .cv_folds                 = 10,
 .handle_inadmissibles     = c("stop", "ignore", "set_NA"),
 .r                        = 1,
 .test_data                = NULL,
 .approach_score_target    = c("mean", "median", "mode"),
 .sim_points               = 100,
 .disattenuate             = TRUE,
 .treat_as_continuous      = TRUE,
 .approach_score_benchmark = c("mean", "median", "mode", "round"),
 .seed                     = NULL
 )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.benchmark

Character string. The procedure to obtain benchmark predictions. One of "lm", "unit", "PLS-PM", "GSCA", "PCA", "MAXVAR", or "NA". Default to "lm".

.approach_predict

Character string. Which approach should be used to perform predictions? One of "earliest" and "direct". If "earliest" predictions for indicators associated to endogenous constructs are performed using only indicators associated to exogenous constructs. If "direct", predictions for indicators associated to endogenous constructs are based on indicators associated to their direct antecedents. Defaults to "earliest".

.cv_folds

Integer. The number of cross-validation folds to use. Setting .cv_folds to N (the number of observations) produces leave-one-out cross-validation samples. Defaults to 10.

.handle_inadmissibles

Character string. How should inadmissible results be treated? One of "stop", "ignore", or "set_NA". If "stop", predict() will stop immediately if estimation yields an inadmissible result. For "ignore" all results are returned even if all or some of the estimates yielded inadmissible results. For "set_NA" predictions based on inadmissible parameter estimates are set to NA. Defaults to "stop"

.r

Integer. The number of repetitions to use. Defaults to 1.

.test_data

A matrix of test data with the same column names as the training data.

.approach_score_target

.sim_points

Integer. How many samples from the truncated normal distribution should be simulated to estimate the exogenous construct scores? Defaults to "100".

.disattenuate

Logical. Should the benchmark predictions be based on disattenuated parameter estimates? Defaults to TRUE.

.treat_as_continuous

.approach_score_benchmark

.seed

Details

The predict function implements the procedure introduced by Shmueli et al. (2016) in the PLS context known as "PLSPredict" (Shmueli et al. 2019) including its variants PLScPredcit, OrdPLSpredict and OrdPLScpredict. It is used to predict the indicator scores of endogenous constructs and to evaluate the out-of-sample predictive power of a model. For that purpose, the predict function uses k-fold cross-validation to randomly split the data into training and test datasets, and subsequently predicts the values of the test data based on the model parameter estimates obtained from the training data. The number of cross-validation folds is 10 by default but may be changed using the .cv_folds argument. By default, the procedure is not repeated (.r = 1). You may choose to repeat cross-validation by setting a higher .r to be sure not to have a particular (unfortunate) split. See Shmueli et al. (2019) for details. Typically .r = 1 should be sufficient though.

Alternatively, users may supply a test dataset as matrix or a data frame of .test_data with the same column names as those in the data used to obtain .object (the training data). In this case, arguments .cv_folds and .r are ignored and predict uses the estimated coefficients from .object to predict the values in the columns of .test_data.

In Shmueli et al. (2016) PLS-based predictions for indicator i are compared to the predictions based on a multiple regression of indicator i on all available exogenous indicators (.benchmark = "lm") and a simple mean-based prediction summarized in the Q2_predict metric. predict() is more general in that is allows users to compare the predictions based on a so-called target model/specification to predictions based on an alternative benchmark. Available benchmarks include predictions based on a linear model, PLS-PM weights, unit weights (i.e. sum scores), GSCA weights, PCA weights, and MAXVAR weights.

Each estimation run is checked for admissibility using verify(). If the estimation yields inadmissible results, predict() stops with an error ("stop"). Users may choose to "ignore" inadmissible results or to simply set predictions to NA ("set_NA") for the particular run that failed.

Value

An object of class cSEMPredict with print and plot methods. Technically, cSEMPredict is a named list containing the following list elements:

⁠$Actual⁠: A matrix of the actual values/indicator scores of the endogenous constructs.
⁠$Prediction_target⁠: A list containing matrices of the predicted indicator scores of the endogenous constructs based on the target model for each repetition .r. Target refers to procedure used to estimate the parameters in .object.
⁠$Residuals_target⁠: A list of matrices of the residual indicator scores of the endogenous constructs based on the target model in each repetition .r.
⁠$Residuals_benchmark⁠: A list of matrices of the residual indicator scores of the endogenous constructs based on a model estimated by the procedure given to .benchmark for each repetition .r.
⁠$Prediction_metrics⁠: A data frame containing the predictions metrics MAE, RMSE, Q2_predict, the misclassification error rate (MER), the MAPE, the MSE2, Theil's forecast accuracy (U1), Theil's forecast quality (U2), Bias proportion of MSE (UM), Regression proportion of MSE (UR), and disturbance proportion of MSE (UD) (Hora and Campos 2015; Watson and Teelucksingh 2002).
⁠$Information⁠: A list with elements Target, Benchmark, Number_of_observations_training, Number_of_observations_test, Number_of_folds, Number_of_repetitions, and Handle_inadmissibles.

References

Hora J, Campos P (2015). “A review of performance criteria to validate simulation models.” Expert Systems, 32(5), 578–595. doi:10.1111/exsy.12111.

Shmueli G, Ray S, Estrada JMV, Chatla SB (2016). “The Elephant in the Room: Predictive Performance of PLS Models.” Journal of Business Research, 69(10), 4552–4564. doi:10.1016/j.jbusres.2016.03.049.

Shmueli G, Sarstedt M, Hair JF, Cheah J, Ting H, Vaithilingam S, Ringle CM (2019). “Predictive Model Assessment in PLS-SEM: Guidelines for Using PLSpredict.” European Journal of Marketing, 53(11), 2322–2347. doi:10.1108/ejm-02-2019-0189.

Watson PK, Teelucksingh SS (2002). A practical introduction to econometric methods: Classical and modern. University of West Indies Press, Mona, Jamaica.

Examples

### Anime example taken from https://github.com/ISS-Analytics/pls-predict/

# Load data
data(Anime) # data is similar to the Anime.csv found on 
            # https://github.com/ISS-Analytics/pls-predict/ but with irrelevant
            # columns removed

# Split into training and data the same way as it is done on 
# https://github.com/ISS-Analytics/pls-predict/
set.seed(123)

index     <- sample.int(dim(Anime)[1], 83, replace = FALSE)
dat_train <- Anime[-index, ]
dat_test  <- Anime[index, ]

# Specify model
model <- "
# Structural model

ApproachAvoidance ~ PerceivedVisualComplexity + Arousal

# Measurement/composite model

ApproachAvoidance         =~ AA0 + AA1 + AA2 + AA3
PerceivedVisualComplexity <~ VX0 + VX1 + VX2 + VX3 + VX4
Arousal                   <~ Aro1 + Aro2 + Aro3 + Aro4
"

# Estimate (replicating the results of the `simplePLS()` function)
res <- csem(dat_train, 
            model, 
            .disattenuate = FALSE, # original PLS
            .iter_max = 300, 
            .tolerance = 1e-07, 
            .PLS_weight_scheme_inner = "factorial"
)

# Predict using a user-supplied training data set
pp <- predict(res, .test_data = dat_test)
pp

### Compute prediction metrics  ------------------------------------------------
res2 <- csem(Anime, # whole data set
            model, 
            .disattenuate = FALSE, # original PLS
            .iter_max = 300, 
            .tolerance = 1e-07, 
            .PLS_weight_scheme_inner = "factorial"
)

# Predict using 10-fold cross-validation
## Not run: 
pp2 <- predict(res, .benchmark = "lm")
pp2
## There is a plot method available
plot(pp2)
## End(Not run)

### Example using OrdPLScPredict -----------------------------------------------
# Transform the numerical indicators into factors
## Not run: 
data("BergamiBagozzi2000")
data_new <- data.frame(cei1    = as.ordered(BergamiBagozzi2000$cei1),
                       cei2    = as.ordered(BergamiBagozzi2000$cei2),
                       cei3    = as.ordered(BergamiBagozzi2000$cei3),
                       cei4    = as.ordered(BergamiBagozzi2000$cei4),
                       cei5    = as.ordered(BergamiBagozzi2000$cei5),
                       cei6    = as.ordered(BergamiBagozzi2000$cei6),
                       cei7    = as.ordered(BergamiBagozzi2000$cei7),
                       cei8    = as.ordered(BergamiBagozzi2000$cei8),
                       ma1     = as.ordered(BergamiBagozzi2000$ma1),
                       ma2     = as.ordered(BergamiBagozzi2000$ma2),
                       ma3     = as.ordered(BergamiBagozzi2000$ma3),
                       ma4     = as.ordered(BergamiBagozzi2000$ma4),
                       ma5     = as.ordered(BergamiBagozzi2000$ma5),
                       ma6     = as.ordered(BergamiBagozzi2000$ma6),
                       orgcmt1 = as.ordered(BergamiBagozzi2000$orgcmt1),
                       orgcmt2 = as.ordered(BergamiBagozzi2000$orgcmt2),
                       orgcmt3 = as.ordered(BergamiBagozzi2000$orgcmt3),
                       orgcmt5 = as.ordered(BergamiBagozzi2000$orgcmt5),
                       orgcmt6 = as.ordered(BergamiBagozzi2000$orgcmt6),
                       orgcmt7 = as.ordered(BergamiBagozzi2000$orgcmt7),
                       orgcmt8 = as.ordered(BergamiBagozzi2000$orgcmt8))

model <- "
# Measurement models
OrgPres =~ cei1 + cei2 + cei3 + cei4 + cei5 + cei6 + cei7 + cei8
OrgIden =~ ma1 + ma2 + ma3 + ma4 + ma5 + ma6
AffJoy  =~ orgcmt1 + orgcmt2 + orgcmt3 + orgcmt7
AffLove =~ orgcmt5 + orgcmt 6 + orgcmt8

# Structural model
OrgIden ~ OrgPres
AffLove ~ OrgIden
AffJoy  ~ OrgIden 
"
# Estimate using cSEM; note: the fact that indicators are factors triggers OrdPLSc
res <- csem(.model = model, .data = data_new[1:250,])
summarize(res)

# Predict using OrdPLSPredict
set.seed(123)
pred <- predict(
  .object = res, 
  .benchmark = "PLS-PM",
  .test_data = data_new[(251):305,],
   .treat_as_continuous = TRUE, .approach_score_target = "median"
  )

pred 
round(pred$Prediction_metrics[, -1], 4)
## End(Not run)

`cSEMAssess` method for `print()`

Description

The cSEMAssess method for the generic function print().

Usage

## S3 method for class 'cSEMAssess'
print(x, ...)

`cSEMNonlinearEffectsAnalysis` method for `print()`

Description

The cSEMNonlinearEffectsAnalysis method for the generic function print().

Usage

## S3 method for class 'cSEMNonlinearEffects'
print(x, ...)

`cSEMPlotPredict` method for `print()`

Description

The cSEMPlotPredict method for the generic function print().

Usage

## S3 method for class 'cSEMPlotPredict'
print(x, ...)

Arguments

x

An R object of class cSEMPlotPredict.

...

Currently ignored.

`cSEMPredict` method for `print()`

Description

The cSEMPredict method for the generic function print().

Usage

## S3 method for class 'cSEMPredict'
print(x, .metrics = c("MAE", "RMSE", "Q2"), ...)

Arguments

.metrics

`cSEMResults` method for `print()`

Description

The cSEMResults method for the generic function print().

Usage

## S3 method for class 'cSEMResults'
print(x, ...)

`cSEMSummarize` method for `print()`

Description

The cSEMSummary method for the generic function print().

Usage

## S3 method for class 'cSEMSummarize'
print(x, .full_output = TRUE, ...)

Arguments

.full_output

Logical. Should the full output of summarize be printed. Defaults to TRUE.

`cSEMTestCVPAT` method for `print()`

Description

The cSEMTestCVAT method for the generic function print().

Usage

## S3 method for class 'cSEMTestCVPAT'
print(x, ...)

`cSEMTestHausman` method for `print()`

Description

The cSEMTestHausman method for the generic function print().

Usage

## S3 method for class 'cSEMTestHausman'
print(x, ...)

`cSEMTestMGD` method for `print()`

Description

The cSEMTestMGD method for the generic function print().

Usage

## S3 method for class 'cSEMTestMGD'
print(
  x,
  .approach_mgd = c("none", "Klesel", "Chin", "Sarstedt", "Keil", "Nitzl", "Henseler",
    "CI_para", "CI_overlap"),
  ...
)

Arguments

.approach_mgd

Character string or a vector of character strings. For which approach should details be displayed? One of: "none", "Klesel", "Chin", "Sarstedt", "Keil, "Nitzl", "Henseler", "CI_para", or "CI_overlap". Default to "none" in which case no details are displayed.

`cSEMTestMICOM` method for `print()`

Description

The cSEMTestMICOM method for the generic function print().

Usage

## S3 method for class 'cSEMTestMICOM'
print(x, ...)

`cSEMTestOMF` method for `print()`

Description

The cSEMTestOMF method for the generic function print().

Usage

## S3 method for class 'cSEMTestOMF'
print(x, ...)

`cSEMVerify` method for `print()`

Description

The cSEMVerify method for the generic function print().

Usage

## S3 method for class 'cSEMVerify'
print(x, ...)

Internal: Process data

Description

Prepare, standardize, check, and clean data provided via the .data argument.

Usage

processData(
  .data        = NULL, 
  .model       = NULL, 
  .instruments = NULL
  )

Arguments

.data

.model

A model in lavaan model syntax or a cSEMModel list.

.instruments

Value

A (N x K) data.frame containing the standardized data with columns ordered according to the order they appear in the measurement model equations provided via the .model argument.

Reliability

Description

Compute several reliability estimates. See the Reliability section of the cSEM website for details.

Usage

calculateRhoC(
  .object = NULL,
  .model_implied = TRUE,
  .only_common_factors = TRUE,
  .weighted = FALSE
)

calculateRhoT(
  .object = NULL,
  .alpha = 0.05,
  .closed_form_ci = FALSE,
  .only_common_factors = TRUE,
  .output_type = c("vector", "data.frame"),
  .weighted = FALSE,
  ...
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.model_implied

Logical. Should weights be scaled using the model-implied indicator correlation matrix? Defaults to TRUE.

.only_common_factors

.weighted

Logical. Should estimation be based on a score that uses the weights of the weight approach used to obtain .object?. Defaults to FALSE.

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

.closed_form_ci

Logical. Should a closed-form confidence interval be computed? Defaults to FALSE.

.output_type

Character string. The type of output. One of "vector" or "data.frame". Defaults to "vector".

...

Ignored.

Details

Since reliability is defined with respect to a classical true score measurement model only concepts modeled as common factors are considered by default. For concepts modeled as composites reliability may be estimated by setting .only_common_factors = FALSE, however, it is unclear how to interpret reliability in this case.

Reliability is traditionally based on a test score (proxy) based on unit weights. To compute congeneric and tau-equivalent reliability based on a score that uses the weights of the weight approach used to obtain .object use .weighted = TRUE instead.

For the tau-equivalent reliability ("rho_T" or "cronbachs_alpha") a closed-form confidence interval may be computed (Trinchera et al. 2018) by setting .closed_form_ci = TRUE (default is FALSE). If .alpha is a vector several CIs are returned.

Value

For calculateRhoC() and calculateRhoT() (if .output_type = "vector") a named numeric vector containing the reliability estimates. If .output_type = "data.frame" calculateRhoT() returns a data.frame with as many rows as there are constructs modeled as common factors in the model (unless .only_common_factors = FALSE in which case the number of rows equals the total number of constructs in the model). The first column contains the name of the construct. The second column the reliability estimate. If .closed_form_ci = TRUE the remaining columns contain lower and upper bounds for the (1 - .alpha) confidence interval(s).

Functions

calculateRhoC(): Calculate the congeneric reliability
calculateRhoT(): Calculate the tau-equivalent reliability

References

Trinchera L, Marie N, Marcoulides GA (2018). “A Distribution Free Interval Estimate for Coefficient Alpha.” Structural Equation Modeling: A Multidisciplinary Journal, 25(6), 876–887. doi:10.1080/10705511.2018.1431544.

Resample data

Description

Resample data from a data set using common resampling methods. For bootstrap or jackknife resampling, package users usually do not need to call this function but directly use resamplecSEMResults() instead.

Usage

resampleData(
 .object          = NULL,
 .data            = NULL,
 .resample_method = c("bootstrap", "jackknife", "permutation", 
                      "cross-validation"),
 .cv_folds        = 10,
 .id              = NULL,
 .R               = 499,
 .seed            = NULL
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.data

A data.frame, a matrix or a list of data of either type. Possible column types or classes of the data provided are: "logical", "numeric" ("double" or "integer"), "factor" (ordered and unordered) or a mix of several types. The data may also include one character column whose column name must be given to .id. This column is assumed to contain group identifiers used to split the data into groups. If .data is provided, .object is ignored. Defaults to NULL.

.resample_method

Character string. The resampling method to use. One of: "bootstrap", "jackknife", "permutation", or "cross-validation". Defaults to "bootstrap".

.cv_folds

Integer. The number of cross-validation folds to use. Setting .cv_folds to N (the number of observations) produces leave-one-out cross-validation samples. Defaults to 10.

.id

Character string or integer. A character string giving the name or an integer of the position of the column of .data whose levels are used to split .data into groups. Defaults to NULL.

.R

Integer. The number of bootstrap runs, permutation runs or cross-validation repetitions to use. Defaults to 499.

.seed

Details

The function resampleData() is general purpose. It simply resamples data from a data set according to the resampling method provided via the .resample_method argument and returns a list of resamples. Currently, bootstrap, jackknife, permutation, and cross-validation (both leave-one-out (LOOCV) and k-fold cross-validation) are implemented.

The user may provide the data set to resample either explicitly via the .data argument or implicitly by providing a cSEMResults objects to .object in which case the original data used in the call that created the cSEMResults object is used for resampling. If both, a cSEMResults object and a data set via .data are provided the former is ignored.

To split data provided via the .data argument into groups, the column name or the column index of the column containing the group levels to split the data must be given to .id. If data that contains grouping is taken from a cSEMResults object, .id is taken from the object information. Hence, providing .id is redundant in this case and therefore ignored.

The number of bootstrap or permutation runs as well as the number of cross-validation repetitions is given by .R. The default is 499 but should be increased in real applications. See e.g., Hesterberg (2015), p.380 for recommendations concerning the bootstrap. For jackknife .R is ignored as it is based on the N leave-one-out data sets.

Choosing resample_method = "permutation" for ungrouped data causes an error as permutation will simply reorder the observations which is usually not meaningful. If a list of data is provided each list element is assumed to represent the observations belonging to one group. In this case, data is pooled and group adherence permuted.

For cross-validation the number of folds (k) defaults to 10. It may be changed via the .cv_folds argument. Setting k = 2 (not 1!) splits the data into a single training and test data set. Setting k = N (where N is the number of observations) produces leave-one-out cross-validation samples. Note: 1.) At least 2 folds required (k > 1); 2.) k can not be larger than N; 3.) If N/k is not not an integer the last fold will have less observations.

Random number generation (RNG) uses the L'Ecuyer-CRMR RGN stream as implemented in the future.apply package (Bengtsson 2018). See ?future_lapply for details. By default a random seed is chosen.

Value

The structure of the output depends on the type of input and the resampling method:

Bootstrap: If a matrix or data.frame without grouping variable is provided (i.e., .id = NULL), the result is a list of length .R (default 499). Each element of that list is a bootstrap (re)sample. If a grouping variable is specified or a list of data is provided (where each list element is assumed to contain data for one group), resampling is done by group. Hence, the result is a list of length equal to the number of groups with each list element containing .R bootstrap samples based on the N_g observations of group g.
Jackknife: If a matrix or data.frame without grouping variable is provided (.id = NULL), the result is a list of length equal to the number of observations/rows (N) of the data set provided. Each element of that list is a jackknife (re)sample. If a grouping variable is specified or a list of data is provided (where each list element is assumed to contain data for one group), resampling is done by group. Hence, the result is a list of length equal to the number of group levels with each list element containing N jackknife samples based on the N_g observations of group g.
Permutation: If a matrix or data.frame without grouping variable is provided an error is returned as permutation will simply reorder the observations. If a grouping variable is specified or a list of data is provided (where each list element is assumed to contain data of one group), group membership is permuted. Hence, the result is a list of length .R where each element of that list is a permutation (re)sample.
Cross-validation: If a matrix or data.frame without grouping variable is provided a list of length .R is returned. Each list element contains a list containing the k splits/folds subsequently used as test and training data sets. If a grouping variable is specified or a list of data is provided (where each list element is assumed to contain data for one group), cross-validation is repeated .R times for each group. Hence, the result is a list of length equal to the number of groups, each containing .R list elements (the repetitions) which in turn contain the k splits/folds.

References

Bengtsson H (2018). future.apply: Apply Function to Elements in Parallel using Futures. R package version 1.0.1, https://CRAN.R-project.org/package=future.apply.

Hesterberg TC (2015). “What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum.” The American Statistician, 69(4), 371–386. doi:10.1080/00031305.2015.1089789.

Examples

# ===========================================================================
# Using the raw data 
# ===========================================================================
### Bootstrap (default) -----------------------------------------------------

res_boot1 <- resampleData(.data = satisfaction)
str(res_boot1, max.level = 3, list.len = 3)

## To replicate a bootstrap draw use .seed:
res_boot1a <- resampleData(.data = satisfaction, .seed = 2364)
res_boot1b <- resampleData(.data = satisfaction, .seed = 2364)
                           
identical(res_boot1, res_boot1a) # TRUE

### Jackknife ---------------------------------------------------------------

res_jack <- resampleData(.data = satisfaction, .resample_method = "jackknife")
str(res_jack, max.level = 3, list.len = 3)

### Cross-validation --------------------------------------------------------
## Create dataset for illustration:
dat <- data.frame(
  "x1" = rnorm(100),
  "x2" = rnorm(100),
  "group" = sample(c("male", "female"), size = 100, replace = TRUE),
  stringsAsFactors = FALSE)

## 10-fold cross-validation (repeated 100 times)
cv_10a <- resampleData(.data = dat, .resample_method = "cross-validation", 
                      .R = 100)
str(cv_10a, max.level = 3, list.len = 3)

# Cross-validation can be done by group if a group identifyer is provided:
cv_10 <- resampleData(.data = dat, .resample_method = "cross-validation", 
                      .id = "group", .R = 100)

## Leave-one-out-cross-validation (repeated 50 times)
cv_loocv  <- resampleData(.data = dat[, -3], 
                          .resample_method = "cross-validation", 
                          .cv_folds = nrow(dat),
                          .R = 50)
str(cv_loocv, max.level = 2, list.len = 3)

### Permuation ---------------------------------------------------------------

res_perm <- resampleData(.data = dat, .resample_method = "permutation",
                         .id = "group")
str(res_perm, max.level = 2, list.len = 3)

# Forgetting to set .id causes an error
## Not run: 
res_perm <- resampleData(.data = dat, .resample_method = "permutation")

## End(Not run)

# ===========================================================================
# Using a cSEMResults object
# ===========================================================================

model <- "
# Structural model
QUAL ~ EXPE
EXPE ~ IMAG
SAT  ~ IMAG + EXPE + QUAL + VAL
LOY  ~ IMAG + SAT
VAL  ~ EXPE + QUAL

# Measurement model
EXPE =~ expe1 + expe2 + expe3 + expe4 + expe5
IMAG =~ imag1 + imag2 + imag3 + imag4 + imag5
LOY  =~ loy1  + loy2  + loy3  + loy4
QUAL =~ qual1 + qual2 + qual3 + qual4 + qual5
SAT  =~ sat1  + sat2  + sat3  + sat4
VAL  =~ val1  + val2  + val3  + val4
"
a <- csem(satisfaction, model)

# Create bootstrap and jackknife samples
res_boot <- resampleData(a, .resample_method = "bootstrap", .R = 499)
res_jack <- resampleData(a, .resample_method = "jackknife")

# Since `satisfaction` is the dataset used the following approaches yield
# identical results.
res_boot_data   <- resampleData(.data = satisfaction, .seed = 2364)
res_boot_object <- resampleData(a, .seed = 2364)

identical(res_boot_data, res_boot_object) # TRUE

Resample cSEMResults

Description

Resample a cSEMResults object using bootstrap or jackknife resampling. The function is called by csem() if the user sets csem(..., .resample_method = "bootstrap") or csem(..., .resample_method = "jackknife") but may also be called directly.

Usage

resamplecSEMResults(
 .object                = NULL,
 .resample_method       = c("bootstrap", "jackknife"), 
 .resample_method2      = c("none", "bootstrap", "jackknife"), 
 .R                     = 499,
 .R2                    = 199,
 .handle_inadmissibles  = c("drop", "ignore", "replace"),
 .user_funs             = NULL,
 .eval_plan             = c("sequential", "multicore", "multisession"),
 .force                 = FALSE,
 .seed                  = NULL,
 .sign_change_option    = c("none","individual","individual_reestimate",
                            "construct_reestimate"),
 ...
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.resample_method

Character string. The resampling method to use. One of: "bootstrap" or "jackknife". Defaults to "bootstrap".

.resample_method2

.R

Integer. The number of bootstrap replications. Defaults to 499.

.R2

Integer. The number of bootstrap replications to use when resampling from a resample. Defaults to 199.

.handle_inadmissibles

.user_funs

.eval_plan

Character string. The evaluation plan to use. One of "sequential", "multicore", or "multisession". In the two latter cases all available cores will be used. Defaults to "sequential".

.force

Logical. Should .object be resampled even if it contains resamples already?. Defaults to FALSE.

.seed

.sign_change_option

...

Further arguments passed to functions supplied to .user_funs.

Details

Given M resamples (for bootstrap M = .R and for jackknife M = N, where N is the number of observations) based on the data used to compute the cSEMResults object provided via .object, resamplecSEMResults() essentially calls csem() on each resample using the arguments of the original call (ignoring any arguments related to resampling) and returns estimates for each of a subset of practically useful resampled parameters/statistics computed by csem(). Currently, the following estimates are computed and returned by default based on each resample: Path estimates, Loading estimates, Weight estimates.

In practical application users may need to resample a specific statistic (e.g, the heterotrait-monotrait ratio of correlations (HTMT) or differences between path coefficients such as beta_1 - beta_2). Such statistics may be provided by a function fun(.object, ...) or a list of such functions via the .user_funs argument. The first argument of these functions must always be .object. Internally, the function will be applied on each resample to produce the desired statistic. Hence, arbitrary complicated statistics may be resampled as long as the body of the function draws on elements contained in the cSEMResults object only. Output of fun(.object, ...) should preferably be a (named) vector but matrices are also accepted. However, the output will be vectorized (columnwise) in this case. See the examples section for details.

Both resampling the original cSEMResults object (call it "first resample") and resampling based on a resampled cSEMResults object (call it "second resample") are supported. Choices for the former are "bootstrap" and "jackknife". Resampling based on a resample is turned off by default (.resample_method2 = "none") as this significantly increases computation time (there are now M * M2 resamples to compute, where M2 is .R2 or N). Resamples of a resample are required, e.g., for the studentized confidence interval computed by the infer() function. Typically, bootstrap resamples are used in this case (Davison and Hinkley 1997).

As csem() accepts a single data set, a list of data sets as well as data sets that contain a column name used to split the data into groups, the cSEMResults object may contain multiple data sets. In this case, resampling is done by data set or group. Note that depending on the number of data sets/groups, the computation may be considerably slower as resampling will be repeated for each data set/group. However, apart from speed considerations users don not need to worry about the type of input used to compute the cSEMResults object as resamplecSEMResults() is able to deal with each case.

The number of bootstrap runs for the first and second run are given by .R and .R2. The default is 499 for the first and 199 for the second run but should be increased in real applications. See e.g., Hesterberg (2015), p.380, Davison and Hinkley (1997), and Efron and Hastie (2016) for recommendations. For jackknife .R are .R2 are ignored.

Resampling may produce inadmissible results (as checked by verify()). By default these results are dropped however users may choose to "ignore" or "replace" inadmissible results in which resampling continuous until the necessary number of admissible results is reached.

The cSEM package supports (multi)processing via the future framework (Bengtsson 2018). Users may simply choose an evaluation plan via .eval_plan and the package takes care of all the complicated backend issues. Currently, users may choose between standard single-core/single-session evaluation ("sequential") and multiprocessing ("multisession" or "multicore"). The future package provides other options (e.g., "cluster" or "remote"), however, they probably will not be needed in the context of the cSEM package as simulations usually do not require high-performance clusters. Depending on the operating system, the future package will manage to distribute tasks to multiple R sessions (Windows) or multiple cores. Note that multiprocessing is not necessary always faster when only a "small" number of replications is required as the overhead of initializing new sessions or distributing tasks to different cores will not immediately be compensated by the availability of multiple sessions/cores.

Random number generation (RNG) uses the L'Ecuyer-CRMR RGN stream as implemented in the future.apply package (Bengtsson 2018). It is independent of the evaluation plan. Hence, setting e.g., .seed = 123 will generate the same random number and replicates for both .eval_plan = "sequential", .eval_plan = "multisession", and .eval_plan = "multicore". See ?future_lapply for details.

Value

The core structure is the same structure as that of .object with the following elements added:

⁠$Estimates_resamples⁠: A list containing the .R resamples and the original estimates for each of the resampled quantities (Path_estimates, Loading_estimates, Weight_estimates, user defined functions). Each list element is a list containing elements ⁠$Resamples⁠ and ⁠$Original⁠. ⁠$Resamples⁠ is a ⁠(.R x K)⁠ matrix with each row representing one resample for each of the K parameters/statistics. ⁠$Original⁠ contains the original estimates (vectorized by column if the output of the user provided function is a matrix.
⁠$Information_resamples⁠: A list containing additional information.

Use ⁠str(<.object>, list.len = 3)⁠ on the resulting object for an overview.

References

Bengtsson H (2018). future: Unified Parallel and Distributed Processing in R for Everyone. R package version 1.10.0, https://CRAN.R-project.org/package=future.

Bengtsson H (2018). future.apply: Apply Function to Elements in Parallel using Futures. R package version 1.0.1, https://CRAN.R-project.org/package=future.apply.

Davison AC, Hinkley DV (1997). Bootstrap Methods and their Application. Cambridge University Press. doi:10.1017/cbo9780511802843.

Efron B, Hastie T (2016). Computer Age Statistical Inference. Cambridge University Pr. ISBN 1107149894.

Hesterberg TC (2015). “What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum.” The American Statistician, 69(4), 371–386. doi:10.1080/00031305.2015.1089789.

Examples

## Not run: 
# Note: example not run as resampling is time consuming
# ===========================================================================
# Basic usage
# ===========================================================================
model <- "
# Structural model
QUAL ~ EXPE
EXPE ~ IMAG
SAT  ~ IMAG + EXPE + QUAL + VAL
LOY  ~ IMAG + SAT
VAL  ~ EXPE + QUAL

# Measurement model
EXPE =~ expe1 + expe2 + expe3 + expe4 + expe5
IMAG =~ imag1 + imag2 + imag3 + imag4 + imag5
LOY  =~ loy1  + loy2  + loy3  + loy4
QUAL =~ qual1 + qual2 + qual3 + qual4 + qual5
SAT  =~ sat1  + sat2  + sat3  + sat4
VAL  =~ val1  + val2  + val3  + val4
"

## Estimate the model without resampling 
a <- csem(satisfaction, model)

## Bootstrap and jackknife estimation
boot <- resamplecSEMResults(a)
jack <- resamplecSEMResults(a, .resample_method = "jackknife") 

## Alternatively use .resample_method in csem()
boot_csem <- csem(satisfaction, model, .resample_method = "bootstrap")
jack_csem <- csem(satisfaction, model, .resample_method = "jackknife")

# ===========================================================================
# Extended usage
# ===========================================================================
### Double resampling  ------------------------------------------------------
# The confidence intervals (e.g. the bias-corrected and accelearated CI) 
# require double resampling. Use .resample_method2 for this.

boot1 <- resamplecSEMResults(
  .object = a, 
  .resample_method  = "bootstrap", 
  .R                = 50,
  .resample_method2 = "bootstrap", 
  .R2               = 20,
  .seed             = 1303
  )

## Again, this is identical to using csem 
boot1_csem <- csem(
  .data             = satisfaction, 
  .model            = model, 
  .resample_method  = "bootstrap",
  .R                = 50,
  .resample_method2 = "bootstrap",
  .R2               = 20,
  .seed             = 1303
  )

identical(boot1, boot1_csem) # only true if .seed was set

### Inference ---------------------------------------------------------------
# To get inferencial quanitites such as the estimated standard error or
# the percentile confidence intervall for each resampled quantity use 
# postestimation function infer()

inference <- infer(boot1)
inference$Path_estimates$sd
inference$Path_estimates$CI_percentile

# As usual summarize() can be called directly
summarize(boot1)

# In the example above .R x .R2 = 50 x 20 = 1000. Multiprocessing will be
# faster on most systems here and is therefore recommended. Note that multiprocessing
# does not affect the random number generation

boot2 <- resamplecSEMResults(
  .object           = a, 
  .resample_method  = "bootstrap", 
  .R                = 50,
  .resample_method2 = "bootstrap", 
  .R2               = 20,
  .eval_plan        = "multisession", 
  .seed             = 1303
  )

identical(boot1, boot2)
## End(Not run)

Data: satisfaction

Description

A data frame with 250 observations and 27 variables. Variables from 1 to 27 refer to six latent concepts: IMAG=Image, EXPE=Expectations, QUAL=Quality, VAL=Value, SAT=Satisfaction, and LOY=Loyalty.

imag1-imag5: Indicators attached to concept IMAG which is supposed to capture aspects such as the institutions reputation, trustworthiness, seriousness, solidness, and caring about customer.
expe1-expe5: Indicators attached to concept EXPE which is supposed to capture aspects concerning products and services provided, customer service, providing solutions, and expectations for the overall quality.
qual1-qual5: Indicators attached to concept QUAL which is supposed to capture aspects concerning reliability of products and services, the range of products and services, personal advice, and overall perceived quality.
val1-val4: Indicators attached to concept VAL which is supposed to capture aspects related to beneficial services and products, valuable investments, quality relative to price, and price relative to quality.
sat1-sat4: Indicators attached to concept SAT which is supposed to capture aspects concerning overall rating of satisfaction, fulfillment of expectations, satisfaction relative to other banks, and performance relative to customer's ideal bank.
loy1-loy4: Indicators attached to concept LOY which is supposed to capture aspects concerning propensity to choose the same bank again, propensity to switch to other bank, intention to recommend the bank to friends, and the sense of loyalty.

Usage

satisfaction

Format

An object of class data.frame with 250 rows and 27 columns.

Details

This dataset contains the variables from a customer satisfaction study of a Spanish credit institution on 250 customers. The data is identical to the dataset provided by the plspm package but with the last column (gender) removed. If you are looking for the original dataset use the satisfaction_gender dataset.

Source

The plspm package (version 0.4.9). Original source according to plspm: "Laboratory of Information Analysis and Modeling (LIAM). Facultat d'Informatica de Barcelona, Universitat Politecnica de Catalunya".

Data: satisfaction including gender

Description

A data frame with 250 observations and 28 variables. Variables from 1 to 27 refer to six latent concepts: IMAG=Image, EXPE=Expectations, QUAL=Quality, VAL=Value, SAT=Satisfaction, and LOY=Loyalty.

imag1-imag5: Indicators attached to concept IMAG which is supposed to capture aspects such as the institutions reputation, trustworthiness, seriousness, solidness, and caring about customer.
expe1-expe5: Indicators attached to concept EXPE which is supposed to capture aspects concerning products and services provided, customer service, providing solutions, and expectations for the overall quality.
qual1-qual5: Indicators attached to concept QUAL which is supposed to capture aspects concerning reliability of products and services, the range of products and services, personal advice, and overall perceived quality.
val1-val4: Indicators attached to concept VAL which is supposed to capture aspects related to beneficial services and products, valuable investments, quality relative to price, and price relative to quality.
sat1-sat4: Indicators attached to concept SAT which is supposed to capture aspects concerning overall rating of satisfaction, fulfillment of expectations, satisfaction relative to other banks, and performance relative to customer's ideal bank.
loy1-loy4: Indicators attached to concept LOY which is supposed to capture aspects concerning propensity to choose the same bank again, propensity to switch to other bank, intention to recommend the bank to friends, and the sense of loyalty.
gender: The sex of the respondent.

Usage

satisfaction_gender

Format

An object of class data.frame with 250 rows and 28 columns.

Details

This data set contains the variables from a customer satisfaction study of a Spanish credit institution on 250 customers. The data is taken from the plspm package. For convenience, there is a version of the dataset with the last column (gender) removed: satisfaction.

Source

savePlot

Description

This function saves a given plot of a cSEMResults object to a specified file format.

Usage

savePlot(
 .plot_object,
 .filename,
 .path = NULL)

Arguments

.plot_object

Object returned by one of the following functions plot.cSEMResults_default(), plot.cSEMResults_multi(), or plot.cSEMResults_2ndorder().

.filename

Character string. The name of the file to save the plot to (supports 'pdf', 'png', 'svg', and 'dot' formats).

.path

Character string. Path of the directory to save the file to. Defaults to the current working directory.

Internal: save_single_plot Helper function to save a single DiagrammeR plot based on the file extension

Description

diagrammer_obj DiagrammeR plot object to be saved. out_file The name of the file to save the plot to (supports 'pdf', 'png', 'svg', and 'dot' formats).

Usage

save_single_plot(
diagrammer_obj, 
out_file,
.path = NULL)

Arguments

.path

Character string. Path of the directory to save the file to. Defaults to NULL.

Value

NULL.

Internal: Scale weights

Description

Scale weights such that the formed composite has unit variance.

Usage

scaleWeights(
  .S = args_default()$.S, 
  .W = args_default()$.W
  )

Arguments

.S

The (K x K) empirical indicator correlation matrix.

.W

A (J x K) matrix of weights.

Value

The (J x K) matrix of scaled weights.

Internal: secondOrderMeasurementEdges

Description

Build measurement edges for a second–order model.

Usage

secondOrderMeasurementEdges(
 construct,
 weights_first,
 loadings_first,
 weight_p_first,
 loading_p_first,
 weights_second,
 loadings_second,
 weight_p_second,
 loading_p_second,
 plot_signif,
 plot_labels,
 constructTypes,
 only_second_stage = FALSE
 )

Arguments

plot_labels

Logical. Whether to display edge labels. Defaults to TRUE.

Value

Character string.

Internal: Set the dominant indicator

Description

Set the dominant indicator for each construct. Since the sign of the weights, and thus the loadings is often not determined, a dominant indicator can be chosen per block. The sign of the weights are chosen that the correlation between the dominant indicator and the composite is positive.

Usage

setDominantIndicator(
 .W                   = args_default()$.W,
 .dominant_indicators = args_default()$.dominant_indicators, 
 .S                   = args_default()$.S
 )

Arguments

.W

A (J x K) matrix of weights.

.dominant_indicators

.S

The (K x K) empirical indicator correlation matrix.

Value

The (J x K) matrix of weights with the dominant indicator set.

Internal: Set starting values

Description

Set the starting values.

Usage

setStartingValues(
  .W               = args_default()$.W,
  .starting_values = args_default()$.starting_values
  )

Arguments

.W

A (J x K) matrix of weights.

.starting_values

Value

The (J x K) matrix of starting values.

Internal: get structured cSEMTestMGD results

Description

Convenience function to summarize the results of all tests resulting from a call to testMGD() in a user-friendly way.

Usage

structureTestMGDDecisions(.object)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

Value

A data.frame.

Summarize model

Description

Usage

summarize(
 .object = NULL, 
 .alpha  = 0.05,
 .ci     = NULL,
 ...
 )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

.ci

A vector of character strings naming the confidence interval to compute. For possible choices see infer().

...

Further arguments to summarize(). Currently ignored.

Details

The summary is mainly focused on estimated parameters. For quality criteria such as the average variance extracted (AVE), reliability estimates, effect size estimates etc., use assess().

If .object contains resamples, standard errors, t-values and p-values (assuming estimates are standard normally distributed) are printed as well. By default the percentile confidence interval is given as well. For other confidence intervals use the .ci argument. See infer() for possible choices and a description.

Value

An object of class cSEMSummarize. A cSEMSummarize object has the same structure as the cSEMResults object with a couple differences:

Elements ⁠$Path_estimates⁠, ⁠$Loadings_estimates⁠, ⁠$Weight_estimates⁠, ⁠$Weight_estimates⁠, and ⁠$Residual_correlation⁠ are standardized data frames instead of matrices.
Data frames ⁠$Effect_estimates⁠, ⁠$Indicator_correlation⁠, and ⁠$Exo_construct_correlation⁠ are added to ⁠$Estimates⁠.

The data frame format is usually much more convenient if users intend to present the results in e.g., a paper or a presentation.

Examples

## Take a look at the dataset
#?threecommonfactors

## Specify the (correct) model
model <- "
# Structural model
eta2 ~ eta1
eta3 ~ eta1 + eta2

# (Reflective) measurement model
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"

## Estimate
res <- csem(threecommonfactors, model, .resample_method = "bootstrap", .R = 40)

## Postestimation
res_summarize <- summarize(res)
res_summarize

# Extract e.g. the loadings
res_summarize$Estimates$Loading_estimates

## By default only the 95% percentile confidence interval is printed. User
## can have several confidence interval computed, however, only the first
## will be printed.

res_summarize <- summarize(res, .ci = c("CI_standard_t", "CI_percentile"), 
                           .alpha = c(0.05, 0.01))
res_summarize

# Extract the loading including both confidence intervals
res_summarize$Estimates$Path_estimates

Perform a Cross-Validated Predictive Ability Test (CVPAT)

Description

Usage

testCVPAT(
.object1              = NULL,
.object2              = NULL,
.approach_predict     = c("earliest", "direct"),
.seed                 = NULL,
.cv_folds             = 10,
.handle_inadmissibles = c("stop", "ignore"),
.testtype             = c("twosided", "onesided"))

Arguments

.object1

An R object of class cSEMResults resulting from a call to csem().

.object2

An R object of class cSEMResults resulting from a call to csem().

.approach_predict

.seed

.cv_folds

Integer. The number of cross-validation folds to use. Setting .cv_folds to N (the number of observations) produces leave-one-out cross-validation samples. Defaults to 10.

.handle_inadmissibles

.testtype

Details

Perform a Cross-Validated Predictive Ability Test (CVPAT) as described in (Liengaard et al. 2020). The predictive performance of two models based on the same dataset is compared. In doing so, the average difference in losses in predictions is compared for both models.

Value

An object of class cSEMCVPAT with print and plot methods. Technically, cSEMCVPAT is a named list containing the following list elements:

'$Information': Additional information.

References

Liengaard BD, Sharma PN, Hult GTM, Jensen MB, Sarstedt M, Hair JF, Ringle CM (2020). “Prediction: Coveted, Yet Forsaken? Introducing a Cross-Validated Predictive Ability Test in Partial Least Squares Path Modeling.” Decision Sciences, 52(2), 362–392. doi:10.1111/deci.12445.

Examples

### Anime example taken from https://github.com/ISS-Analytics/pls-predict/

# Load data
data(Anime) # data is similar to the Anime.csv found on 
            # https://github.com/ISS-Analytics/pls-predict/ but with irrelevant
            # columns removed

# Split into training and data the same way as it is done on 
# https://github.com/ISS-Analytics/pls-predict/
set.seed(123)

index     <- sample.int(dim(Anime)[1], 83, replace = FALSE)
dat_train <- Anime[-index, ]
dat_test  <- Anime[index, ]

# Specify model
model <- "
# Structural model

ApproachAvoidance ~ PerceivedVisualComplexity + Arousal

# Measurement/composite model

ApproachAvoidance         =~ AA0 + AA1 + AA2 + AA3
PerceivedVisualComplexity <~ VX0 + VX1 + VX2 + VX3 + VX4
Arousal                   <~ Aro1 + Aro2 + Aro3 + Aro4
"

# Estimate (replicating the results of the `simplePLS()` function)
res <- csem(dat_train, 
            model, 
            .disattenuate = FALSE, # original PLS
            .iter_max = 300, 
            .tolerance = 1e-07, 
            .PLS_weight_scheme_inner = "factorial"
)

# Predict using a user-supplied training data set
pp <- predict(res, .test_data = dat_test)
pp

### Compute prediction metrics  ------------------------------------------------
res2 <- csem(Anime, # whole data set
            model, 
            .disattenuate = FALSE, # original PLS
            .iter_max = 300, 
            .tolerance = 1e-07, 
            .PLS_weight_scheme_inner = "factorial"
)

# Predict using 10-fold cross-validation
## Not run: 
pp2 <- predict(res, .benchmark = "lm")
pp2
## There is a plot method available
plot(pp2)
## End(Not run)

### Example using OrdPLScPredict -----------------------------------------------
# Transform the numerical indicators into factors
## Not run: 
data("BergamiBagozzi2000")
data_new <- data.frame(cei1    = as.ordered(BergamiBagozzi2000$cei1),
                       cei2    = as.ordered(BergamiBagozzi2000$cei2),
                       cei3    = as.ordered(BergamiBagozzi2000$cei3),
                       cei4    = as.ordered(BergamiBagozzi2000$cei4),
                       cei5    = as.ordered(BergamiBagozzi2000$cei5),
                       cei6    = as.ordered(BergamiBagozzi2000$cei6),
                       cei7    = as.ordered(BergamiBagozzi2000$cei7),
                       cei8    = as.ordered(BergamiBagozzi2000$cei8),
                       ma1     = as.ordered(BergamiBagozzi2000$ma1),
                       ma2     = as.ordered(BergamiBagozzi2000$ma2),
                       ma3     = as.ordered(BergamiBagozzi2000$ma3),
                       ma4     = as.ordered(BergamiBagozzi2000$ma4),
                       ma5     = as.ordered(BergamiBagozzi2000$ma5),
                       ma6     = as.ordered(BergamiBagozzi2000$ma6),
                       orgcmt1 = as.ordered(BergamiBagozzi2000$orgcmt1),
                       orgcmt2 = as.ordered(BergamiBagozzi2000$orgcmt2),
                       orgcmt3 = as.ordered(BergamiBagozzi2000$orgcmt3),
                       orgcmt5 = as.ordered(BergamiBagozzi2000$orgcmt5),
                       orgcmt6 = as.ordered(BergamiBagozzi2000$orgcmt6),
                       orgcmt7 = as.ordered(BergamiBagozzi2000$orgcmt7),
                       orgcmt8 = as.ordered(BergamiBagozzi2000$orgcmt8))

model <- "
# Measurement models
OrgPres =~ cei1 + cei2 + cei3 + cei4 + cei5 + cei6 + cei7 + cei8
OrgIden =~ ma1 + ma2 + ma3 + ma4 + ma5 + ma6
AffJoy  =~ orgcmt1 + orgcmt2 + orgcmt3 + orgcmt7
AffLove =~ orgcmt5 + orgcmt 6 + orgcmt8

# Structural model
OrgIden ~ OrgPres
AffLove ~ OrgIden
AffJoy  ~ OrgIden 
"
# Estimate using cSEM; note: the fact that indicators are factors triggers OrdPLSc
res <- csem(.model = model, .data = data_new[1:250,])
summarize(res)

# Predict using OrdPLSPredict
set.seed(123)
pred <- predict(
  .object = res, 
  .benchmark = "PLS-PM",
  .test_data = data_new[(251):305,],
   .treat_as_continuous = TRUE, .approach_score_target = "median"
  )

pred 
round(pred$Prediction_metrics[, -1], 4)
## End(Not run)

Regression-based Hausman test

Description

Usage

testHausman(
 .object               = NULL,
 .eval_plan            = c("sequential", "multicore", "multisession"),
 .handle_inadmissibles = c("drop", "ignore", "replace"),
 .R                    = 499,
 .resample_method      = c("bootstrap", "jackknife"),
 .seed                 = NULL
 )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.eval_plan

Character string. The evaluation plan to use. One of "sequential", "multicore", or "multisession". In the two latter cases all available cores will be used. Defaults to "sequential".

.handle_inadmissibles

.R

Integer. The number of bootstrap replications. Defaults to 499.

.resample_method

Character string. The resampling method to use. One of: "none", "bootstrap" or "jackknife". Defaults to "none".

.seed

Details

Calculates the regression-based Hausman test to be used to compare OLS to 2SLS estimates or 2SLS to 3SLS estimates. See e.g., Wooldridge (2010) (pages 131 f.) for details.

The function is somewhat experimental. Only use if you know what you are doing.

References

Wooldridge JM (2010). Econometric Analysis of Cross Section and Panel Data, 2 edition. MIT Press.

Examples

### Example from Dijkstra & Hensler (2015)
## Prepartion (values are from p. 15-16 of the paper)
Lambda <- t(kronecker(diag(6), c(0.7, 0.7, 0.7)))
Phi <- matrix(c(1.0000, 0.5000, 0.5000, 0.5000, 0.0500, 0.4000, 
                0.5000, 1.0000, 0.5000, 0.5000, 0.5071, 0.6286,
                0.5000, 0.5000, 1.0000, 0.5000, 0.2929, 0.7714,
                0.5000, 0.5000, 0.5000, 1.0000, 0.2571, 0.6286,
                0.0500, 0.5071, 0.2929, 0.2571, 1.0000, sqrt(0.5),
                0.4000, 0.6286, 0.7714, 0.6286, sqrt(0.5), 1.0000), 
              ncol = 6)

## Create population indicator covariance matrix
Sigma <- t(Lambda) %*% Phi %*% Lambda
diag(Sigma) <- 1
dimnames(Sigma) <- list(paste0("x", rep(1:6, each = 3), 1:3),
                        paste0("x", rep(1:6, each = 3), 1:3))

## Generate data
dat <- MASS::mvrnorm(n = 500, mu = rep(0, 18), Sigma = Sigma, empirical = TRUE)
# empirical = TRUE to show that 2SLS is in fact able to recover the true population
# parameters.

## Model to estimate
model <- "
## Structural model (nonrecurisve)
eta5 ~ eta6 + eta1 + eta2
eta6 ~ eta5 + eta3 + eta4

## Measurement model
eta1 =~ x11 + x12 + x13
eta2 =~ x21 + x22 + x23
eta3 =~ x31 + x32 + x33
eta4 =~ x41 + x42 + x43

eta5 =~ x51 + x52 + x53
eta6 =~ x61 + x62 + x63
"

library(cSEM)

## Estimate
res_ols <- csem(dat, .model = model, .approach_paths = "OLS")
sum_res_ols <- summarize(res_ols) 

# Note: For the example the model-implied indicator correlation is irrelevant
#       the warnings can be ignored.

res_2sls <- csem(dat, .model = model, .approach_paths = "2SLS",
                 .instruments = list("eta5" = c('eta1','eta2','eta3','eta4'), 
                                     "eta6" = c('eta1','eta2','eta3','eta4')))
sum_res_2sls <- summarize(res_2sls)
# Note that exogenous constructs are supplied as instruments for themselves!

## Test for endogeneity
test_ha <- testHausman(res_2sls, .R = 200)
test_ha

Tests for multi-group comparisons

Description

Usage

testMGD(
 .object                = NULL,
 .alpha                 = 0.05,
 .approach_p_adjust     = "none",
 .approach_mgd          = c("all", "Klesel", "Chin", "Sarstedt", 
                            "Keil", "Nitzl", "Henseler", "CI_para","CI_overlap"),
 .output_type           = c("complete", "structured"),
 .parameters_to_compare = NULL,
 .eval_plan             = c("sequential", "multicore", "multisession"),                           
 .handle_inadmissibles  = c("replace", "drop", "ignore"),
 .R_permutation         = 499,
 .R_bootstrap           = 499,
 .saturated             = FALSE,
 .seed                  = NULL,
 .type_ci               = "CI_percentile",
 .type_vcv              = c("indicator", "construct"),
 .verbose               = TRUE
 )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

.approach_p_adjust

.approach_mgd

.output_type

Character string. The type of output to return. One of "complete" or "structured". See the Value section for details. Defaults to "complete".

.parameters_to_compare

.eval_plan

Character string. The evaluation plan to use. One of "sequential", "multicore", or "multisession". In the two latter cases all available cores will be used. Defaults to "sequential".

.handle_inadmissibles

Character string. How should inadmissible results be treated? One of "drop", "ignore", or "replace". If "drop", all replications/resamples yielding an inadmissible result will be dropped (i.e. the number of results returned will potentially be less than .R). For "ignore" all results are returned even if all or some of the replications yielded inadmissible results (i.e. number of results returned is equal to .R). For "replace" resampling continues until there are exactly .R admissible solutions. Defaults to "replace" to accommodate all approaches.

.R_permutation

Integer. The number of permutations. Defaults to 499

.R_bootstrap

Integer. The number of bootstrap runs. Ignored if .object contains resamples. Defaults to 499

.saturated

Logical. Should a saturated structural model be used? Defaults to FALSE.

.seed

.type_ci

.type_vcv

Character string. Which model-implied correlation matrix should be calculated? One of "indicator" or "construct". Defaults to "indicator".

.verbose

Logical. Should information (e.g., progress bar) be printed to the console? Defaults to TRUE.

Details

This function performs various tests proposed in the context of multigroup analysis.

The following tests are implemented:

.approach_mgd = "Klesel": Approach suggested by Klesel et al. (2019): The model-implied variance-covariance matrix (either indicator (.type_vcv = "indicator") or construct (.type_vcv = "construct")) is compared across groups. If the model-implied indicator or construct correlation matrix based on a saturated structural model should be compared, set .saturated = TRUE. To measure the distance between the model-implied variance-covariance matrices, the geodesic distance (dG) and the squared Euclidean distance (dL) are used. If more than two groups are compared, the average distance over all groups is used.
.approach_mgd = "Sarstedt": Approach suggested by Sarstedt et al. (2011): Groups are compared in terms of parameter differences across groups. Sarstedt et al. (2011) tests if parameter k is equal across all groups. If several parameters are tested simultaneously it is recommended to adjust the significance level or the p-values (in cSEM correction is done by p-value). By default no multiple testing correction is done, however, several common adjustments are available via .approach_p_adjust. See stats::p.adjust() for details. Note: the test has some severe shortcomings. Use with caution.
.approach_mgd = "Chin": Approach suggested by Chin and Dibbern (2010): Groups are compared in terms of parameter differences across groups. Chin and Dibbern (2010) tests if parameter k is equal between two groups. If more than two groups are tested for equality, parameter k is compared between all pairs of groups. In this case, it is recommended to adjust the significance level or the p-values (in cSEM correction is done by p-value) since this is essentially a multiple testing setup. If several parameters are tested simultaneously, correction is by group and number of parameters. By default no multiple testing correction is done, however, several common adjustments are available via .approach_p_adjust. See stats::p.adjust() for details.
.approach_mgd = "Keil": Approach suggested by Keil et al. (2000): Groups are compared in terms of parameter differences across groups. Keil et al. (2000) tests if parameter k is equal between two groups. It is assumed, that the standard errors of the coefficients are equal across groups. The calculation of the standard error of the parameter difference is adjusted as proposed by Henseler et al. (2009). If more than two groups are tested for equality, parameter k is compared between all pairs of groups. In this case, it is recommended to adjust the significance level or the p-values (in cSEM correction is done by p-value) since this is essentially a multiple testing setup. If several parameters are tested simultaneously, correction is by group and number of parameters. By default no multiple testing correction is done, however, several common adjustments are available via .approach_p_adjust. See stats::p.adjust() for details.
.approach_mgd = "Nitzl": Approach suggested by Nitzl (2010): Groups are compared in terms of parameter differences across groups. Similarly to Keil et al. (2000), a single parameter k is tested for equality between two groups. In contrast to Keil et al. (2000), it is assumed, that the standard errors of the coefficients are unequal across groups (Sarstedt et al. 2011). If more than two groups are tested for equality, parameter k is compared between all pairs of groups. In this case, it is recommended to adjust the significance level or the p-values (in cSEM correction is done by p-value) since this is essentially a multiple testing setup. If several parameters are tested simultaneously, correction is by group and number of parameters. By default no multiple testing correction is done, however, several common adjustments are available via .approach_p_adjust. See stats::p.adjust() for details.
.approach_mgd = "Henseler": Approach suggested by Henseler (2007): Groups are compared in terms of parameter differences across groups. In doing so, the bootstrap estimates of one parameter are compared across groups. In the literature, this approach is also known as PLS-MGA. Originally, this test was proposed as an one-sided test. In this function we perform a left-sided and a right-sided test to investigate whether a parameter differs across two groups. In doing so, the significance level is divided by 2 and compared to p-value of the left and right-sided test. Moreover, .approach_p_adjust is ignored and no overall decision is returned. For a more detailed description, see also Henseler et al. (2009).
.approach_mgd = "CI_param": Approach mentioned in Sarstedt et al. (2011): This approach is based on the confidence intervals constructed around the parameter estimates of the two groups. If the parameter of one group falls within the confidence interval of the other group and/or vice versa, it can be concluded that there is no group difference. Since it is based on the confidence intervals .approach_p_adjust is ignored.
.approach_mgd = "CI_overlap": Approach mentioned in Sarstedt et al. (2011): This approach is based on the confidence intervals (CIs) constructed around the parameter estimates of the two groups. If the two CIs overlap, it can be concluded that there is no group difference. Since it is based on the confidence intervals .approach_p_adjust is ignored.

Use .approach_mgd to choose the approach. By default all approaches are computed (.approach_mgd = "all").

For convenience, two types of output are available. See the "Value" section below.

By default, approaches based on parameter differences across groups compare all parameters (.parameters_to_compare = NULL). To compare only a subset of parameters provide the parameters in lavaan model syntax just like the model to estimate. Take the simple model:

model_to_estimate <- "
Structural model
eta2 ~ eta1
eta3 ~ eta1 + eta2

# Each concept os measured by 3 indicators, i.e., modeled as latent variable
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"

If only the path from eta1 to eta3 and the loadings of eta1 are to be compared across groups, write:

to_compare <- "
Structural parameters to compare
eta3 ~ eta1

# Loadings to compare
eta1 =~ y11 + y12 + y13
"

Note that the "model" provided to .parameters_to_compare does not need to be an estimable model!

Note also that compared to all other functions in cSEM using the argument, .handle_inadmissibles defaults to "replace" to accommodate the Sarstedt et al. (2011) approach.

Argument .R_permuation is ignored for the "Nitzl" and the "Keil" approach. .R_bootstrap is ignored if .object already contains resamples, i.e. has class cSEMResults_resampled and if only the "Klesel" or the "Chin" approach are used.

The argument .saturated is used by "Klesel" only. If .saturated = TRUE the original structural model is ignored and replaced by a saturated model, i.e. a model in which all constructs are allowed to correlate freely. This is useful to test differences in the measurement models between groups in isolation.

Value

If .output_type = "complete" a list of class cSEMTestMGD. Technically, cSEMTestMGD is a named list containing the following list elements:

⁠$Information⁠: Additional information.
⁠$Klesel⁠: A list with elements, Test_statistic, P_value, and Decision
⁠$Chin⁠: A list with elements, Test_statistic, P_value, Decision, and Decision_overall
⁠$Sarstedt⁠: A list with elements, Test_statistic, P_value, Decision, and Decision_overall
⁠$Keil⁠: A list with elements, Test_statistic, P_value, Decision, and Decision_overall
⁠$Nitzl⁠: A list with elements, Test_statistic, P_value, Decision, and Decision_overall
⁠$Henseler⁠: A list with elements, Test_statistic, P_value, Decision, and Decision_overall
⁠$CI_para⁠: A list with elements, Decision, and Decision_overall
⁠$CI_overlap⁠: A list with elements, Decision, and Decision_overall

If .output_type = "structured" a tibble (data frame) with the following columns is returned.

Test: The name of the test.
Comparision: The parameter that was compared across groups. If "overall" the overall fit of the model was compared.
⁠alpha%⁠: The test decision for a given "alpha" level. If TRUE the null hypotheses was rejected; if FALSE it was not rejected.
p-value_correction: The p-value correction.
CI_type: Only for the "CI_para" and the "CI_overlap" test. Which confidence interval was used.
Distance_metric: Only for Test = "Klesel". Which distance metric was used.

References

Chin WW, Dibbern J (2010). “An Introduction to a Permutation Based Procedure for Multi-Group PLS Analysis: Results of Tests of Differences on Simulated Data and a Cross Cultural Analysis of the Sourcing of Information System Services Between Germany and the USA.” In Handbook of Partial Least Squares, 171–193. Springer Berlin Heidelberg. doi:10.1007/978-3-540-32827-8_8.

Henseler J (2007). “A new and simple approach to multi-group analysis in partial least squares path modeling.” In Martens H, Næ s T (eds.), Proceedings of PLS'07 - The 5th International Symposium on PLS and Related Methods, 104–107. PLS, Norway: Matforsk, As.

Henseler J, Ringle CM, Sinkovics RR (2009). “The use of partial least squares path modeling in international marketing.” Advances in International Marketing, 20, 277–320. doi:10.1108/S1474-7979(2009)0000020014.

Keil M, Tan BC, Wei K, Saarinen T, Tuunainen V, Wassenaar A (2000). “A cross-cultural study on escalation of commitment behavior in software projects.” MIS Quarterly, 24(2), 299–325.

Klesel M, Schuberth F, Henseler J, Niehaves B (2019). “A Test for Multigroup Comparison Using Partial Least Squares Path Modeling.” Internet Research, 29(3), 464–477. doi:10.1108/intr-11-2017-0418.

Nitzl C (2010). “Eine anwenderorientierte Einfuehrung in die Partial Least Square (PLS)-Methode.” In Arbeitspapier, number 21. Universitaet Hamburg, Institut fuer Industrielles Management, Hamburg.

Sarstedt M, Henseler J, Ringle CM (2011). “Multigroup Analysis in Partial Least Squares (PLS) Path Modeling: Alternative Methods and Empirical Results.” In Advances in International Marketing, 195–218. Emerald Group Publishing Limited. doi:10.1108/s1474-7979(2011)0000022012.

Examples

## Not run: 
# ===========================================================================
# Basic usage
# ===========================================================================
model <- "
# Structural model
QUAL ~ EXPE
EXPE ~ IMAG
SAT  ~ IMAG + EXPE + QUAL + VAL
LOY  ~ IMAG + SAT
VAL  ~ EXPE + QUAL

# Measurement model

EXPE <~ expe1 + expe2 + expe3 + expe4 + expe5
IMAG <~ imag1 + imag2 + imag3 + imag4 + imag5
LOY  =~ loy1  + loy2  + loy3  + loy4
QUAL =~ qual1 + qual2 + qual3 + qual4 + qual5
SAT  <~ sat1  + sat2  + sat3  + sat4
VAL  <~ val1  + val2  + val3  + val4
"

## Create list of virtually identical data sets
dat <- list(satisfaction[-3,], satisfaction[-5, ], satisfaction[-10, ])
out <- csem(dat, model, .resample_method = "bootstrap", .R = 40) 

## Test 
testMGD(out, .R_permutation = 40,.verbose = FALSE)

# Notes: 
#  1. .R_permutation (and .R in the call to csem) is small to make examples run quicker; 
#     should be higher in real applications.
#  2. Test will not reject their respective H0s since the groups are virtually
#     identical.
#  3. Only exception is the approach suggested by Sarstedt et al. (2011), a
#     sign that the test is unreliable.
#  4. As opposed to other functions involving the argument, 
#     '.handle_inadmissibles' the default is "replace" as this is
#     required by Sarstedt et al. (2011)'s approach.

# ===========================================================================
# Extended usage
# ===========================================================================
### Test only a subset ------------------------------------------------------
# By default all parameters are compared. Select a subset by providing a 
# model in lavaan model syntax:

to_compare <- "
# Path coefficients
QUAL ~ EXPE

# Loadings
EXPE <~ expe1 + expe2 + expe3 + expe4 + expe5
"

## Test 
testMGD(out, .parameters_to_compare = to_compare, .R_permutation = 20, 
        .R_bootstrap = 20, .verbose = FALSE)

### Different p_adjustments --------------------------------------------------
# To adjust p-values to accommodate multiple testing use .approach_p_adjust. 
# The number of tests to use for adjusting depends on the approach chosen. For
# the Chin approach for example it is the number of parameters to test times the
# number of possible group comparisons. To compare the results for different
# adjustments, a vector of p-adjustments may be chosen.

## Test 
testMGD(out, .parameters_to_compare = to_compare, 
        .approach_p_adjust = c("none", "bonferroni"),
        .R_permutation = 20, .R_bootstrap = 20, .verbose = FALSE)

## End(Not run)

Test measurement invariance of composites

Description

Usage

testMICOM(
 .object               = NULL,
 .approach_p_adjust    = "none",
 .handle_inadmissibles = c("drop", "ignore", "replace"), 
 .R                    = 499,
 .seed                 = NULL,
 .verbose              = TRUE
 )

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.approach_p_adjust

.handle_inadmissibles

.R

Integer. The number of bootstrap replications. Defaults to 499.

.seed

.verbose

Logical. Should information (e.g., progress bar) be printed to the console? Defaults to TRUE.

Details

The functions performs the permutation-based test for measurement invariance of composites across groups proposed by Henseler et al. (2016). According to the authors assessing measurement invariance in composite models can be assessed by a three-step procedure. The first two steps involve an assessment of configural and compositional invariance. The third steps involves mean and variance comparisons across groups. Assessment of configural invariance is qualitative in nature and hence not assessed by the testMICOM() function.

As testMICOM() requires at least two groups, .object must be of class cSEMResults_multi. As of version 0.2.0 of the package, testMICOM() does not support models containing second-order constructs.

It is possible to compare more than two groups, however, multiple-testing issues arise in this case. To adjust p-values in this case several p-value adjustments are available via the approach_p_adjust argument.

The remaining arguments set the number of permutation runs to conduct (.R), the random number seed (.seed), instructions how inadmissible results are to be handled (handle_inadmissibles), and whether the function should be verbose in a sense that progress is printed to the console.

The number of permutation runs defaults to args_default()$.R for performance reasons. According to Henseler et al. (2016) the number of permutations should be at least 5000 for assessment to be sufficiently reliable.

Value

A named list of class cSEMTestMICOM containing the following list element:

⁠$Step2⁠: A list containing the results of the test for compositional invariance (Step 2).
⁠$Step3⁠: A list containing the results of the test for mean and variance equality (Step 3).
⁠$Information⁠: A list of additional information on the test.

References

Henseler J, Ringle CM, Sarstedt M (2016). “Testing Measurement Invariance of Composites Using Partial Least Squares.” International Marketing Review, 33(3), 405–431. doi:10.1108/imr-09-2014-0304.

Examples

## Not run: 
# NOTE: to run the example. Download and load the newst version of cSEM.DGP
# from GitHub using devtools::install_github("M-E-Rademaker/cSEM.DGP").

# Create two data generating processes (DGPs) that only differ in how the composite
# X is build. Hence, the two groups are not compositionally invariant.
dgp1 <- "
# Structural model
Y ~ 0.6*X

# Measurement model
Y =~ 1*y1
X <~ 0.4*x1 + 0.8*x2

x1 ~~ 0.3125*x2
"

dgp2 <- "
# Structural model
Y ~ 0.6*X

# Measurement model
Y =~ 1*y1
X <~ 0.8*x1 + 0.4*x2

x1 ~~ 0.3125*x2
"

g1 <- generateData(dgp1, .N = 399, .empirical = TRUE) # requires cSEM.DGP 
g2 <- generateData(dgp2, .N = 200, .empirical = TRUE) # requires cSEM.DGP

# Model is the same for both DGPs
model <- "
# Structural model
Y ~ X

# Measurement model
Y =~ y1
X <~ x1 + x2
"

# Estimate
csem_results <- csem(.data = list("group1" = g1, "group2" = g2), model)

# Test
testMICOM(csem_results, .R = 50, .alpha = c(0.01, 0.05), .seed = 1987)

## End(Not run)

Test for overall model fit

Description

Usage

testOMF(
 .object                = NULL, 
 .alpha                 = 0.05,
 .fit_measures          = FALSE,
 .handle_inadmissibles  = c("drop", "ignore", "replace"), 
 .R                     = 499, 
 .saturated             = FALSE,
 .seed                  = NULL,
 ...
)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

.alpha

An integer or a numeric vector of significance levels. Defaults to 0.05.

.fit_measures

Logical. (EXPERIMENTAL) Should additional fit measures be included? Defaults to FALSE.

.handle_inadmissibles

.R

Integer. The number of bootstrap replications. Defaults to 499.

.saturated

Logical. Should a saturated structural model be used? Defaults to FALSE.

.seed

...

Can be used to determine the fitting function used in the calculateGFI function.

Details

Bootstrap-based test for overall model fit originally proposed by Beran and Srivastava (1985). See also Dijkstra and Henseler (2015) who first suggested the test in the context of PLS-PM.

By default, testOMF() tests the null hypothesis that the population indicator correlation matrix equals the population model-implied indicator correlation matrix. Several discrepancy measures may be used. By default, testOMF() uses four distance measures to assess the distance between the sample indicator correlation matrix and the estimated model-implied indicator correlation matrix, namely the geodesic distance, the squared Euclidean distance, the standardized root mean square residual (SRMR), and the distance based on the maximum likelihood fit function. The reference distribution for each test statistic is obtained by the bootstrap as proposed by Beran and Srivastava (1985).

It is possible to perform the bootstrap-based test using fit measures such as the CFI, RMSEA or the GFI if .fit_measures = TRUE. This is experimental. To the best of our knowledge the applicability and usefulness of the fit measures for model fit assessment have not been formally (statistically) assessed yet. Theoretically, the logic of the test applies to these fit indices as well. Hence, their applicability is theoretically justified. Only use if you know what you are doing.

If .saturated = TRUE the original structural model is ignored and replaced by a saturated model, i.e., a model in which all constructs are allowed to correlate freely. This is useful to test misspecification of the measurement model in isolation.

Value

A list of class cSEMTestOMF containing the following list elements:

⁠$Test_statistic⁠: The value of the test statistics.
⁠$Critical_value⁠: The corresponding critical values obtained by the bootstrap.
⁠$Decision⁠: The test decision. One of: FALSE (Reject) or TRUE (Do not reject).
⁠$Information⁠: The .R bootstrap values; The number of admissible results; The seed used and the number of total runs.

References

Beran R, Srivastava MS (1985). “Bootstrap Tests and Confidence Regions for Functions of a Covariance Matrix.” The Annals of Statistics, 13(1), 95–115. doi:10.1214/aos/1176346579.

Dijkstra TK, Henseler J (2015). “Consistent and Asymptotically Normal PLS Estimators for Linear Structural Equations.” Computational Statistics & Data Analysis, 81, 10–23.

Examples

# ===========================================================================
# Basic usage
# ===========================================================================
model <- "
# Structural model
eta2 ~ eta1
eta3 ~ eta1 + eta2

# (Reflective) measurement model
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"

## Estimate
out <- csem(threecommonfactors, model, .approach_weights = "PLS-PM")

## Test
testOMF(out, .R = 50, .seed = 320)

Data: threecommonfactors

Description

A dataset containing 500 standardized observations on 9 indicator generated from a population model with three concepts modeled as common factors.

Usage

threecommonfactors

Format

A matrix with 500 rows and 9 variables:

y11-y13: Indicators attached to the first common factor (eta1). Population loadings are: 0.7; 0.7; 0.7
y21-y23: Indicators attached to the second common factor (eta2). Population loadings are: 0.5; 0.7; 0.8
y31-y33: Indicators attached to the third common factor (eta3). Population loadings are: 0.8; 0.75; 0.7

The model is:

`eta2` = gamma1 * `eta1` + zeta1

`eta3` = gamma2 * `eta1` + beta * `eta2` + zeta2

with population values gamma1 = 0.6, gamma2 = 0.4 and beta = 0.35.

Examples

#============================================================================
# Correct model (the model used to generate the data)
#============================================================================
model_correct <- "
# Structural model
eta2 ~ eta1
eta3 ~ eta1 + eta2

# Measurement model
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33 
"

a <- csem(threecommonfactors, model_correct)

## The overall model fit is evidently almost perfect:
testOMF(a, .R = 30) # .R = 30 to speed up the example

Verify admissibility

Description

Usage

verify(.object)

Arguments

.object

An R object of class cSEMResults resulting from a call to csem().

Details

Verify admissibility of the results obtained using csem().

Results exhibiting one of the following defects are deemed inadmissible: non-convergence of the algorithm used to obtain weights, loadings and/or (congeneric) reliabilities larger than 1, a construct variance-covariance (VCV) and/or model-implied VCV matrix that is not positive semi-definite.

If .object is of class cSEMResults_2ndorder (i.e., estimates are based on a model containing second-order constructs) both the first and the second stage are checked separately.

Currently, a model-implied indicator VCV matrix for nonlinear model is not available. verify() therefore skips the check for positive definiteness of the model-implied indicator VCV matrix for nonlinear models and returns "ok".

Value

A logical vector indicating which (if any) problem occurred. A FALSE indicates that the specific problem did not occurred. For models containing second-order constructs estimated by the two/three-stage approach, a list of two such vectors (one for the first and one for the second stage) is returned. Status codes are:

1: The algorithm has converged.
2: All absolute standardized loading estimates are smaller than or equal to 1. A violation implies either a negative variance of the measurement error or a correlation larger than 1.
3: The construct VCV is positive semi-definite.
4: All reliability estimates are smaller than or equal to 1.
5: The model-implied indicator VCV is positive semi-definite. This is only checked for linear models (including models containing second-order constructs).

Examples

### Without higher order constructs --------------------------------------------
model <- "
# Structural model
eta2 ~ eta1
eta3 ~ eta1 + eta2

# (Reflective) measurement model
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"
  
# Estimate
out <- csem(threecommonfactors, model)
  
# Check admissibility
verify(out) # ok!

## Examine the structure of a cSEMVerify object
str(verify(out))

### With higher order constructs -----------------------------------------------
# If the model containes higher order constructs both the first and the second-
# stage estimates estimates are checked for admissibility

## Not run: 
require(cSEM.DGP) # download from https://m-e-rademaker.github.io/cSEM.DGP/
  
# Create DGP with 2nd order construct. Loading for indicator y51 is set to 1.1
# to produce a failing first stage model
  
dgp_2ndorder <- "
## Path model / Regressions
eta2 ~ 0.5*eta1
eta3 ~ 0.35*eta1 + 0.4*eta2

## Composite model
eta1 =~ 0.8*y41 + 0.6*y42 + 0.6*y43
eta2 =~ 1.1*y51 + 0.7*y52 + 0.7*y53
c1   =~ 0.8*y11 + 0.4*y12
c2   =~ 0.5*y21 + 0.3*y22

## Higher order composite
eta3 =~ 0.4*c1 + 0.4*c2
"
  
dat <- generateData(dgp_2ndorder) # requires the cSEM.DGP package
out <- csem(dat, .model = dgp_2ndorder)

verify(out) # not ok

## End(Not run)

cSEM: A package for composite-based structural equation modeling

Description

Author(s)

See Also

Data: Anime

Description

Usage

Format

Details

Source

Data: Benitezetal2020

Description

Usage

Format

Details

Source

References

Examples

Data: BergamiBagozzi2000

Description

Usage

Format

Details

Source

References

Examples

Data: ITFlex

Description

Usage

Format

Details

Source

References

Examples

Data: LancelotMiltgenetal2016

Description

Usage

Format

Details

Source

References

Examples

Data: political democracy

Description

Usage

Format

Source

References

Examples

Data: Russett

Description

Usage

Format

Details

Source

References

Examples

Data: SQ

Description

Usage

Format

Details

Source

References

Data: Summers

Description

Usage

Format

Details

Source

References

Examples

Data: Switching

Description

Usage

Format

Details

Source

References

Examples