Type: Package
Title: Creating Correspondence Tables Between Two Statistical Classifications
Date: 2022-09-25
Version: 0.7.4
Description: A candidate correspondence table between two classifications can be created when there are correspondence tables leading from the first classification to the second one via intermediate 'pivot' classifications. The correspondence table between two statistical classifications can be updated when one of the classifications gets updated to a new version.
License: EUPL version 1.1 | EUPL version 1.2 [expanded from: EUPL]
Encoding: UTF-8
Imports: data.table
Suggests: knitr, rmarkdown
VignetteBuilder: knitr
NeedsCompilation: no
URL: https://github.com/eurostat/correspondenceTables
BugReports: https://github.com/eurostat/correspondenceTables/issues
Maintainer: Mátyás Mészáros <matyas.meszaros@ec.europa.eu>
RoxygenNote: 7.1.2
Packaged: 2022-09-27 08:43:52 UTC; mmeszaros
Author: Vasilis Chasiotis [aut] (Department of Statistics, Athens University of Economics and Business), Photis Stavropoulos [aut] (Quantos S.A. Statistics and Information Systems), Martin Karlberg [aut], Mátyás Mészáros [cre]
Repository: CRAN
Date/Publication: 2022-09-27 10:50:02 UTC

Ex novo creation of candidate correspondence tables between two classifications via pivot tables

Description

Creation of a candidate correspondence table between two classifications, A and B, when there are correspondence tables leading from the first classification to the second one via k intermediate pivot classifications C_1, \ldots, C_k. The correspondence tables leading from A to B are A:C_1, {C_i:C_{i+1}: 1 \le i \le k -1}, B:C_k.

Usage

newCorrespondenceTable(
  Tables,
  CSVout = NULL,
  Reference = "none",
  MismatchTolerance = 0.2
)

Arguments

Tables

A string of type character containing the name of a csv file which contains the names of the files that contain the classifications and the intermediate correspondence tables (see "Details" below).

CSVout

The preferred name for the output csv files that will contain the candidate correspondence table and information about the classifications involved. The valid values are NULL or strings of type character. If the selected value is NULL, the default, no output file is produced. If the value is a string, then the output is exported into two csv files whose names contain the provided name (see "Value" below).

Reference

The reference classification among A and B. If a classification is the reference to the other, and hence hierarchically superior to it, each code of the other classification is expected to be mapped to at most one code of the reference classification. The valid values are "none", "A", and "B". If the selected value is "A" or "B", a "Review" flag column (indicating the records violating this expectation) is included in the output (see "Explanation of the flags" below).

MismatchTolerance

The maximum acceptable proportion of rows in the candidate correspondence table which contain no code for classification A or no code for classification B. The default value is 0.2. The valid values are real numbers in the interval [0, 1].

Details

File and file name requirements:

Classification table requirements:

Correspondence table requirements:

Interdependency requirements:

Mismatch tolerance:

If any of the conditions required from the arguments is violated an error message is produced and execution is stopped.

Value

newCorrespondenceTable() returns a list with two elements, both of which are data frames.

Explanation of the flags

Sample datasets included in the package

Running browseVignettes("correspondenceTables") in the console opens an html page in the user's default browser. Selecting HTML from the menu, users can read information about the use of the sample datasets that are included in the package. If they wish to access the csv files with the sample data, users have two options:

Examples

{
   ## Application of function newCorrespondenceTable() with "example.csv" being the file
   ## that includes the names the files  and the intermediate tables in a sparse square 
   ## matrix containing the 100 rows of the classifications (from ISIC v4 to CPA v2.1 through 
   ## CPC v2.1). The desired name for the csv file that will contain the candidate
   ## correspondence table is "newCorrespondenceTable.csv", the reference classification is 
   ## ISIC v4 ("A") and the maximum acceptable proportion of unmatched codes between
   ## ISIC v4 and CPC v2.1 is 0.56 (this is the minimum mismatch tolerance for the first 100 row 
   ## as 55.5% of the code of ISIC v4 is unmatched).
     
     tmp_dir<-tempdir()
     A <- read.csv(system.file("extdata", "example.csv", package = "correspondenceTables"), 
                   header = FALSE, 
                   sep = ",")
     for (i in 1:nrow(A)) {
       for (j in 1:ncol(A)) {
         if (A[i,j]!="") {
           A[i, j] <- system.file("extdata", A[i, j], package = "correspondenceTables")
       }}}
     write.table(x = A, 
                 file = file.path(tmp_dir,"example.csv"), 
                 row.names = FALSE, 
                 col.names = FALSE, 
                 sep = ",")
        
     NCT<-newCorrespondenceTable(file.path(tmp_dir,"example.csv"), 
                                 file.path(tmp_dir,"newCorrespondenceTable.csv"), 
                                 "A", 
                                 0.56)
     
     summary(NCT)
     head(NCT$newCorrespondenceTable)
     NCT$classificationNames
     csv_files<-list.files(tmp_dir, pattern = ".csv")
     unlink(csv_files)
    }

Update the correspondence table between statistical classifications A and B when A has been updated to version A*

Description

Update the correspondence table between statistical classifications A and B when A has been updated to version A*.

Usage

updateCorrespondenceTable(
  A,
  B,
  AStar,
  AB,
  AAStar,
  CSVout = NULL,
  Reference = "none",
  MismatchToleranceB = 0.2,
  MismatchToleranceAStar = 0.2
)

Arguments

A

A string of the type character containing the name of a csv file that contains the original classification A.

B

A string of the type character containing the name of a csv file that contains classification B.

AStar

A string of the type character containing the name of a csv file that contains the updated version A*.

AB

A string of the type character containing the name of a csv file that contains the previous correspondence table A:B.

AAStar

A string of the type character containing the name of a csv file that contains the concordance table A:A*, which contains the mapping between the codes of the two versions of the classification.

CSVout

The preferred name for the output csv files that will contain the updated correspondence table and information about the classifications involved. The valid values are NULL or strings of type character. If the selected value is NULL, the default, no output file is produced. If the value is a string, then the output is exported into two csv files whose names contain the provided name (see "Value" below).

Reference

The reference classification among A and B. If a classification is the reference to the other, and hence hierarchically superior to it, each code of the other classification is expected to be mapped to at most one code of the reference classification. The valid values are "none", "A", and "B". If the selected value is "A" or "B", a "Review" flag column is included in the output (see "Explanation of the flags" below).

MismatchToleranceB

The maximum acceptable proportion of rows in the updated correspondence table which contain no code of the target classification B, among those which contain a code of A, of A*, or of both. The default value is 0.2. The valid values are real numbers in the interval [0, 1].

MismatchToleranceAStar

The maximum acceptable proportion of rows in the updated correspondence table which contain no code of the updated classification A*, among those which contain a code of A, of B, or of both. The default value is 0.2. The valid values are real numbers in the interval [0, 1].

Details

File and file name requirements:

Classification table requirements:

Correspondence and concordance table requirements:

Interdependency requirements:

Mismatch tolerance:

If any of the conditions required from the arguments is violated an error message is produced and execution is stopped.

Value

updateCorrespondenceTable() returns a list with two elements, both of which are data frames.

Explanation of the flags

Sample datasets included in the package

Running browseVignettes("correspondenceTables") in the console opens an html page in the user's default browser. Selecting HTML from the menu, users can read information about the use of the sample datasets that are included in the package. If they wish to access the csv files with the sample data, users have two options:

Examples

 {
 ## Application of function updateCorrespondenceTable() with NAICS 2017 being the
 ## original classification A, NACE being the target classification B, NAICS 2022
 ## being the updated version A*, NAICS 2017:NACE being the previous correspondence
 ## table A:B, and NAICS 2017:NAICS 2022 being the A:A* concordance table. The desired
 ## name for the csv file that will contain the updated correspondence table is
 ## "updateCorrespondenceTable.csv", there is no reference classification, and the
 ## maximum acceptable proportions of unmatched codes between the original
 ## classification A and the target classification B, and between the original
 ## classification A and the updated classification A* are 0.5 and 0.3, respectively.
  
 tmp_dir<-tempdir()   
 A <- system.file("extdata", "NAICS2017.csv", package = "correspondenceTables")
 AStar <- system.file("extdata", "NAICS2022.csv", package = "correspondenceTables")
 B <- system.file("extdata", "NACE.csv", package = "correspondenceTables")
 AB <- system.file("extdata", "NAICS2017_NACE.csv", package = "correspondenceTables")
 AAStar <- system.file("extdata", "NAICS2017_NAICS2022.csv", package = "correspondenceTables")
 
 UPC <- updateCorrespondenceTable(A,
                                  B,
                                  AStar, 
                                  AB, 
                                  AAStar, 
                                  file.path(tmp_dir,"updateCorrespondenceTable.csv"), 
                                  "none", 
                                  0.5, 
                                  0.3)
 
 summary(UPC)
 head(UPC$updateCorrespondenceTable)
 UPC$classificationNames
 csv_files<-list.files(tmp_dir, pattern = ".csv")
 if (length(csv_files)>0) unlink(csv_files)
    }