Title: | Generalizability Theory for Information Retrieval Evaluation |
Version: | 2.0 |
Description: | Provides tools to measure the reliability of an Information Retrieval test collection. It allows users to estimate reliability using Generalizability Theory and map those estimates onto well-known indicators such as Kendall tau correlation or sensitivity. |
Depends: | R (≥ 3.2) |
License: | MIT + file LICENSE |
BugReports: | https://github.com/julian-urbano/gt4ireval/issues |
URL: | https://github.com/julian-urbano/gt4ireval/ |
Encoding: | UTF-8 |
LazyData: | true |
Suggests: | testthat, knitr, rmarkdown |
RoxygenNote: | 6.0.1 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2017-03-06 01:20:03 UTC; caerolus |
Author: | Julián Urbano [aut, cre] |
Maintainer: | Julián Urbano <urbano.julian@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2017-03-06 08:29:02 |
TREC-3 Ad hoc track.
Description
This is the set of Average Precision scores of the 40 systems submitted to the TREC-3 Ad hoc track, evaluated over 50 topics.
Usage
adhoc3
Format
A data frame with 40 columns (systems) and 50 rows (queries).
References
D. Harman (1994). Overview of the Third Text REtrieval Conference (TREC-3). Text REtrieval Conference.
See Also
D-study (Decision)
Description
dstudy
runs a D-study from the results of a gstudy
and computes, for a
certain number of queries, the expected generalizability coefficient Erho2
and index of
dependability Phi
, possibly with confidence intervals. Alternatively, it can estimate the
number of queries needed to achieve a certain level of stability, also with confidence intervals.
Usage
dstudy(gdata, queries = gdata$n.q, stability = 0.95, alpha = 0.025)
Arguments
gdata |
The result of running a |
queries |
A vector with different query set sizes for which to estimate Erho2 and Phi.
Defaults to the number of queries used to compute |
stability |
A vector with target Erho2 and Phi values to estimate required query set sizes. |
alpha |
A vector of confidence levels to compute intervals for Erho2, Phi and query set
sizes. This is the probability on each side of the interval, so for a 90% confidence interval
one must set |
Value
An object of class dstudy
, with the following components:
Erho2 , Erho2.lwr , Erho2.upr | Expected generalizability coefficient, and lower and upper limits of the intervals around it. |
Phi , Phi.lwr , Phi.upr | Expected index of dependability, and lower and upper limits of the intervals around it. |
n.q_Erho2 , n.q_Erho2.lwr , n.q_Erho2.upr | Expected number of queries to achieve the generalizability coefficient, and lower and upper limits of the intervals around it. |
n.q_Phi , n.q_Phi.lwr , n.q_Phi.upr | Expected number of queries to achieve the index of dependability, and lower and upper limits of the intervals around it. |
call | A list with the gstudy used in this D-study, the target number of
queries , target level of stability and alpha level for the confidence
intervals. |
Author(s)
Julián Urbano
References
R.L. Brennan (2001). Generalizability Theory. Springer.
L.S. Feldt (1965). The Approximate Sampling Distribution of Kuder-Richardson Reliability Coefficient Twenty. Psychometrika, 30(3):357–370.
C. Arteaga, S. Jeyaratnam, and G. A. Franklin (1982). Confidence Intervals for Proportions of Total Variance in the Two-Way Cross Component of Variance Model. Communications in Statistics: Theory and Methods, 11(15):1643–1658.
J. Urbano, M. Marrero and D. Martín (2013). On the Measurement of Test Collection Reliability. ACM SIGIR, pp. 393-402.
See Also
Examples
g <- gstudy(adhoc3)
dstudy(g)
# estimate stability at various query set sizes
dstudy(g, queries = seq(50, 200, 10))
# estimate required query set sizes for various stability levels
dstudy(g, stability = seq(0.8, 0.95, 0.01))
# compute both 95% and 99% confidence intervals
dstudy(g, stability = 0.9, alpha = c(0.05, 0.01) / 2)
# compute 1-tailed 95% confidence intervals
dstudy(g, alpha = 0.05)
G-study (Generalizability)
Description
gstudy
runs a G-study with the given data, assuming a fully crossed design (all systems
evaluated on the same queries). It can be used to estimate variance components, which can further
be used to run a D-study with dstudy
.
Usage
gstudy(data, drop = 0)
Arguments
data |
A data frame or matrix with the existing effectiveness scores. Systems are columns and queries are rows. |
drop |
The fraction of worst-performing systems to drop from the data before analysis. Defaults to 0 (include all systems). |
Value
An object of class gstudy
, with the following components:
n.s , n.q | Number of systems and number of queries of the existing data. |
var.s , var.q , var.e | Variance of the system, query, and residual effects. |
em.s , em.q , em.e | Mean squares of the system, query and residual components. |
call | A list with the existing data and the percentage of systems to
drop . |
Author(s)
Julián Urbano
References
R.L. Brennan (2001). Generalizability Theory. Springer.
J. Urbano, M. Marrero and D. Martín (2013). On the Measurement of Test Collection Reliability. ACM SIGIR, pp. 393-402.
See Also
Examples
g <- gstudy(adhoc3)
# same, but drop the 20% worst systems
g20 <- gstudy(adhoc3, drop = 0.2)
Map GT-based Indicators onto Data-based Indicators
Description
Maps Erho2 and Phi scores from Generalizability Theory onto traditional data-based scores like the Kendall tau correlation, AP correlation, power, minor conflict rate and major conflict rate with 2-tailed t-tests, absolute and relative sensitivity, and rooted mean squared error.
Usage
gt2tau(Erho2)
gt2tauAP(Erho2)
gt2power(Erho2)
gt2minor(Erho2)
gt2major(Erho2)
gt2asens(Erho2)
gt2rsens(Phi)
gt2rmse(Phi)
Arguments
Erho2 |
Vector of generalizability coefficients to map from. |
Phi |
Vector of indices of dependability to map from. |
Details
Take these mappings with a grain of salt. See figure 3 in (Urbano, 20013).
Value
A vector of data-based indicator values.
Author(s)
Julián Urbano
References
J. Urbano, M. Marrero and D. Martín (2013). On the Measurement of Test Collection Reliability. ACM SIGIR, pp. 393-402.
See Also
Examples
g <- gstudy(adhoc3)
d <- dstudy(g)
gt2tau(d$Erho2)
gt2rmse(d$Phi)
Synthetic dataset no. 4.
Description
This is the Synthetic dataset no. 4 from Table 3.2 on page 73 of Brennan (2001), recasted as a p x i design, as required on page 182.
Usage
synthetic4
Format
A data frame with 10 columns (systems) and 12 rows (queries).
References
R.L. Brennan, "Generalizability Theory". Springer, 2001.