Intra-Class Correlation and Inter-rater Reliability
A few notes on agreement between raters.
Cohen's κ
Cohen's κ can be used for agreement between two raters on categorical data. The basic calculation is
where is the percentage observed agreement and is the percentage expected agreement by chance. Therefore κ is what percentage of the agreement over chance is observed.
Fleiss' κ is an extension to more than two raters and has a similar form.
A major flaw in either κ is that for ordinal data, any disagreement is treated equal. E.g. on a Likert scale, ratings of 4 and 5 are just as disagreeable as ratings of 1 and 5. Weighted κ addresses this by including a weight matrix which can be used to provide levels of disagreement.
Sources
Intra-class correlation
ICC is used for continuous measurements. It can be used in place of weighted κ with ordinal variables of course. The basic calculation is
where and represent within- and between- rater variability respectively. Since the denominator is the total variance of all ratings regardless of rater, this fraction represents the percent of total variation accounted for by within-variation.
The modern way to estimate the ICC is by a mixed model, extracting the σ's that are needed.
ICC in R
Use the "Orthodont" data from nlme as our example. Look
at distance measurements and look at correlation
by Subject.
library("nlme") library("lme4") data(Orthondont)
With nlme
Using the nlme package, we fit the model:
fm1 <- lme(distance ~ 1, random = ~ 1 | Subject, data = Orthodont) summary(fm1)
Linear mixed-effects model fit by REML
Data: Orthodont
AIC BIC logLik
521.3618 529.3803 -257.6809
Random effects:
Formula: ~1 | Subject
(Intercept) Residual
StdDev: 1.937002 2.220312
Fixed effects: distance ~ 1
Value Std.Error DF t-value p-value
(Intercept) 24.02315 0.4296606 81 55.91192 0
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.2400448 -0.5277439 -0.1072888 0.4731815 2.7687301
Number of Observations: 108
Number of Groups: 27
The between-effect standard deviation is reported as
the Residual StdDev. To obtain the ICC, we compute each
σ:
s2w <- getVarCov(fm1)[[1]] s2b <- fm1$s^2 c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))
sigma2_w sigma2_b icc 3.7519762 4.9297832 0.4321677
With lme4
Using the lme4 package, we fit the model:
fm2 <- lmer(distance ~ (1 | Subject), data = Orthodont) summary(fm2)
Linear mixed model fit by REML ['lmerMod']
Formula: distance ~ (1 | Subject)
Data: Orthodont
REML criterion at convergence: 515.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.2400 -0.5277 -0.1073 0.4732 2.7687
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 3.752 1.937
Residual 4.930 2.220
Number of obs: 108, groups: Subject, 27
Fixed effects:
Estimate Std. Error t value
(Intercept) 24.0231 0.4297 55.91
The Variance column of the Random Effects table gives the within-subject (Subject) and between-subject (Residual) variances.
s2w <- summary(fm2)$varcor$Subject[1] s2b <- summary(fm2)$sigma^2 c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))
sigma2_w sigma2_b icc 3.7519736 4.9297839 0.4321675