TY - JOUR
T1 - Assumptions behind Intercoder Reliability Indices
AU - Zhao, Xinshu
AU - Liu, Jun S.
AU - Deng, Ke
N1 - This study was supported in part by HKBU Faculty Research Grant (2008 & 2009, Zhao PI), HKBU Strategic Development Fund (2009 & 2011, Zhao PI), and grants from Panmedia Institute (2010, Zhao PI) and ENICHD (R24 HD056670, Henderson PI).
PY - 2016/5/18
Y1 - 2016/5/18
N2 - Intercoder reliability is the most often used quantitative indicator of measurement quality in content studies. Researchers in psychology, sociology, education, medicine, marketing and other disciplines also use reliability to evaluate the quality of diagnosis, tests and other assessments. Many indices of reliability have been recommended for general use. This article analyzes 22, which are organized into 18 chance-adjusted and four non-adjusted indices. The chance-adjusted indices are further organized into three groups, including nine category-based indices, eight distribution-based indices, and one that is double based, on category and distribution. The main purpose of this work is to examine the assumptions behind each index. Most of the assumptions are unexamined in the literature, and yet these assumptions have implications for assessments of reliability that need to be understood, and that result in paradoxes and abnormalities. This article discusses 13 paradoxes and nine abnormalities to illustrate the 24 assumptions. To facilitate understanding, the analysis focuses on categorical scales with two coders, and further focuses on binary scales where appropriate. The discussion is situated mostly in analysis of communication content. The assumptions and patterns that we will discover will also apply to studies, evaluations and diagnoses in other disciplines with more coders, raters, diagnosticians, or judges using binary or multi-category scales. We will argue that a new index is needed. Before the new index can be established, we need guidelines for using the existing indices. This article will recommend such guidelines.
AB - Intercoder reliability is the most often used quantitative indicator of measurement quality in content studies. Researchers in psychology, sociology, education, medicine, marketing and other disciplines also use reliability to evaluate the quality of diagnosis, tests and other assessments. Many indices of reliability have been recommended for general use. This article analyzes 22, which are organized into 18 chance-adjusted and four non-adjusted indices. The chance-adjusted indices are further organized into three groups, including nine category-based indices, eight distribution-based indices, and one that is double based, on category and distribution. The main purpose of this work is to examine the assumptions behind each index. Most of the assumptions are unexamined in the literature, and yet these assumptions have implications for assessments of reliability that need to be understood, and that result in paradoxes and abnormalities. This article discusses 13 paradoxes and nine abnormalities to illustrate the 24 assumptions. To facilitate understanding, the analysis focuses on categorical scales with two coders, and further focuses on binary scales where appropriate. The discussion is situated mostly in analysis of communication content. The assumptions and patterns that we will discover will also apply to studies, evaluations and diagnoses in other disciplines with more coders, raters, diagnosticians, or judges using binary or multi-category scales. We will argue that a new index is needed. Before the new index can be established, we need guidelines for using the existing indices. This article will recommend such guidelines.
U2 - 10.1080/23808985.2013.11679142
DO - 10.1080/23808985.2013.11679142
M3 - Journal article
SN - 2380-8985
VL - 36
SP - 419
EP - 480
JO - Annals of the International Communication Association
JF - Annals of the International Communication Association
IS - 1
ER -