A Reliability index (a i ) that assumes honest coders and variable randomness

Xinshu Zhao

Research output: Contribution to conferenceConference paper

130 Downloads (Pure)


The performances of six major indices of inter-coder reliability were evaluated against actual judgments of human coders in a behavior-based Monte Carlo (BMC) experiment. The correlations between the indices’ estimated chance agreements (ac) and the observed chance agreements (oac) turned out to be negative for Cohen’s κ, Scott’s π and Krippendorff’s α, and mild although positive for Bennett et al’s S, Perrault and Leigh’s Ir and Gwet’s AC1. While each of the indices was designed to improve on percent agreement, each underperformed percent agreement (ao) when estimating observed true agreement (at) in the BMC experiment.

The poor or negative correlations between the calculated estimates and the observed estimands question the validity of the estimators, namely the indices. The findings support the emerging theory that reliability indices available today assume dishonest coders who deliberately maximize chance coding, and they are therefore unsuitable for typical studies where coders perform chance coding involuntarily when the task is too difficult. A new index or indices are needed.

This manuscript also reports the effort to develop such a new index, agreement index (ai), which assumes honest coders and involuntary chance coding. Subsequent analysis shows that ai is void of the 23 known paradoxes that plague other indices. In the BMC experiment, the chance agreement estimated by ai was by far the best predictor of the observed chance agreement between coders. Index ai also outperformed percent agreement and all other six indices while predicting true agreements among the coders.

Empirical testing of theories and indices should continue, especially by different researchers using different methods, and so should the search for a better index. Until better evidences are available, however, researchers may refrain from using κ, π, and α, and add ai as a reasonable measure of true agreements between two coders on a nominal scale. Online software has been provided at http://reliability.hkbu.edu.hk/ to facilitate calculation.

Original languageEnglish
Publication statusPublished - Aug 2012
EventAssociation for Education in Journalism and Mass Communication (AEJMC) 2012 Annual Conference - Chicago, United States
Duration: 9 Aug 201212 Aug 2012


ConferenceAssociation for Education in Journalism and Mass Communication (AEJMC) 2012 Annual Conference
Country/TerritoryUnited States
Internet address

User-Defined Keywords

  • reliability
  • intercoder reliability
  • interrater reliability
  • agreement index
  • estimator
  • estimate
  • estimand
  • maximum randomness
  • variable randomness
  • behavioral Monte-Carlo experiment
  • BMC
  • simulation-augmented behavior experiment
  • SAB
  • kappa
  • alpha
  • pi


Dive into the research topics of 'A Reliability index (a i ) that assumes honest coders and variable randomness'. Together they form a unique fingerprint.

Cite this