## Abstract

Constructing confidence interval (CI) for functions of cell probabilities (e.g., rate difference, rate ratio and odds ratio) is a standard procedure for categorical data analysis in clinical trials and medical studies. In the presence of incomplete data, existing methods could be problematic. For example, the inverse of the observed information matrix may not exist and the asymptotic CIs based on delta methods are hence not available. Even though the inverse of the observed information matrix exists, the large-sample delta methods are generally not reliable in small-sample studies. In addition, existing expectation-maximization (EM) algorithm via the conventional data augmentation (DA) may suffer from slow convergence due to the introduction of too many latent variables. In this article, for r × c tables with incomplete data, we propose a novel DA scheme that requires fewer latent variables and this will consequently lead to a more efficient EM algorithm. We present two bootstrap-type CIs for parameters of interest via the new EM algorithm with and without the normality assumption. For r × c tables with only one incomplete/supplementary margin, the improved EM algorithm converges in only one step and the associated maximum likelihood estimates can hence be obtained in closed form. Theoretical and simulation results showed that the proposed EM algorithm outperforms the existing EM algorithm. Three real data from a neurological study, a rheumatoid arthritis study and a wheeze study are used to illustrate the methodologies.

Original language | English |
---|---|

Pages (from-to) | 2919-2933 |

Number of pages | 15 |

Journal | Computational Statistics and Data Analysis |

Volume | 51 |

Issue number | 6 |

DOIs | |

Publication status | Published - 1 Mar 2007 |

## Scopus Subject Areas

- Statistics and Probability
- Computational Mathematics
- Computational Theory and Mathematics
- Applied Mathematics

## User-Defined Keywords

- Bootstrap
- Confidence interval
- Convergence rate
- Data augmentation
- EM algorithm
- Incomplete data
- Paired binary data
- Small sample size