Some new developments in hypothesis testing for high-dimensional data

Project: Research project

Project Details

Description

The recent flood of high-dimensional data has posed new challenges to the traditional statistical and computational methods. For high-dimensional data, due to the small sample size, there is a large amount of uncertainty associated with the standard estimates of variances and covariances. As a consequence, the classical testing procedures including Student's t test and Hotelling's T2 test may not be reliable, or even applicable, for high-dimensional data analysis. To overcome the problems, researchers have proposed a number of regularization methods to improve the estimation of variances and covariances, and then applied the new estimates to various testing problems to improve the literature.

In this research plan, we propose to develop some new test methods for high- dimensional data. Our first project is to propose a new category of the test methods to further advance the existing literature on high-dimensional multivariate testing. Specifically, we will consider a pairwise Hotelling's test to provide a compromise and hence fill the gap between the existing categories of the test methods. For both non- sparse and sparse cases, we will establish the asymptotic null distributions of the test statistics, derive their asymptotic power, evaluate the finite sample performance and compare them with the existing methods. Our second project is to propose a regularized t distribution and apply it to high-dimensional multiple testing. To advance the research goals, we will study the statistical properties of the regularized t distribution, explore the effects of the regularization parameters on the performance of the regularized t test, define a non-central regularized t distribution, and assess the statistical power of the test in both theory and simulation. To cater for the demands of the application, we will also develop two R packages for the proposed testing methods and make them freely available in the R statistical program. Our proposed test methods are expected to have wide applications in different areas including statistical genetics, epidemiology, ecology, and engineering sciences.
StatusFinished
Effective start/end date1/12/1831/05/22

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.