TY - JOUR
T1 - Quantizing Heavy-tailed Data in Statistical Estimation
T2 - (Near) Minimax Rates, Covariate Quantization, and Uniform Recovery
AU - Chen, Junren
AU - Ng, Michael K.
AU - Wang, Di
N1 - The work of Junren Chen was supported by the Hong Kong Ph.D. Fellowship from the Hong Kong Research Grants Council (HKRGC). The work of Michael K. Ng was supported in part by the HKRGC GRF under Grant 17201020, Grant 17300021, and Grant CRF C7004-21GF; and in part by the Joint NSFC-RGC under Grant N-HKU76921. The work of Di Wang was supported in part by the baseline funding under Grant BAS/1/1689-01-01; in part by CRG through the Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), under Grant URF/1/4663-01-01, Grant REI/1/5232-01-01, Grant REI/1/5332-01-01, and Grant FCC/1/1976-49-01; and in part by the SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA-KAUST AI) under Grant RGC/3/4816-09-01.
PY - 2024/3
Y1 - 2024/3
N2 - Modern datasets often exhibit heavy-tailed behavior, while quantization is inevitable in digital signal processing and many machine learning problems. This paper studies the quantization of heavy-tailed data in several fundamental statistical estimation problems where the underlying distributions have bounded moments of some order (no greater than 4). We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error could be achieved by computationally tractable estimators based on the quantized data produced by the proposed scheme. In particular, concrete results are worked out for covariance estimation, compressed sensing (also interpreted as sparse linear regression), and matrix completion, all agreeing that the quantization only slightly worsens the multiplicative factor. Additionally, while prior results focused on the quantization of responses (i.e., measurements), we study compressed sensing where the covariates (i.e., sensing vectors) are also quantized; in this case, though our recovery program is non-convex (since our covariance matrix estimator lacks positive semi-definiteness), we prove that all local minimizers enjoy near-optimal estimation error. Moreover, by the concentration inequality of the product process and a covering argument, we establish a near minimax uniform recovery guarantee for quantized compressed sensing with heavy-tailed noise. Finally, numerical simulations are provided to corroborate our theoretical results.
AB - Modern datasets often exhibit heavy-tailed behavior, while quantization is inevitable in digital signal processing and many machine learning problems. This paper studies the quantization of heavy-tailed data in several fundamental statistical estimation problems where the underlying distributions have bounded moments of some order (no greater than 4). We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error could be achieved by computationally tractable estimators based on the quantized data produced by the proposed scheme. In particular, concrete results are worked out for covariance estimation, compressed sensing (also interpreted as sparse linear regression), and matrix completion, all agreeing that the quantization only slightly worsens the multiplicative factor. Additionally, while prior results focused on the quantization of responses (i.e., measurements), we study compressed sensing where the covariates (i.e., sensing vectors) are also quantized; in this case, though our recovery program is non-convex (since our covariance matrix estimator lacks positive semi-definiteness), we prove that all local minimizers enjoy near-optimal estimation error. Moreover, by the concentration inequality of the product process and a covering argument, we establish a near minimax uniform recovery guarantee for quantized compressed sensing with heavy-tailed noise. Finally, numerical simulations are provided to corroborate our theoretical results.
KW - Quantization
KW - heavy-tailed distribution
KW - estimation
KW - compressed sensing
UR - http://www.scopus.com/inward/record.url?scp=85181841660&partnerID=8YFLogxK
U2 - 10.1109/TIT.2023.3329240
DO - 10.1109/TIT.2023.3329240
M3 - Journal article
AN - SCOPUS:85181841660
SN - 0018-9448
VL - 70
SP - 2003
EP - 2038
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 3
ER -