Outlier detection in Large-Scale traffic data by regression analysis

Philip Lam*, Lili Wang, Henry Y.T. Ngan, Nelson H.C. Yung, Michael K. Ng

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

A robust outlier detection for large-scale traffic data by an unsupervised regression method is proposed in this paper. Traffic data is collected from loops, sensors and digital cameras all around a city every day. The data size is massive and in a big data format. Outlier is regarded as abnormal traffic situation like traffic jams, low traffic flows, or incidents as well as errors and noise in data storage and transmission. The traffic data to be tackled in this paper is represented by spatial temporal (ST) signals. A principle component analysis (PCA) is used for dimension reduction and to generate a representation of (x, y)-coordinates from the first two component's coefficients in the ST signals. The (x, y)-coordinate points of inliers are measured by Standardized Residual (SR), Hat Matrix (HM) and Cook's Distance (CD) in the regression method so that outliers are assumed to have high changes in these three metrics in the best fit regression model. Experimental result of the proposed method for the Level 1 data achieves detection success rates (DSRs) of 97.37% (SR), 91.19% (HM), 94.28% (CD) for linear regression model, respectively, and 96.80% (SR), 89.71% (HM), 93.14% (CD) for quadratic regression model, respectively. For a finer granularity of Level 2 data, the regression method with the CD metric achieves 94.44% DSR.

Scopus Subject Areas

  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications
  • Human-Computer Interaction
  • Software
  • Electrical and Electronic Engineering
  • Atomic and Molecular Physics, and Optics

Fingerprint

Dive into the research topics of 'Outlier detection in Large-Scale traffic data by regression analysis'. Together they form a unique fingerprint.

Cite this