Outlier detection in Large-Scale traffic data by regression analysis

Philip Lam*, Lili Wang, Henry Y T NGAN, Nelson H.C. Yung, Kwok Po NG

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review


A robust outlier detection for large-scale traffic data by an unsupervised regression method is proposed in this paper. Traffic data is collected from loops, sensors and digital cameras all around a city every day. The data size is massive and in a big data format. Outlier is regarded as abnormal traffic situation like traffic jams, low traffic flows, or incidents as well as errors and noise in data storage and transmission. The traffic data to be tackled in this paper is represented by spatial temporal (ST) signals. A principle component analysis (PCA) is used for dimension reduction and to generate a representation of (x, y)-coordinates from the first two component's coefficients in the ST signals. The (x, y)-coordinate points of inliers are measured by Standardized Residual (SR), Hat Matrix (HM) and Cook's Distance (CD) in the regression method so that outliers are assumed to have high changes in these three metrics in the best fit regression model. Experimental result of the proposed method for the Level 1 data achieves detection success rates (DSRs) of 97.37% (SR), 91.19% (HM), 94.28% (CD) for linear regression model, respectively, and 96.80% (SR), 89.71% (HM), 93.14% (CD) for quadratic regression model, respectively. For a finer granularity of Level 2 data, the regression method with the CD metric achieves 94.44% DSR.

Original languageEnglish
Pages (from-to)1271-1274
Number of pages4
JournalIS and T International Symposium on Electronic Imaging Science and Technology
Publication statusPublished - 2018
EventIntelligent Robotics and Industrial Applications using Computer Vision 2018, IRIACV 2018 - Burlingame, United States
Duration: 28 Jan 20181 Feb 2018

Scopus Subject Areas

  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications
  • Human-Computer Interaction
  • Software
  • Electrical and Electronic Engineering
  • Atomic and Molecular Physics, and Optics


Dive into the research topics of 'Outlier detection in Large-Scale traffic data by regression analysis'. Together they form a unique fingerprint.

Cite this