Distance-based k-nearest neighbors outlier detection method in large-scale traffic data

Taurus T. Dang, Henry Y T NGAN, Wei Liu

Research output: Chapter in book/report/conference proceedingConference contributionpeer-review

30 Citations (Scopus)

Abstract

This paper presents a k-nearest neighbors (kNN) method to detect outliers in large-scale traffic data collected daily in every modern city. Outliers include hardware and data errors as well as abnormal traffic behaviors. The proposed kNN method detects outliers by exploiting the relationship among neighborhoods in data points. The farther a data point is beyond its neighbors, the more possible the data is an outlier. Traffic data here was recorded in a video format, and converted to spatial-temporal (ST) traffic signals by statistics. The ST signals are then transformed to a two-dimensional (2D) (x, y) -coordinate plane by Principal Component Analysis (PCA) for dimension reduction. The distance-based kNN method is evaluated by unsupervised and semi-supervised approaches. The semi-supervised approach reaches 96.19% accuracy.

Original languageEnglish
Title of host publication2015 IEEE International Conference on Digital Signal Processing, DSP 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages507-510
Number of pages4
ISBN (Electronic)9781479980581, 9781479980581
DOIs
Publication statusPublished - 9 Sep 2015
EventIEEE International Conference on Digital Signal Processing, DSP 2015 - Singapore, Singapore
Duration: 21 Jul 201524 Jul 2015

Publication series

NameInternational Conference on Digital Signal Processing, DSP
Volume2015-September

Conference

ConferenceIEEE International Conference on Digital Signal Processing, DSP 2015
Country/TerritorySingapore
CitySingapore
Period21/07/1524/07/15

Scopus Subject Areas

  • Signal Processing

User-Defined Keywords

  • distance-based
  • kNN
  • large-scale
  • Outlier detection
  • traffic data

Fingerprint

Dive into the research topics of 'Distance-based k-nearest neighbors outlier detection method in large-scale traffic data'. Together they form a unique fingerprint.

Cite this