Order Statistic Filter (OSF): A novel approach to document analysis

Hong Ma, Jie Zhou, Li Ma, Yuan Y. Tang*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

2 Citations (Scopus)


Page segmentation is one of the important and basic research subjects of document analysis. There are two major kinds of page segmentation methods, i.e. hierarchical and no-hierarchical ones. Most traditional techniques such as top–down and bottom–up approaches belong to the hierarchical method. Though these two approaches have been used till now, they are not effective for processing documents with high geometric complexity and the process of splitting document needs iterative operations which is time consuming. A non-hierarchical method called the modified fractal signature (MFS) was presented in recent years. It can overcome the above weaknesses, however the MFS needs to calculate modified fractal signature which makes the theory very complex. In this thesis, we present a new page segmentation approach: Median Order Statistic Filter (MedOSF) — Maximum Order Statistic Filter (MaxOSF) approach which is more direct and much simpler. We use the MedOSF to remove the salt–pepper noise of the document and use the MaxOSF to do the page segmentation. In practice, they not only can adaptively process the documents with high geometrical complexity, but also save a lot of computing time.

Original languageEnglish
Pages (from-to)551-571
Number of pages21
JournalInternational Journal of Pattern Recognition and Artificial Intelligence
Issue number5
Publication statusPublished - Aug 2002
Externally publishedYes

Scopus Subject Areas

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

User-Defined Keywords

  • Document analysis
  • Influence function
  • Order statistic filter
  • Page segmentation
  • Robustness
  • Statistic functional


Dive into the research topics of 'Order Statistic Filter (OSF): A novel approach to document analysis'. Together they form a unique fingerprint.

Cite this