Abstract
Page segmentation is one of the important and basic research subjects of document analysis. There are two major kinds of page segmentation methods, i.e. hierarchical and no-hierarchical ones. Most traditional techniques such as top–down and bottom–up approaches belong to the hierarchical method. Though these two approaches have been used till now, they are not effective for processing documents with high geometric complexity and the process of splitting document needs iterative operations which is time consuming. A non-hierarchical method called the modified fractal signature (MFS) was presented in recent years. It can overcome the above weaknesses, however the MFS needs to calculate modified fractal signature which makes the theory very complex. In this thesis, we present a new page segmentation approach: Median Order Statistic Filter (MedOSF) — Maximum Order Statistic Filter (MaxOSF) approach which is more direct and much simpler. We use the MedOSF to remove the salt–pepper noise of the document and use the MaxOSF to do the page segmentation. In practice, they not only can adaptively process the documents with high geometrical complexity, but also save a lot of computing time.
Original language | English |
---|---|
Pages (from-to) | 551-571 |
Number of pages | 21 |
Journal | International Journal of Pattern Recognition and Artificial Intelligence |
Volume | 16 |
Issue number | 5 |
DOIs | |
Publication status | Published - Aug 2002 |
Externally published | Yes |
Scopus Subject Areas
- Software
- Computer Vision and Pattern Recognition
- Artificial Intelligence
User-Defined Keywords
- Document analysis
- Influence function
- Order statistic filter
- Page segmentation
- Robustness
- Statistic functional