Estimating gradient trees

Ming-Yen Cheng, Peter Hall, John A. Hartigan

Research output: Chapter in book/report/conference proceedingChapterpeer-review

Abstract

With applications to cluster analysis in mind, we suggest new approaches to constructing tree diagrams that describe associations among points in a scatterplot. Our most basic tree diagram results in two data points being associated with one another if and only if their respective curves of steepest ascent up the density or intensity surface lead toward the same mode. The representation, in the sample space, of the set of steepest ascent curves corresponding to the data, is called the gradient tree. It has a regular, octopuslike structure, and is consistently estimated by its analogue computed from a nonparametric estimator which gives consistent estimation of both the density surface and its derivatives. We also suggest ‘forests’, in which data are linked by line segments which represent good approximations to portions of the population gradient tree. A forest is closely related to a minimum spanning tree, or MST, defined as the graph of minimum total length connecting all sample points. However, forests use a larger bandwidth for constructing the density-surface estimate than is implicit in the MST, with the result that they are substantially more orderly and are more readily interpreted. The effective bandwidth for the MST is so small that even the corresponding density-surface estimate, let alone its derivatives, is inconsistent. As a result, relationships that are suggested by the MST can change considerably if relatively small quantities of data are added or removed. Our trees and forests do not suffer from this problem. They are related to the concept of gradient traces, introduced by Wegman, Carr and Luo (1993) and Wegman and Carr (1993) for purposes quite different from our own.
Original languageEnglish
Title of host publicationA festschrift for Herman Rubin
EditorsAnirban DasGupta
PublisherInstitute of Mathematical Statistics
Pages237-249
Number of pages13
ISBN (Print)0940600617
DOIs
Publication statusPublished - 1 Jan 2004

Publication series

NameInstitute of Mathematical Statistics Lecture Notes - Monograph Series
Volume45

User-Defined Keywords

  • density ascent line
  • Density estimation
  • forest
  • gradient trace
  • minimum spanning tree
  • nearest neighbour methods
  • ridge estimation
  • tree diagram

Fingerprint

Dive into the research topics of 'Estimating gradient trees'. Together they form a unique fingerprint.

Cite this