TY - GEN
T1 - A ConceptLink graph for text structure mining
AU - Chau, Rowena
AU - Tsoi, Ah Chung
AU - Hagenbuchner, Markus
AU - Lee, Vincent
PY - 2009/1
Y1 - 2009/1
N2 - Most text mining methods are based on representing documents using a vector space model, commonly known as a bag of word model, where each document is modeled as a linear vector representing the occurrence of independent words in the text corpus. It is well known that using this vector-based representation, important information, such as semantic relationship among concepts, is lost. This paper proposes a novel text representation model called ConceptLink graph. The ConceptLink graph does not only represent the content of the document, but also captures some of its underlying semantic structure in terms of the relationships among concepts. The ConceptLink graph is constructed in two main stages. First, we find a set of concepts by clustering conceptually related terms using the self-organizing map method. Secondly, by mapping each document’s content to concept, we generate a graph of concepts based on the occurrences of concepts using a singular value decomposition technique. The ConceptLink graph will overcome the keyword independence limitation in the vector space model to take advantage of the implicit concept relationships exhibit in all natural language texts. As an information-rich text representation model, the ConceptLink graph will advance text mining technology beyond feature-based to structure-based knowledge discovery. We will illustrate the ConceptLink graph method using samples generated from benchmark text mining dataset.
AB - Most text mining methods are based on representing documents using a vector space model, commonly known as a bag of word model, where each document is modeled as a linear vector representing the occurrence of independent words in the text corpus. It is well known that using this vector-based representation, important information, such as semantic relationship among concepts, is lost. This paper proposes a novel text representation model called ConceptLink graph. The ConceptLink graph does not only represent the content of the document, but also captures some of its underlying semantic structure in terms of the relationships among concepts. The ConceptLink graph is constructed in two main stages. First, we find a set of concepts by clustering conceptually related terms using the self-organizing map method. Secondly, by mapping each document’s content to concept, we generate a graph of concepts based on the occurrences of concepts using a singular value decomposition technique. The ConceptLink graph will overcome the keyword independence limitation in the vector space model to take advantage of the implicit concept relationships exhibit in all natural language texts. As an information-rich text representation model, the ConceptLink graph will advance text mining technology beyond feature-based to structure-based knowledge discovery. We will illustrate the ConceptLink graph method using samples generated from benchmark text mining dataset.
M3 - Conference proceeding
SN - 9781920682729
T3 - Conferences in Research and Practice in Information Technology Series
BT - Proceedings of 32nd Australasian Computer Science Conference, ACSC 2009
PB - Australasian Computer Science Conference
T2 - Australsian Computer Science Conference (ACSC 2009)
Y2 - 19 January 2009 through 23 January 2009
ER -