Semantic Indexing for Keyword Search on Graphs

Project: Research project

Project Details


Tremendous effort is spent on creating knowledge networks, such as the open source knowledge base (e.g., YAGO), structured data of Wikipedia (e.g., DBpedia), collaborative knowledge bases (e.g., Freebase) and Google knowledge networks. Ontology information is often associated with those networks that encode the definitions of and semantic relationships between entities and the like. Many emerging applications have been recently built on top of knowledge networks.

In this proposal, we investigate a novel, generic approach that exploits ontology information to index graphs to improve query efficiency. We call it semantic indexing. While semantic indexing is a general concept, to illustrate its potentials, this proposal focuses on keyword search on graphs, which is fundamental to many recent applications and easy-to-use. In Task 1, we propose the index structure that exploits ontologies. We show that, first, some semantically relevant entities can be generalized to their same supertypes/superclasses, and that generalization makes the entities and their relationships less heterogeneous for indexing. Second, we summarize the generalized graph into a smaller graph. These two steps are repeated alternately until the summary graph is small enough. Our proposed semantic index is then the hierarchy of these summary graphs.

In Task 2, we propose to integrate the current state-of-the-art of algorithms into semantic indexing. The core of existing algorithms comprise backward traversals of the knowledge graph, distance queries of entities and enumerations of entities that match the keyword queries, among others. Since the index consists of yet another graphs, we expect the integrations are feasible and efficient.

Ontology information provides opportunities for optimizing modern applications of keyword searches. In Task 3, we investigate efficient algorithms for exploratory data analysis (EDA) via keyword searches. In EDA, users may issue a query and examine its answers, and then refine that query to form the next (semantically related) query. Formally, we define a semantically related query of another query if they are different in one keyword, where these keywords are closely connected in the ontology graph. Last, in Task 4, we study efficient keyword queries on multiple graphs and ontologies. There are domain-specific knowledge networks such as PubMed and Gene Ontology. Furthermore, users may have their own private knowledge networks. Hence, we study efficient approximate query algorithms in the presence of additional semantic information.

We expect this project offers a new indexing approach for keyword search. It improves the efficiency of the current state-of-the-art methods, and subsequently, their applications.
Effective start/end date1/01/2030/06/23


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.