Skip Navigation
York U: Redefine the PossibleHOME | Current Students | Faculty & Staff | Research | International
Search »FacultiesLibrariesCampus MapsYork U OrganizationDirectorySite Index
Future Students, Alumni & Visitors
2013 Technical Reports

Meaningful Keyword Search in RDBMS

Mehdi Kargar, Aijun An, Parke Godfrey, Jaroslaw Szlichta and Xiaohui Yu

Technical Report CSE-2013-03

York University

February 4 2013

Abstract

Keyword search over relational databases offer an alternative way to SQL to query and explore databases that is effective for lay users who may not be well versed in SQL or the database schema. This becomes more pertinent for databases with large and complex schemas. An answer in this context is a join tree spanning tuples containing the query keywords. As many answers can result of varying quality, and the user is often only interested in seeing the top-k answers, how to gauge the relevance of answers to rank them is of paramount importance. We focus on the relevance of join trees that consulate answers as the fundamental means to rank them. We devise means to measure relevance of relations and foreign keys in the schema over the information content of the database. This can be done offline with no need for external models. We compare against a gold standard that we create from a real workload over TPC-E and prove the effectiveness of our measures. Finally, we test performance of our measures against existing techniques to demonstrate a marked improvement, and perform a user study to establish naturalness of the ranking.

Download paper in PDF format.



The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.