Skip Navigation
York U: Redefine the PossibleHOME | Current Students | Faculty & Staff | Research | International
Search »FacultiesLibrariesCampus MapsYork U OrganizationDirectorySite Index
Future Students, Alumni & Visitors
2004 Technical Reports

Maximal Vector Computation in Large Data Sets

Parke Godfrey, Ryan Shipley and Jarek Gryz

Technical Report CS-2004-06

York University

December 2004

Abstract

Finding the maximals in a collection of vectors is relevant to many applications. The maximal set is related to the convex hull (and hence, linear optimization) and nearest neighbors. The maximal vector problem has resurfaced with the advent of skyline queries for relational databases and skyline algorithms that are external and relationally well behaved.

The initial algorithms proposed for maximals are based on divide-and-conquer. These established good average and worst case asymptotic running times, showing it to be O(N) average-case, where N is the number of vectors. However, they are not amenable to externalizing. We prove their performance is quite bad with respect to the dimensionality, K, of the problem. We demonstrate that the more recent external skyline algorithms are actually better behaved, although they do not have as good an apparent asymptotic complexity. We introduce a new external algorithm, LESS, that combines the best features of these, experimentally evaluate its effectiveness and improvement over the field, and prove its average-case running time is O(KN).

Download paper in PDF format.



The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.