K. L. Clarkson.
Algorithms for Closest-point Problems.
PhD thesis, Stanford University, January 1985.
A scanned version, in three parts:
Part 1 (4.5 Meg)
Part 2 (3.7 Meg)
Part 3 (6 Meg)
Abstract:
This dissertation reports a variety of new algorithms for solving
closest-point problems. The input to these algorithms is a set or
sets of points in d-dimensional
space, with an associated Lp
metric. The problems considered
are:
- The all nearest neighbors
problem. For point set A,
find the nearest neighbors in A of each point in A.
- The nearest foreign neighbor
problem. For point sets A
and B, find the closest point
in B to each point in A.
- The geometric minimum spanning
tree problem. For point set A, find the minimum spanning tree
for the complete weighted undirected graph associated with A, where the vertices of the graph
correspond to the points of A,
and the weight of an edge is the distance between the points defining
the edge.
These problems arise in routing, statistical classification, data
compression, and other areas. Obvious algorithms for them require
a running time quadratic in n,
the number of points in the input. In many cases they can be
solved with algorithms requiring O(n
log O(1)
n) time.
In this work, approximation algorithms for some cases of these problems
have been found. For example, for the minimum spanning tree
problem with the L1
metric, an algorithm has been devised that requires O(n log d(1/rho) time to find a spanning
tree with weight within 1+rho
of the minimum. Several other algorithms have been found with
time bounds dependent on log(1/rho)
for attaining error rho.
Algorithms have also been found that require linear expected time, for
independent identically distributed random input points with a
probability density function satisfying weak conditions. One such
algorithm depends on the fact that under certain conditions, values
that are identically distributed, but dependent, can be bucket sorted
in linear expected time.
An algorithm has been found for the all nearest neighbors problem that
requires O(n log n) expected time for any input set
of points, where the expectation is on the random sampling performed by
the algorithm. This algorithm involves the construction of a new
data structure, a compressed form of digital trie.