Optimal Euclidean metric compression

26/12/2019 - 12:00

In the metric compression problem, we are given n points in a metric space, and the goal is to construct a compact representation (sketch) of the points, such that the distance between every pair can be approximately recovered from the sketch, up to a small distortion of (1 +/- epsilon). Such sketches are widely used for fast nearest neighbor search in high-dimensional Euclidean spaces.

We give a new algorithm for sketching Euclidean metric spaces. It provably achieves the optimal compression bound, improving over the classical dimension reduction theorem of Johnson and Lindenstrauss. In particular, while the latter theorem represents each point by log(n)/epsilon^2 *coordinates* (each containing a multi-bit number), we show that log(n)/epsilon^2 *bits* are both sufficient and necessary. Empirically, our algorithm either matches or improves over state-of-the-art heuristics.

Based on joint works with Piotr Indyk and Ilya Razenshteyn.