Some new approaches to the heavy hitters problem
In the 'frequent items' problem one sees a sequence of items in a stream (e.g. a stream of words coming into a search query engine like Google) and wants to report a small list of items containing all frequent items. In the 'change detection' problem one sees two streams, say one from yesterday and one from today, and wants to report a small list of items containing all those whose frequencies changed significantly. For both of these problems, we would like algorithms that use memory substantially sublinear in the length of the stream.
We describe new state-of-the-art solutions to both problems. For the former, we make use of chaining methods to control the suprema of Rademacher processes to develop an algorithm BPTree with provably near-optimal memory consumption. For the latter, we develop the first
algorithm to simultaneously achieve asymptotically optimal space, fast query and update time, and high success probability (over the random coins flipped by the algorithm) by reducing the problem to a certain graph partitioning problem, which we then give a new algorithm for.
Based on joint works with Vladimir Braverman, Stephen Chestnut, Nikita Ivkin, Kasper Green Larsen, Huy Le Nguyen, Mikkel Thorup, Zhengyu Wang, and David P. Woodruff.