Let a thousand filters Bloom

16/11/2017 - 12:00

Bloom filters and other approximate membership query data structures (AMQs), are one of the great successes of theoretical computer science, with uses throughout databases, file systems, networks and beyond.  An AMQ maintains a set under insertions, sometimes deletions, and queries with one-sided error: if a queried element is in the set, the AMQ returns \texttt{present}, if it is not, the AMQ returns \texttt{present} with probability at most $\epsilon$. An optimal AMQ uses $n\log\epsilon^{-1}$ bits of space when the represented set has $n$ elements.


Yet there is a gap between their theoretical bounds provided by AMQs and their requirements in the field.  


In this talk, I will describe how AMQs are used in practice and how this changes (for the better) our theoretical understanding of these data structures.



Martin Farach-Colton is a Professor of Computer Science at Rutgers University, New Brunswick, New Jersey. His research focuses on both the theory and practice of external memory and storage systems. He was a pioneer in the theory of cache oblivious analysis. His current research focuses on the use of write optimization to improve performance in both read- and write-intensive big data systems. He has also worked on the algorithmics of strings and metric spaces, with applications to bioinformatics. In addition to his academic work, Professor Farach-Colton has extensive industrial experience. He was CTO and co-founder of Tokutek, a database company that was founded to commercialize his research and acquired in 2015. During 2000–2002, he was a Senior Research Scientist at Google.