Bloom Filter

Items added: 0
Bits set: 0 / 64
Fill rate: 0.0%
Theoretical FP: 0.0%
Observed FP: —

Add Element

Test Membership

Array Size (bits) 64

Hash Functions 3

Added Elements

No items added yet

How it works

A Bloom filter is a probabilistic data structure invented by Burton Howard Bloom in 1970. It answers a simple question: “Is this element in the set?” But instead of storing the actual elements, it uses a compact bit array and a set of hash functions. This makes it extraordinarily space-efficient — at the cost of occasionally giving false positive answers.

When you add an element, each of the k hash functions computes an index into the m-bit array, and those bits are set to 1. When you test membership, the same k hash functions are applied; if all k bits are 1, the filter reports “possibly present.” If any bit is 0, the element is definitely absent. The key insight: bits can be set to 1 by different elements, so finding all k bits set doesn’t guarantee the element was actually added — it might be a coincidental overlap.

The false positive probability is approximately (1 − e^−kn/m)^k, where n is the number of inserted elements, m is the array size, and k is the number of hash functions. For a given m and n, the optimal number of hash functions is k = (m/n) · ln 2 ≈ 0.693 · m/n. Too few hash functions means each test checks too few bits, making collisions likely. Too many means the array fills up too fast, also increasing false positives. The optimum sits precisely at the balance point.

Bloom filters appear everywhere in practice. Web browsers use them to check URLs against malware databases without downloading the entire list. Databases use them to avoid expensive disk lookups for keys that don’t exist. Network routers use them for packet classification. Medium uses one to avoid recommending articles you’ve already read. The structure’s elegance lies in the fundamental tradeoff: by accepting a small, controllable probability of error, you gain enormous savings in space and time.