Space-efficient probabilistic structures that trade certainty for memory. Answer "is X in the set?" with "definitely NOT" or "probably yes" โ using a fraction of the memory a hash set would need.
Medium Medium FrequencyA Bloom filter is a bit array of m bits (all initially 0) and k independent hash functions. To add an element, hash it with all k functions and set the corresponding bits to 1. To check membership, hash the element and verify all k bits are set. If any bit is 0, the element is definitely not in the set. If all bits are 1, it's probably in the set (false positive possible).
Different elements may hash to the same bit positions. As the filter fills up, more bits are set to 1, increasing the probability that a non-member's bits are all coincidentally set. The false positive rate is: p โ (1 - e^(-kn/m))^k where n = number of elements, m = bits, k = hash functions.
Approximate count of distinct elements (cardinality estimation). Uses ~12KB memory to count billions of unique items with ~0.8% standard error. Used by Redis for unique visitor counting.
Approximate frequency counting in a data stream. Like a 2D Bloom filter โ hash to rows, increment counters. Query returns the minimum across rows. Overcounts slightly, never undercounts.
(m/n) ร ln(2) โ typically 7-11 for common FP ratesFalse positive rate vs memory: 1% FP needs ~10 bits/element (6MB for 5M items). 0.01% FP needs ~20 bits/element (12MB). Tune based on the cost of a false positive โ if it triggers an expensive DB lookup, lower the FP rate.
Deletion support: Standard Bloom filters can't delete (unsetting a bit may affect other elements). Use a Counting Bloom Filter (counters instead of bits) or Cuckoo Filter (supports deletion + better space efficiency at low FP rates).
Scalable Bloom Filters: If you don't know the set size upfront, use a Scalable Bloom Filter โ a chain of increasingly larger filters. New elements go to the latest filter; checks query all filters. Slight overhead from checking multiple filters.
Hash function choice: Use fast, uniform hash functions โ MurmurHash3, xxHash, or FNV. You don't need cryptographic hashes (SHA/MD5). The "double hashing" trick: compute only h1(x) and h2(x), derive k hashes as hi(x) = h1(x) + i ร h2(x).
Bloom filters shine when most lookups are expected to be misses and you need to avoid expensive operations for those misses.
Interview signal: Explain the asymmetry โ false negatives would be catastrophic (missing a malicious URL) but false positives are cheap (just do one extra API check). This shows you understand why the tradeoff works.
PFADD / PFCOUNT). 12KB memory to count billions of unique elements with ~0.8% error.| Metric | Value |
|---|---|
| Hash set: 1B URLs ร 64 bytes | ~64 GB |
| Bloom filter: 1B URLs ร 10 bits (1% FP) | ~1.2 GB |
| Memory savings vs hash set | ~50ร less |
| Chrome Safe Browsing filter size | ~30 MB (5M URLs) |
| Optimal hash functions (1% FP) | K = 7 |
| Optimal hash functions (0.01% FP) | K = 11 |
| HyperLogLog memory for billions of items | ~12 KB |
| Lookup time | O(k) hash computations, ~ns |