False Positives Won't Kill You, But Mushrooms Might
The Peterson field guide uses a dichotomous key: check gill attachment, check spore colour, check cap texture. At each decision point, you’re narrowing the possibility space. But here’s the problem with mushroom identification—a false positive (“yes, this is edible”) can be lethal. A false negative (“I’m not sure, leave it”) just means you walk home hungry.
Burton Howard Bloom, working at MIT in 1970, invented a data structure with exactly this asymmetry. A Bloom filter can tell you “definitely not in set” or “probably in set”—never the reverse. It hashes each element into a bit array using multiple hash functions. To check membership, you verify all those bits are set. If any bit is zero: definitely absent. If all bits are one: probably present, but possibly a collision.
Foragers call this the “when in doubt, throw it out” principle. The Bloom filter formalizes it mathematically.
import mmh3
class BloomFilter:
def __init__(self, size=256, hashes=3):
self.bits = [False] * size
self.hashes = hashes
self.size = size
def add(self, item):
for seed in range(self.hashes):
self.bits[mmh3.hash(item, seed) % self.size] = True
def might_contain(self, item):
return all(self.bits[mmh3.hash(item, seed) % self.size]
for seed in range(self.hashes))
safe = BloomFilter()
for species in ["chanterelle", "morel", "puffball", "oyster"]:
safe.add(species)
print(safe.might_contain("chanterelle")) # True - probably safe
print(safe.might_contain("amanita")) # False - DEFINITELY not safe
The Zig version shows the same logic with compile-time hash count:
const std = @import("std");
fn BloomFilter(comptime size: usize, comptime hashes: u32) type {
return struct {
bits: [size]bool = [_]bool{false} ** size,
pub fn add(self: *@This(), item: []const u8) void {
for (0..hashes) |seed| {
const h = std.hash.Wyhash.hash(@intCast(seed), item);
self.bits[h % size] = true;
}
}
pub fn mightContain(self: *const @This(), item: []const u8) bool {
for (0..hashes) |seed| {
const h = std.hash.Wyhash.hash(@intCast(seed), item);
if (!self.bits[h % size]) return false;
}
return true;
}
};
}
The trade-off is space versus certainty. More bits and more hash functions reduce false positives, but a Bloom filter can never shrink or delete—just like my growing list of “definitely don’t eat” specimens after yesterday’s spore prints came back wrong.