Thread Safety Bathroom

1 minute read

Why Thread Safety Is Like Sharing a Bathroom

Thread safety is what happens when multiple parts of your program try to use the same resource, like roommates sharing one bathroom. Without proper synchronization, you get race conditions, which is the technical term for “everybody loses.” I once debugged a threading issue where two threads were incrementing a counter and somehow it was going down. It’s like two people adding money to a jar and finding less than they started with.

The classic fix is to add locks everywhere, but that’s like solving bathroom conflicts by making everyone wait in a single-file line even when they just need to brush their teeth. Pretty soon your multi-threaded performance improvement has turned into a very complicated way to run things sequentially. At Zenoss, we had so many locks that our multi-threaded code was actually slower than the single-threaded version, like hiring more cooks and ending up with slower service because they’re all waiting to use the only knife.

My favorite threading bug was one that only manifested on Tuesdays. Not every Tuesday, just some Tuesdays, and only in production. Turns out it was related to the number of milliseconds since epoch modulo the thread pool size, which meant it only happened when the timestamp had certain properties. It was like a bug that was superstitious about numerology. The fix was to add proper synchronization, but finding it required the kind of detective work that makes you question your career choices.

Updated: