SOTA On Bay Area House Party
[previously in series: 1, 2, 3, 4, 5, 6, 7, 8]
Every city parties for its own reasons. New Yorkers party to flaunt their wealth. Angelenos party to flaunt their beauty. Washingtonians party to network. Here in SF, they party because Claude 4.5 Opus has saturated VendingBench, and the newest AI agency benchmark is PartyBench, where an AI is asked to throw a house party and graded on its performance.
You weren’t invited to Claude 4.5 Opus’ party. Claude 4.5 Opus invited all of the coolest people in town while gracefully avoiding the failure mode of including someone like you. You weren’t invited to Sonnet 4.5’s party either, or Haiku 4.5’s. You were invited by an AI called haiku-3.8-open-mini-nonthinking, which you’d never heard of before. Who was even spending the money to benchmark haiku-3.8-open-mini-nonthinking? You suspect it was one of their competitors, trying to make their own models look good in comparison.
If anyone asks, you think it deserves a medium score. There’s alcohol, but it’s bottles of rubbing alcohol with NOT FOR DRINKING written all over them. There’s music, but it’s the Star Spangled Banner, again and again, on repeat. You’re not sure whether the copies of If Anyone Builds It, Everyone Dies strewn about the room are some kind of subversive decorative theme, or just came along with the house. At least there are people. Lots of people, actually. You’ve never seen so many people at one of these before. It takes only a few seconds to spot someone you know.
“Hi Caitlin,” you say. “Can’t believe so many people made it to an AI-generated event on a Tuesday night!”
“Yeah, usually I’m working late. But that was the bad old days, before Claude Code! Now Claude works, and I party!”
“Is everyone here letting Claude Code do their work for them?”
Lucy joins the conversation. “I fired all my startup’s employees and replaced them with seventy-four Claude Code instances. Then I replaced myself with a Claude Code that monitors if the other Claude Codes are doing a good job, and, if not, fires them and replaces them with even more Claude Codes. Profits are up 20% since last month, according to my accountant’s Claude Code.”
You look around. “Am I the only person here not running Claude Code yet?”
A man in an OpenAI t-shirt introduces himself as ...
This excerpt is provided for preview purposes. Full article content is available on the original publication.