Deepdive: How 10 tech companies choose the next generation of dev tools
Right now, it seems like almost every tech company is changing its developer tooling stack, which is a big shift from eighteen months ago when the answer to “what to use for AI-assisted coding?” was simple: buy a GitHub Copilot license and boot up ChatGPT. In our AI tooling survey in 2024, those two tools racked up more mentions than all the others combined.
But no more. Today, a plethora of tools outpace Copilot in various ways, like Cursor, Claude Code, Codex, and Gemini CLI, and there’s also AI code review tools like CodeRabbit, Graphite, and Greptile, not to mention all the MCP integrations which plug into agentic tools.
So, for this deepdive I asked 10 tech companies which tools their engineers use and, crucially, how they made their choices from among all the options. These businesses range from a 5-person seed-stage startup, to one that employs 1,500 people and is publicly listed. All are anonymous, except for Wealthsimple and WeTravel. WeTravel has also kindly shared the most detailed measurement framework I’ve yet seen.
We cover:
Speed, trust, & show-and-tell: how small teams select tools: At places with fewer than ~60 engineers, tooling decisions are fast and informal: developers try them for a couple of weeks and those which “stick” win.
How mid-to-large companies choose: bureaucracy, security, and vendor lock-ins. At companies with ~150 engineers, adoption is considerably slowed down by security reviews, compliance requirements, and executive-level budgetary considerations.
Measurement problem: metrics are needed but none work. Every workplace struggles to prove its AI tools work, and common metrics like lines-of-code-generated are distrusted by engineers.
How Wealthsimple measured and decided. The flagship Canadian consumer fintech ran a 2-month selection process to choose an AI code review tool. Rolling out Claude Code to all engineers was a decision made by the CTO, backed with a mix of personal conviction, validated by usage data from Jellyfish.
How one company accurately measures code review usefulness. WeTravel built a structured -3 to +3 scoring system across five dimensions, with five engineers evaluating ~100 comments. They found no AI code reviewer suitable for their codebase.
Comparative measurements at a large fintech. A team ran Copilot, Claude, and Cursor simultaneously across ~50 PRs, scoring ~450 comments. They found Cursor reviews the most precise, Claude the most balanced, and Copilot the most quality-focused.
Common patterns. Developer trust drives adoption more than mandates, the Copilot → Cursor
...
This excerpt is provided for preview purposes. Full article content is available on the original publication.