From the Build · 2 min · March 12, 2026

Latency Is a Feature, Not a Footnote

The first thing real users teach you is that correctness is not enough. A right answer that arrives too late is, functionally, a wrong one. The user has already tabbed away, already lost the thread, already decided your product is the kind of thing you wait on. You can win the benchmark and lose the moment.

We treat latency as an engineering footnote, something to optimize later, once the logic is right. But for anything a person uses in real time, the wait is the product. It is the part they actually experience, breath by breath, while the clever part happens out of sight.

Most latency is self-inflicted

The uncomfortable truth is that most agent latency is not the model's fault. It is ours. Sequential calls that could have run in parallel. Prompts that grew, paragraph by paragraph, because no one ever decided they should; they just accreted. A second model invocation to check the first, added in a moment of caution and never removed.

A correct answer that arrives too late is not a good answer. It's a missed one.

None of these show up in a correctness eval. They only show up in the place that matters least to a spreadsheet and most to a human: the silence between asking and answering.

Designing for the wait

So we try to make the wait itself part of the design. Stream early, so the user sees thinking before they see the conclusion. Parallelize aggressively, and treat every new sequential step as a cost that has to justify itself. Be suspicious of any change that makes the prompt longer, because longer prompts are slower prompts, and slower is a feeling, not a number.

Speed is not the opposite of quality. For a product someone is waiting on, it is part of what quality means.

product engineering

X LinkedIn Email

רוצה את המאמר הבא במייל?