May 21, 2025

2k tokens per second

I'm still catching up on the panoply of releases and demos announced at Google I/O this week. This demo of a coding agent producing 2,000 tokens per second caught my eye and hasn't gotten as much attention as I think it deserves:

Not only is this technologically impressive; it has significant implications for the user experience around generative app experiences, aka vibe coding.

Using a text-to-product app today like Replit or Lovable feels magical, but it often includes long wait times. For example, I demoed Lovable the other day to family members and had to wait a minute or more for Lovable to produce a working prototype in the browser. If Lovable were to incorporate Google's Gemini Diffusion model, the working prototype might appear in seconds.

This will make first-time user experiences of text-to-product apps even more magical, but it should also change how vibe coding works.

If major changes to a codebase can be instant, we can allow users to iterate many more times on core aspects of product functionality. This means an overall better experience of vibe coding (instant feedback) and, ultimately, superior products.

The author above also notes that the diffusion approach supports better "non-causal reasoning" within the generation, which should improve the overall output. That means larger changes while vibe coding will be less likely to break existing functionality.

So overall, this diffusion approach should be a major accelerant to vibe coders and vibe coding platforms.