I don’t think it’s unreasonable to assume that in 1-2 years inference speed with have increased enough to allow for “real time” prompting where the agent finishes work in a few seconds instead of a couple minutes. That will certainly change our workflows. Seems like we are in the dial-up era currently.
It's arguably already here, only cost is a concern. We now have an open weights model - =you can throw as much hardware at it as you want to speed it up - at Sonnet 4.5+ level.
Today Anthropic started offering 3x(?) Opus speed at 5x cost as well.
But then models will do more computation, to be slower.
What will have to change are workflows. Why are you ever waiting for the prompt to return? When you send an email, do you stare at your screen until you get a reply?