More

bluesnowmonkey · 2026-03-23T03:29:57 1774236597

I recently started having my AI assistant help clean up my email gradually. (Using stumpy.ai for what it's worth.)

The way I do it is every morning we go through recent emails in my inbox one at a time. If I want to mark it as spam, delete it, add it to my calendar, whatever, I explain to the agent why in detail. Over time it builds up an understanding of how I handle a lot of things, it needs to show me less and less, and it handles more and more on its own.

I also told the assistant to check my email on its own once per hour and auto-action what it can. That helps keep junk from building up, and it alerts me via SMS if something high priority shows up (e.g. user reporting a bug).

Point is there was never a point where it just ran for a long time and magically cleaned everything up just how I'd have wanted. I have like 7k emails in my inbox, that wouldn't be practical. But the number is going down now gradually, instead of up. I've had a chance to teach it and let it establish trust that it's doing things the right way. Which feels safer.

driftnode · 2026-03-23T15:49:09 1774280949

this is the approach that actually makes sense to me. gradual trust not yolo from day one. curious though, can you see what it learned about your patterns or is it a black box? like if it starts auto-archiving something you actually wanted, how do you debug that

bluesnowmonkey · 2026-03-17T21:48:50 1773784130

This one is taking them a while to fix.

tills13 · 2026-03-17T22:13:42 1773785622

Have they tried asking Claude to fix it no mistakes?

kylehotchkiss · 2026-03-17T23:32:24 1773790344

this explains the 98% uptime

bluesnowmonkey · 2026-02-21T22:34:21 1771713261

Agreed. People aren’t ready for this, even (maybe especially) on HN.

Everyone’s hung up on how nobody really does waterfall. Or course. But a LOT of people are vibing their code and making PRs and then getting buried in code reviews. Just like the article says, you can’t keep up that way. Obviously. Only agents can review code as fast as agents write it. But I find as of recently that agents review code better than people now, just like how they write it better. Gotta lean into it!

g-b-r · 2026-02-22T06:19:19 1771741159

> Only agents can review code as fast as agents write it

Which means that their writing speed is misleading, and AI or not AI you can only produce quality software at the speed that humans can review it.

It's never been hard for a computer to write gibberish very fast.

skeeter2020 · 2026-02-21T22:50:40 1771714240

>> But I find as of recently that agents review code better than people now, just like how they write it better.

Let me guess: you're building a system that uses AI agents to replace all the PR-type tasks most of us waste their time completing?

bluesnowmonkey · 2026-02-21T19:33:44 1771702424

> They don't know that the reason you price things the way you do is rooted in a competitive dynamic that never got written down anywhere.

So maybe you should write it down?

I see this going differently than they do. An exoskeleton that makes you 5% stronger is not a game changer. Companies that lean fully into agents won’t just be “humans, but a little better.” They’ll move orders of magnitude faster, make decisions in less time, do it ask more efficiently. It will be no contest.

bluesnowmonkey · 2026-02-21T16:24:28 1771691068

Depending on what you mean by claw-like, stumpy.ai is close. But it’s more security focused. Starts with “what can we let it do safely” instead of giving something shell access and then trying to lock it down after the fact.

bluesnowmonkey · 2026-02-19T15:34:04 1771515244

More than 20x actually. According to ccusage I’ve consumed the equivalent of $4500 worth of API tokens in the last 30 days on my $200 subscription.

bluesnowmonkey · 2026-02-18T15:05:52 1771427152

For one thing they were just early. Whatever measurements people made of AI six months ago are invalid. It’s a different animal now.

Plus you get a wildly different payoff the more you can take humans completely out of the loop. If it writes the code but humans review, you’re still bottleneck. If it designs and codes and reviews and goes back to designing, and so on, there’s no effective speed limit.

Big businesses aren’t going to work that way though. Which is why we shouldn’t be looking to them as thought leaders right now.

2026-02-18T15:07:37 1771427257

[dead]

threethirtytwo · 2026-02-18T15:18:44 1771427924

That's because you're getting left behind. The technology is outpacing you because most likely you're not using it right. Also likely you're not in an environment that pushes you to use it right so you just give it half assed attempts, never putting the initial effort to up your game with AI.

At my company, if you don't use AI, you're productivity will be much slower than everyone else and that will result in you getting fired. The expectation is 3-4 PRs a day per person.

2026-02-18T15:22:26 1771428146

[dead]

tstrimple · 2026-02-19T03:27:29 1771471649

Keep fighting against the looms. But the looms always win.

threethirtytwo · 2026-02-18T18:21:29 1771438889

Bro no need to be snarky. You're not useless to the economy. You're in the process of becoming more and more useless. Unlikely to be completely useless but AI is for sure eating away your job. Denying it and acting like this is just delusional coping.

I'm not singling you out. This applies to all of us, you, me, everyone.

bluesnowmonkey · 2026-02-18T13:07:03 1771420023

Author here. I've been a programmer for 25 years. Elixir, C, Ruby, PHP, Python, FoxPro. About a month ago I stopped writing code entirely and switched to designing through conversation with AI agents.

The 60x number is real but I know it'll be controversial. It's lines of code in a month vs. what I'd produce in a year by hand. Your mileage will vary. I'm not claiming everyone gets this. I'm saying the range of individual experiences is so wide that averages are meaningless.

Yes, this is another post about vibe coding. But it's a real product with real users at this point, more than a weekend project, and I think the health and communication effects are worth talking about even if the productivity claim doesn't land for you.

Happy to answer questions.

bluesnowmonkey · 2026-02-12T14:05:00 1770905100

Along the same lines: I want the context window percentage visible at all times, not just when it drops below 10%. By that point it's too late to do anything useful. I can't even get it to finish up and dump its state to a file before the window is full. If I could see the percentage the whole time, I could pace my work and wrap things up cleanly instead of slamming into the wall.

bluesnowmonkey · 2026-02-07T23:38:09 1770507489

> The Digital Twin Universe is our answer: behavioral clones of the third-party services our software depends on. We built twins of Okta, Jira, Slack, Google Docs, Google Drive, and Google Sheets, replicating their APIs, edge cases, and observable behaviors.

Came to the same conclusion. I have an integration heavy codebase and it could hardly test anything if tests weren't allowed to call external services. So there are fake implementations of every API it touches: Anthropic, Gemini, Sprites, Brave, Slack, AgentMail, Notion, on and on and on. 22 fakes and climbing. Why not? They're essentially free to generate, it's just tokens.

I didn't go as far as recreating the UI of these services, though, as the article seems to be implying based on those screenshots. Just the APIs.

adisingh13 · 2026-02-08T05:41:37 1770529297

how are you implementing agentmail? would love to know more

bluesnowmonkey · 2026-02-09T13:42:09 1770644529

I'm building a platform of AI agents, and each agent can have its own email address. AgentMail handles that. You create an inbox via their REST API and they POST to your webhook when mail arrives.

On the agent side, it just gets tools: send email, reply to email, list inbox, read message. Those tools call the AgentMail API. So the fake implements the same interface.. same send/reply/list/read methods, but recording calls instead of making HTTP requests. You can pre-populate inboxes with test messages, simulate "username taken" errors, etc.

AgentMail is actually one of the simpler fakes because there's no internal state to maintain. Sending a message doesn't affect what you'd read back from your own inbox. Some of the other fakes (like the database or file storage) need to actually simulate state in memory so writes are visible to subsequent reads. This one is closer to a stub.