Hacker Newsnew | past | comments | ask | show | jobs | submit | shubhamintech's commentslogin

we've started to document any a/b decision we take in terms of tech and store in our engineering internal docs! have gone back to it once in a whilw but that usually helps keep us grounded

IMO the under-discussed risk here is that sites will start serving different content to verified crawlers vs real users. You're already seeing it with known search bots getting sanitized views. If your agent's context comes from a crawl the site knows is going to an AI, you have no guarantee it matches what a human sees, and that data quality problem won't surface until your agent starts acting on selectively curated information.

This could go wrong on same levels.


This already happens in the opposite direction. See: news websites that drop their pay wall for GoogleBot

Hard limits are a good first layer but they don't tell you why the agent is looping. Retrying because it's confused, retrying because a dependency is flaky, and genuine planning loops are three different problems with different fixes. What helped us was logging the agent's intent at each step, and if it's asking the same underlying question three times in different syntax, that's the signal to bail early rather than burning through your iteration budget.

The BCG framing makes it sound like a cognitive load problem but I think it is more unreliability fatigue. When your AI does 8 things right and then confidently does the 9th wrong, you spend mental energy second-guessing everything. Supervising an unreliable system is more exhausting than just doing the task yourself.

Automation Bias is probably the thing you're trying to describe. :)

https://en.wikipedia.org/wiki/Automation_bias


Same mental model problem comes up in AI agent observability. Two conversation flows can produce identical user outcomes and look totally different at the message level, or vice versa. The normalization step that actually captures 'did behavior change' is the hard part in both domains.

That's a really sharp parallel. "Did behavior change" is exactly the question in both cases, and the surface-level representation lies to you in both. We normalize ASTs before hashing so reformatting or renaming a local variable doesn't register as a change. Curious what normalization looks like on the agent observability side, feels like a harder problem when the output is natural language instead of code.

The latency point matters more than it looks imo like the GPU work isn't just async CPU work at a different speed, the cost model is completely different. In LLM inference, the hard scheduling problem is batching non-uniform requests where prompt lengths and generation lengths vary, and treating that like normal thread scheduling leads to terrible utilization. Would be curious if Eyot has anything to say about non-uniform work units.

Not right now, it is far too early days. I'm currently working through bugs, and missing stdlib, to get a simple backpropagation network efficient. Once I'm happy with that I'd like to move onto more complex models.

What is the new language doing that can't be done with an already established language that is worth sacrificing an entire standard library?

Was using this in our prod microservices, has been helping us with value instantly! Really in for the vision of AI SREs

Lol the joke works because it's halfway serious. Agents already choose tools, vote with API calls, and quietly churn off platforms that create friction. "What do agents want" is actually a real product question and imo most teams shipping agent products have zero visibility into how their agents are actually behaving in production. What do you think?


The pattern is always the same: product team has a new AI feature, someone says put it everywhere, nobody asks users if they want it, and then power users revolt while the feature languishes unused anyway. The right move was opt-in, but that would've made the adoption numbers look bad. Progressive disclosure exists for a reason.


The "trust our logs" problem is real ie regulators and security teams don't care about your dashboard. Curious about the semantic layer though: once you can verify a log is intact, the next hard question is why the agent made the specific decision that caused an incident. Integrity proves the what, but you still need the interpretability layer for the why.


Yeah, totally agree. Integrity mostly answers the “what happened” part.

The idea is that once the sequence of events is provably intact, you can attach the decision context to it — things like policy snapshots, inputs/prompts (or hashes of them), and state transitions.

Then the evidence layer proves the history wasn’t altered, and analysis tools can reconstruct why the system made a particular decision from that preserved context.

The demo focuses on the integrity layer because without that everything else turns into “trust our dashboard.” Interpretability tools can sit on top of the same evidence


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: