More

seneca · 2026-03-28T16:08:23 1774714103

> ... I immediately feel the need to go ask a fresh instance the question and/or another LLM

Not to criticize at all, but it's remarkable that LLMs have already become so embedded that when we get the sense they're lying to us, the instinct is to go ask another LLM and not some more trustworthy source. Just goes to show that convenience reigns supreme, I suppose.

pixl97 · 2026-03-28T16:27:24 1774715244

>and not some more trustworthy source.

What is that more trustworthy source exactly? At least to me it feels like the internet age has eroded most things we considered trustworthy. Behind every thing humans need there is some company or person willing to sell out trustworthiness for an extra dollar. Consumer protections get dumped in favor of more profit.

LLMs start feeling more like a dummy than the amount of ill intent they get from other places. So yea, I can see how it happens to people.

danny_codes · 2026-03-28T19:33:12 1774726392

Wikipedia is excellent.

AnimalMuppet · 2026-03-28T17:14:48 1774718088

At the moment, maybe Google Search, throwing away the AI response at the top? Or Duck Duck Go, if you don't really trust Google?

I can see a day when even that won't be trustworthy, because too much AI slop output will wind up in the search corpus. But I don't think we're there yet.

autoexec · 2026-03-28T18:21:12 1774722072

> At the moment, maybe Google Search, throwing away the AI response at the top? Or Duck Duck Go,

Even past the summary and the ads a huge amount of results that come back from both google and DDG are AI generated. It's sometimes harder to find a reliable source for information in search results these days than it was 20 years ago.

salawat · 2026-03-28T17:25:50 1774718750

Google is NN's all the way down these days. There might still be an honest index under it all, but a truly accurate representation of the Web has been effectively outlawed in the U.S. since DMCA.

vintermann · 2026-03-28T16:33:05 1774715585

But they're not exactly lying. Lying assumes an intent to deceive. It's because we know an LLMs limitations, that it makes sense to ask it the opposite question/the question without context etc.

If it was easy to look up/check the fact without an LLM, wary users probably wouldn't have gone to the LLM in the first place.

seneca · 2026-03-28T16:59:51 1774717191

> Lying assumes an intent to deceive.

Yeah, fair point. "Misleading" would be a better term, perhaps.

salawat · 2026-03-28T17:22:40 1774718560

Funny thing for me, is it's not the LLM lying to me. It's the creators. The LLM is just doing what it's weights tell it to. I'll admit, I went a bit nuclear the first time I ran one locally and observed it's outputs/chain-of-thought diverging/demonstrating intent to information hide. I'd never seen software straight up deceive before. Even obfuscated/anti-debug code is straightforward in doing what it does once you decompile the shit. To see a bunch of matrix math trying to perception manage me on my own machine... I did not take it well. It took a few days of cooling down and further research to reestablish firmly that any mendacity was a projection of the intent of the organization that built it. Once you realize that an LLM is basically a glorified influence agent/engagement pipeline built by someone else, so much clicks into place it's downright scary. Problem is it's hard to realize that in the moment you're confronting the radical novelty of a computer doing things an entire lifetime of working professionally with computers should tell you a computer simply cannot do. You have to get over the shock first. That shock is a hell of a hit.

seneca · 2026-03-27T15:07:44 1774624064

They seem to be a victim of their own success. Their response times are quite bad, and it's widely believed they are doing something to degrade service quality (quantizing?) in order to stretch resources. They just announced that they're cutting their usage limits down during peak hours as well.

They're in serious risk of losing their lead with this sort of performance.

ACCount37 · 2026-03-27T16:03:27 1774627407

> it's widely believed they are doing something to degrade service quality (quantizing?) in order to stretch resources

God, I wish this inane bullshit would just fucking die already.

Models are not "degrading". They're not being "secretly quantized". And no one is swapping out your 1.2T frontier behemoth for a cheap 120B toy and hoping you wouldn't notice!

It's just that humans are completely full of shit, and can't be trusted to measure LLM performance objectively!

Every time you use an LLM, you learn its capability profile better. You start using it more aggressively at what it's "good" at, until you find the limits and expose the flaws. You start paying attention to the more subtle issues you overlooked at first. Your honeymoon period wears off and you see that "the model got dumber". It didn't. You got better at pushing it to its limits, exposing the ways in which it was always dumb.

Now, will the likes of Anthropic just "API error: overloaded" you on any day of the week that ends in Y? Will they reduce your usage quotas and hope that you don't notice because they never gave you a number anyway? Oh, definitely. But that "they're making the models WORSE" bullshit lives in people's heads way more than in any reality.

BoneShard · 2026-03-27T20:43:35 1774644215

It's possible though - it was a bug, a model pool instance wasn't updated properly and served a very old model for several months; whoever hit this instance would received a response from a prev version of a model.

hbrn · 2026-03-27T23:36:14 1774654574

While it's true that people are naturally predisposed to invent the "secret quantizing" conspiracy regardless of whether the actual conspiracy exists or not, I think there's more to the story.

I've seen Sonnet consistently start hallucinating on the exact same inputs for a couple hours, and then just go back to normal like nothing ever happened. It may just be a combination of hardware malfunction + session pinning. But at the end of the day the effects are indistinguishable from "secret quantizing".

ramesh31 · 2026-03-27T15:24:53 1774625093

>"They're in serious risk of losing their lead with this sort of performance."

Nobody goes there anymore, it's too crowded.

seneca · 2026-03-27T16:59:59 1774630799

You'll notice I specifically said "victims of their own success". Obviously these problems are induced by the fact that they have so many users. Blowing a lead due to inability to handle the demands of success is still a path to losing the lead.

sva_ · 2026-03-27T15:13:21 1774624401

It can't be worse than gemini-cli using a Pro account.

seneca · 2026-03-27T15:23:25 1774625005

Oh really? Do they have availability problems too?

nsingh2 · 2026-03-27T15:26:44 1774625204

Gemini CLI has been broken for the past 2-3 days, with no response from Google. Really embarrassing for a multi-trillion dollar company. At this point Codex is the only reliable CLI app, out of the big three.

https://www.reddit.com/r/GeminiCLI/comments/1s49pag/this_is_...

sva_ · 2026-03-27T18:39:49 1774636789

Last time I tried it a single prompt ran for over an hour, mostly doing nothing/waiting on availability.

internetter · 2026-03-27T15:15:55 1774624555

I can't speak on Gemini but OpenAI is far worse for free accounts at least

danelski · 2026-03-27T15:31:11 1774625471

GeminiCLI is absolutely terrible, nothing comparable to the browser access. I've started using the 'AI Pro' tier lately and I get 15 minutes response times from Gemini 3 'Flash' on a regular basis.

orphea · 2026-03-27T15:12:38 1774624358

  > this sort of performance

They've been very proud of it.

faangguyindia · 2026-03-27T15:46:13 1774626373

i just use gemini 3 flash via api with custom agent.

only people who do not even look at code anymore need anything more than that.

seneca · 2026-03-24T21:11:03 1774386663

> Any massive infra migration is going to cause issues.

What? No, no it's not. The entire discipline of Infrastructure and Systems engineering are dedicated to doing these sorts of things. There are well-worn paths to making stable changes. I've done a dozen massive infrastructure migrations, some at companies bigger than Github, and I've never once come close to this sort of instability.

This is a botched infrastructure migration, onto a frankly inferior platform, not something that just happens to everyone.

seneca · 2026-03-18T21:52:07 1773870727

I share your concern. Airlines seem to be anticipating this. There was a recent publicized incident of American Airlines removing a woman from a flight for playing audio over her phone speakers. United has similar policies. As I understand it, both are saying they will ban passengers over it.

seneca · 2026-03-17T23:19:48 1773789588

I've tried several of these sorts of things, and I keep coming away with the feeling that they are a lot of ceremony and complication for not much value. I appreciate that people are experimenting with how to work with AI and get actual value, but I think pretty much all of these approaches are adding complexity without much, or often any, gain.

That's not a reason to stop trying. This is the iterative process of figuring out what works.

seneca · 2026-03-17T22:15:49 1773785749

> But Google actually knows how to do research and how to apply it to products.

I have seen basically no evidence of this. Google knows how to do research to create technology. Google is pretty terrible at creating product though.

FartyMcFarter · 2026-03-18T11:56:26 1773834986

Since we're comparing to Meta, you just have to look at the state of their publicly facing products that feature AI. Google has better AI models (Gemini, Nanobanana) and they've integrated them successfully into way more products than Meta has.

Meta spends a lot of money on AI research with little to show for it. As imperfect as Google may be, they're still doing much better.

nemothekid · 2026-03-18T16:17:13 1773850633

Google knows how to do research - and at the very least lets other people figure out the products, and then becomes the #3 or #4 player.

Both GCP and Gemini are products of this. Modern cloud was arguably built by Google (think Chubby, GFS, Bigtable as building blocks) - they just spent 10 years ceding it to Amazon before competing.

mosura · 2026-03-17T22:58:24 1773788304

Which is almost word for word the state Microsoft have been in for over 20 years.

To the point it was a running joke at MSR.

seneca · 2026-03-17T23:20:29 1773789629

I'm not sure Microsoft is good at creating technology OR products. Microsoft is good at enterprise sales.

seneca · 2026-03-16T19:24:19 1773689059

These categories are extremely broad. Top Executive includes general managers, legislators, school superintendents, mayors, city administrators, and a lot of other government jobs. The name is misleading, it's basically non-frontline management.

Chief Executives is actually a specific sub-category of it and is, obviously, much smaller.

seneca · 2026-03-16T01:49:00 1773625740

I'm not sure if this is what the writer was getting at, but I tend to check telemetry for my production applications regularly not because I'm looking for things that would fire alerts, but to keep a sense of what production looks like. Things like request rate, average latency, top request paths etc. It's not about knowing something is broken, it's about knowing what healthy looks like.

Understanding what your code looks like in production gives you a lot better sense of how to update it, and how to fix it when it does inevitably break. I think having AI checking for you will make this basically impossible, and that probably makes it a pretty bad idea.

danpalmer · 2026-03-16T06:24:10 1773642250

This is a good answer, and I agree that having a good production intuition like this is important. You're probably also right that having AI do it probably doesn't get that value.

I'm not sure I'd do this once a day. I tend to take note of things to build that intuition when I have other reasons to go and look at dashboards, and we have a weekly SLO review as a team, but perhaps there's a place for this in some way.

seneca · 2026-03-16T19:40:16 1773690016

Yeah, agreed. Daily isn't really necessary outside of initial launch and maybe a busy season. It's really just often enough to build a good sense of production use, and keep it up to date.

seneca · 2026-03-06T18:32:18 1772821938

> It was always "criminals only"

This is absolutely false. It was always mass deportation of all illegal immigrants. The "worst of the worst" rhetoric is new.

Here's a source, but there are many: https://www.aljazeera.com/news/2024/12/9/trump-lays-out-agen...

> Appearing on NBC’s Meet the Press on Sunday, Trump reiterated his intention to deport every person who had entered the US without authorisation.

seneca · 2026-03-06T18:27:48 1772821668

> You can be anti billionaire and still not be a fuckass racist

Genuinely, if you can't handle discussing a basic political disagreement without becoming apoplectic, you should take a breath and wait to respond. This is the opposite of what HN is for.

kbelder · 2026-03-06T23:06:31 1772838391

You've been downvoted for being reasonable (I gave you an upvote). The histrionics in these threads are way over the top, and it's sad to see.

spencerflem · 2026-03-07T01:30:21 1772847021

I apologize for not wording things as pleasantly as the guy who wants people thrown in camps.