(Q|O)SFP are basically just raw high speed serial interfaces to whatever - you see this a lot in FPGAs, you can use the QSFP interfaces for anything high speed - PCIe, SATA, HDMI…
> Although we can already buy commercial transceiver solutions that allow us to use PCIe devices like GPUs outside of a PC, these use an encapsulating protocol like Thunderbolt rather than straight PCIe.
> [snip]
> As explained in the intro, this doesn’t come without a host of compatibility issues, least of all PCIe device detection, side-channel clocking and for PCIe Gen 3 its equalization training feature that falls flat if you try to send it over an SFP link.
So, uh… what’s the benefit? How much overhead does Thunderbolt really introduce, given it solves these other issues?
I go over it in the video but yes, active thunderbolt is probably a very good choice for a lot of people. I went into another direction for some reasons that are not applicable to everyone:
- Learning : I want to learn about the lower level of PCIe and it's a good project.
- Re-use of cabling : I have a bunch of single mode fiber bundle going around already. You can't find thunderbolt that just have a LC connector ...
- Isolation : Active thunderbolt cable still often have copper for some low speed signals, they don't offer true galvanic isolation
- Avoid dealing with thunderbolt. I want a custom chassis/pcb at one end and chips to convert from TB back to PCIe are not readily available to make custom stuff with ... (not as an individual anyway).
So yeah, if you want a ready to use solution, TB cable is absolutely a good choice, here I'm having some fun, learning in the process and hopefully sharing some of the knowledge.
Hey, I love a great self-educational deep dive. Don’t have time to watch the video until after the workday, but it sounds enlightening! (I swear that was not intentional.)
The benefits are twofold: physical colocation and bandwidth.
Thunderbolt 5 offers 80Gbps of bidirectional bandwidth. PCIe 5.0 16x offers 1024Gbps of bidirectional bandwidth. This matters.
TB5 cables can only get so long whereas fiber can go much farther more easily. This means that in a data center type environment, you could virtualize your GPUs and attach them as necessary, putting them in a separate bank (probably on the same rack).
Active optical (yes!) Thunderbolt cables can be much longer. After all, optical fiber was the original medium for Thunderbolt, back when it was still called Light Peak.
As for bandwidth, the medium transition seems to actually limit the author’s capabilities by losing some of the more advanced link-training features that are necessary for the highest-bandwidth PCIe 3 connections, never mind PCIe 5.
Hundreds of meters is considered short range in the world of *SFP. If you just plan on putting the GPUs in the same rack then I'm not sure it really matters, but you can really put anything anywhere in your DC and have things zoned with *SFP.
I don't think there is any reason TB couldn't do the same, beyond it would be even more niche to want non-modular/patchable cables+transceivers at those lengths (especially since fiber is often bundled dozens/hundreds of strands over a single trunk cable between racks).
The video is about a 2x1 link, which the author hopes to eventually scale up to 3x4 using 40 gig transceivers. I'd say thunderbolt is probably safe in the near future.
I was looking into the highest bandwidth optical transceivers. 400Gbps were easy enough to find so thanks for posting this. I honestly didn't know there were 1.6Tbps transceivers like this.
One note: I believe the SMF max fiber length is 2km not 1m [1]. The data sheet [2] also says:
Bidirectional is a lot like biweekly. Biweekly depending on context means twice a week or once every two weeks and bidirectional can both mean per direction and total of both directions.
I'm only a single datapoint but I've never encountered that usage. My understanding of a bidirectional link is that it meets the same spec in both directions simultaneously. It's important precisely because many links aren't bidirectional, sharing a single physical link between two logical links.
Too late, personally after how bad 4.6 was the past week I was pushed to codex, which seems to mostly work at the same level from day to day. Just last night I was trying to get 4.6 to lookup how to do some simple tensor parallel work, and the agent used 0 web fetches and just hallucinated 17K very wrong tokens. Then the main agent decided to pretend to implement tp, and just copied the entire model to each node...
Same. I stopped my Pro subscription yesterday after entering the week with 70% of my tokens used by Monday morning (on light, small weekend projects, things I had worked on in the past and barely noticed a dent in usage.) Support was... unhelpful.
It's been funny watching my own attitude to Anthropic change, from being an enthusiastic Claude user to pure frustration. But even that wasn't the trigger to leave, it was the attitude Support showed. I figure, if you mess up as badly as Anthropic has, you should at least show some effort towards your customers. Instead I just got a mass of standardised replies, even after the thread replied I'd be escalated to a human. Nothing can sour you on a company more. I'm forgiving to bugs, we've all been there, but really annoyed by indifference and unhelpful form replies with corporate uselessness.
So if 4.7 is here? I'd prefer they forget models and revert the harness to its January state. Even then, I've already moved to Codex as of a few days ago, and I won't be maintaining two subscriptions, it's a move. It has its own issues, it's clear, but I'm getting work done. That's more than I can say for Claude.
> It's been funny watching my own attitude to Anthropic change, from being an enthusiastic Claude user to pure frustration.
You were enthusiastic because it was a great product at an unsustainable price.
Its clear that Claude is now harnessing their model because giving access to their full model is too expensive for the $20/m that consumers have settled on as the price point they want to pay.
Off topic, but I really like the writing style on your blog. Do you have any advice for improving my own? In an older comment[1], you mentioned the craft of sharpening an idea to a very fine, meaningful, well-written point. Are there any books, or resources you’d recommend for honing that craft? Thanks in advance.
My bad — I had Max, so more than $20. I can’t edit the comment any more. Can’t keep track of the names. I wonder when ‘pro’ started to mean ‘lowest tier’.
But your article is interesting. You think some of the degradation is because when I think I’m using Opus they’re giving me Sonnet invisibily?
I agree with what you what you have written, which is why I would never pay a subscription to an external AI provider.
I prefer to run inference on my own HW, with a harness that I control, so I can choose myself what compromise between speed and the quality of the results is appropriate for my needs.
When I have complete control, resulting in predictable performance, I can work more efficiently, even with slower HW and with somewhat inferior models, than when I am at the mercy of an external provider.
For now, the most suitable computer that I have for running LLMs is an Epyc server with 128 GB DRAM and 2 AMD GPUs with 16 GB of HBM memory each.
I have a few other computers with 64 GB DRAM each and with NVIDIA, Intel or AMD GPUs. Fortunately all that memory has been bought long ago, because today I could not afford to buy extra memory.
However, a very short time ago, i.e. the previous week, I have started to work at modifying llama.cpp to allow an optimized execution with weights stored in SSDs, e.g. by using a couple of PCIe 5.0 SSDs, in order to be able to use bigger models than those that can fit inside 128 GB, which is the limit to what I have tested until now.
By coincidence, this week there have been a few threads on HN that have reported similar work for running locally big models with weights stored in SSDs, so I believe that this will become more common in the near future.
The speeds previously achieved for running from SSDs hover around values from a token at a few seconds to a few tokens per second. While such speeds would be low for a chat application, they can be adequate for a coding assistant, if the improved code that is generated compensates the lower speed.
Thank you for that, it's very interesting. I keep wanting to find time to try out a local only setup with an NVIDIA 4090 and 64gb of RAM. It seems like it may be time try it out.
I used the $60/mo subscription and I bet most developers get access to AI agents via their company, and there was no difference. They should have reduced the rate limits, or offered a new model, anything except silently reduce the quality of their flagship product to reduce cost.
The cost of switching is too low for them to be able to get away with the standard enshittification playbook. It takes all of 5 minutes to get a Codex subscription and it works almost exactly the same, down to using the same commands for most actions.
Corporate software in general is often chosen based on the value returned simply being "good enough" most of the time, because the actual product being purchased is good controls for security, compliance, etc.
A corporate purchaser is buying hundreds to thousands of Claude seats and doesn't care very much about percieved fluctuations in the model performance from release to release, they're invested in ties into their SSO and SIEM and every other internal system and have trained their employees and there's substantial cost to switching even in a rapidly moving industry.
Consumer end-users are much less loyal, by comparison.
I didn't experience that at all. I know there are lots of rumblings around here about that, but I'm posting this to show this wasn't a universal experience.
Its funny watching llm users act like gamblers. Every other week swearing by one model and cursing another, like a gambler who thinks a certain slot machine, or table is cold this week. These llm companies are literally building slot machine mechanics into their ui interfaces too, I don't think this phenomenon is a coincidence.
Stop using these dopamine brain poisoning machines, think for yourself, don't pay a billionaire for their thinking machine.
Don't confuse the many voices of a crowd with a single person's fickle view. If you can track an individual person or organization who changes their mind 'every other week' then more power to you, but unless you're performing that longitudinal study you are simply seeing differential levels of enthusiasm.
Funny because many people here were so confident that OpenAI is going to collapse because of how much compute they pre-ordered.
But now it seems like it's a major strategic advantage. They're 2x'ing usage limits on Codex plans to steal CC customers and it seems to be working. I'm seeing a lot of goodwill for Codex and a ton of bad PR for CC.
It seems like 90% of Claude's recent problems are strictly lack of compute related.
> people here were so confident that OpenAI is going to collapse because of how much compute they pre-ordered
That's not why. It was and is because they've been incredibly unfocused and have burnt through cash on ill-advised, expensive things like Sora. By comparison Anthropic have been very focused.
Nobody was talking about them betting too much on compute, people were saying that their shady deals on compute with NVIDIA and Oracle were creating a giant bubble in their attempt to get a Too Big To Fail judgement (in their words- taxpayer-backed "backstop").
That’s just short term talk. The main thesis behind their collapse is that they won’t be able to pay their compute bills because they won’t have enough demand to.
That doesn't really track because their compute isn't like a debt obligation.
The compute topic was more around how OpenAI, Nvidia, Oracle, and others were all announcing commitments to spend money in each other in a circular way which could just net out to zero value.
To me it seems like they burn so much money they can do lots of things in parallel. My guess would be that e.g. codex and sora are very independently developed. After all there's a quite a hard limit on how many bodies are beneficial to a software project.
Personally its down to Altman having the cognitive capacity of a sleeping snail, the world insight of a hormonal 14 year old who's only ever read one series of manga.
Despite having literal experts at his fingertips, he still isn't able to grasp that he's talking unfilters bollocks most of the time. Not to mention is Jason level of "oath breaking"/dishonesty.
> I'm seeing a lot of goodwill for Codex and a ton of bad PR for CC.
AI is one of the things that you cannot find genuine opinions online. Just like politics. If you visit, say, r/codex, you'll see all the people complaining about how their limits are consumed by "just N prompts" (N is a ridiculously small integer).
I agree. And I am seeing it in a lot of venues, especially political discourse. Commenting is increasingly AI driven I fear the whole thing is going to collapse and nobody will be able to rely on online commentary to make decisions. At least not without a lot of independent research, maybe that’s for the best, but it’s definitely going to change the Internet.
OpenAI will need to stop burning money eventually, but so does everyone else in the space. The longer they can do this the more squeeze it puts on their competitors.
I would call out though that I think there is one way in which this differs from the Uber situation. Theoretically at some point we should hit a place where compute costs start to come down either because we've built enough resources or because most tasks don't need the newest models and a lot of the work people are doing can be automatically sent to cheaper models that are good enough. Unless Uber's self driving program magically pops back up, Uber doesn't really have that since their biggest expense is driver wages.
I think it's a long shot, but not impossible, that if OpenAI can subsidize costs long enough that prices don't need to go too much higher to be sustainable.
My standing assumption is the darling company/model will change every quarter for the foreseeable future, and everyone will be equally convinced that the hotness of the week will win the entire future.
As buyers, we all benefit from a very competitive market.
In hindsight, it is painfully clear that Antropic’s conservative investment strategy has them struggling with keeping up with demand and caused their profit margin to shrink significantly as last buyer of compute.
they've also introduced a lot of caching and token burn related bugs which makes things worse. any bug that multiplies the token burn also multiplies their infrastructure problems.
Different plan. The old 2x has been discontinued, and the bonus is now (temporarily) available for the new $100 plan users in an effort, presumably, to entice them away from Anthropic.
Proof they don't nerf it only after testing that the benchmarks there stay the same? So overall performance degrades but they isolate those benchmarks?
The market here is extraordinarily vibes-based and burning billions of dollars for a ephemeral PR boost, which might only last another couple weeks until people find a reason to hate Codex, does not reflect well on OAI's long term viability.
> It seems like 90% of Claude's recent problems are strictly lack of compute related.
Downtime is annoying, but the problem is that over the past 2-3 weeks Claude has been outrageously stupid when it does work. I have always been skeptical of everything produced - but now I have no faith whatsoever in anything that it produces. I'm not even sure if I will experiment with 4.7, unless there are glowing reviews.
Codex has had none of these problems. I still don't trust anything it produces, but it's not like everything it produces is completely and utterly useless.
I have both Claude and OpenAI, side by side. I would say sonnet 46 still beats gpt 54 for coding (at least in my use case) But after about 45 minutes I'm out of my window, so I use openai for the next 4 hours and I can't even reach my limit.
Most of the compute OpenAI "preordered" is vapour. And it has nothing to do with why people thought the company -- which is still in extremely rocky rapids -- was headed to bankruptcy.
Anthropic has been very disciplined and focused (overwhelmingly on coding, fwiw), while OpenAI has been bleeding money trying to be the everything AI company with no real specialty as everyone else beat them in random domains. If I had to qualify OpenAI's primary focus, it has been glazing users and making a generation of malignant narcissists.
But yes, Anthropic has been growing by leaps and bounds and has capacity issues. That's a very healthy position to be in, despite the fact that it yields the inevitable foot-stomping "I'm moving to competitor!" posts constantly.
Droves? I mean, if we take the "I'm leaving!" posts seriously, the company has people so emotionally invested they feel the need to announce their departure is a pretty good place to be. Some tiny sampling of unhappy customers is indicative of nothing.
Honestly at this point I am pretty firmly of the belief that OAI is paying astroturfers to post the "Boy does anyone else think Claude is dumb now and Codex is better?" (always some unreproducible "feel" kind of thing that are to be adopted at face value despite overwhelming evidence that we shouldn't). OAI is kind of in the desperation stage -- see the bizarre acquisitions they've been making, including paying $100M for some fringe podcast almost no one had heard of -- and it would not be remotely unexpected.
We have no idea the ratio of foot stompers to quite quitters but I'm sure most people don't announce it. I cancelled my subscription and hadn't told anybody. And I quit based on personal experience over the last few weeks, not on social media pr.
All of the smart people I know went to work at OpenAI and none at Anthropic. In addition to financial capital, OpenAI has a massive advantage in human capital over Anthropic.
As long as OpenAI can sustain compute and paying SWE $1million/year they will end up with the better product.
Attracting talent with huge sums of money just gets you people who optimize for money, and it's usually never a good long-term decision. I think it's what led to Google's downturn.
> OpenAI has a massive advantage in human capital over Anthropic.
but if your leader is a dipshit, then its a waste.
Look You can't just throw money at the problem, you need people who are able to make the right decisions are the right time. That that requires leadership. Part of the reason why facebook fucked up VR/AR is that they have a leader who only cares about features/metrics, not user experience.
Part of the reason why twitter always lost money is because they had loads of teams all running in different directions, because Dorsey is utterly incapable of making a firm decision.
My tinfoil hat theory, which may not be that crazy, is that providers are sandbagging their models in the days leading up to a new release, so that the next model "feels" like a bigger improvement than it is.
An important aspect of AI is that it needs to be seen as moving forward all the time. Plateaus are the death of the hype cycle, and would tether people's expectations closer to reality.
I was there too, but honestly after today, 4.7 "feels" just as a bad. I was cynical, but also, kind of eager for the improvement. It's just not there. Compared to early Feb, I have to babysit EVERYTHING.
My purely unfounded, gut reaction to Opus 4.7 being released today was "Oh, that explains the recent 4.6 performance - they were spinning up inference on 4.7."
Of course, I have no information on how they manage the deployment of their models across their infra.
I switched to Codex and found it extremely inferior for my use case.
It is much faster, but faster worse code is a step in the wrong direction. You're just rapidly accumulating bugs and tech debt, rather than more slowly moving in the correct direction.
I'm a big fan of Gemini in general, but at least in my experience Gemini Cli is VERY FAR behind either Codex or CC. It's both slower than CC, MUCH slower than Codex, and the output quality considerably worse than CC (probably worse than Codex and orders of magnitude slower).
In my experience, Codex is extraordinarily sycophantic in coding, which is a trait that could t be more harmful. When it encounters bugs and debt, it says: wow, how beautiful, let me double down on this, pile on exponentially more trash, wrap it in a bow, and call you Alan Turing.
It also does not follow directions. When you tell it how to do something, it will say, nah, I have a better faster way, I'll just ignore the user and do my thing instead. CC will stop and ask for feedback much more often.
What is your use case? I read comments like this and it's totally opposite of my experience, I have both CC Opus 4.6 and Codex 5.4 and Codex is much more thorough and checks before it starts making changes maybe even to a fault but I accept it because getting Opus to redo work because it messes up and jumps in the first attempt is a massive waste of time, all tasks and spec are atomic and granularly spec'd, I'd say 30% of the time I regret when I decide to use Opus for 'simpler' and work
I'm building a correct, safe, highly understandable, concurrent runtime & language.
Essentially Rust/Tokio if it was substantially easier than even Go - and without a need for crates and a subset of the language to achieve near Ada-level safety.
>> I switched to Codex and found it extremely inferior for my use case.
Yeah, 100% the case for me. I sometimes use it to do adversarial reviews on code that Opus wrote but the stuff it comes back with is total garbage more often than not. It just fabricates reasons as to why the code it's reviewing needs improvement.
Codex really has its place in my bag. I mainly use it, rarely Claude.
Codex just gets it done. Very self-correcting by design while Claude has no real base line quality for me. Claude was awesome in December, but Codex is like a corporate company to me. Maybe it looks uncool, but can execute very well.
Also Web Design looks really smooth with Codex.
OpenAI really impressed me and continues to impress me with Codex. OpenAI made no fuzz about it, instead let results speak. It is as if Codex has no marketing department, just its product quality - kind of like Google in its early days with every product.
There's nothing moral about Anthropic. Especially to those of us who are not American citizens and to which Dario's pronouncements about ethics apparently do not apply, as stated in his own press release.
To me it just looks like a big sanctimonious festival of hypocrisy.
How about assuming the positive intent of what I actually said? Not everything has to be a moral crusade. Let me use the tool without pushing your personal moral opinions on me.
The same person wringing their hands over OpenAI, buys clothing made from slave labor and wrote that comment using a device with rare earth materials gotten from slave labor. Why is OpenAI the line? Why are they allowed to "exploit people" and I'm not?
Taken to its logical conclusion it's silly. And instead of engaging with that, they deflect with oH yEaH lEtS hAvE nO mOrAlS which is clearly not what I'm advocating.
Thing is that Anthropic was always working with DoD, too, and the line in the sand they drew looked really noble until I found it didn't not apply to me, a non-US citizen. Dario made it clear that was the case.
And so the difference, to me, was irrelevant. I'll buy based on value, and keep a poker in the fire of Chinese & European open weight models, as well.
Not everyone is American, and people who are not see Anthropic state they are willing to spy on our countries and shrug about OAI saying the same about America. What’s the difference to us?
well, if they put in a fully automated kill chain, its gonna be weak to attacks to make yourself look like a car, or a video game styled "hide under a box"
the current non-automated kill chain has targeted fishermen and a girl's school. Nobody is gonna be held accountable for either.
Am i worried about the killing or the AI? If i'm worried about the killing, id much rather push for US demilitarization.
Dario in fact said it was ok to spy and drone non-US citizens, and in fact endorsed American foreign policy generally.
So, no, I'm not voting with my wallet for one American country versus the other. I'll pick the best compromise product for me, and then also boost non-American R&D where I can.
Anthropic's issue was only that the AI isn't yet good enough to tell who's an American, so it avoids killing them. They were fine with the "killing non-Americans" bit.
Not only is Anthropic perfectly happy to let the DoD use their products to kill people, but they are partners with Palantir and were apparently instrumental in the strikes against Iran by the US military.
neah, I believe most people here, which immediately brag about codex, are openai employees doing part of their job. otherwise I couldn't possibly phantom why would anyone use codex. In my company 80% is claude and 15% gemini. you can barely see openai on the graph. and we have >5k programmers using ai every day.
Currently GPT just works much better, and so does Gemini but it's more expensive right now. Going through Opencode stats, their claim is that Gemini is the current best model followed by GPT 5.4 on their benchmarks, but the difference is slim.
My personal experience is best with GPT but it could be the specific kind of work I use it for which is heavy on maths and cpp (and some LISP).
I've been using it with `/effort max` all the time, and it's been working better than ever.
I think here's part of the problem, it's hard to measure this, and you also don't know in which AB test cohorts you may currently be and how they are affecting results.
Agree. I keep effort max on Claude and xhigh on GPT for all tasks and keep tasks as scoped units of work instead of boil the ocean type prompts. It is hard to measure but ultimately the tasks are getting completed and I'm validating so I consider it "working as expected".
It works better, until you run out of tokens. Running out of tokens is something that used to never happen to me, but this month now regularly happens.
Maybe I could avoid running out of tokens by turning off 1M tokens and max effort, but that's a cure worse than the disease IMO.
I would risk a guess that people have a wrong intuition about the long-context pricing and are complaining because of that.
Yeah, the per-token price stays the same, even with large context. But that still means that you're spending 4x more cache-read tokens in a 400k context conversation, on each turn, than you would be in a 100k context conversation.
Until the next time they push you back to Claude. At this point, I feel like this has to be the most unstable technology ever released. Imagine if docker had stopped working every two releases
This is one of the many reasons I don't think the model companies are going to win the application space in coding.
There's literally zero context lost for me in switching between model providers as a cursor user at work. For personal stuff I'll use an open source harness for the same reason.
You can output it as a memory using a simple prompt. You could probably re-use this prompt for any product with only slight modification. Or you could prompt the product to output an import prompt that is more tuned to its requirements.
I think this is more about which model you steer your coding harness to. You can also self-host a UI in front of multiple models, then you own the chat history.
Personally I find using and managing Claude sessions and limits is getting exhausting and feels similar to calorie counting. You think you are going to have an amazing low calories meal only to realize the meal is full of processed sugars and you overshot the limit within 2-3 bites. Now "you have exhausted your limit for this time. Your session limits resets in next 4 hrs".
Yep, it just feels terrible, the usage bars give me anxiety, and I think that's in their interest as they definitely push me towards paying for higher limits. Won't do that, though.
Usually the problems that cause this kind of thing are:
1) Bad prompt/context. No matter what the model is, the input determines the output. This is a really big subject as there's a ton of things you can do to help guide it or add guardrails, structure the planning/investigation, etc.
2) Misaligned model settings. If temperature/top_p/top_k are too high, you will get more hallucination and possibly loops. If they're too low, you don't get "interesting" enough results. Same for the repeat protection settings.
I'm not saying it didn't screw up, but it's not really the model's fault. Every model has the potential for this kind of behavior. It's our job to do a lot of stuff around it to make it less likely.
The agent harness is also a big part of it. Some agents have very specific restrictions built in, like max number of responses or response tokens, so you can prevent it from just going off on a random tangent forever.
It's been shockingly bad for me - for another example when asked to make a new python script building off an existing one; for some cursed reason the model choose to .read() the py files, use 100 of lines of regex to try to patch the changes in, and exec'd everything at the end...
Hate that about Claude Code. I have been adding permissions for it to do everything that makes sense to add when it comes to editing files, but way too often it will generate 20-30 line bash snippets using sed to do the edits instead, and then the whole permission system breaks down. It means I have to babysit it all the time to make sure no random permission prompts pop up.
I generally think codex is doing well until I come in with my Opus sweep to clean it up. Claude just codes closer to the way my brain works. codex is great at finding numerical stability issues though and increasingly I like that it waits for an explicit push to start working. But talking to Claude Code the way I learned to talk to codex seems to work also so I think a lot of it is just learning curve (for me).
so even with a new tokenizer that can map to more tokens than before, their answer is still just "you're not managing your context well enough"
"Opus 4.7 uses an updated tokenizer that [...] can map to more tokens—roughly 1.0–1.35× depending on the content type.
[...]
Users can control token usage in various ways: by using the effort parameter, adjusting their task budgets, or prompting the model to be more concise."
That's wild that you think 4.6 is bad..... Each model has its strengths and weaknesses I find that Codex is good for architectural design and Claude Is actually better the engineering and building
This. They kind of snuck this into the release notes: switching the default effort level to Medium. High is significantly slower, but that’s somewhat mitigated by the fact that you don’t have to constantly act like a helicopter parent for it.
I do feel that CC sometimes starts doing dumb tasks or asking for approval for things that usually don’t really need it. Like extra syntax checks, or some greps/text parsing basic commands
Exactly. Why do they ask permission for read-only operations?! You either run with --dangerously-skip-permissions or you come back after 30 minutes to find it waiting for permission to run grep. There's no middle ground, at least not that Claude CLI users have access to.
I've noticed the same over the last two weeks. Some days Claude will just entirely lose its marbles. I pay for Claude and Codex so I just end up needing to use codex those days and the difference is night and day.
I've been raging pretty hard too. Thought either I'm getting cleverer by the day or Claude has been slipping and sliding toward the wrong side of the "smart idiot" equation pretty fast.
Have caught it flat-out skipping 50% of tasks and lying about it.
Anecdotally, codex has been burning through way more tokens for me lately. Claude seems to just sit and spin for a long time doing nothing, but at least token use is moderate.
Meh. At $work we were on CC for one month, then switched to Codex for one month, and now will be on CC again to test. We haven’t seen any obvious difference between CC and Codex; both are sometimes very good and sometimes very stupid. You have to test for a long time, not just test one day and call it a benchmark just because you have a single example.
I describe the problem and codex runs in circles basically:
codex> I see the problem clearly. Let me create a plan so that I can implement it. The plan is X, Y, Z. Do you want me to implement this?
me> Yes please, looks good. Go ahead!
codex> Okay. Thank you for confirming. So I am going to implement X, Y, Z now. Shall I proceeed?
me> Yes, proceed.
codex> Okay. Implementing.
...codex is working... you see the internal monologue running in circles
codex> Here is what I am going to implement: X, Y, Z
me> Yes, you said that already. Go ahead!
codex> Working on it.
...codex in doing something...
codex> After examining the problem more, indeed, the steps should be X, Y, Z. Do you want me to implement them?
etc.
Very much every sessions ends up being like this. I was unable to get any useful code apart from boilerplate JS from it since 5.4
So instead I just use ChatGPT to create a plan and then ask Opus to code, but it's a hit and miss. Almost every time the prompt seems to be routed to cheaper model that is very dumb (but says Opus 4.6 when asked). I have to start new session many times until I get a good model.
It's just like subscription based MMORPGs that delay you as much as possible every step of the way because that's the way they can extract more money from you. If you pay for the tokens it's not in their benefit to give you the answer directly.
Yep, I'll wait for the GPT answer to this. If we're lucky OpenAI will release a new GPT 5.5 or whatever model in the next few days, just like the last round.
I have been getting better results out of codex on and off for months. It's more "careful" and systematic in its thinking. It makes less "excuses" and leaves less race conditions and slop around. And the actual codex CLI tool is better written, less buggy and faster. And I can use the membership in things like opencode etc without drama.
For March I decided to give Claude Code / Opus a chance again. But there's just too much variance there. And then they started to play games with limits, and then OpenAI rolled out a $100 plan to compete with Anthropic's.
I'm glad to see the competition but I think Anthropic has pissed in the well too much. I do think they sent me something about a free month and maybe I will use that to try this model out though.
I’ve been on the Claude Code train for a while but decided to try Codex last week after they announced the $100 USD Pro plan.
I’ve been pretty happy with it! One thing I immediately like more than Claude is that Codex seems much more transparent about what it’s thinking and what it wants to do next. I find it much easier to interrupt or jump in the middle if things are going to wrong direction.
Claude Code has been slowly turning into this mysterious black box, wiping out terminal context any time it compacts a conversation (which I think is their hacky way of dealing with terminal flickering issues — which is still happening, 14 months later), going out of the way to hide thought output, and then of course the whole performance issues thing.
Excited to try 4.7 out, but man, Codex (as a harness at least) is a stark contrast to Claude Code.
> One thing I immediately like more than Claude is that Codex seems much more transparent about what it’s thinking and what it wants to do next. I find it much easier to interrupt or jump in the middle if things are going to wrong direction.
I've finally started experimenting recently with Claude's --dangerously-skip-permissions and Codex's --dangerously-bypass-approvals-and-sandbox through external sandboxing tools. (For now just nono¹, which I really like so far, and soon via containerization or virtual machines.)
When I am using Claude or Codex without external sandboxing tools and just using the TUI, I spend a lot of time approving individual commands. When I was working that way, I found Codex's tendency to stop and ask me whether/how it should proceed extremely annoying. I found myself shouting at my monitor, "Yes, duh, go do the thing!".
But when I run these tools without having them ask me for permission for individual commands or edits, I sometimes find Claude has run away from me a little and made the wrong changes or tried to debug something in a bone-headed way that I would have redirected with an interruption if it has stopped to ask me for permissions. I think maybe Codex's tendency to stop and check in may be more valuable if you're relying on sandboxing (external or built-in) so that you can avoid individual permissions prompts.
there is an official codex plugin for claude. I just have them do adversarial reviews/implementations. etc with each other. adds a bit of time to the workflow but once you have the permissions sorted it'll just engage codex when necessary
Do this -- take your coworker's PRs that they've clearly written in Claude Code, and have Codex/GPT 5.4 review them.
Or have Codex review your own Claude Code work.
It then becomes clear just how "sloppy" CC is.
I wouldn't mind having Opus around in my back pocket to yeet out whole net new greenfield features. But I can't trust it to produce well-engineered things to my standards. Not that anybody should trust an LLM to that level, but there's matters of degree here.
I've been using Claude and Codex in tandem ($100 CC, $20 Codex), and have made heavy use of claude-co-commands [0] to make them talk. Outside of the last 1-2 weeks (which we now have confirmation YET AGAIN that Claude shits the fucking bed in the run-up to a new model release), I usually will put Claude on max + /plan to gin up a fever dream to implement. When the plan is presented, I tell it to /co-validate with Codex, which tends to fill in many implementation gaps. Claude then codes the amended plan and commits, then I have a Codex skill that reviews the commit for gaps, missed edge cases, incorrect implementation, missed optimizations, etc, and fix them. This had been working quite well up until the beginning of the month, Claude more or less got CTE, and after a week of that I swapped to $100 Codex, $20 CC plans. Now I'm using co-validation a lot less and just driving primarily via Codex. When Claude works, it provides some good collaborative insights and counter-points, but Codex at the very least is consistently predictable (for text-oriented, data-oriented stuff -- I don't use either for designing or implementing frontend / UI / etc).
You should not get dependent on one black box. Companies will exploit that dependency.
My version of this is having CC Pro, Cursor Pro, and OpenCode (with $10 to Codex/GLM 5.1) --> total $50. My work doesn't stop if one of these is having overloaded servers, etc. And it's definitely useful to have them cross-checking each other's plans and work.
This more or less mimics a flow that I had fairly good results from -- but I'm unwilling to pay for both right now unless I had a client or employer willing to foot the bill.
Claude Code as "author" and a $20 Codex as reviewer/planner/tester has worked for me to squeeze better value out of the CC plan. But with the new $100 codex plan, and with the way Anthropic seemed to nerf their own $100 plan, I'm not doing this anymore.
It cuts both ways. What I usually do these days is to let codex write code, then use claude code /simplify, have both codex and claude code review the PR, then finally manually review and fixup things myself. It's still ~2x faster than doing everything by myself.
100%. On days when I'm sleep deprived (once or twice a week), I fallback to this flow. On regular days, I tend to write more code the old school way and use things things for review.
It just makes logical sense really; the human using the tool is in the end responsible.
Whether the tool is too powerful or ethical to use is an orthogonal discussion, in my opinion. Taken to the extreme, nuclear weapons still need someone fire or drop them. (We should still have discussions on safety and ethics always!)
I love the analogy of AI coding as witchcraft! It’s very accurate to how working with these tools feels - At one point I was forced to invoke a “litany against stubbing” in a loop to make claude code actually implement a renode setup for some firmware. That worked really well.
It feels like hexing the technical interview come to real life ;)
I'm saying the things mentioned exist and gave example of one of the most popular consumer applications in the whole world already offering an entry level version of the same feature. Since that's what most people know about.
You have all those features already in professional photo software already as well. DaVinci is cool but it doesn't unlock anything like "make my photo look like VHS" that hasn't existed for decades by now.
Is there even a working definition of what a "filter" is in Instagram, or mobile photo editors targetting social media users (which is approximately all of the mobile photo editors), beyond "a script that fucks up your photo in some trivial but also undocumented ways"?
I'm yet to see a filter that makes your photo look like taken from a specific camera (old or otherwise). Smearing colors and sticking a frame that imitates camera film border does not count.
I cant find the original source of why I know this but I know the original Instagram filters were trying to emulate specific looks from specific analog cameras and expired film.
Smearing colors for example or weird blue / purple overlays is what you get when you shoot expired film.
There’s literally no concrete details in this; other than that they were inspired by the lomo cameras… Do you know what a 3D LUT is? Or color grading at all?
Wow, this looks incredible- Capture One has really not been innovating, is slow, the library can’t handle 40k raws, and with Lightroom, edits seem slightly worse.
The cinematic color grading seems super cool, can’t wait to give this a try.
It's also owned by Siemens via Supplyframe. That means its content is controlled to a certain degree. Sort of like the way Vice is controlled by its owners. In that way it could function as controlled opposition. Be careful what you submit too.
I wonder if it'd be possible to create a Hackaday-type site with HN content. hackernewsbooks.com >> hackernewshacks.com
While at a higher level, thunderbolt and https://en.wikipedia.org/wiki/ExpEther can both of course work over fiber too!
(Q|O)SFP are basically just raw high speed serial interfaces to whatever - you see this a lot in FPGAs, you can use the QSFP interfaces for anything high speed - PCIe, SATA, HDMI…
reply