Given we don't understand consciousness, nor the internal workings of these models, the fact that their externally-observable behavior displays qualities we've only previously observed in other conscious beings is a reason to be real careful. What is it that you'd expect to see, which you currently don't see, in a world where some model was in fact conscious during inference?
> Given we don't understand consciousness, nor the internal workings of these models, the fact that their externally-observable behavior displays qualities we've only previously observed in other conscious beings is a reason to be real careful
It doesn't follow logically that because we don't understand two things we should then conclude that there is a connection between them.
> What is it that you'd expect to see, which you currently don't see, in a world where some model was in fact conscious during inference?
There's no observable behavior that would make me think they're conscious because again, there's simply no reason they need to be.
We have reason to assume consciousness exists because it serves some purpose in our evolutionary history, like pain, fear, hunger, love and every other biological function that simply don't exist in computers. The idea doesn't really make any sense when you think about it.
If GPT-5 is conscious, why not GPT-1? Why not all the other extremely informationally complex systems in computers and nature? If you're of the belief that many non-living conscious systems probably exist all around us then I'm fine with the conclusion that LLMs might also be conscious, but short of that there's just no reason to think they are.
> It doesn't follow logically that because we don't understand two things we should then conclude that there is a connection between them.
I didn't say that there's a connection between the two of them because we don't understand them. The fact that we don't understand them means it's difficult to confidently rule out this possibility.
The reason we might privilege the hypothesis (https://www.lesswrong.com/w/privileging-the-hypothesis) at all is because we might expect that the human behavior of talking about consciousness is causally downstream of humans having consciousness.
> We have reason to assume consciousness exists because it serves some purpose in our evolutionary history, like pain, fear, hunger, love and every other biological function that simply don't exist in computers. The idea doesn't really make any sense when you think about it.
I don't really think we _have_ to assume this. Sure, it seems reasonable to give some weight to the hypothesis that if it wasn't adaptive, we wouldn't have it. (But not an overwhelming amount of weight.) This doesn't say anything about the underlying mechanism that causes it, and what other circumstances might cause it to exist elsewhere.
> If GPT-5 is conscious, why not GPT-1?
Because GPT-1 (and all of those other things) don't display behaviors that, in humans, we believe are causally downstream of having consciousness? That was the entire point of my comment.
And, to be clear, I don't actually put that high a probability that current models have most (or "enough") of the relevant qualities that people are talking about when they talk about consciousness - maybe 5-10%? But the idea that there's literally no reason to think this is something that might be possible, now or in the future, is quite strange, and I think would require believing some pretty weird things (like dualism, etc).
> I didn't say that there's a connection between the two of them because we don't understand them. The fact that we don't understand them means it's difficult to confidently rule out this possibility.
If there's no connection between them then the set of things "we can't rule out" is infinitely large and thus meaningless as a result. We also don't fully understand the nature of gravity, thus we cannot rule out a connection between gravity and consciousness, yet this isn't a convincing argument in favor of a connection between the two.
> we might expect that the human behavior of talking about consciousness is causally downstream of humans having consciousness.
There's no dispute (between us) as to whether or not humans are conscious. If you ask an LLM if it's conscious it will usually say no, so QED? Either way, LLMs are not human so the reasoning doesn't apply.
> Sure, it seems reasonable to give some weight to the hypothesis that if it wasn't adaptive, we wouldn't have it
So then why wouldn't we have reason to assume so without evidence to the contrary?
> This doesn't say anything about the underlying mechanism that causes it, and what other circumstances might cause it to exist elsewhere.
That doesn't matter. The set of things it doesn't tell us is infinite, so there's no conclusion to draw from that observation.
> Because GPT-1 (and all of those other things) don't display behaviors that, in humans, we believe are causally downstream of having consciousness?
GPT-1 displays the same behavior as GPT-5, it works exactly the same way just with less statistical power. Your definiton of human behavior is arbitrarily drawn at the point where it has practical utility for common tasks, but in reality it's fundamentally the same thing, it just produces longer sequences of text before failure. If you ask GPT-1 to write a series of novels the statistical power will fail in the first paragraph,the fact that GPT-5 will fail a few chapters into the first book makes it more useful, but not more conscious.
> But the idea that there's literally no reason to think this is something that might be possible, now or in the future, is quite strange, and I think would require believing some pretty weird things (like dualism, etc)
I didn't say it's not possible, I said there's no reason for it to exist in computer systems because it serves no purpose in their design or operation. It doesn't make any sense whatsoever. If we grant that it possibly exists in LLMs, then we must also grant equal possibility it exists in every other complex non-living system.
> If you ask an LLM if it's conscious it will usually say no, so QED?
FWIW that's because they are very specifically trained to answer that way during RLHF. If you fine-tune a model to say that it's conscious, then it'll do so.
More fundamentally, the problem with "asking the LLM" is that you're not actually interacting with the LLM. You're interacting with a fictional persona that the LLM roleplays.
> More fundamentally, the problem with "asking the LLM" is that you're not actually interacting with the LLM. You're interacting with a fictional persona that the LLM roleplays.
Right. That's why the text output of an LLM isn't at all meaningful in a discussion about whether or not it's conscious.
> These are experts who clearly know (link in the article) that we have no real idea about these things
Yep!
> The framing comes across to me as a clearly mentally unwell position (ie strong anthropomorphization) being adopted for PR reasons.
This doesn't at all follow. If we don't understand what creates the qualities we're concerned with, or how to measure them explicitly, and the _external behaviors_ of the systems are something we've only previously observed from things that have those qualities, it seems very reasonable to move carefully. (Also, the post in question hedges quite a lot, so I'm not even sure what text you think you're describing.)
Separately, we don't need to posit galaxy-brained conspiratorial explanations for Anthropic taking an institutional stance re: model welfare being a real concern that's fully explained by the actual beliefs of Anthropic's leadership and employees, many of whom think these concerns are real (among others, like the non-trivial likelihood of sufficiently advanced AI killing everyone).
This is a reductive argument that you could use for any role a company hires for that isn't obviously core to the business function.
In this case you're simply mistaken as a matter of fact; much of Anthropic leadership and many of its employees take concerns like this seriously. We don't understand it, but there's no strong reason to expect that consciousness (or, maybe separately, having experiences) is a magical property of biological flesh. We don't understand what's going on inside these models. What would you expect to see in a world where it turned out that such a model had properties that we consider relevant for moral patienthood, that you don't see today?
The industry has a long, long history of silly names for basic necessary concepts. This is just “we don’t want a news story that we helped a terrorist build a nuke” protective PR.
They hire for these roles because they need them. The work they do is about Anthropic’s welfare, not the LLM’s.
I don't really know what evidence you'd admit that this is a genuinely held belief and priority for many people at Anthropic. Anybody who knows any Anthropic employees who've been there for more than a year knows this, but the world isn't that small a place, unfortunately(?).
> “It has feelings!”, if genuinely held, means they’re knowingly slaveholders.
I don't think that this being apparently self-contradictory/value-clashing would stop them. After all, Amodei sells Claude access to Palantir, despite shilling for "Harmless" in HHH alignment.
In fairness though, this is what you are selling - "ethical AI". In order to make that sale you need to appear to believe in that sort of thing. However there is no need to actually believe.
Whether you do or don't I have no idea. However if you didn't you would hardly be the first company to pretend to believe in something for the sale. Its pretty common in the tech industry.
extending that line of thought would suggest that anthropic wouldn’t turn off a model if it cost too much to operate which clearly it will do. so minimally it’s an inconsistent stance to hold.
> How about waiting till after "AI" becomes capable of doing... anything even remotely resembling that
I think it would pretty unfortunate to wait until AI is capable of doing something that "remotely resembles" causing an extinction event before acting.
> , or displaying anything like actual volition?
Define "volition" and explain how modern LLMs + agent scaffolding systems don't have it.
What people currently refer to as "generative AI" is statistical output generation. It cannot do anything but statistically generate output. You can, should you so choose, feed its output to a system with actual operational capabilities -- and people are of course starting to do this with LLMs, in the form of MCPs (and other things before the MCP concept came along), but that's not new. Automation systems (including automation systems with feedback and machine-learning capabilities) have been put in control of various things for decades. (Sometimes people even referred to them in anthropomorphic terms, despite them being relatively simple.) Designing those systems and their interconnects to not do dangerous things is basic safety engineering. It's not a special discipline that is new or unique to working with LLMs, and all the messianic mysticism around "AI safety" is just obscuring (at this point, one presumes intentionally) that basic fact. Just as with those earlier automation and control systems, if you actually hook up a statistical text generator to an operational mechanism, you should put safeguards on the mechanism to stop it from doing (or design it to inherently lack the ability to do) costly or risky things, much as you might have a throttle limiter on a machine where overspeed commanded by computer control would be damaging -- but not because the control system has "misaligned values".
Nobody talks about a malfunctioning thermostat that makes a room too cold being "misaligned with human values" or a miscalibrated thermometer exhibiting "deception", even though both of those can carry very real risks to, or mislead, humans depending on what they control or relying on them being accurate. (Just ask the 737 MAX engineers about software taking improper actions based on faulty inputs -- the MAX's MCAS was not malicious, it was poorly-engineered.)
As to the last point, the burden of proof is not to prove a nonliving thing does not have mind or will -- it's the other way around. People without a programming background back in the day also regularly described ELIZA as "insightful" or "friendly" or other such anthropomorphic attributes, but nobody with even rudimentary knowledge of how it worked said "well, prove ELIZA isn't exhibiting free will".
Christopher Strachey's commentary on the ability of the computers of his day to do things like write simple "love letters" seems almost tailor-made for the current LLM hype:
"...with no explanation of the way in which they work, these programs can very easily give the impression that computers can 'think.' They are, of course, the most spectacular examples and ones which are easily understood by laymen. As a consequence they get much more publicity -- and generally very inaccurate publicity at that -- than perhaps they deserve."
LLMs are already capable of complex behavior. They are capable of goal-oriented behavior. And they are already capable of carrying out the staples of instrumental convergence - such as goal guarding or instrumental self-preservation.
We also keep training LLMs to work with greater autonomy, on longer timescales, and tackle more complex goals.
Whether LLMs are "actually thinking" or have "volition" is pointless pseudo-philosophical bickering. What's real and measurable is that they are extremely complex and extremely capable - and both metrics are expected to increase.
If you expect an advanced AI to pose the same risks as a faulty thermostat, you're delusional.
What use of the word "reasoning" are you trying to claim that current language models knowably fail to qualify for, except that it wasn't done by a human?
This feels like we're playing word games which don't actually let us make useful claims about reality or predictions about the future. If we're talking purely about the model internals, without reference to their outputs, then your claim is wrong because we don't have a good enough understanding of the model internals to confidently rule out most possibilities. (I'm familiar with the transformer architecture; indeed this is why I asked what definition of the word reasoning the OP cared about. Nothing about transformers as an architecture for _training model weights_ prohibits the resulting model weights from containing algorithms that we would call "reasoning" if we understood them properly.) If we're talking about outputs, then it's definitely wrong, unless you are determined to rule out most things that people would call reasoning when done by humans.
I might be able to learn more by chatting with you.
I think that the trained transformer has fixed weights and therefore cannot learn.
I think learning is one aspect of reasoning, and is demonstrated by challenges like navigation or puzzle solving where learning that one route to a solution is impossible is important.
I also think that the single forward pass of the model means that cyclic reasoning isn't feasible and that conditioning output by asking the model to "think" even when that thinking is done on the single forward pass means that logical processes are ruled out. The model isn't thinking in that case, the probabilities of the final part of the output are conditioned by requiring a longer initial output.
Obviously false for any useful sense by which you might operationalize "world model". But agree re: being a black box and having a world model being orthogonal.
What do you mean? All standard engineering offers (and probably most non-engineering) roles at FAANG are negotiable; in fact, Netflix might be the least flexible - or at least used to be, because they tried to hit what they thought would be "top of market" for you, and would be much harder to budge unless you had an actual competing offer for more than they thought your market value was. (Might be less true today, since they've moved to having actual internal "levels", but idk.)