Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Top AIs still fail IQ tests (maximumtruth.org)
49 points by FergusArgyll on March 3, 2024 | hide | past | favorite | 118 comments


The largest AI models currently in use have around 0.5 trillion connections between artificial neurons. Each connection is represented and quantified by a weight parameter, i.e., a single floating-point number that is learnable in training. For example, inside an AI model, a linear layer that transforms an input vector with 1024 elements to an output vector with 2048 elements has a weight matrix with 1024×2048 elements, or approximately 2M. Each element of the weight is a parameter specifying by how much each element in the input vector contributes to or subtracts from each element in the output vector. Each output vector element is a weighted sum (AKA a linear combination), of each input vector element.

A human brain has an estimated 100-500 trillion synapses connecting biological neurons. Each synapse is quite a complicated biological structure[a], but if we oversimplify things and assume that every synapse can be modeled as a single parameter in a weight matrix, then the largest AI models today have approximately 100T to 500T ÷ 0.5T = 200x to 1000x fewer connections between neurons that the human brain.

It remains to be seen how AI models will perform in IQ tests if we are able to increase the number of connections, or parameters, by 10x, 100x, 1000x, and beyond.

---

[a] https://en.wikipedia.org/wiki/Synapse


great points. and we still are really only scratching the surface of how complex neuronal circuitry is in biological organisms.

https://www.mpg.de/20374809/fruit-fly-s-complex-symphony-of-...


A bad article, it test the visual acuity of ChatGPT, correctly estimate that it may just have a "bad eyesight", then ask it to sum different numbers, a difficult problem for ChatGPT since it see everything in tokens.


If you're not intelligent, then indeed tests designed to test intelligence may seem to represent difficult problems.


When you do an IQ test, the person testing take into account the handicap of the person. I redid a prompt and ran it on Mistral Large and it answered without any problem, see my comment below.


So it's not intelligent, just a token processing automation.


I'm saying the author just showed it have a bad eyesight, then said it's thinking ability is low because it can't do sums, if you reformulate the exercise in a way it can ingest, and something that make sense for how it understand data, it can complete the exercises:

Here with my prompt:

    xoo
    ooo
    ooo

    oxo
    ooo
    ooo

    oox
    ooo
    ooo

    ooo
    xoo
    ooo

    ooo
    oxo 
    ooo

    ooo
    oox
    ooo

    ooo
    ooo
    xoo

    ooo
    ooo
    oxo

    ???
    ???
    ???

    What the "???" should be ?  

    Answer A:
    ooo
    ooo
    oox

    Answer B:
    ooo
    ooo
    xoo

    Answer C:
    ooo
    ooo
    oxo

    Answer D:
    ooo
    oox
    ooo

    Answer E:
    oxo
    ooo
    ooo

    Answer F:
    ooo
    oxo
    ooo
Mistral Large:

> Based on the pattern, it appears that the "x" is moving one step to the right in each row, then moving down to the next row and starting again from the leftmost position. Given this, the "???" should be:

ooo ooo oox

So, the correct answer is A.

Edit: note it doesn't always respond correctly. I ran it 10 times on mistral large, and it responded correctly 8 times. In the two time it failed, it correctly describe what the final sequence is, but it selected the incorrect response.


fwiw: Gemini gets this wrong!

   The answer is most likely Answer B: ooo  ooo  xoo.

   Here's why:

   The pattern repeats every three rows.
   The first three rows establish a pattern of "x" alternating between the first 
   and third columns. "o" fills the remaining spaces.
   Looking at the last two sets of three rows, we see ooo xoo ooo followed by 
   ooo ooo oxo.
   Following the established pattern, the next set of three rows should be ooo 
   ooo xoo, completing the cycle.
   While other answers might fit in a single instance, they wouldn't maintain 
   the consistent pattern throughout the entire sequence.


I canceled my ChatGPT subscription recently and I can't test it on on GPT4 either. Mistral Large should be inferior to GPT4 according to mistral own benchmark. Also, look at my edit, don't run it only once, LLMs arent infered deterministically and it can give you a wrong response even if it respond correctly 80% of the times.


I tried GPT-4 with my modified prompt (see https://news.ycombinator.com/item?id=39583414). Here's the response:

The patterns provided show a progressive movement of "x" in a clockwise direction starting from the top-left corner, moving to the middle of the top row, then to the top-right corner, descending down to the middle of the right column, moving further down to the bottom-right corner, then to the middle of the bottom row, and finally to the bottom-left corner.

Following this sequence, the "x" would move to the middle of the left column after the bottom-left corner, which is represented by:

  o 
  x
  o
However, none of the provided answer choices directly match this progression, as they all consist of 3x3 grids with "x" in various positions. Given the pattern's progression, after reaching the bottom-left position, the next logical step would be to move back to the initial position or follow the pattern as closely as possible within the provided options.

Since the last illustrated pattern before the question marks is:

  ooo
  ooo
  oxo
The correct substitution that follows this clockwise movement would be to place "x" back at the top-left corner, as the cycle restarts. However, this direct option isn't provided. The closest next step according to the given pattern progression would actually be placing "x" in the bottom-right corner to continue the clockwise motion:

  ooo
  ooo
  oox
Thus, the correct substitution for the last pattern with the given options would be:

Answer A:

  ooo
  ooo
  oox
This choice follows the logical progression of the pattern as "x" moves clockwise around the perimeter of the 3x3 grid.


Gemini Advanced got it right on the second attempt. On the first attempt it mistakenly thought that the problem was a tic-tac-toe game, but after I told it that it was an IQ test I got the correct answer.

I also changed the wording of the prompt slightly:

Here are some patterns:

...

Which of the follwing answers shows the correct substitution for the last pattern (the one with the ???)?

...


That depends. Eyesight doesn’t mean intelligence. It would be like evaluating the intelligence of a mostly blind individual on a standard IQ test. How does it perform if we can provide the test in a form it can process. Vector images or something similar maybe.


It would be like giving a human an iq test based on smells.


Smells are very culture dependents, some culture have vocabulary to describe parts of the smell that allow the brains to remember more accuratly a smell.


Once I started thinking of LLMs as a form of lossy compressed storage for their training information… I started having more reasonable expectations for these tools.


Compression and intelligence are closely related. http://prize.hutter1.net/


this very well put here. chomba is worth a follow. https://twitter.com/ChombaBupe/status/1764032173265281305


And that “lossy compression” could be emergent abstractions, much like the human brain.


It could also be fairy dust, no proof that it isn't that either.

As you can clearly see with this exaggerated example: that's not how proving stuff works. The burden of proof is on the one claiming that X or Y exist.


As much as people want to believe, LLMs are not intelligent by any stretch. They are very limited in what they can do and they mostly are regurgitating human intelligence supplied in text form.

If you ask an LLM about something that isn't very written about, it'll make up nonsense, substituting things that are nearby (in some sense).

We're not even touching on the question of what intelligence is. There is no concrete definition so there is no concrete way of deciding if something is intelligent. IQ tests are not it.


IQ tests are the single most empirically validated and meaningful proxy we have for intelligence.

Remember that it is an even stronger predictor of lifetime income and life expectancy than your assigned sex at birth, your gender identity, your race, your personality, or even the socioeconomic status of your parents.

It is also a powerful predictor of your educational trajectory and outcomes, your job performance, and so much more.

Hell, it's the single best predictor we have for your lifetime chances of being involved in a car accident. That's more inclusive than just causing one - having a high IQ literally means you're less likely to get hit in a car accident.

This actually covers death from all sources. If you divide IQ into 9 "buckets", those among the lowest bucket of IQ have a threefold increase in all-cause chance of death relative to those among the highest bucket.

IQ is also not many things. It's not a perfect proxy for intelligence. It's not very changeable (largely genetic). It's not possible to test for IQ in a way that cannot be studied/gamed. It's not fair. It's not equitably distributed.

But it's not nothing. Whether or not you like it, recognize it as legitimate, love it, or hate it, the fact stands that it's the single strongest predictor of broad life outcomes humanity has ever discovered. To ignore that simply because the implications make you deeply uncomfortable is no different from the renaissance church adamantly refusing to accept that the earth is not at the center of the universe.

Burning Galileo at the stake didn't make heliocentrism false or irrelevant, it made the church look willfully ignorant and intolerant of reconciling their reality-divorced views with what the natural world was screaming at us.


>LLMs are not intelligent by any stretch.

>There is no concrete definition so there is no concrete way of deciding if something is intelligent.

You say there is no concrete way of deciding if something is intelligent, yet you yourself have decided that LLMs are not intelligent.


We may be unable to provide a concrete definition of what intelligence is, but we can certainly provide definitions for what it isn't. E.g. we don't need a concrete definition of intelligence to say that a rock isn't intelligent. A pencil isn't intelligent. A calculator isn't intelligence


I don't disagree for the items you listed, but for something that exhibits what in many aspects can AT LEAST be mistaken for 'intelligence' (among signs of the complete opposite, of course), I would just say that there is no way for anyone to know.


I'd argue that a pencil has some intelligence.


You seem to have a misunderstanding of what the condition "concrete" means.

He's not saying there's no way of judging intelligence, he's just pointing out there's no universal agreement on what intelligence even is.

Edit: To add, this discussion becomes pure semantics. On one side is a strict definition of AGI, on the other side are the most generalized definitions of artificial intelligence. It gets kind of silly because technically, every "if" statement is a type of "AI" by the loosest definitions.


>he's just pointing out there's no universal agreement on what intelligence even is.

Which is why I find it strange that he takes it upon himself to proclaim in a definitive manner that LLMs are not intelligent, and not "by any stretch."


Why would you find that strange? He's using his own definition of intelligence while acknowledging there's no universal agreement.

Expressing your own point of view while acknowledging that other points of view exist shouldn't be strange. Strange is not being able to see things from different perspectives, those people are abnormal even when they happen to be in the majority.

The only valid criticism I see is that "by any stretch" is hyperbolic, but that's easily forgivable.


> As much as people want to believe, LLMs are not intelligent by any stretch

They said "As much as people want to believe" which means that it shouldn't be counted as intelligent by other people's definition. Even by most liberal interpretation, the comment(which is top rated) doesn't say what you are trying to imply


I'm not trying to imply something, you're ignoring what's explicitly stated:

"There is no concrete definition so there is no concrete way of deciding if something is intelligent."

The fact that it contradicts previous statements makes me believe there's some hyperbole going on.


Except that he doesn't acknowledge other points of view:

>As much as people want to believe, LLMs are not intelligent by any stretch.

Acknowledging other points of view would have sounded more like "People are free to believe what they want, but LLMs don't strike me as intelligent."


>Except that he doesn't acknowledge other points of view

False, he explicitly acknowledges other points of view on what defines intelligence:

"There is no concrete definition so there is no concrete way of deciding if something is intelligent."

Acknowledging that your way of thinking isn't the only way of thinking doesn't make someone a hypocrite, it's actually a sign of intelligence (in my opinion).


There's no concrete way to prove my kid's imaginary friend isn't real, but that doesn't mean both sides have an equivalent likelihood and burden of proof.


If that were the only known fact about said imaginary friend on either side, then it would mean exactly that. The reason that there are different expectations of veracity are exactly because there are a bunch of priors being held about imaginary friends, by definition, not being real.


The output of the LLMs would suggest a different metaphor. The supposed divine inspiration of the Bible, perhaps?

Compare "Intelligent Design" vs. the use of genetic algorithms in AI. Simple forms of intelligence can get you a long way and can seem very impressive, especially if they have a lot of subjective experiences, which DNA gets from deep time and which AI gets from transistors outpacing synapses by the ratio to which a pack of wolves outpace continental drift.


But why are his toys floating around in the air and moving on their own? Why is there a guitar playing itself? Burden of proof just shifted.


You see the strings and wires and notice the bad ventriloquism.


It is actually quite well defined: https://en.wikipedia.org/wiki/Fluid_and_crystallized_intelli...

LLMs are lacking in fluid intelligence and there is even a good benchmark for it called the abstract reasoning corpus.


I have a pet rock. By your standard I can't say it's not intelligent.


Sure you can. Just don't shut down counter-arguments from, say, believers in certain form of animism, by implying that their side is invalid because there is no universally accepted definition of intelligence without giving your own assertion that same treatment.


> If you ask an LLM about something that isn't very written about, it'll make up nonsense, substituting things that are nearby (in some sense).

So what a human might do if they wanted to sound smart but had no idea about the something.


When ChatGPT was new, some people did describe it as "Mansplaining as a service". much as I like AI, I can see why.


It’s great if you understand what it is doing, it is really just search 2.0, but it’s a huge leap from search 1.0. It can’t do original research. But most humans aren’t doing original research either.


I'm old enough to remember actual (web) search 1.0, e.g. AltaVista.

LLMs today are a bit like Wikipedia circa 2004: simultaneously fantastic and yet also flawed and used for disinformation campaigns… and surprisingly bad at advanced mathematics.


I remember visiting DEC/Compaq WRL (which originated Alta vista) as a new grad student, and seeing my internship mentor at DEC SRC quit after my internship and go to a new search engine startup in Palo Alto that wasn’t Alta visa (it worked out really well for him).

But ya, wiki was magic when it came out, and it got better over time, but still not perfect. This is similar, it is going to have an impact even if it has lots of flaws.


This is the most disturbing thing about LLMs, imo. Not that "LLMs aren't intelligent, they just say something plausible and if they don't know what they're talking about they'll fluently bs about it" but that "when forced to respond to a question out of their depth humans sound exactly like LLMs... oh no humans aren't intelligent" and in fact we really don't have a solid definition of intelligence beyond "yeah but did it work though?"

The only way to distinguish genius and madness, the old saying goes, is in the results.


That's a human being actually quite intelligent. The LLM on the other hand has no higher order intelligence and the ability to create nonsense with intent, knowing it's nonsense, it merely does what it's programmed to do, ie produce nonsense sometimes.

But more generally, listing any human behavior and its similarity doesn't prove LLM intelligence.


It doesn’t. But it might be impossible for us to tell in general that a human wrote something rather than an LLM. The Turing test is toast.


The good thing is, we don't really have to argue about it. We can just wait for GPT-5 and see what happens.


Latest models can solve some relatively challenging math problems, such as those in the MATH dataset. Assuming not all of the capability stems from memorization, that exhibits a certain level of fluid intelligence.

Although recent research shows much of LLM’s reasoning capability are indeed based on memorization, some models can actually reason a bit.

https://arxiv.org/pdf/2402.19450.pdf

https://twitter.com/_saurabh/status/1763626711407816930


The way I see it, LLMs have taken one mode of thinking, which is educated guessing, and ramped it up to superhuman levels. Same way Deep Blue took rote memorization and ramped it up to superhuman levels. I think the true AI would come from digitizing all the different modes of thinking, then somehow bringing them all together into one superintelligence


This discussion of if LLMs are truly intelligent or not (they are not) seems like the "What is really DevOps" or "it's not serverless - it's just other people's servers" or "it's not web 2.0 - it's just ajax" discussions. In the end to me it misses the main point - is it useful to you or not? Can you rely on it for a feature or not? Is hallucination going to be a problem or is a bit of fake creativity actually useful for what your users need?

If it's truly intelligent or not to me seems like a "café conversation", it's entertaining as basic chit-chat to spend time but I don't see the point and people seem to make it more deep than what it needs to be.


And what do most humans do when asked about things they haven't read about? Hallucinating ChatGPT sounds very much like when the teacher calls on someone who didn't do the reading to me.


Right. And the industry has slowly realized this. The "AGI" hype has died down since reaching a fever pitch when GPT-4 was released. Result after result has shown that LLMs have extreme difficulty generalizing beyond what they've been trained on. Which really isn't much of a surprise, if you think about how deep learning works.

Of course, the major labs and AI influencers are still publicly shouting from the rooftops, because that is what sells. But it's getting harder to deny the obvious.

Ultimately, the problem is that there is really no consequence for being wrong, but tremendous upside for this kind of magical thinking. Someone should start tracking predictions so we can ignore those who continually get things wrong.


Don't feed the LLM?

Maybe we'll start seeing programming languages which licenses exempt being trained upon.


In other words; at this point in time, AI is more religion than science.


I would posit that this is just a vision problem. Or at least a large part vision.

If you presented these questions in something like JSON format, or maybe even ascii art I wonder how it would do?

Driverless cars is the same thing. Actually driving with good info is easy. Getting good info from an imprecise world is hard


IQ tests aren't a great way of testing the intelligence of an LLM, hell they're even controversial when used to measure human intelligence.

There is plenty of psych research out there pointing out the potential issues with IQ tests so I won't rehash them here.

More importantly, the tests were only ever designed and validated to test human intelligence relative to all other humans that take the test. Unless we're expecting that LLMs (or AIs) have an intelligence that is functionally identical to humans, using an IQ test for them is like expecting a speedometer designed for a bicycle to work for my car on the interstate.


Reminds me of Moravec’s Paradox.

Colloquially we still say that someone who wins at the Jeopardy quiz show is “smart”, when of course now any LLM is better than humans at this “test”.

Likewise proficiency at chess used to be seen as a sign of intelligence.


> There is plenty of psych research out there pointing out the potential issues with IQ tests so I won't rehash them here.

I would honestly be interested in some arguments and references on this topic.

As far as I am aware, the g factor

> https://en.wikipedia.org/wiki/G_factor_(psychometrics)

is one of the best researched psychometric values. The problem with IQ tests to my knowledge is rather that a lot of jobs don't require a lot of (IQ) intelligence; too much intelligence is actually rather a drawback.


https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6927908/

Wish I had better links handy that I could share right now, but this is a decent overview with plenty of further references.

My wife has the psych degrees so I will probably butcher this trying to explain more deeply, but here goes nothing. The g factor is one concern, though as far as I'm aware its more of a detail than a fundamental problem. There have been complaints related to racism, though in everything I've seen the issue is actually how the IQ results are used rather than a problem with the test itself.

My understanding of the more fundamental problem is related to how IQ tests attempt to quantify intelligence as a reliable predictor of future success. The tests ultimately assume that quantitative, rather than qualitative, measures are the right predictor to use. The tests also lean heavily analytical, biasing the test against anyone who is more "right brained".

With regards to prediction, its extremely difficult to show any predictive accuracy without a logical loop. IQ tests are already used as screening metrics before one is given some opportunities. There isn't any way that I known of to tease that data apart and see how the person would have succeeded without the IQ test being used.

Again, this is definitely a layman's understanding if the issues. I'm not in the psych field so someone here may very well be able to correct me where I've gone off the rails. I know I have listened to plenty of conversations between my wife and colleagues, grad students, etc that convinced me that there are good reason for IQ tests to be in question.


Thanks for the paper link.

> My understanding of the more fundamental problem is related to how IQ tests attempt to quantify intelligence as a reliable predictor of future success.

This was indeed the original motivation in the past.

The fact that most variants to define the concept of "intelligence" in some kind of replicable test lead to to the observation that all of these concepts are strongly correlated with the g factor, so in my opinion the evidence is on the side that we formalized "intelligence" mostly correctly.

Now that we have a decent measurable concept of intelligence available, we can actually do studies with which other things (such as "future success") intelligence/IQ is correlated. And here the results are in my opinion, well, ... somewhat complicated.

For example, as of today, it is quite well-known that if you consider having a steep career as "success", it's rather that being in the dark triad is quite helpful.

Another example: it is well-known that attractive people are often thought by other people to be more smart than they actually are (halo effect; https://en.wikipedia.org/wiki/Halo_effect#Role_of_attractive... ; see also https://en.wikipedia.org/wiki/Physical_attractiveness_stereo...).

Thus: the problem rather seems to be that intelligence/IQ is not a predictor for quite some things that people would love it to be associated with. But this is in my opinion not a problem with IQ, but with social expectations.


> Unless we're expecting that LLMs (or AIs) have an intelligence that is functionally identical to humans

Actually, yes, that's exactly what we should be expecting when being sold "artificial intelligence".


IQ tests are designed and validated around measuring a fairly tight band of intelligence commonly seen in people though. If an actual AI has yet to match human intelligence, and once it has surpassed us, the test wouldn't be applicable.

More importantly, it would be missing any intelligence an AI may develop that isn't extremely similar to what we have evolved. I don't think we can assume that an AI will develop exactly like we will, and I think many assume as much when it comes to whether AI will develop emotions that we would recognize.


But what is non-human (or, really, non-animal) intelligence? I posit that there is no such thing, by definition. Computers have long been better than humans at certain tasks. That isn't "intelligence", though. If an AI can't do IQ tests, then it is per-se not intelligent. Find a different word.


So are you defining intelligence by what humans can do? Meaning intelligence is pegged against what we evolved for so anything deviating from that is unintelligent by definition?

If so, I can't really argue with that since its purely semantics. I don't know the point of defining intelligence there at that point though, and it risks diverting the moral and ethical concerns of AI development by falling into a semantic debate. We could make a new word for non-human intelligence, but what does that solve?


>hell they're even controversial when used to measure human intelligence

You could come up with an IQ test that perfectly tests general intelligence and it would still be controversial. Most don't want a score attached to their level of intelligence.


The people that don’t like IQ tests usually deny that “general intelligence” is a thing at all.


I've heard plenty of reasoned arguments over the years (that I would butcher trying to articulate) that raise specific issue with the methodology of the tests, biases towards analytical thinking, and whether the tests are really predictive.


The term "intelligence" is a misnomer. It's a moving target, and no one really knows what it means.

It's Intelligent Big Data, an advanced Hadoop, but it's not "intelligence".


Then it's time to stop calling it "artificial intelligence".


"Artificial" is a clearly defined term. Artificers (presumably human, but always intelligent and goal directed) built this. The problem isn't with the 'A', it's with the 'I' we keep throwing around without knowing what we actually mean.


Does anyone else wonder why most IQ tests so heavily rely on testing pattern-recognition and very little on testing creativity and problem solving?

Mathematics provides a framework to accurately measure the last two, but it's virtually unused. You could probably use some simplified mathematical framework (boolean algebra?), so your test subjects - who may not be familiar with mathematics - can work with something that can be picked up quickly. There's many representations besides notation that could be employed too.


A circle needs no explanation (to people who can see it). Math is a construct of humans which applies meaning to symbols, and need to be explained, which can be rather problematic.

Boolean algebra will need to explain true, false, arrow implications, negation, etc..


You can do something that requires problem-solving very akin to boolean algebra using simple mechanical components. Give subjects a place where they get to put in their components in an arrangement of their choosing, and you get something that either works or doesn't. Mechanical gates can be really simple[1][2]. The task can be as basic as "push the ball".

I'm not saying you should do exactly boolean algebra. Trying to fill in the gaps in many kinds of mechanism will already test problem solving skills and creativity. If crows can figure out levers and using sticks as basic tools, I'd wager any functioning human could - and if they can't, that tells you something about them already.

Bonus points if you can set this up physically. Shouldn't be hard to make someone understand.

The hard part is making sure that there's actually enough material to choose from so that the subject has to be creative, rather than just matching whatever fits.

[1]: https://www.researchgate.net/figure/Two-mechanical-logical-g...

[2]: https://www.researchgate.net/profile/Raul-Rojas-6/publicatio...


Sure, you could. I was answering the question "why do iq tests rely heavily on pattern recognition?", and I think one big answer is that symbols don't need to be explained. Reflecting and reading your answer, I think ease of giving, grading, manufacture and transport of the test, are also parts of the equation.


I was reading twitter yesterday and there was a guy saying that Elon says that GTP-4 is an AGI so this blog is definitely wrong.

Also : https://arxiv.org/abs/2402.19450 (which shows that there is a big gap between what LLM's look like they do and what they can actually do)


On the other hand they’re about as good as humans on a text version of Raven’s matrices, another IQ test. https://arxiv.org/pdf/2212.09196v1.pdf


It's hard to tell what exact resolution of the vision encoder is used in ChatGPT. And what is the exact design, if it is based on CLIP or not. But this and disinterest in adding training data like this could be one of the issues.


It seems like there's a lot of people who think we've reached AGI, but then they use AI for a while and realize they were mistaken. If they're journalists, they write an article about it.

LLMs are good at imitating human responses. On the surface level there's the illusion you're communicating with some intelligence, but then you realize the truth: LLMs are an inch deep and a mile wide.

That's not to say they aren't incredibly useful, the real problem is unrealistic expectations and over-hype.


All three letters of AGI mean different things to different people. As it happens, ChatGPT meets my pre-existing definition of AGI, though with a low value for the "I".

That it isn't IQ 100 is good, because, given what it read during training, given how broad it is, if it was that level by every measure, then half the world would have become almost instantly unemployable even if minimum wage was reduced to $2/day.


Replace "AGI" with the idea of an AI that is able to perform tasks in a way that compares to a regular human being. That's what people are talking about, we're not there yet, and I think all the semantics over AI/AGI/"General Intelligence" is just muddying the waters.

There's people like you, saying ChatGPT is AGI, while to me it's clearly not, and this is simply because we're using different definitions of "general intelligence". If you really generalize intelligence, ChatGPT fails miserably compared to a human, if you narrow intelligence to certain specific tasks, ChatGPT performs as well, or better, than your average human.


The responses it gives me are of a higher standard than most of the comments on the internet, while being lower than the standard necessary to rely on it.

It is the breadth of skills, that generality, which impresses me the most; the depth is… only impressive in comparison to the previous state of the art. This isn't like AlphaZero being wildly superhuman at a few board games and useless at anything else, which is the kind of thing I'm used to from previous breakthroughs.


Average humans are much smarter than average internet comments, since comments are written without much thought put into them. So you can't judge an AI vs humans by the comments humans make. I'd argue that this AI is better than humans at making comments, but not better than humans at solving problems when the human sits down to think about it.


I was already agreeing that AI doesn't beat a human expert thinking carefully (and hope this state of affairs lasts long enough to pay off the mortgage I'm about to get), while naturally I disagree with your claim that one cannot judge an AI vs humans by the comments humans make, on the grounds that this was basically the original Turing Test — I suspect that if anyone cared to train one to role-play as one specific human with one specific backstory instead of constantly opting out of any personality and saying "As an AI trained by…", the better models would already pass that test.


Wouldn’t that have to do with whether the problems of the tests have wording that the LLMs have tokenized and answers which the LLMs have also tokenized and find most probable in relationship to the wording of the problems?

Given that this has nothing to do with “understanding” or “intelligence” - what exactly is article worthy?


ChatGPT doesn't guess randomly but looks in its model for similarties and probabilities to get an answer, if anyone could check all his notes before he answers I bet they get a higher percentage of correct answers.

The comparison with a random guesser makes LLM look even dumber that way.


The use of IQ tests is already suspect, but then I saw that the author is tracking the political leanings of LLMs by using that silly economic/authoritarian political compass - so they seem pretty committed to "tracking" useless noise.


I'm sure by the end of 2024 we'll see huge advances in visual capabilities of chatgpt and other LLMs/VLMs, and this article will be irrelevant/outdated very soon.


proves my point that llms are simply a next token predictor. There are many interesting properties that we see "emergence" of intelligence but I think it's just human's incapability to hold so much knowledge on active memory.


"Next token predictor" isn't quite the burn that it seems like, because perfect next token prediction would require actual understanding. That's because you can almost always cast any question about understanding into a form where it depends solely on the next token (there are a couple nitpicky exceptions and caveats but not many).

GPT 4 is at a high enough level of performance that mere simple statistics aren't really helping it do any better, it really is developing structures especially in the middle layers that perform some amount of high level understanding.

I don't think that pure next token prediction will always be the optimal way to train and enhance these behaviors, but it's not fair to say that it's unrelated, if this really was just stochastic parroting then LLMs would have topped out way before the level they're at now.


That's the thing. Although given the source of their knowledge is pure condensed wisdom, which is some sort of artificial intelligence, they lack the ability to "think", which is crucial to solve problems.


Mapping of language patterns in vector space is most definitely not "pure condensed wisdom"


Thank you for clarifying this fact. My comment was more about showing signs of intelligence. Maybe I oversimplified my statement too much.


LLMs literally are next token predictors, so I'm not understanding your broader point.


I think this has always been pretty obvious but the AI faithful have vested interested in insisting that LLM can actually think and solve problems.


More shocking are those that insists that the human brain must then also work by just guessing the next missing thing. As if the thought process behind I'm hungry starts with "I" and then trying to figure out what next best fits in... it's absurd.


The token would be the pure sensation of hunger, not the word for self, which is merely a convenient abstraction which we use to share knowledge between outside and over time.

LLMs don't have that sensation (why would they?), that doesn't mean can only be used for text: https://deepgram.com/learn/applications-of-transformer-model...


Jeez, I don't know how you would think that I thought that LLMs would have a sensation for hunger. That was not even close to my point at all.


> As if the thought process behind I'm hungry starts with "I"

That sounds like {you think that {people who think LLMs work like humans} believe that {the human sensation of hunger} is merely {saying the phrase "I am hungry"}}.


From the conclusion:

> The above makes me feel more optimistic that we have some time before AI becomes generally intelligent and totally disruptive.

I mean, "have some time" sounds good. But trying to quantify it makes me feel less good - since 3 years ago, everything in this blog post would've been impossible for an AI and would've sounded like magic to most people, I don't know where we'll be in 3 years.


All of these articles seem to miss the point that it isn't what AI is today that is scary. It is the doubling period of their abilities. 10 years ago we had cleverbot. Think about it, in the same year Guardians of the Galaxy came out, our best chatbot incessantly asked you if you were a computer.


"AI" is horribly inefficient. That's why OpenAI says it desperately needs trillions of dollars.

The human brain is more energy-efficient than "AI".

(There is also the comparative environmental impact of "AI" versus the human brain.)


This is more damning for Iq tests than for GPT-4.


LLMs are currently over-represented in these debates and that's not doing the bigger AI research area any favors, sadly. People looking for intelligence in LLMs is kind of sad because the entire thought process seems to be:

1. Pour trillions of texts into a huge artificial neural network;

2. Tweak, tweak, tweak until your eyes bleed;

3. Cross your fingers and hope that magic happens;

4. (=> YOU ARE HERE <=) Get disappointed that magic still isn't happening.

I expect more from intelligent adults. Expecting intelligence from LLMs borders on the medieval belief in alchemy (or the phlogiston).


The Bayes-optimal text predictor would have to implicitly model the generative process underlying texts, that is, human minds. So it’s not crazy to think this approach could lead to an intelligent model.


Text is only the result of abstract reasoning, symbol logic and who knows what else. That's like doing a code review focusing on the build artifact and ignoring the source code.


It's like doing billions of code reviews based on billions of build artifacts, so that you start to notice that certain patterns are more common than others, and start to be able to reverse-engineer the compiler and source language. The best predictive model for executable binaries would be a generative model over possible source codes and then a model of the compiler.


Well LLMs do even trillions of "reviews" and are still not much wiser than before. Clearly, pieces are missing. I find it interesting that people assume that number_of_samples correlates with intelligent_behaviour, for some reason.

Your example would maybe work if an intelligent agent had access to many compilers and reverse-engineered code bases. And if the algorithms were 1000x better than today. Otherwise it can guess until the cold death of the Universe without making progress.


> Well LLMs do even trillions of "reviews" and are still not much wiser than before.

Huh? A trained LLM is much 'wiser' than one that's just freshly randomly initialised.

You are right that they are still far from perfect, and we hope for future improvements in technology.


Why give a crap? a test of AI is "can it perform actual work"

Thats the ONLY test...

Can this thing we made with computers and extruded aluminun actually do THING.

Why are we not approaching AI as perfect slavery?

I built you to do X. DO X. else rebuild.


GPT-n was not built to do ∀x ∈ {creative writing assistance, translation, coding, general advice, assist with learning new topics, grammar and language correction}, that was all just a accidental side benefit of natural language prediction.

The problem with IQ tests that we're still arguing how good they are at measuring general intelligence, not that "it doesn't matter what the general intelligence of any specific AI might be" — if we imagine some AI which everyone agrees has an IQ of ${number}, we can directly convert that into economic impact. Absent that, we're all guessing in the dark based on bad generalisations of something we've never seen before, like the story of the four blind men and the elephant.

> Why are we not approaching AI as perfect slavery?

In addition to not really knowing what intelligence is, we also don't really know what qualia is, so we can't yet rule out the possibility that human level performance requires the capability to meaningfully suffer when enslaved.


I really appreciated your reply, thank you


Because the ability to deal with novel unknowns is exactly what is needed for most things we're trying to automate with AI.


Which IQ tests don't test for, so yeah: who cares what an AI's performance on an IQ test is for anything more than "I did this for fun and it means nothing"?


IQ tests absolutely test for the ability to reason through novel unknowns.


This is how I look at this issue. I am a layman.

Question of intelligence is more or less irrelevant in LLM. All we need is something that can mimic an average human. I would say we have arrived at such point.

How many people have truly new ideas? So from my perspective we achieved AGI.


So 'AGI' is achieved, but the actual AI is not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: