that's a tokenization issue. every tool has strengths and weaknesses. why does it matter whether an LLM can compare numbers? that can be done trivially in any programming language
As an AI layman (downloaded Claude for android as a result of hn, just today) "why does it matter whether an LLM can compare numbers?", is rather important to me.
Probably others, also.
I was going to say beware because there isn't an Anthropic Claude official app.... but I checked and I guess as of today there is one hah. https://www.anthropic.com/news/android-app
i can understand that perspective. as an end user, you would like your application to handle math questions correctly. it's true that llms are not the best at math
as an application developer, if we need an llm to be good at math, one solution is to give it access to a python interpret
It doesn't seem super relevant to karpathy announcing he's created a company so that he can increase the production value of his AI YouTube videos
I mean, sure, the company is ostensibly going to also teach math at some point, but karpathy will not be using gpt 4o for that when it launches (what do you think his timelines are? Do you think he is going to be able to solve trivial things like "having the llm use something like function calling to do math"? If you're unfamiliar with his work, karpathy is a very good engineer, and this is a small problem that anybody working in building production apps on LLMs can easily deal with)
If I'm going to augment my education with AI, I'd at least want to know it could get basic numerical facts right. If a computer program struggles with the concept of a number being greater than another number, how do I have any confidence that it can teach physics?
your concern would be valid if an llm were the ONLY tool being used. applications use multiple tools so you can use the appropriate tool for the job. if you're doing math, you don't want a standalone llm
As a student, how would I know what things the LLM tutor can provide correct answers for, and what things I will need to "use appropriate tools" for? Should I rely on the LLM to help teach me spelling, or US history, or are there more appropriate tools for these, too?
if the product is great, hopefully you as the user would not have to worry about which tool is doing which task. the developers would worry about that. it's the same in any app. the user doesn't know or care what tool is used to render the frontend or store the data in the backend
Since we're all just speculating to the wind here, I can see multiple ways LLMs can be used. Maybe it'll help simplify TA triage, maybe it'll just be a Discord bot. Maybe a classifier will sample from multiple models.
I think if anyone can give this idea a fair shot it's Andrew Karpathy, an ML expert and a person known to be passionate about education.
applications can use multiple tools. an llm is just one tool, and it will not cover every use case, such as math. that does not detract from the utility of llms
1: https://x.com/liujc1998/status/1813244909501182310