Hacker Newsnew | past | comments | ask | show | jobs | submit | mackopes's commentslogin

32" when?



Hey community!

SOGS compression keeps coming up here as a go-to method for reducing Gaussian Splatting model sizes. I put together a deep dive into how SOGS actually works under the hood, with some practical insights on how it can be used in production.

The article stays fairly high-level, but I'm happy to dive into specifics in the comments. I learned quite a bit from implementing my own version of SOGS compression.


For some time I have a feeling that Apple actually IS playing the hardware game in the age of AI. Even though they are not actively innovating on the AI software or shipping products with AI, their hardware (especially the unified memory) is great for running large models locally.

You can't get a consumer-grade GPU with enough VRAM to run a large model, but you can do so with macbooks.

I wonder if doubling down on that and shipping devices that let you run third party AI models locally and privately will be their path.

If only they made their unified memory faster as that seems to be the biggest bottleneck regarding LLMs and their tk/s performance.


> You can't get a consumer-grade GPU with enough VRAM to run a large model, but you can do so with macbooks.

You can if you're willing to trust a modded GPU with leaked firmware from a Chinese backshop


Short of flying to China and buying in person, how can an American find/get one of these?


True, but Apple is a consumer hardware company, which requires billions of users at their scale.

We may care about running LLMs locally, but 99% of consumers don't. They want the easiest/cheapest path, which will always be the cloud models. Spending ~$6k (what my M4 Max cost) every N years since models/HW keep improving to be able to run a somewhat decent model locally just isn't a consumer thing. Nonviable for a consumer hardware business at Apple's scale.


This. If we plateau around current SOTA LLM performance and 192/386Gb of memory can run a competitive model, Apple computers could become the new iPhone. They have a unique and unmatched product because of their hardware investment.

Of course nobody knows how this will eventually play out. But people without inside information on what these big organizations have/possess, cannot make such predictions.


On a hypothetical 70b q4 model, the Ryzen AI Max+ 395 (128GB memory with 96GB allocated to iGPU) delivers ~2–5 tokens/sec, slightly trailing the M4 Max’s ~3–7 tokens/sec. The next generation for AMD I expect can easily catch up to or surpass the M4 Max.

A pair of MaxSun/Intel Arc B60 48GB GPUs (dual 24GB B580's on one card) for $1200 each also outperforms the M4 Max.


This isn’t a great point. “A hypothetical model with hypothetical hardware will beat Apple on a hypothetical timeline.”

The tangible hardware you point out is $2,400 for two niche-specific components vs the Apple hardware which benefits more general use cases.


> A pair of MaxSun/Intel Arc B60 48GB GPUs (dual 24GB B580's on one card)

please point me to the laptop with these


I think it is a given that they are aiming for a fully custom training cluster with custom training chips and inference hardware. That would align well with their abilities and actually isn't too hard to pull off for them given that they have very decent processors, GPUs and NPUs already.


>I think it is a given that they are aiming for a fully custom training cluster with custom training chips and inference hardware.

It is? I haven't seen anything about this.


They're working—almost done—on a CUDA backend for their Apple Silicon framework:

https://github.com/ml-explore/mlx/pull/1983


Memory is not in any way or shape some sort of crucial advantage, you were just tricked into thinking that because it's used for market segmentation and nobody would slaughter their datacenter profits cash cow. The inference and god forbid training on consumer Apple hardware is terrible and behind.


Show me another consumer hardware that handles inference and/or training better. How many RTX5090s would you need?


https://liliputing.com/nvidia-dgx-spark-is-3000-ai-supercomp...

looks like there will be several good options "soon"?


this is cool! Nvidia should sell notebooks, too.


i think Nvidia is trying to create a sort of reference platform and have other OEMs produce mass market products, so a laptop might happen even if nvidia doesn't make one themselves


For local inference macs have indeed shined through this whole LLM thing, and came out as the preferred device. They are great, the dev experience is good, speeds are ok-ish (a bit slower w/ the new "thinking" models / agentic use with lots of context, but still manageable).

But nvda isn't that far behind, and has already moved to regain some space with their PRO6000 "workstation" GPUs. You get 96GB of VRAM for ~7.5k$, which is more than a comparable RAM mac, but not 30k you previously had to shell for top of the line GPUs. So you get a "prosumer" 5090 with a bit more compute and 3x VRAM, in a computer that can sell for <10k$ and beat any mac at both inference and training, for things that "fit" in that VRAM.

Macs still have the advantage for larger models, tho. The new DGX spark should join that market soon(tm). But they allegedly ran into problems on several fronts. We'll have to wait and see.


It's weird to see only one word at the time. I don't think this reflects my true typing speed as I have a tiny pause to read the word once it shows on the screen. On the other hand, I'm able to read multiple words ahead as long as I can see them and naturally remove the pause.


Yeah, every new word flushes the entire pipeline. I end up typing long words that I happen to know (prestidigitation) faster than short words, because I can "fill up the pipeline".

Not clear if there's any intelligence selecting the words, but on the whole I still find https://typeracer.com more fun.


Yes, i get half of my normal WPM here because i can't look ahead.

I recommend using https://typing-speed-test.aoeu.eu/ instead.


I also find some difficulty in reading the next word because of the animation obscuring it; this has my net wpm at 61, but my monkeytype average wpm is ~120


absolutly agree. I find https://10fastfingers.com much better.


The animations is also quite confusing, _the bubble explosion_


Working on a platform to host and share 3D Gaussian Splatting models.

The key goal is that the creators of 3DGS models can use Blurry as a powerful tool to build the 3D experience that is performant, simple, and aesthetically pleasant for end users (viewers).

3DGS models can be shared via a link or embedded on a website, notion, etc..

Link: https://useblurry.com


I'm not convinced that there is one generalised solution to sync engines. To make them truly performant at large scale, engineers need to have deep understanding of the underlying technology, their query performance, database, networking, and build a custom sync engine around their product and their data.

Abstracting all of this complexity away in one general tool/library and pretending that it will always work is snake oil. There are no shortcuts to building truly high quality product at a large scale.


We've built a sync engine from scratch. Our app is a multiplayer "IDE" but for tasks/notes [1], so it's important to have a fast local first/office experience like other editors, and have changes sync in the background.

I definitely believe sync engines are the future as they make it so much easier to enable things like no-spinners browsing your data, optimistic rendering, offline use, real-time collaboration and so on.

I'm also not entirely convinced yet though that it's possible to get away with something that's not custom-built, or at least large parts of it. There were so many micro decisions and trade-offs going into the engine: what is the granularity of updates (characters, rows?) that we need and how does that affect the performance. Do we need a central server for things like permissions and real-time collaboration? If so do we want just deltas or also state snapshots for speedup. How much versioning do we need, what are implications of that? Is there end-to-end-encryption, how does that affect what the server can do. What kind of data structure is being synced, a simple list/map, or a graph with potential cycles? What kind of conflict resolution business logic do we need, where does that live?

It would be cool to have something general purpose so you don’t need to build any of this, but I wonder how much time it will save in practice. Maybe the answer really is to have all kinds of different sync engines to pick from and then you can decide whether it's worth the trade-off not having everything custom-built.

[1] https://thymer.com


Optimally, a sync engine would have the ability to be configed to have the best settings for the project (e.g. central server or completely decentralised). It'd be great if one engine would be so performant/configurable, but having a lot of sync engines to choose from for your project is the best alternative.

btw: excellent questions to ask / insights - about the same I also came across in my lo-fi ventures.

Would be great if someone could assemble all these questions in a "walkthrough" step-by-step interface and in the end, the user gets a list of the best matching engines.

Edit: Mh ... maybe something small enough to vibe code ... if someone is interested to help let me know!


Completely decentralized is cool, but I think there are two key problems with it.

1) in a decentralized system who is responsible for backups? What happens when you restore from a backup?

2) in a decentralized system who sends push notifications and syncs with mobile devices?

I think that in an age of $5/mo cloud vms and free SSL having a single coordination server has all the advantages and none of the downsides.


- You can have many sync engines

- Sync engines might only solve small and medium scale, that would be a huge win even without large scale


> Abstracting all of this complexity away in one general tool/library and pretending that it will always work is snake oil.

Remember Meteor?


That might be true, but you might not have those engineers or they might be busy with higher-priority tasks:

> It’s also ill-advised to try to solve data sync while also working on a product. These problems require patience, thoroughness, and extensive testing. They can’t be rushed. And you already have a problem on your hands you don’t know how to solve: your product. Try solving both, fail at both.

Also, you might not have that "large scale" yet.

(I get that you could also make the opposite case, that the individual requirements for your product are so special that you cannot factor out any common behavior. I'd see that as a hypothesis to be tested.)


Damn. Here’s the upvote


Did anyone get a faster click speed than 45ms?


At first I was getting 46, then got it down to 32, then 26, then 11. I stopped trying after that, don’t want to waste more than a minute on this.

For reference, this was on mobile and it did cause the screen to zoom in and out on occasion due to the fast double taps.


I got 2ms but I think this is just lagging in Safari


i got 16ms on mobile by laying it flat on my desk and spamming with both index fingers


38, but only by tabbing to the button and finger drumming the enter key


Space key also works, so you can have one hand tapping enter and space along with the other rolling the mouse key.


I seem to be capped at 98 ms (FF/chrome ubuntu 24.04).


> think of your queries as super human friendly SQL > The database? Massive amounts of data boiled down to unique entries with probabilities. This is a simplistic, but accurate way to think of LLMs.

I disagree that this is the accurate way to think about LLMs. LLMs still use a finite number of parameters to encode the training data. The amount of training data is massive in comparison to the number of parameters LLMs use, so they need to be somewhat capable of distilling that information into small pieces of knowledge they can then reuse to piece together the full answer.

But this being said, they are not capable of producing an answer outside of the training set distribution, and inherit all the biases of the training data as that's what they are trying to replicate.

> I guess my point is, when you use LLMs for tasks, you're getting whatever other humans have said. And I've seen some pretty poor code examples out there. Yup, exactly this.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: