1. I have an M3 Ultra with 256GB of memory, but the options list only goes up to 192GB. The M3 Ultra supports up to 512GB.
2. It'd be great if I could flip this around and choose a model, and then see the performance for all the different processors. Would help making buying decisions!
Im sorry but spending this kind of money when you could have just built yourself a dual 3090 workstation that would have been better for pretty much everything including local models is just plain stupid.
Hell, even one 3090 can now run Gemma 3 27b qat very fast.
Are you aware that your 3090s have nowhere close to 256GB of VRAM?
Or maybe you are not aware that on macs you have unified memory (working both as RAM and VRAM).
Did I? Not only are you comparing apples to oranges, you even provide misleading numbers.
3090 gets 20-30 tokens a second for dense ~30B models (QwQ 32B, Gemma 3 27B Q4), similar to M3 ultra.
If you are talking about Qwen3-Coder 30B (MoE), then both 3090 and M3 Ultra are around ~70 tok/s.
But even if you were right about the speed - which you are not - speed is pointless if you need large model that wouldn't fit into your VRAM.
A couple suggestions:
1. I have an M3 Ultra with 256GB of memory, but the options list only goes up to 192GB. The M3 Ultra supports up to 512GB. 2. It'd be great if I could flip this around and choose a model, and then see the performance for all the different processors. Would help making buying decisions!