>At Google/FB scale tweaks to improve cost may make sense but it doesnt seem lik...

emcq · on April 22, 2018

There are many researchers not bound by NDAs making novel chip architectures, some of which are on HN.

Secondly, GPUs don't really have very much hardware specialized for graphics anymore. If we called it a TPU I'm not sure you'd be making the same point :P At a company like MS/FB/Google where you don't need to leverage selling the same chip for gaming/vr/ML/mining for economy of scale, like you said you can reduce your transistor count and have the same compute fabric. This would reduce your idle power consumption through leakage but you wouldn't expect a huge drop in power during active compute. Because the smaller precision compute ends up increasing the compute per byte, you either need to find more parallelism to RAM, get faster RAM, reduce the clockrate of compute, or reduce the number of compute elements to find a balanced architecture with lower precision. If you just shrink the number of compute elements - voila! - you're close to what NVIDIA is doing with ML cores.

joshuamorton · on April 22, 2018

> Secondly, GPUs don't really have very much hardware specialized for graphics anymore. If we called it a TPU I'm not sure you'd be making the same point :P

I think this is semantics. Modern GPUs do have a lot of hardware that isn't specialized for machine learning. My limited knowledge says that very recent NVIDIA GPUs have some things that vaguely resemble TPU cores ("Tensor cores"), but they also have a lot of silicon for classic CUDA cores. Which I was calling "graphics" hardware, but might better be described as "silicon that isn't optimized for the necessary memory bandwidth for DL". So its still used non-optimally.

To be clear, you can still use the CUDA cores for DL. We did that just fine for a long time, they're just decidedly less efficiently than Tensor cores.

sanxiyn · on April 22, 2018

NVIDIA is heading in that direction: see NVDLA. http://nvdla.org/