Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've read that you can't split up large layers to be trained on separate processors either horizontally (one layer per processor) or vertically (parts of many layers).


On a shared memory system there's little need to do that - there's much more parallelism to be had from accelerating fine grained operations, like matrix multiplications to compute each layer's output.

On a distributed system, splitting up layers between machines to do distributed training is pretty much what Google initially designed Tensorflow for. Generally it scales less well due to the need to communicate massive amounts of data between nodes and much lower network throughput than what GPU/TPU memory provides.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: