In these deep relu networks which are not renormalized in between, (like overfea...

varelse · on Sept 30, 2014

16-bit float has dynamic range from + or - 6.5e04 to 6.1e-05.

http://en.wikipedia.org/wiki/Half-precision_floating-point_f...

That's plenty IMO for most inputs and weights. Where it gets tricky is in accumulation. You could constrain the weights for each unit I guess, but this is the sort of work best done under the hood rather than by the data scientist IMO. I'd personally choose 32-bit accumulation just because it would drastically simplify code development.

I've also worked with fixed precision elsewhere. It's awesome if you understand the dynamic range of your application. It's a migraine headache if you don't.