Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a muddy comparison given that NumPy is commonly used with other BLAS implementations, which the author even lists, but doesn't properly address. Anaconda defaults to Intel oneAPI MKL, for example, and that's a widely used distribution. Not that I think MKL would do great on AMD hardware, BLIS is probably a better alternative.

The author also says "(...) implementation follows the BLIS design", but then proceeds to compare *only* with OpenBLAS. I'd love to see a more thorough analysis, and using C directly would make it easier to compare multiple BLAS libs.



Their implementation outperforms not only the recent version of OpenBLAS but also MKL on their machine (these are DEFAULT BLAS libraries shipped with numpy). What's the point of compairing against BLIS if numpy doesn't use it by default? The authors explicitly say: "We compare against NumPy". They use matrix sizes up to M=N=K=5000. So the ffi overhead is, in fact, neglectable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: