Nice read! I'm curious as to what settings you used for product quantization, an...

Nice read!

I'm curious as to what settings you used for product quantization, and the general setup that you tried. I find that there can generally be a smooth quality degradation from raw float32, to float16, through 8/6/4 bit scalar quantization, to product quantization down to a very small number of bits per vector if tuned properly. The quantization method is independent of the search method used (brute force, cell probe with inverted lists (IVF), cell probe with IMI inverted lists, HNSW), though both affect quality of results.

FYI, the Faiss library (used across Facebook and Instagram) has a CPU HNSW index implementation as well, with a GPU one possibly to come at some point.

Generally we do not use HNSW for our largest indices due to the memory overheads and restrictions on vector removal, but HNSW is useful as a top-level index for inverted list searching on these large indices (an index sitting on top of sub-indices), and indeed is possibly required for the largest indices, e.g. https://github.com/facebookresearch/faiss/wiki/Indexing-1T-v...