blast_one's comments

blast_one · on Aug 15, 2023

Well, hnswlib is actually faster than epsilla according to benchmarks (compare their own vs ann benchmark), especially in terms of throughput

riedel · on Aug 15, 2023

They claim that they are 10x faster given the high accuracy target ( no clue what that means in practice for the AI use case, probably less tokens for the LLM). Can you elaborate why you think hnswlib is still faster? Can you link the benchmark you mention.

blast_one · on Aug 15, 2023

Sure. The benchmark from Epsilla https://miro.medium.com/v2/resize:fit:1400/format:webp/1*dDy..., the benchmark for the same dataset and same K (10) from ann benchmark https://ann-benchmarks.com/gist-960-euclidean_10_euclidean.h... At fixed recall (which is the mentioned accuracy) 0.95 Epsilla gets 200 QPS using multiple intra-threads and a single inter-thread. Hnswlib gets more than 370 QPS at higher 0.97 recall and both single intra- and inter- threads, which is much faster and uses less CPU.

Because hnswlib does not use intra-threads it will scale much better in terms of full throughput, probably close to 7X-8X with 16 threads on 16 vCPUs (compared to Epsilla which saturates with 2.2X improvement from multiple threads). The main premise of Epsilla's solution is trading throughput for latency, which is probably legit but would not work for all.

Note that even though the hardware between the benchmarks is not controlled (Epsilla only says it is some AWS EC2 16C32G, ann benchmark uses AWS r6i.16xlarge), it does not matter that much since the single threaded cpu speeds are pretty stagnant over the years, so ann benchmark single-thread results can be transferred (unless Epsilla is using non-x64 hardware, which would be a weird choice). There is a constant overhead from communication between the nodes in Epsilla, but it is constant and should not affect the speed at high recalls (for which the hnswlib is also faster).