They claim that they are 10x faster given the high accuracy target ( no clue what that means in practice for the AI use case, probably less tokens for the LLM). Can you elaborate why you think hnswlib is still faster? Can you link the benchmark you mention.
Because hnswlib does not use intra-threads it will scale much better in terms of full throughput, probably close to 7X-8X with 16 threads on 16 vCPUs (compared to Epsilla which saturates with 2.2X improvement from multiple threads).
The main premise of Epsilla's solution is trading throughput for latency, which is probably legit but would not work for all.
Note that even though the hardware between the benchmarks is not controlled (Epsilla only says it is some AWS EC2
16C32G, ann benchmark uses AWS r6i.16xlarge), it does not matter that much since the single threaded cpu speeds are pretty stagnant over the years, so ann benchmark single-thread results can be transferred (unless Epsilla is using non-x64 hardware, which would be a weird choice).
There is a constant overhead from communication between the nodes in Epsilla, but it is constant and should not affect the speed at high recalls (for which the hnswlib is also faster).