Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a DuckDB feature that's incredibly hard for Snowflake (or anyone else) to copy. Running the same database client-side (WASM) and server-side can make for a pretty magical experience.

Queries that normally take 1s to 2s can run in 25ms, so you get under the "100ms rule" which is very uncommon in analytics applications.

We DuckDB server side and have experimental support for DuckDB WASM on the client-side at https://www.definite.app/ and sometimes I don't trust that a query ran because of how fast it can happen (we need some UX work there).



How does that work? Does the client clone the database at the beginning of the session and work with a shapshot? If so, do you automatically and periodically sync it?


With HTTP Range Requests, which is typically used for pausing and resuming large file downloads, to request specific byte ranges from the file. This allows you to retrieve only the data you need. With SQL indexes, the data returned will be minimal because the lookup is optimized. However, if you select *, you will still end up downloading the entire database.


Parent comment isn't asking how data is requested from the back-end.

GP comment is (seemingly) describing keeping an entirely client side instance (data stored locally / in memory) snapshot of the back-end database.

Parent comment is asking how the two are kept in sync.

It's hard to believe it would be the method you're describing and take 25ms.

If you're doing http range requests, that suggests you're reading from a file which means object storage or disk.

I have to assume there is something getting triggered when back end is updating to tell the client to update their instance. (Which very well could just be telling it to execute some sql to get the new / updated information it needs)

Or the data is entirely in memory on the back end in an in memory duckdb instance with the latest data and just needs to retrieve it / return it from memory.


Doesn't that mean you have way more round-trips than necessary? Instead of asking for the row, you ask for the file header, the list of tables and indices, an index page, another index page, another index page, and a table page?


Yes, we're still fine-tuning exactly what we cache, but a simple example would be:

1. user writes a `select` statement that return 20k records. We cache the 20k.

2. user can now query the results of #1

we're also working on more complex cases (e.g. caching frequently used tables).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: