This is a DuckDB feature that's incredibly hard for Snowflake (or anyone else) t...

esafak · on April 22, 2025

How does that work? Does the client clone the database at the beginning of the session and work with a shapshot? If so, do you automatically and periodically sync it?

randomtoast · on April 22, 2025

With HTTP Range Requests, which is typically used for pausing and resuming large file downloads, to request specific byte ranges from the file. This allows you to retrieve only the data you need. With SQL indexes, the data returned will be minimal because the lookup is optimized. However, if you select *, you will still end up downloading the entire database.

jasonjmcghee · on April 22, 2025

Parent comment isn't asking how data is requested from the back-end.

GP comment is (seemingly) describing keeping an entirely client side instance (data stored locally / in memory) snapshot of the back-end database.

Parent comment is asking how the two are kept in sync.

It's hard to believe it would be the method you're describing and take 25ms.

If you're doing http range requests, that suggests you're reading from a file which means object storage or disk.

I have to assume there is something getting triggered when back end is updating to tell the client to update their instance. (Which very well could just be telling it to execute some sql to get the new / updated information it needs)

Or the data is entirely in memory on the back end in an in memory duckdb instance with the latest data and just needs to retrieve it / return it from memory.

immibis · on April 22, 2025

Doesn't that mean you have way more round-trips than necessary? Instead of asking for the row, you ask for the file header, the list of tables and indices, an index page, another index page, another index page, and a table page?

mritchie712 · on April 22, 2025

Yes, we're still fine-tuning exactly what we cache, but a simple example would be:

1. user writes a `select` statement that return 20k records. We cache the 20k.

2. user can now query the results of #1

we're also working on more complex cases (e.g. caching frequently used tables).