One nice thing about CSV files being zipped and served via the web is they can be streamed directly into the database incredibly fast without having to persist them anywhere (aside from the db).
You can load the zip file as a stream, read the CSV line by line, transform it, and then load it to the db using COPY FROM stdin (assuming Postgres).
It isn't. But that's easily mitigated with temp tables, ephemeral database and COPY etc.
Upstream can easily f-up and (accidentally) delete production data if you do this on a live db. Which is why PostgreSQL and nearly all other DBS have a miriad of tools to solve this by not doing it directly on a production database
I had this last week, but instead it was a 3rd party api and their service started returning null instead of true for the has_more property beyond the second page of results.
In either the solution is probably to check rough counts and error if not reasonable.
That’s not how COPY FROM works in postgres. You give it a csv and a table matching the structure and it hammers the data into the table faster than anything else can.
You can load the zip file as a stream, read the CSV line by line, transform it, and then load it to the db using COPY FROM stdin (assuming Postgres).