Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Batch is just a special case of streaming

No. Designing a system that is always up and running and can process small amounts of data constantly is a completely different problem from designing a system that runs occasionally with a lot of data. For one thing, your output formats are usually different in the latter case (maybe you're creating a PDF for example). Also the high availability requirement just makes things different at the design level.

Finally, the author claims it's not hard to switch between batch and streaming. With a large volume of preexisting data, this is just not true. For example, if you make a REST API call for each document in a DB, it can take days or months to load that. If batching together documents isn't a possibility, how do you move data between stores easily? (This data movement is often required when switching between batch and streaming.)



I'm seconding this, and I have first hand experience in exactly this problem, in finance. My first boss also had the view that "batching is a special case of streaming where you stream N and streaming is also a special case of batching where the batch size is 1 and so it doesn't matter which one you implement". This was never performant enough and he was eventually asked to leave.


The key is working incrementally, not sitting idle for months and then hammering production as hard as possible trying to get all the deferred work done in exactly one batch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: