I would note that a notional "log store", doesn't have to just be used for things that are literally "logs."
You know what else you could call a log store? A CQRS/ES event store.
(Specifically, a "log store" is a CQRS/ES event store that just so happens to also remember a primary-source textual representation for each structured event-document it ingests — i.e. the original "log line" — so that it can spit "log lines" back out unchanged from their input form when asked. But it might not even have this feature, if it's a structured log store that expects all "log lines" to be "structured logging" formatted, JSON, etc.)
And you know what the most important operation a CQRS/ES event store performs is? A continuous streaming-reduction over particular filtered subsets of the events, to compute CQRS "aggregates" (= live snapshot states / incremental state deltas, which you then continuously load into a data warehouse to power the "query" part of CQRS.)
Most CQRS/ES event stores are built atop message queues (like Kafka), or row-stores (like Postgres). But neither are actually very good backends for powering the "ad-hoc-filtered incremental large-batch streaming" operation.
• With an MQ backend, streaming is easy, but MQs maintain no indices for events per se, just copies of events in different topics; so filtered streaming would either have the filtering occur mostly client-side; or would involve a bolt-on component that is its own "client-side", ala Kafka Streams. You can use topics for this — but only if you know exactly what reduction event-type-sets you'll need before you start publishing any events. Or if you're willing to keep an archival topic of every-event-ever online, so that you can stream over it to retroactively build new filtered topics.
• With a row-store backend, filtered streaming without pre-indexing is tenable — it's a query plan consisting of a primary-key-index-directed seq scan with a filter node. But it's still a lot more expensive than it'd be to just be streaming through a flat file containing the same data, since a seq scan is going to be reading+materializing+discarding all the rows that don't match the filtering rule. You can create (partial!) indices to avoid this — and nicely-enough, in a row-store, you can do this retroactively, once you figure out what the needs of a given reduction job are. But it's still a DBA task rather than a dev task — the data warehouse needs to be tweaked to respond to the needs of the app, every time the needs of the app change. (I would also mention something about schema flexibility here, but Postgres has a JSON column type, and I presume CQRS/ES event-store backends would just use that.)
A CQRS/ES event store built atop a fully-indexed document store / "index store" like ElasticSearch (or Quickwit, apparently) would have all the same advantages of the RDBMS approach, but wouldn't require any manual index creation.
Such a store would perform as if you took the RDBMS version of the solution, and then wrote a little insert-trigger stored-procedure that reads the JSON documents out of each row, finds any novel keys in them, and creates a new partial index for each such novel key. (Except with much lower storage-overhead — because in an "index store" all the indices share data; and much better ability to combine use of multiple "indices", as in an "index store" these are often not actually separate indices at all, but just one index where the key is part of the index.)
---
That being said, you know what you can use the CQRS/ES model for? Reducing your literal "logs" into metrics, as a continuous write-through reduction — to allow your platform to write log events, but have its associated observability platform read back pre-aggregated metrics time-series data, rather than having to crunch over logs itself at query time.
And AFAIK, this "modelling of log messages as CQRS/ES events in a CQRS/ES event store, so that you can do CQRS/ES reductions to them to compute metrics as aggregates" approach is already widely in use — but just not much talked about.
For example, when you use Google Cloud Logging, Google seems to be shoving your log messages into something approximating an event-store — and specifically, one with exactly the filtered-streaming-cost semantics of an "index store" like ElasticSearch (even though they're actually probably using a structured column-store architecture, i.e. "BigTable but append-only and therefore serverless.") And this event store then powers Cloud Logging's "logs-based metrics" reductions (https://cloud.google.com/logging/docs/logs-based-metrics).
You know what else you could call a log store? A CQRS/ES event store.
(Specifically, a "log store" is a CQRS/ES event store that just so happens to also remember a primary-source textual representation for each structured event-document it ingests — i.e. the original "log line" — so that it can spit "log lines" back out unchanged from their input form when asked. But it might not even have this feature, if it's a structured log store that expects all "log lines" to be "structured logging" formatted, JSON, etc.)
And you know what the most important operation a CQRS/ES event store performs is? A continuous streaming-reduction over particular filtered subsets of the events, to compute CQRS "aggregates" (= live snapshot states / incremental state deltas, which you then continuously load into a data warehouse to power the "query" part of CQRS.)
Most CQRS/ES event stores are built atop message queues (like Kafka), or row-stores (like Postgres). But neither are actually very good backends for powering the "ad-hoc-filtered incremental large-batch streaming" operation.
• With an MQ backend, streaming is easy, but MQs maintain no indices for events per se, just copies of events in different topics; so filtered streaming would either have the filtering occur mostly client-side; or would involve a bolt-on component that is its own "client-side", ala Kafka Streams. You can use topics for this — but only if you know exactly what reduction event-type-sets you'll need before you start publishing any events. Or if you're willing to keep an archival topic of every-event-ever online, so that you can stream over it to retroactively build new filtered topics.
• With a row-store backend, filtered streaming without pre-indexing is tenable — it's a query plan consisting of a primary-key-index-directed seq scan with a filter node. But it's still a lot more expensive than it'd be to just be streaming through a flat file containing the same data, since a seq scan is going to be reading+materializing+discarding all the rows that don't match the filtering rule. You can create (partial!) indices to avoid this — and nicely-enough, in a row-store, you can do this retroactively, once you figure out what the needs of a given reduction job are. But it's still a DBA task rather than a dev task — the data warehouse needs to be tweaked to respond to the needs of the app, every time the needs of the app change. (I would also mention something about schema flexibility here, but Postgres has a JSON column type, and I presume CQRS/ES event-store backends would just use that.)
A CQRS/ES event store built atop a fully-indexed document store / "index store" like ElasticSearch (or Quickwit, apparently) would have all the same advantages of the RDBMS approach, but wouldn't require any manual index creation.
Such a store would perform as if you took the RDBMS version of the solution, and then wrote a little insert-trigger stored-procedure that reads the JSON documents out of each row, finds any novel keys in them, and creates a new partial index for each such novel key. (Except with much lower storage-overhead — because in an "index store" all the indices share data; and much better ability to combine use of multiple "indices", as in an "index store" these are often not actually separate indices at all, but just one index where the key is part of the index.)
---
That being said, you know what you can use the CQRS/ES model for? Reducing your literal "logs" into metrics, as a continuous write-through reduction — to allow your platform to write log events, but have its associated observability platform read back pre-aggregated metrics time-series data, rather than having to crunch over logs itself at query time.
And AFAIK, this "modelling of log messages as CQRS/ES events in a CQRS/ES event store, so that you can do CQRS/ES reductions to them to compute metrics as aggregates" approach is already widely in use — but just not much talked about.
For example, when you use Google Cloud Logging, Google seems to be shoving your log messages into something approximating an event-store — and specifically, one with exactly the filtered-streaming-cost semantics of an "index store" like ElasticSearch (even though they're actually probably using a structured column-store architecture, i.e. "BigTable but append-only and therefore serverless.") And this event store then powers Cloud Logging's "logs-based metrics" reductions (https://cloud.google.com/logging/docs/logs-based-metrics).