Thanks for the thoughts. I agree that you're not going to disintermediate existing datalakes, no matter how successful, so integration makes sense.
Every few months I run up into a use case where I'm like "I want to get a whole bunch of data, analyze it, then search for it later with embeddings, and probably keep running different sorts of analysis on it, and store the embeddings of those analyses in a related way." This still feels fairly difficult to do, or at least there aren't canonical "right" architectures yet.
My instinct is if you nail the ml+dev+data ops needs with good architecture and api you could really have something -- good luck!
Every few months I run up into a use case where I'm like "I want to get a whole bunch of data, analyze it, then search for it later with embeddings, and probably keep running different sorts of analysis on it, and store the embeddings of those analyses in a related way." This still feels fairly difficult to do, or at least there aren't canonical "right" architectures yet.
My instinct is if you nail the ml+dev+data ops needs with good architecture and api you could really have something -- good luck!