Would anyone be interested in discussing this paper together, especially through the lens of "how can we schedule algebraic expressions and checkpoint computed progress across a heterogenous pool of consumer/donated machines?"
I'm an infrastructure engineer, mostly focused on databases, data pipelines, ML infra for the past 10-15 years. Even when designing homogenous compute clusters, I had to dig in and understand compiler-level implementations in MLIR and LLVM. I'm not a compiler expert by any measure, but know just enough to be dangerous and curious about (safely) scheduling computations across a pool of volunteers machines. Seems especially important to chew on now, with training of foundational LLM weights costing 7-9 figures.
I broadly classify them as such since the former has a stronger disposition towards linear/tensor-algebra, while the latter towards relational algebra, and it isn't yet clear (to me) how well innovations in one carry over to the other (if they do), and hence I'm also curious to hear more about proposals for a unified language across linalg and relational alg (e.g. https://news.ycombinator.com/item?id=36349015).
I'm an infrastructure engineer, mostly focused on databases, data pipelines, ML infra for the past 10-15 years. Even when designing homogenous compute clusters, I had to dig in and understand compiler-level implementations in MLIR and LLVM. I'm not a compiler expert by any measure, but know just enough to be dangerous and curious about (safely) scheduling computations across a pool of volunteers machines. Seems especially important to chew on now, with training of foundational LLM weights costing 7-9 figures.