I'm talking about time-sharing. It doesn't mater if it's smaller instances sharing a single GPU instance or many instances sharing many GPU instances. Essentially N:M sharing (with some scheduling).
Since the GPU client is now abstracted from the GPU devices by placing the GPUs across the network. It seams like time-sharing should be next logical step.
Got it, this is actually already supported. At the very end of the blog post there is a link to create a custom configuration. You can create any N:M configuration, that is any number of clients to servers and therefore the level of performance scaling or GPU pooling.
Did you guys think about building it further out to provide a GPU load balancer for multiple frontend machines running Cuda / OpenCL?