Yes, your app would have to support multiple GPUs. What's done here is remoting CUDA/OpenCL/etc. calls so that remote GPUs can be accessed from a single instance. When performing device/platform enumeration, all GPUs appear to be directly connected to a single instance -- hence no change to the application required.
Sounds like Plan9's concept of "CPU server mounts" has been reborn as "GPU server mounts." Could actually get traction this time, given that existing multi-GPU programs will Just Work.
I cant wait for company to provide opencl/cuda mflops as a service instead of giving you vms as a whole, so one could just attach remote engine to any smallish controller vm
What you suggest is technically possible by installing our Boost software on any GPU machine, and then accessing that machine from any clients running our Boost software as well. That client does not need to have a GPU. This configuration is supported in AWS today, where for example you can connect one or more t2.large isntance to a g2.8xlarge. All that would have to be done is some metering on the GPU machine to implement the service you suggest :)
We are not limiting our software to AWS so you can built this kind of service on any kind of cluster by installing our software directly from https://boost.bitfusion.io - I say cluster, because we have played with the idea of thin devices accessing remote GPU instances in the cloud, but over public networks the network performance was a limiting factor.