We implemented background removal in an iOS app recently. We went down a similar route, but ended up choosing a user directed grabcut (heavily modified).
It would be interesting to take the output of this and use the alpha mask as the starting point for the grabcut mask.
It would be interesting to take the output of this and use the alpha mask as the starting point for the grabcut mask.