The traditional way to do this is taking a stack of aligned images and applying a median filter: http://www.jnack.com/adobe/photoshop/fountain/ which relies on each pixel in the median case, not having a tourist in front of it. There might be tourist destinations so busy that over a set of photos, a certain pixels contain a tourist more often than not.
Just so it's clear: the median filter way of doing this doesn't rely on finding two images with non-overlapping people, it uses the open spaces from all the images. As such, you don't need an exponentially growing number of images.
You'll still run into issues with two things:
1. Someone napping or being otherwise still throughout your photos will show up in the finished product.
2. Systems which are stationary but put out a lot of internal movement (trees, video screens) will likely show up as random-colored pixels within the range of their colors. For trees this would look like a blur. For TV screens it would probably end up gray-ish or staticky.
This remind me how the first picture of a human was being taken. Louis Daguerre took pictures of busy streets in Paris, but because the photographs had an exposure time of around 15 minutes, the streets looked empty. The exception was a guy that had his shoes shined, since he had been standing still for the period the picture was taken.
I think the best solution for tree blur would be to (after applying the median stack mode) add any picture in the stack (that has clear trees) and use a layer mask to just paint the trees in.
Or you can just do it on christmas day: http://www.ianvisits.co.uk/blog/2008/12/25/deserted-london/