> The asymptotic optimality of samplesort is in the sense that it takes (1 +...

cperciva · on Feb 12, 2011

Two reasons: 1. It's a PITA to implement; 2. You need a large N to overcome the o(1), and at that point you're usually dominated by the performance of your memory hierarchy rather than by the cost of doing compares.

Last I heard, python's built-in sort was a type of samplesort, though, so it isn't completely unused.

jacobolus · on Feb 12, 2011

The previous Python sort was a type of samplesort, but the current one is a “adaptive, stable, natural mergesort” by Tim Peters (“timsort”). On random data I think it performs about the same or slightly worse, but when data has some structure, it performs much better. See http://bugs.python.org/file4451/timsort.txt

kragen · on Feb 12, 2011

Your #2 suggests that it would be slower than ordinary Quicksort on normal-sized datasets. But dual-pivot quicksort is faster than ordinary Quicksort on normal-sized datasets. So in at least one sense, dual-pivot is not a step in the direction of samplesort.

Python's built-in sort is timsort, last I heard, which is a variant of mergesort that looks for existing sorted or backwards runs in order to run faster in the common cases of mostly-sorted or mostly-backwards data. It's true that Python's built-in sort was a samplesort variant until 2002, though, which I didn't know.

(It does seem like Samplesort's virtue of minimizing comparisons would be particularly desirable in an environment like Python, where every comparison potentially involves a call into slow interpreted code.)

fmap · on Feb 12, 2011

I just read the original paper on samplesort, so this might not be the most up to date description, but the basic idea is as follows:

- From an input sequence of length n, choose a random subsequence of length k.

- Sort the subsequence.

- Partition the remaining (n-k) elements of the original sequence into the (k+1) partitions given by the subsequence.

- Sort the (k+1) partitions.

In the paper, the partitioning and sorting is always done using quicksort. Any n*log(n) sorting algorithm should work though, which includes using samplesort recursively.

If you use an optimal value for k (and this might be a significant fraction of n), you can prove that lim_{n->infty} E(C_n)/log(n!) = 1, where E(C_n) is the expected number of comparisons for a sequence of length n.

Now, using samplesort recursively, quicksort is samplesort with k = 1. Double-Pivot quicksort is samplesort with k = 2.