Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your #2 suggests that it would be slower than ordinary Quicksort on normal-sized datasets. But dual-pivot quicksort is faster than ordinary Quicksort on normal-sized datasets. So in at least one sense, dual-pivot is not a step in the direction of samplesort.

Python's built-in sort is timsort, last I heard, which is a variant of mergesort that looks for existing sorted or backwards runs in order to run faster in the common cases of mostly-sorted or mostly-backwards data. It's true that Python's built-in sort was a samplesort variant until 2002, though, which I didn't know.

(It does seem like Samplesort's virtue of minimizing comparisons would be particularly desirable in an environment like Python, where every comparison potentially involves a call into slow interpreted code.)



I just read the original paper on samplesort, so this might not be the most up to date description, but the basic idea is as follows:

- From an input sequence of length n, choose a random subsequence of length k.

- Sort the subsequence.

- Partition the remaining (n-k) elements of the original sequence into the (k+1) partitions given by the subsequence.

- Sort the (k+1) partitions.

In the paper, the partitioning and sorting is always done using quicksort. Any n*log(n) sorting algorithm should work though, which includes using samplesort recursively.

If you use an optimal value for k (and this might be a significant fraction of n), you can prove that lim_{n->infty} E(C_n)/log(n!) = 1, where E(C_n) is the expected number of comparisons for a sequence of length n.

Now, using samplesort recursively, quicksort is samplesort with k = 1. Double-Pivot quicksort is samplesort with k = 2.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: