Sorting

1 minute read

Sorting is always an important topic in computer science. Theoretically, comparison sorts cannot perform better than O(n lgn) in the best case, better performance is often possible - although not guaranteed - on real-world data.

Heapsort

Heapsort guarantees O(n lg n) performance. But in most case heapsort is slower than Quicksort. If O (n2) in worst case is not acceptable, heapsort would be choosen over Quicksort.

  • Tournament sort is a sort algorithm like a tournament, all pairs of values are compared and a “winner” got promoted to the next compare.

Tournament sort takes O(n) to build the initial tournament structure, the height of the tournament tree is lg n. So it takes O (n lgn). The drawback is that it needs 2n space.

Mergesort

Mergesort is a divide and conquer sorting algorithm.

  1. Divide A into n l sorted sublists, or runs, of length l.

  2. Merge pairs of runs into new runs of size 2l.

  3. Continue merging runs until l = n, producing a sorted version of A.

The overall performance is O (n lgn). It needs O(n) additional space.

Timsort

Timsort is initially implemented as a standard sorting method in Python. It is a hybrid sorting algorithm, a combination of insertion sort and an adaptive mergesort, built specifically to work well on real-world data.

The basic idea is that any array A can be regarded as a sequence of sorted runs, ascending runs and descentding runs. Timsort leverageds this fact by merging these runs together to be a sorted array.

For the top 3 runs of a stack X Y Z, we can merge these into X(YZ) or (XY)Z. We can only merge consecutive runs. And we need to maintain a balance merge for all runs. So we choose a way to make the lengths of two merged runs as near as possible.

Reference

[1] Disk-Based Algorithms for Big Data By Christopher G. Healey