Tim Sort
Introduction and Background
There is a meme that I came across on LinkedIn the other day, this is how it goes:
Interviewer: How would you sort an array of numbers?
Python Programmer: a.sort()
Interviewer: -_-
In the interviewee’s defence, he gave the right answer, but he misunderstood the question. The interviewer clearly wanted elaboration of some elaborate sorting algorithm. However, one must consider, there is more to a.sort() than is visible to the surface. Obviously python’s sort()/sorted() function also must use some or the other sorting algorithm under the hood, dare I say, the most common programming language used today might even use the best sorting algorithm out there! That is the trust the interviewee placed in his answer, and there seems to be great merit to it after all.
In more than just a few words, let’s take the pain to dig deeper and talk about the algorithm that python (and many other languages) use to sort arrays by default — Tim Sort.
Prerequisite
To understand Tim Sort, it is imperative for us to understand and familiarise ourselves with the base nature and types of sorting algorithms as well as two specific algorithms — insertion sort and merge sort.
While it might be in your favour to understand in depth (perhaps via a similar blog) the workings of both insertion sort and merge sort, I will brush over them both, so as to ensure we are on the same page, even if we are on different sides of the screen.
Merge Sort
At every recursive call merge sort sorts the given array into half and calls itself on both halves. The break condition is when the array has only two elements then it swaps them to get them in said order and if it has one element it simply returns it. It then uses a merge algorithm to merge both sorted halves.
It has the best, worst and average case complexity of O(n log n).
Insertion Sort
It iterates over the elements of the array from the left. For each iteration it moves and swaps the element with its predecessor repeatedly until the ideal position for said element is reached.
The best case complexity is O(n), while average and worst case time complexity is O(n sq.) which occurs when the array is in reverse order.
Tim Sort — Working
Tim sort is a combination of merge sort and insertion sort. The algorithm first breaks down the array into sub arrays of a certain size (which is within the range of 32 and 64 inclusive such that the sub arrays are almost equal). Let’s call a single sub -array of such nature a “RUN”. Let the minimum size of a single RUN be termed “minrun”. The algorithm then performs insertion sort on each of these RUNs and then merges two of them at a time using the merge function of merge sort.
The “certain size”, or the “minrun” can be calculated by taking the 6 most significant bits of N, and adding one to it if it contains at least one zero.
Time Complexity of Tim Sort
Average Case
The algorithm requires O(nlogn) comparisons to sort an array of n elements. Hence, the time complexity is of the order of [Big Theta]: O(nlogn).
Worst Case
In the worst-case, nlogn comparisons are required. The worst-case time complexity is [Big O]: O(nlogn). It is the same as average-case time complexity.
Best Case
The best-case occurs when the array is already sorted, and no swaps are required. The best-case time complexity is [Big Omega]: O(n).
Space Complexity of Tim sort
Space Complexity for this algorithm is O(n) because extra auxiliary space of O(n) is required by the merge sort algorithm.
Understand It Better
Understanding Tim sort:
https://www.javatpoint.com/tim-sort
Understanding Insertion Sort:
https://www.interviewbit.com/tutorial/insertion-sort-algorithm/
Understanding Merge Sort: