Sharing of ideas for Java sorting implementation

  • 2020-04-01 02:45:39
  • OfStack

1. An overview of the
Sorting and search are two very basic problems in programming, and now there are a lot of classic algorithms to solve these two problems, this paper mainly on the Java sorting algorithm implementation of a basic discussion, hope to play a role in the introduction. First, a few questions: can you write a correct quicksort? Under what circumstances is quicksort really fast? Is your quickie fast enough? Can it be further optimized? With these questions in mind, let's take a look at how quicksort is implemented in jre7.

The sorting implementation class in Jre7 is dualpivotquicksor.java, and there are two major changes from jre6, one on the implementation of the insertion sort and the other on the QuickSort implementation where pivot changes from one to two. Let's take an int array as an example. This class has a core method for sorting implementation, which is prototyped as


void sort(int[] a, int left, int right, boolean leftmost)

Parameter a is the array to be sorted, left represents the index of the leftmost element in the array interval to be sorted, right represents the index of the rightmost element in the interval, and leftmost represents whether the interval is the leftmost interval in the array. For example:
Array: [2, 4, 8, 5, 6, 3, 0, 3, 9] can be divided into three intervals (2, 4, 8) {5, 6} < 3, 0, 3, 9 >
For the () interval, left=0, right=2, leftmost=true
For {} interval, left=3, right=4, leftmost=false, the same can be obtained < > The corresponding parameter of the interval

When the interval length is less than 47, the method adopts insertion sort. Otherwise, we use quicksort.

2. Insertion sort implementation
When leftmost is true, it USES traditional insertion sort, and the code is simpler. The process is similar to drawing and inserting CARDS at poker:


for (int i = left, j = i; i < right; j = ++i) {
                    int ai = a[i + 1];
                    while (ai < a[j]) {
                        a[j + 1] = a[j];
                        if (j-- == left) {
                            break;
                        }
                    }
                    a[j + 1] = ai;
                } 

Traditional insert sort code
When leftmost is false, it USES a new pair insertion sort, which is improved by inserting two elements into previously ordered arrays each time, whereas traditional insertion sort only needs to find the right place for one element during the traversal. For insertion sort, the key is to find the right insertion position for the element to be inserted. In order to find this position, you need to traverse the subarray that has been sorted before, so for insertion sort, the number of elements it traverses determines its performance. Obviously, inserting two elements at a time can reduce the number of elements traversed in the sorting process. The implementation code is as follows:


for (int k = left; ++left <= right; k = ++left) {
                    int a1 = a[k], a2 = a[left];

                    if (a1 < a2) {
                        a2 = a1; a1 = a[left];
                    }
                    while (a1 < a[--k]) {
                        a[k + 2] = a[k];
                    }
                    a[++k + 1] = a1;

                    while (a2 < a[--k]) {
                        a[k + 1] = a[k];
                    }
                    a[k + 1] = a2;
                }

Now there's a question: why do the leftmost intervals use traditional insertion sort, and the others use paired insertion sort? What's the problem with adding the paired insertion sort code mentioned above to replace the traditional insertion sort code? I look forward to your answers...
3. Quicksort implementation
In Jre7 is made improvement on quick sort, quick sort of traditional selection is a pivot (jre6 ways to select the pivot is selected array the left, middle and the right position of the elements, in which numerical rank in the middle to the size of the element as the pivot), then respectively from both ends to the middle traversal, the traverse on the left is greater than the value and the right side of the pivot in the process of traverse of less than or equal to the pivot value exchange, after traversal meet, insert the pivot value; So that the left side of pivot is less than or equal to pivot, and the right side of pivot is greater than pivot; Then we recursively sort the left and the right.

Through the above analysis, we can see: every step of insertion sort is to make a child of an array of interval absolute and orderly, and every time cycle is the essence of kept expanding the subinterval, so we can see the optimization direction is to make each loop traverse may extend the intervals of speed faster, so the above each traversal insert an element optimization to insert two elements at a time. Of course, some people will ask, why not make it bigger? Like 5 at a time, 10 at a time... Obviously, this doesn't work. At one extreme, it inserts n at a time. As to why, answer for yourself.

For quicksort, what it does every time recursively is it makes the subintervals that need to be sorted more ordered, not absolutely ordered; So for quicksort, the performance is determined by the degree to which the subinterval to be sorted is ordered by each recursive operation, and the other determinant, of course, is the number of recursions. The key to quicksort making the subintervals relatively ordered is pivot, so our optimization direction should also be pivot, so let's increase the number of pivot, and we can see that increasing the number of pivot doesn't have much effect on the number of recursions, and sometimes even reduces the number of recursions. The problem with insert sort is, how many does pivot increase to? Obviously, pivot cannot be too large; Keep in mind that any optimization has a cost, and the cost of increasing pivot is hidden in the process of exchanging elements each time. Guan zi seems to be selling a little big... Now let's see how quicksort works when pivot is 2. The realization process is not difficult to understand:
1.   Firstly, two pivot are selected. The selection method of pivot is to divide the array into six segments of equal length, which are actually separated by five elements. The five elements are sorted from small to large, and the second and fourth pivot are selected as pivot1 and pivot2 respectively.
2.   After Pivot is selected, the Pivot traverses from left to right to the middle. The left traversal stops when a value greater than or equal to pivot1 is encountered, and the position is marked as less. The stopping condition for the right traversal is to encounter a value less than or equal to pivot2 and mark that location as great
3.   Then iterate backward from the less position. The traversed position is represented by k, and the following situations will occur:
A.   The value of position k is smaller than pivot1, so the values of position k and less are exchanged, and the value of position k is less plus 1. In this way, the values on the left of the less position are all less than pivot1, while the values between the less and k positions are greater than or equal to pivot1
B.   If the value of position k is greater than pivot2, then traverse from the great position to the left. The traversal stop condition is that a value less than or equal to pivot2 is encountered. If the value is less than pivot1, write the value to the less position. If this value is greater than or equal to pivot1, the k position and the great position are swapped, and then the great --
4.   After the above procedure is completed, the sorted subinterval is divided into three segments (< Pivot1, pivot1 < = & & < = pivot2, > Pivot2), and finally, each of the three segments is recursed.



The core of the sorting implementation in Jre7 is described above, and you can refer to the code in the corresponding class for details, such as errors or errors.


Related articles: