Discussion on common for loop traversal LinkedList disadvantages

2020-05-30 20:18:51
OfStack

During the development of java, the most used List sets are ArrayList and LinkedList. For ArrayList traversal, the usual method is the following:


public static void main(String[] args)
{
  List<Integer> arrayList = 
      new ArrayList<Integer>();
  
  for (int i = 0; i < 100; i++)
    arrayList.add(i);
  for (int i = 0; i < 100; i++)
    System.out.println(arrayList.get(i));
}

If the set is changed to LinkedList, maybe we will use the same method to traverse, as follows:


public static void main(String[] args)
{
  List<Integer> linkedList = 
      new LinkedList<Integer>();
  
  for (int i = 0; i < 100; i++)
    linkedList.add(i);
  for (int i = 0; i < 100; i++)
    System.out.println(linkedList.get(i));
}

Remember: this is a very bad practice. This is no longer the problem of Java, but the problem of data structure. I believe that the language will be changed from Java to other languages as well.

The common for cycle efficiency of ArrayList and LinkedList are tested below and the reasons are analyzed.

ArrayList and LinkedList use normal for loop traversal speed comparison

First, the test code is given:


public class ListIteratorTest
{
  private final static int LIST_SIZE = 1000;
  
  public static void main(String[] args)
  {
    List<Integer> arrayList = 
        new ArrayList<Integer>();
    List<Integer> linkedList = 
        new LinkedList<Integer>();
    
    for (int i = 0; i < LIST_SIZE; i++)
    {
      arrayList.add(i);
      linkedList.add(i);
    }
    
    long startTime = System.currentTimeMillis();
    for (int i = 0; i < arrayList.size(); i++)
      arrayList.get(i);
    System.out.println("ArrayList Traversal speed: " + (System.currentTimeMillis() - startTime) + "ms");
    
    startTime = System.currentTimeMillis();
    for (int i = 0; i < linkedList.size(); i++)
      linkedList.get(i);
    System.out.println("LinkedList Traversal speed: " + (System.currentTimeMillis() - startTime) + "ms");
  }
}

Increasing LIST_SIZE, I use the table to represent the following operation results:

	1000	5000	10000	50000	100000
ArrayList	0ms	1ms	2ms	3ms	3ms
LinkedList	3ms	16ms	88ms	2446ms	18848ms

Here's why. We can see from the operation results that if the capacity of List is increased by multiple, the traversal of ArrayList is relatively stable, while the traversal of LinkedList is almost explosive, so it is no longer necessary for further testing.

ArrayList USES regular for for fast loop traversal reasons

First look at the ArrayList get method source code:


public E get(int index) {
RangeCheck(index);

return (E) elementData[index];
}

When you see ArrayList's get method, you're just taking one element from the array. We have concluded that the time complexity of ArrayList's get method is O(1), and O(1) means that the time complexity is 1 constant, and it has nothing to do with the size of the array, as long as the location of the array is given, the data can be located directly.

If you're familiar with C, C++, or Pointers, it's easy to understand why, but let me explain to you why it's faster to use get for arrays.

At the bottom of the computer, the data has an address, just like a person has an address 1. Let's say I write this code:


int[3] ints = {1, 3, 5};

In Java, a piece of int data is 4 bytes. What the computer does internally is to find a contiguous chunk of memory space in the memory space that is large enough to hold 3 4-byte, or 12-byte arrays, and return the first address of that memory space. For example, if the first address of the memory space is 0x00, then 1 will be placed in 0x00~0x03, 3 in 0x04~0x07, and 5 in 0x08~0x0B.

When ints[1] is taken, the computer will calculate that the data of ints[1] is stored in memory which begins with 0x04 and occupies 4 bytes of space. Therefore, the computer will read the data from the address space of 0x04~0x07.

The whole thing, it doesn't matter how big the array is, all the computer does is figure out the starting address > We only go to the address to fetch the data, so we can see that for loops through ArrayList with normal for are very fast and stable.

LinkedList USES normal for to loop through the cause of slow

Take a look at what LinkedList's get method does:


public E get(int index) {
  return entry(index).element;
}

 Node<E> node(int index) {
     // assert isElementIndex(index);
 
     if (index < (size >> 1)) {
       Node<E> x = first;
       for (int i = 0; i < index; i++)
         x = x.next;
       return x;
     } else {
       Node<E> x = last;
       for (int i = size - 1; i > index; i--)
         x = x.prev;
      return x;
    }
  }

Since LinkedList is a bidirectional linked list, the fourth line means that it is much faster to calculate whether i is traversed before or after 1 1/2, before 1 1/2 in the positive order, and after 1 1/2 in the reverse order. Of course, regardless of this, why is it so slow to iterate through LinkedList using normal for?

The reason lies in the two for traces in lines 6~ 7 and 11~ 12. Take the former as an example:

1. get(0), directly get the address of Node0 in bit 0, and get the data in Node0

2. get(1), get the address of Node0 in bit 0 directly, find the address of Node1 in bit 0 in Node0, find Node1, and get the data in Node1

3. get(2), directly get the address of Node0 bit, find the address of Node1 bit from Node0 bit, find Node1 bit, find the address of Node2 bit from Node1 bit, find Node2 bit, get the data in Node2 bit.

And so on.

In other words, when LinkedList is in any position of get, it will walk the previous data once. If I have 10 data, I will query 1+2+3+4+5+5+4+3+2+1=30 times of data. Compared with ArrayList, I only need to query 10 times of data. The larger the capacity of LinkedList is, the larger the gap will be. In fact, it should be clear to you how many times you need to query the data using LinkedList. Let's calculate 1: The first half should be (1 + 0.5N) * 0.5N / 2, and the second half should be (1 + 0.5N) * 0.5N = 0.25N2 + 0.5N, ignoring the low-order term and the first term coefficient, it is concluded that the traversal time complexity of LinikedList is O(N2), and N is the capacity of LinkedList.

Time complexity has the following rules of thumb:

O(1) < O(log2N) < O(n) < O(N * log2N) < O(N2) < O(N3) < 2N < 3N < N!

The first four are better, the middle two are like 1, and the last three are terrible. So O(N2) is a relatively poor time complexity, and N is 1 point larger, so the program will execute slower.

Afterword.

Remember 1 never use the normal for loop to traverse LinkedList. Iterator or foreach loop (the principle of foreach loop is the iterator) can be used to traverse LinkedList, which is to find the data directly according to the address, which will greatly improve the efficiency of traversing LinkedList.