Analysis of Java traversal set method (implementation principle algorithm performance application situation)

  • 2021-02-17 06:15:33
  • OfStack

An overview of the

Java language, provides a set of data set framework, which defines some abstract data types such as List, Set and so on, each abstract data type of each specific implementation, the underlying use of different implementation methods, such as ArrayList and LinkedList.

In addition, Java provides several different ways to traverse data sets. The developer must clearly understand the characteristics of each traversal, where it fits, and how it performs on the different underlying implementations. The following is a detailed analysis of the content of 1.

How are data elements stored in memory?

Data elements are stored in memory in two main ways:

1. Sequential storage, Random Access (Direct Access) :

In this way, adjacent data elements are stored in adjacent memory addresses, and the entire memory address is contiguous. You can directly calculate the memory address according to the location of the element, directly read. The average time complexity of reading 1 element in a particular location is O(1). Normally, only collections based on arrays have this feature. Java is represented by ArrayList.

2, Chain storage, Sequential Access:

In this way, each data element is not required to be in adjacent locations in memory, and each data element contains the memory address of its next element. You cannot calculate the memory address directly from the location of the elements. You can only read the elements sequentially. The average time complexity of reading 1 element in a particular location is O(n). Mainly represented by linked lists.

Java is represented by LinkedList.

What are the traversal methods provided in Java?

1, Traditional for loop traversal, based on the counter:

The traversal itself maintains a counter outside the collection, and then reads the elements in each position in turn, stopping when the last element is read. Basically, you need to read elements by their position. This is also the most primitive collection traversal method.

Written as follows:


for (int i = 0; i < list.size(); i++) {
list.get(i);
} 

2, Iterator traversal, Iterator:

Iterator was originally a design pattern of OO, the main purpose of which is to shield the characteristics of different data sets, and unify the interface of traversal sets. As an OO language, Java naturally supports the Iterator schema in Collections.

Written as follows:


Iterator iterator = list.iterator();
while (iterator.hasNext()) {
iterator.next();
}

3. foreach loop traversal:

Explicitly declared Iterator and counters are masked.

Advantages: concise code, not easy to error.

Disadvantages: can only do simple traversal, can not operate (delete, replace) in the process of traversal data collection.

Written as follows:


for (ElementType element : list) {
}

What is the implementation principle of each traversal method?

1, Traditional for loop traversal, based on the counter:

The traversal itself maintains a counter outside the collection, and then reads the elements in each position in turn, stopping when the last element is read. Basically, you need to read elements by their position.

Iterator traversal, Iterator:

For each specific data set implemented, the corresponding Iterator is usually required. Compared to traditional for loops, Iterator eliminates explicit traversal counters. So Iterator, which is based on a sequential storage collection, can access data directly by location. The normal implementation of Iterator, which is based on a chained storage set, is to save the current traversal location. It then moves the pointer forward or backward based on the current position.

foreach loop traversal:

According to the decompilated bytecode, foreach is also implemented as Iterator internally, but the Java compiler generates the code for us.

What is the performance of each traversal mode for different storage modes?

1, Traditional for loop traversal, based on the counter:

Because it is based on the position of the element, read by position. So we can see that for sequential storage, because the average time complexity for reading elements at a particular location is O(1), the average time complexity for traversing the entire collection is O(n). For chained storage, because the average time complexity for reading elements at a particular location is O(n), the average time complexity for traversing the entire collection is O(n2) (the square of n).

ArrayList read by position code: Read directly by element position.


transient Object[] elementData;
public E get(int index) {
rangeCheck(index);
return elementData(index);
}
E elementData(int index) {
return (E) elementData[index];
} 

LinkedList read code by position: Every time you need to start from the 0th element and read backwards. It's actually a little bit of an internal optimization.


transient int size = 0;
transient Node<E> first;
transient Node<E> last;
public E get(int index) {
checkElementIndex(index);
return node(index).item;
}
Node<E> node(int index) {
if (index < (size >> 1)) { // The lookup position is in the first half of the list, starting at the head of the list 
Node<E> x = first;
for (int i = 0; i < index; i++)
x = x.next;
return x;
} else { // The lookup position is at the bottom of the list, starting from the bottom of the list 
Node<E> x = last;
for (int i = size - 1; i > index; i--)
x = x.prev;
return x;
}
}

2, Iterator traversal, Iterator:

For collections of RandomAccess type, this does not make much sense, but adds extra runtime because of the extra operations. But for Sequential Access collection, is of great significance, because Iterator internal to maintain the position of the current traversal, so each traversal, read under a position does not need to start the first element of the collection, just put the pointer back 1, 1, traverse the entire collection of time complexity is lower for O (n);

(LinkedList is used as an example.) The iterator for LinkedList, internally, maintains the current traversal position and then moves the pointer:

Code:


public E next() {
checkForComodification();
if (!hasNext())
throw new NoSuchElementException();
lastReturned = next;
next = next.next;
nextIndex++;
return lastReturned.item;
}
public E previous() {
checkForComodification();
if (!hasPrevious())
throw new NoSuchElementException();
lastReturned = next = (next == null) ? last : next.prev;
nextIndex--;
return lastReturned.item;
}

3. foreach loop traversal:

Analysis of Java bytecode, foreach internal implementation principle, is also through Iterator to achieve, but this Iterator compiler is generated for us, so we do not need to manually to write. However, because the type conversion check is done every time, it takes slightly longer than Iterator. The time complexity is the same as Iterator1.

Using ES195en bytecode:


Code:
new # // class java/util/ArrayList
dup
invokespecial # // Method java/util/ArrayList."<init>":()V
astore_
aload_
invokeinterface #, // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;
astore_
goto 
aload_
invokeinterface #, // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
pop
aload_
invokeinterface #, // InterfaceMethod java/util/Iterator.hasNext:()Z
ifne 
return 

Using foreach bytecode:


Code:
new # // class java/util/ArrayList
dup
invokespecial # // Method java/util/ArrayList."<init>":()V
astore_
aload_
invokeinterface #, // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;
astore_
goto 
aload_
invokeinterface #, // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
checkcast # // class loop/Model
astore_
aload_
invokeinterface #, // InterfaceMethod java/util/Iterator.hasNext:()Z
ifne 
return 

What is the application of each traversal mode?

1, Traditional for loop traversal, based on the counter:

Sequential storage: High read performance. Applicable to traversing sequential storage collections.

Chained storage: Too time complex to traverse a collection of chained storage.

2, Iterator traversal, Iterator:

Sequential storage: If you don't care too much about time, it is recommended to choose this method, after all, the code is more concise, and it also prevents Off-By-One problems.

Chain storage: This makes a lot of sense. The average time complexity is reduced to O(n), which is quite attractive, so this traversal method is recommended.

3. foreach loop traversal:

foreach only makes the code more concise, but it has a disadvantage, is that the process of traversal can not operate the data set (delete, etc.), so some cases do not use. It is also based on Iterator, but due to type conversion issues, it is a bit slower than using Iterator directly, but the time complexity is still the same. So how to choose, refer to the above two ways, make a compromise choice.

What are the best practices for Java?

In the Java data set framework, an RandomAccess interface is provided. This interface has no methods, just a tag. Usually used by implementations of the List interface to indicate whether an implementation of List supports Random or not.

A data set that implements this interface means that it supports Random Access, and the average time complexity of reading elements by location is O(1). Such as ArrayList.

If Random is not implemented, Access is not supported. Such as LinkedList.

It is recommended that if you want to traverse a single List, you first verify that Random Access is supported, i.e. list instanceof RandomAccess.

Such as:


if (list instanceof RandomAccess) {
// Use traditional for Loop through. 
} else {
// use Iterator or foreach . 
}

The above is the site to introduce Java traversal set method analysis (implementation principle, algorithm performance, application occasions), I hope to help you!


Related articles: