Java functional programming of ten: collector

  • 2020-04-01 03:30:44
  • OfStack

We've used the collect() method several times before to spell the elements returned by the Stream into an ArrayList. This is a reduce operation, which is useful for converting a collection to another type (usually a mutable collection). The collect() function, if used in conjunction with some of the methods in the Collectors tool class, can provide a great deal of convenience, as we will see in this section.

Let's continue with the previous Person list as an example to see what the collect() method can do. Suppose we want to find all the people over 20 from the original list. Here is the version implemented using the variability and forEach() method:


List<Person> olderThan20 = new ArrayList<>(); people.stream()
        .filter(person -> person.getAge() > 20)
.forEach(person -> olderThan20.add(person)); System.out.println("People older than 20: " + olderThan20);

We used the filter() method to filter out everyone older than 20 from the list. Then, in the forEach method, we add the element to an ArrayList already initialized earlier. Let's look at the output of this code first, and then refactor it later.


People older than 20: [Sara - 21, Jane - 21, Greg - 35]

The output of the program is correct, but there is a small problem. First, adding an element to a collection is a low-level operation -- it's imperative, not declarative. If we want to make this iteration concurrent, we also have to consider thread-safety issues -- variability makes it difficult to parallelize. Fortunately, this problem can be easily solved using the collect() method. So let's see how that works.

The collect() method takes a Stream and collects it into a results container. To do this, it needs to know three things:

+ how to create a result container (e.g., using the ArrayList::new method) + how to add a single element to the container (e.g., using the ArrayList::add method) + how to merge one result set into another (e.g., using the ArrayList::addAll method)

For serial operations, the last one is not required; The code is designed to support both serial and parallel.

We provide these operations to the collect method to collect the filtered stream.


List<Person> olderThan20 =
people.stream()
.filter(person -> person.getAge() > 20)
.collect(ArrayList::new, ArrayList::add, ArrayList::addAll);
System.out.println("People older than 20: " + olderThan20);

The result of this code is the same as before, but there are many benefits to writing this way.

First, the way we program is more focused, more expressive, and clearly communicates the purpose of collecting results into an ArrayList. The first parameter of collect() is a factory or producer, followed by an operation to collect elements.

Second, since we did not perform an explicit change in the code, it was easy to execute the iteration in parallel. We let the underlying library do the modification, and it handles the collaboration and thread-safety issues itself, even though the ArrayList itself isn't thread-safe -- it does a good job.

If conditions permit, the collect() method can add elements to different sublists in parallel, and then merge them into a large list in a thread-safe manner (the last parameter is used to do the merge).

We have seen that there are simply too many benefits to using the collect() method over adding elements to the list manually. Let's look at an overloaded version of this method -- it's simpler and more convenient -- that USES a Collector as an argument. This Collector is an interface that includes producers, add-ins, and consolidators -- operations that in previous versions were passed into methods as separate parameters -- and is simpler and reusable to use with Collector. The Collectors utility class provides a toList method that generates an implementation of a Collector to add elements to the ArrayList. Let's modify the previous code to use the collect() method.


List<Person> olderThan20 =
people.stream()
.filter(person -> person.getAge() > 20)
.collect(Collectors.toList());
System.out.println("People older than 20: " + olderThan20);

This is not the only use of the collect() method, which USES a compact version of the Collectors tool class. There are several different methods in the Collectors tool class for different collection and addition operations. For example, in addition to the toList() method, there's the toSet() method, which you can add to a Set, the toMap() method, which you can use to gather a collection of key-values, and the joining() method, which you can join into a string. We can also combine methods such as mapping(),collectingAndThen(), minBy(), maxBy(), and groupingBy().

Let's use the next groupingBy() method to group people by age.


Map<Integer, List<Person>> peopleByAge =
people.stream()
.collect(Collectors.groupingBy(Person::getAge));
System.out.println("Grouped by age: " + peopleByAge);

Grouping can be done with a simple call to the collect() method. GroupingBy () accepts a lambda expression or method reference -- this is called a classification function -- that returns the value of a property of the object to be grouped. Put the elements from the call context into a group based on the value returned by our function. The result of grouping can be seen in the output:


Grouped by age: {35=[Greg - 35], 20=[John - 20], 21=[Sara - 21, Jane - 21]}

These people have been grouped by age.

In the previous example we grouped people by age. A variant of the groupingBy() method can be grouped by multiple criteria. The simple groupingBy() method USES a classifier for element collection. The generic groupingBy () collector, on the other hand, can specify one collector for each group. That is, elements are collected through different classifiers and collections, as we will see below.

Continuing with the example above, this time instead of grouping people by age, we just take their names and sort them by age.


Map<Integer, List<String>> nameOfPeopleByAge =
people.stream()
.collect(
groupingBy(Person::getAge, mapping(Person::getName, toList())));
System.out.println("People grouped by age: " + nameOfPeopleByAge);

This version of groupingBy() accepts two parameters: the first is the age, which is the condition for grouping, and the second is a collector, which is the result returned by the mapping() function. These methods come from the Collectors utility class, which is statically imported in this code. The mapping() method accepts two parameters, one for the properties used for the mapping and one for where the object is to be collected, such as a list or set. Take a look at the output of the above code:


People grouped by age: {35=[Greg], 20=[John], 21=[Sara, Jane]}

As you can see, people's names are already grouped by age.

Let's look at another combination operation: group names by their initials, and then select the oldest person in each group.


Comparator<Person> byAge = Comparator.comparing(Person::getAge);
Map<Character, Optional<Person>> oldestPersonOfEachLetter =
people.stream()
.collect(groupingBy(person -> person.getName().charAt(0),
reducing(BinaryOperator.maxBy(byAge))));
System.out.println("Oldest person of each letter:");
System.out.println(oldestPersonOfEachLetter);

We first sorted by first letter. To do this, we passed in a lambda expression as the first argument to groupingBy(). This lambda expression is used to return the first letter of a name for grouping purposes. The second parameter is no longer mapping(), but a reduce operation is performed. Within each group, it USES the maxBy() method to pass the oldest of all the elements. This syntax looks a bit bloated because of the combination of operations, but it reads like this: group by initials, and then recurse to the oldest member of the group. Look at the output of this code, which lists the oldest person in the group of names that start with the specified letter.


Oldest person of each letter:
{S=Optional[Sara - 21], G=Optional[Greg - 35], J=Optional[Jane - 21]}

We have seen the power of the collect() method and the Collectors tool class. Spend some more time in your IDE or JDK documentation looking at the Collectors tool class and getting familiar with the methods it provides. Next we'll use lambda expressions to implement some filters.


Related articles: