Java 8 Stream API 자습서

1. 개요

이 심층 자습서에서는 생성에서 병렬 실행에 이르기까지 Java 8 Streams의 실제 사용법을 살펴 봅니다.

이 자료를 이해하려면 독자는 Java 8 (람다 표현식, 선택 사항, 메소드 참조) 및 Stream API에 대한 기본 지식이 있어야합니다 . 이러한 주제에 익숙하지 않은 경우 이전 기사 인 Java 8의 새로운 기능 및 Java 8 스트림 소개를 참조하십시오.

2. 스트림 생성

다양한 소스의 스트림 인스턴스를 만드는 방법에는 여러 가지가 있습니다. 일단 생성되면 인스턴스 는 소스를 수정하지 않으므로 단일 소스에서 여러 인스턴스를 생성 할 수 있습니다.

2.1. 빈 스트림

빈 () 메소드는 빈 스트림의 생성시에 사용한다 :

Stream streamEmpty = Stream.empty();

요소가없는 스트림에 대해 null 반환을 방지하기 위해 생성시 empty () 메서드가 사용되는 경우가 많습니다 .

public Stream streamOf(List list)  return list == null 

2.2. 컬렉션의 흐름

스트림은 모든 유형의 컬렉션 ( Collection, List, Set ) 으로 만들 수도 있습니다 .

Collection collection = Arrays.asList("a", "b", "c"); Stream streamOfCollection = collection.stream();

2.3. 어레이 스트림

배열은 스트림의 소스 일 수도 있습니다.

Stream streamOfArray = Stream.of("a", "b", "c");

기존 배열 또는 배열의 일부에서 만들 수도 있습니다.

String[] arr = new String[]{"a", "b", "c"}; Stream streamOfArrayFull = Arrays.stream(arr); Stream streamOfArrayPart = Arrays.stream(arr, 1, 3);

2.4. Stream.builder ()

빌더를 사용할 때 원하는 유형을 문의 오른쪽 부분에 추가로 지정해야합니다. 그렇지 않으면 build () 메서드가 Stream 의 인스턴스를 만듭니다 .

Stream streamBuilder = Stream.builder().add("a").add("b").add("c").build();

2.5. Stream.generate ()

생성 () 메소드는 받아 공급 요소를 생성. 결과 스트림은 무한하므로 개발자는 원하는 크기를 지정해야합니다. 그렇지 않으면 generate () 메서드가 메모리 제한에 도달 할 때까지 작동합니다.

Stream streamGenerated = Stream.generate(() -> "element").limit(10);

위의 코드는 "element" 값을 가진 10 개의 문자열 시퀀스를 생성합니다 .

2.6. Stream.iterate ()

무한 스트림을 만드는 또 다른 방법은 iterate () 메서드를 사용하는 것입니다.

Stream streamIterated = Stream.iterate(40, n -> n + 2).limit(20);

결과 스트림의 첫 번째 요소는 iterate () 메서드 의 첫 번째 매개 변수입니다 . 모든 다음 요소를 생성하기 위해 지정된 기능이 이전 요소에 적용됩니다. 위의 예에서 두 번째 요소는 42입니다.

2.7. 원시 스트림

Java 8은 int, longdouble 의 세 가지 기본 유형에서 스트림을 생성 할 수있는 가능성을 제공합니다 . 스트림은 일반적인 인터페이스이며, 제네릭 형식 매개 변수로 프리미티브를 사용하는 방법이 없습니다, 세 가지 새로운 특수 인터페이스가 만들어졌습니다 : IntStream, LongStream, DoubleStream을.

새로운 인터페이스를 사용하면 불필요한 자동 박싱이 완화되어 생산성이 향상됩니다.

IntStream intStream = IntStream.range(1, 3); LongStream longStream = LongStream.rangeClosed(1, 3);

The range(int startInclusive, int endExclusive) method creates an ordered stream from the first parameter to the second parameter. It increments the value of subsequent elements with the step equal to 1. The result doesn't include the last parameter, it is just an upper bound of the sequence.

The rangeClosed(int startInclusive, int endInclusive)method does the same with only one difference – the second element is included. These two methods can be used to generate any of the three types of streams of primitives.

Since Java 8 the Random class provides a wide range of methods for generation streams of primitives. For example, the following code creates a DoubleStream, which has three elements:

Random random = new Random(); DoubleStream doubleStream = random.doubles(3);

2.8. Stream of String

String can also be used as a source for creating a stream.

With the help of the chars() method of the String class. Since there is no interface CharStream in JDK, the IntStream is used to represent a stream of chars instead.

IntStream streamOfChars = "abc".chars();

The following example breaks a String into sub-strings according to specified RegEx:

Stream streamOfString = Pattern.compile(", ").splitAsStream("a, b, c");

2.9. Stream of File

Java NIO class Files allows to generate a Stream of a text file through the lines() method. Every line of the text becomes an element of the stream:

Path path = Paths.get("C:\\file.txt"); Stream streamOfStrings = Files.lines(path); Stream streamWithCharset = Files.lines(path, Charset.forName("UTF-8"));

The Charset can be specified as an argument of the lines() method.

3. Referencing a Stream

It is possible to instantiate a stream and to have an accessible reference to it as long as only intermediate operations were called. Executing a terminal operation makes a stream inaccessible.

To demonstrate this we will forget for a while that the best practice is to chain sequence of operation. Besides its unnecessary verbosity, technically the following code is valid:

Stream stream = Stream.of("a", "b", "c").filter(element -> element.contains("b")); Optional anyElement = stream.findAny();

But an attempt to reuse the same reference after calling the terminal operation will trigger the IllegalStateException:

Optional firstElement = stream.findFirst();

As the IllegalStateException is a RuntimeException, a compiler will not signalize about a problem. So, it is very important to remember that Java 8 streams can't be reused.

This kind of behavior is logical because streams were designed to provide an ability to apply a finite sequence of operations to the source of elements in a functional style, but not to store elements.

So, to make previous code work properly some changes should be done:

List elements = Stream.of("a", "b", "c").filter(element -> element.contains("b")) .collect(Collectors.toList()); Optional anyElement = elements.stream().findAny(); Optional firstElement = elements.stream().findFirst();

4. Stream Pipeline

To perform a sequence of operations over the elements of the data source and aggregate their results, three parts are needed – the source, intermediate operation(s) and a terminal operation.

Intermediate operations return a new modified stream. For example, to create a new stream of the existing one without few elements the skip() method should be used:

Stream onceModifiedStream = Stream.of("abcd", "bbcd", "cbcd").skip(1);

If more than one modification is needed, intermediate operations can be chained. Assume that we also need to substitute every element of current Stream with a sub-string of first few chars. This will be done by chaining the skip() and the map() methods:

Stream twiceModifiedStream = stream.skip(1).map(element -> element.substring(0, 3));

As you can see, the map() method takes a lambda expression as a parameter. If you want to learn more about lambdas take a look at our tutorial Lambda Expressions and Functional Interfaces: Tips and Best Practices.

A stream by itself is worthless, the real thing a user is interested in is a result of the terminal operation, which can be a value of some type or an action applied to every element of the stream. Only one terminal operation can be used per stream.

The right and most convenient way to use streams are by a stream pipeline, which is a chain of stream source, intermediate operations, and a terminal operation. For example:

List list = Arrays.asList("abc1", "abc2", "abc3"); long size = list.stream().skip(1) .map(element -> element.substring(0, 3)).sorted().count();

5. Lazy Invocation

Intermediate operations are lazy. This means that they will be invoked only if it is necessary for the terminal operation execution.

To demonstrate this, imagine that we have method wasCalled(), which increments an inner counter every time it was called:

private long counter; private void wasCalled() { counter++; }

Let's call method wasCalled() from operation filter():

List list = Arrays.asList(“abc1”, “abc2”, “abc3”); counter = 0; Stream stream = list.stream().filter(element -> { wasCalled(); return element.contains("2"); });

As we have a source of three elements we can assume that method filter() will be called three times and the value of the counter variable will be 3. But running this code doesn't change counter at all, it is still zero, so, the filter() method wasn't called even once. The reason why – is missing of the terminal operation.

Let's rewrite this code a little bit by adding a map() operation and a terminal operation – findFirst(). We will also add an ability to track an order of method calls with a help of logging:

Optional stream = list.stream().filter(element -> { log.info("filter() was called"); return element.contains("2"); }).map(element -> { log.info("map() was called"); return element.toUpperCase(); }).findFirst();

Resulting log shows that the filter() method was called twice and the map() method just once. It is so because the pipeline executes vertically. In our example the first element of the stream didn't satisfy filter's predicate, then the filter() method was invoked for the second element, which passed the filter. Without calling the filter() for third element we went down through pipeline to the map() method.

The findFirst() operation satisfies by just one element. So, in this particular example the lazy invocation allowed to avoid two method calls – one for the filter() and one for the map().

6. Order of Execution

From the performance point of view, the right order is one of the most important aspects of chaining operations in the stream pipeline:

long size = list.stream().map(element -> { wasCalled(); return element.substring(0, 3); }).skip(2).count();

Execution of this code will increase the value of the counter by three. This means that the map() method of the stream was called three times. But the value of the size is one. So, resulting stream has just one element and we executed the expensive map() operations for no reason twice out of three times.

If we change the order of the skip() and the map() methods, the counter will increase only by one. So, the method map() will be called just once:

long size = list.stream().skip(2).map(element -> { wasCalled(); return element.substring(0, 3); }).count();

This brings us up to the rule: intermediate operations which reduce the size of the stream should be placed before operations which are applying to each element. So, keep such methods as skip(), filter(), distinct() at the top of your stream pipeline.

7. Stream Reduction

The API has many terminal operations which aggregate a stream to a type or to a primitive, for example, count(), max(), min(), sum(), but these operations work according to the predefined implementation. And what if a developer needs to customize a Stream's reduction mechanism? There are two methods which allow to do this – the reduce()and the collect() methods.

7.1. The reduce() Method

There are three variations of this method, which differ by their signatures and returning types. They can have the following parameters:

identity – the initial value for an accumulator or a default value if a stream is empty and there is nothing to accumulate;

accumulator – a function which specifies a logic of aggregation of elements. As accumulator creates a new value for every step of reducing, the quantity of new values equals to the stream's size and only the last value is useful. This is not very good for the performance.

combiner – a function which aggregates results of the accumulator. Combiner is called only in a parallel mode to reduce results of accumulators from different threads.

So, let's look at these three methods in action:

OptionalInt reduced = IntStream.range(1, 4).reduce((a, b) -> a + b);

reduced = 6 (1 + 2 + 3)

int reducedTwoParams = IntStream.range(1, 4).reduce(10, (a, b) -> a + b);

reducedTwoParams = 16 (10 + 1 + 2 + 3)

int reducedParams = Stream.of(1, 2, 3) .reduce(10, (a, b) -> a + b, (a, b) -> { log.info("combiner was called"); return a + b; });

The result will be the same as in the previous example (16) and there will be no login which means, that combiner wasn't called. To make a combiner work, a stream should be parallel:

int reducedParallel = Arrays.asList(1, 2, 3).parallelStream() .reduce(10, (a, b) -> a + b, (a, b) -> { log.info("combiner was called"); return a + b; });

The result here is different (36) and the combiner was called twice. Here the reduction works by the following algorithm: accumulator ran three times by adding every element of the stream to identity to every element of the stream. These actions are being done in parallel. As a result, they have (10 + 1 = 11; 10 + 2 = 12; 10 + 3 = 13;). Now combiner can merge these three results. It needs two iterations for that (12 + 13 = 25; 25 + 11 = 36).

7.2. The collect() Method

Reduction of a stream can also be executed by another terminal operation – the collect() method. It accepts an argument of the type Collector, which specifies the mechanism of reduction. There are already created predefined collectors for most common operations. They can be accessed with the help of the Collectors type.

In this section we will use the following List as a source for all streams:

List productList = Arrays.asList(new Product(23, "potatoes"), new Product(14, "orange"), new Product(13, "lemon"), new Product(23, "bread"), new Product(13, "sugar"));

Converting a stream to the Collection (Collection, List or Set):

List collectorCollection = productList.stream().map(Product::getName).collect(Collectors.toList());

Reducing to String:

String listToString = productList.stream().map(Product::getName) .collect(Collectors.joining(", ", "[", "]"));

The joiner() method can have from one to three parameters (delimiter, prefix, suffix). The handiest thing about using joiner() – developer doesn't need to check if the stream reaches its end to apply the suffix and not to apply a delimiter. Collector will take care of that.

Processing the average value of all numeric elements of the stream:

double averagePrice = productList.stream() .collect(Collectors.averagingInt(Product::getPrice));

Processing the sum of all numeric elements of the stream:

int summingPrice = productList.stream() .collect(Collectors.summingInt(Product::getPrice));

Methods averagingXX(), summingXX() and summarizingXX() can work as with primitives (int, long, double) as with their wrapper classes (Integer, Long, Double). One more powerful feature of these methods is providing the mapping. So, developer doesn't need to use an additional map() operation before the collect() method.

Collecting statistical information about stream’s elements:

IntSummaryStatistics statistics = productList.stream() .collect(Collectors.summarizingInt(Product::getPrice));

By using the resulting instance of type IntSummaryStatistics developer can create a statistical report by applying toString() method. The result will be a String common to this one “IntSummaryStatistics{count=5, sum=86, min=13, average=17,200000, max=23}”.

It is also easy to extract from this object separate values for count, sum, min, average by applying methods getCount(), getSum(), getMin(), getAverage(), getMax(). All these values can be extracted from a single pipeline.

Grouping of stream’s elements according to the specified function:

Map
    
      collectorMapOfLists = productList.stream() .collect(Collectors.groupingBy(Product::getPrice));
    

In the example above the stream was reduced to the Map which groups all products by their price.

Dividing stream’s elements into groups according to some predicate:

Map
    
      mapPartioned = productList.stream() .collect(Collectors.partitioningBy(element -> element.getPrice() > 15));
    

Pushing the collector to perform additional transformation:

Set unmodifiableSet = productList.stream() .collect(Collectors.collectingAndThen(Collectors.toSet(), Collections::unmodifiableSet));

In this particular case, the collector has converted a stream to a Set and then created the unmodifiable Set out of it.

Custom collector:

If for some reason, a custom collector should be created, the most easier and the less verbose way of doing so – is to use the method of() of the type Collector.

Collector
    
      toLinkedList = Collector.of(LinkedList::new, LinkedList::add, (first, second) -> { first.addAll(second); return first; }); LinkedList linkedListOfPersons = productList.stream().collect(toLinkedList);
    

In this example, an instance of the Collector got reduced to the LinkedList.

Parallel Streams

Before Java 8, parallelization was complex. Emerging of the ExecutorService and the ForkJoin simplified developer’s life a little bit, but they still should keep in mind how to create a specific executor, how to run it and so on. Java 8 introduced a way of accomplishing parallelism in a functional style.

The API allows creating parallel streams, which perform operations in a parallel mode. When the source of a stream is a Collection or an array it can be achieved with the help of the parallelStream() method:

Stream streamOfCollection = productList.parallelStream(); boolean isParallel = streamOfCollection.isParallel(); boolean bigPrice = streamOfCollection .map(product -> product.getPrice() * 12) .anyMatch(price -> price > 200);

If the source of stream is something different than a Collection or an array, the parallel() method should be used:

IntStream intStreamParallel = IntStream.range(1, 150).parallel(); boolean isParallel = intStreamParallel.isParallel();

Under the hood, Stream API automatically uses the ForkJoin framework to execute operations in parallel. By default, the common thread pool will be used and there is no way (at least for now) to assign some custom thread pool to it. This can be overcome by using a custom set of parallel collectors.

When using streams in parallel mode, avoid blocking operations and use parallel mode when tasks need the similar amount of time to execute (if one task lasts much longer than the other, it can slow down the complete app’s workflow).

The stream in parallel mode can be converted back to the sequential mode by using the sequential() method:

IntStream intStreamSequential = intStreamParallel.sequential(); boolean isParallel = intStreamSequential.isParallel();

Conclusions

Stream API는 요소 시퀀스를 처리하기위한 강력하지만 이해하기 쉬운 도구 세트입니다. 이를 통해 엄청난 양의 상용구 코드를 줄이고 더 읽기 쉬운 프로그램을 만들고 올바르게 사용하면 앱의 생산성을 높일 수 있습니다.

이 기사에 표시된 대부분의 코드 샘플에서 스트림은 사용되지 않은 채로 남아 있습니다 ( close () 메서드 또는 터미널 작업을 적용하지 않았습니다 ). 실제 앱에서는 메모리 누수로 이어질 수 있으므로 인스턴스화 된 스트림을 사용하지 않은 상태로 두지 마십시오.

기사와 함께 제공되는 전체 코드 샘플은 GitHub에서 사용할 수 있습니다.