Working with Parallel Streams for Concurrency

Published: August 5, 2016


Introduction

Java 8 introduced parallel streams to the Streams API, allowing developers to process collections in parallel. Parallel streams can significantly improve performance when dealing with large datasets or CPU-intensive operations by leveraging multi-core processors.

In this tutorial, we’ll explore:

  • How parallel streams work in Java 8.
  • The performance benefits and limitations of parallel streams.
  • Practical examples of using parallel streams.

By the end of this tutorial, you’ll be equipped to use parallel streams effectively to improve the performance of your Java applications.


What Are Parallel Streams?

parallel stream is a stream that can execute multiple operations concurrently, utilizing multiple threads in the background. Java achieves this using the ForkJoinPool to divide the workload into smaller chunks, which can then be processed in parallel.

A regular stream processes data sequentially (one element at a time), whereas a parallel stream divides the data into chunks and processes them in parallel. The stream automatically handles thread management, which simplifies the process of working with concurrency.

Example: Using Parallel Streams

import java.util.List;
import java.util.Arrays;

public class ParallelStreamExample {
    public static void main(String[] args) {
        List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

        // Sequential stream processing
        long startTime = System.nanoTime();
        int sumSequential = numbers.stream().mapToInt(Integer::intValue).sum();
        long endTime = System.nanoTime();
        System.out.println("Sequential sum: " + sumSequential);
        System.out.println("Time taken for sequential processing: " + (endTime - startTime) + " ns");

        // Parallel stream processing
        startTime = System.nanoTime();
        int sumParallel = numbers.parallelStream().mapToInt(Integer::intValue).sum();
        endTime = System.nanoTime();
        System.out.println("Parallel sum: " + sumParallel);
        System.out.println("Time taken for parallel processing: " + (endTime - startTime) + " ns");
    }
}

In this example, we compare the performance of sequential and parallel streams when summing a list of integers. While the parallel stream can be faster on larger datasets, the actual performance gain depends on various factors such as the complexity of the operation, the size of the dataset, and the system’s hardware.


When to Use Parallel Streams

While parallel streams can be very useful for performance improvement, they are not always the best option. Here are some factors to consider before deciding to use parallel streams:

  1. Large Datasets: Parallel streams are best suited for large datasets where the processing time can be divided across multiple threads. For smaller datasets, the overhead of managing parallel execution might outweigh the benefits.
  2. CPU-Intensive Operations: If your stream operations are CPU-bound (i.e., they require a lot of computation), parallel streams can take advantage of multiple cores to speed up processing.
  3. Independent Operations: Parallel streams work best when each operation in the stream can be executed independently (i.e., no dependencies between elements).
  4. Limited by Thread Pool: Parallel streams use a thread pool (the ForkJoinPool) to execute tasks. If the thread pool is already heavily loaded, it might not be as effective. Be cautious when using parallel streams in environments with limited resources, such as web servers.

Performance Considerations

Parallel streams can be faster for CPU-bound operations, but there are some caveats to keep in mind:

  • Overhead: Parallel streams introduce overhead for dividing the task into smaller chunks, managing threads, and combining the results. If the task is not large enough, the overhead may negate any performance gains.
  • Thread Contention: When multiple threads try to access shared resources, such as memory or a database, you may run into thread contention. This can actually reduce performance if not handled properly.

Example: Parallel Stream with Sorting

Sorting large datasets using a parallel stream might show significant performance improvements due to the concurrent nature of the task. However, it’s important to be cautious when using parallel streams for tasks like sorting, as the underlying sorting algorithm might not be optimized for parallel processing.

import java.util.List;
import java.util.Arrays;

public class ParallelStreamExample {
    public static void main(String[] args) {
        List<Integer> numbers = Arrays.asList(9, 3, 5, 1, 7, 6, 2, 8, 4);

        // Using parallel stream to sort the numbers
        List<Integer> sortedNumbers = numbers.parallelStream()
                                             .sorted()
                                             .toList();

        System.out.println("Sorted numbers: " + sortedNumbers);
    }
}

In this example, we use the parallelStream() method to sort the list of integers. The stream is divided into chunks and processed concurrently to improve performance.


Using Parallel Streams Safely

If your operations modify shared state (e.g., writing to a shared variable or a collection), it’s essential to ensure thread safety. You can use thread-safe collections (like ConcurrentHashMap) or synchronize shared resources to avoid race conditions.

Example: Using Parallel Streams Safely

import java.util.List;
import java.util.Arrays;
import java.util.concurrent.atomic.AtomicInteger;

public class ParallelStreamExample {
    public static void main(String[] args) {
        List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);

        // Thread-safe way to update a shared variable
        AtomicInteger sum = new AtomicInteger();

        numbers.parallelStream()
               .forEach(number -> sum.addAndGet(number));

        System.out.println("Sum: " + sum.get());
    }
}

In this example, we use an AtomicInteger to safely update the sum across multiple threads in the parallel stream.


Limitations of Parallel Streams

  • Order of Execution: In a parallel stream, the order of elements may not be preserved unless the stream is explicitly ordered. If the order matters in your application, you should use forEachOrdered() instead of forEach().
  • Complex Operations: Complex operations with multiple stages might not benefit from parallel streams, as the task might not be easily divisible into independent chunks.

Conclusion

Parallel streams are a powerful feature in Java 8 that allows you to process large datasets in parallel, taking advantage of multi-core processors. However, it’s important to weigh the benefits and limitations carefully before deciding to use them.

In the next tutorial, we’ll cover how to aggregate data in streams using the reduce() method, which allows you to combine stream elements into a single result.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *