Sunday, April 5, 2015

Proper Usage of Java 8 Parallel Stream

Java 8 stream incorporates parallel constructs for ease of speedup over collection of items. However, it is not guaranteed unless used appropriately. Here are the justifications by Dough Lea1.

In terms of workload

Java 8 parallel stream speeds up sequential work processing by forking threads to exploit potential parallelism but implicit thread setup cost that may weigh down the expected benefits. The total workload estimation from sequential processing is to multiply by defined as follows. Only when is it worth it.

  • Assume there are elements to process in the source collection
  • Assume work function cost by lines of code or statements

Startup

  • Power control slows down the startup of cores with overhead imposed by JVMs, OSes, and hypervisors.
  • Too fine grained work splitting causes startup cost to dominate the parallel processing benefits.
  • Random accessible data structures expose more parallel processing speedup over linked lists, blocking queues and IO-based sources for lower access overhead.

Concerns

  1. JVMs may not figure out sensibly and deliver uniform speedup in general so that Java 8 Stream simply delegates the decision making to users.
  2. Collections may weigh the pros and cons to return a sequential or parallel stream interface selectively in term of some cost measure.
  3. Sequential I/O and synchronization tasks should be avoided.
  4. I/O sources require custom development of Stream interfaces.
  5. Overhead in sequential processing is likely to be magnified in parallel processing in the following aspects.
    • cache-locality
    • garbage-collection rates
    • JIT compilation
    • memory contention
    • data layout
    • OS scheduling policies
    • the presence of hypervisors

No comments:

Post a Comment