Benchmark: concatenate lists
Recently, I came across this piece of code that concatenates two lists using Guava’s Lists
utility class and the Stream API:
Lists.newArrayList(collectionOriginal,collectionValue) .stream() .flatMap(Collection::stream) .toList();
As this piece of code is in the hot path of the application and therefore called very frequently, I wondered if this was the best way to concatenate two lists.
Two lists can be concatenated in many different ways:
- With the above code
- With
Stream.concat()
- With
Stream.of()
- With
ListUtils
from Apache Commons Collection - With
ArrayList.addAll()
There are certainly other ways, but I wanted to compare these five!
To compare these five solutions, there’s nothing better than JMH – The Java Microbenchmark Harness. This tool enables us to write relevant microbenchmarks, taking into account the internal characteristics of the JVM. It also offers integrated tools for analyzing the performance of our tests (profiling, disassembling, etc.). If you’re not familiar with JMH, please refer to this article: INTRODUCTION TO JMH – JAVA MICROBENCHMARK HARNESS.
The benchmark
To be able to compare different list sizes, I use @Param
to parameterize the size of the lists to be concatenated. Here’s the setup I used:
@Param({"10", "1000", "10000"}) int size; List<String> list1; List<String> list2; @Setup public void setup() { list1 = new ArrayList<>(size); list2 = new ArrayList<>(size); for (int i = 0; i < size; i++) { list1.add("list1-" + i); list2.add("list2-" + i); } }
Voici ensuite l'implémentation des cinq benchmarks
@Benchmark public List<String> concatWithStreamConcat() { return Stream.concat(list1.stream(), list2.stream()) .toList(); } @Benchmark public List<String> concatWithStreamOf() { return Stream.of(list1, list2) .flatMap(List::stream) .toList(); } @Benchmark public List<String> concatWithGuava() { return Lists.newArrayList(list1, list2) .stream() .flatMap(Collection::stream) .toList(); } @Benchmark public List<String> concatWithCommonsCollection() { return ListUtils.union(list1, list2); } @Benchmark public List<String> concatWithNewArrayList() { List<String> newList = new ArrayList<>(list1.size() + list2.size()); newList.addAll(list1); newList.addAll(list2); return newList; }
The results
All tests were run on my laptop: Intel(R) Core(TM) i7-1260P 8 coeurs (16 avec hyperthreading) – Ubuntu 24.10.
The Java version used is openjdk version "21.0.5" 2024-10-15 LTS.
Benchmark (size) Mode Cnt Score Error Units ConcatList.concatWithCommonsCollection 10 avgt 5 57,706 ± 27,376 ns/op ConcatList.concatWithCommonsCollection 1000 avgt 5 1631,110 ± 472,323 ns/op ConcatList.concatWithCommonsCollection 10000 avgt 5 15434,192 ± 2164,191 ns/op ConcatList.concatWithGuava 10 avgt 5 225,249 ± 83,069 ns/op ConcatList.concatWithGuava 1000 avgt 5 7967,360 ± 8693,234 ns/op ConcatList.concatWithGuava 10000 avgt 5 108601,706 ± 104439,847 ns/op ConcatList.concatWithNewArrayList 10 avgt 5 54,327 ± 10,894 ns/op ConcatList.concatWithNewArrayList 1000 avgt 5 1515,667 ± 595,135 ns/op ConcatList.concatWithNewArrayList 10000 avgt 5 14827,026 ± 1253,715 ns/op ConcatList.concatWithStreamConcat 10 avgt 5 111,837 ± 26,451 ns/op ConcatList.concatWithStreamConcat 1000 avgt 5 7104,415 ± 4030,866 ns/op ConcatList.concatWithStreamConcat 10000 avgt 5 57587,144 ± 13994,551 ns/op ConcatList.concatWithStreamOf 10 avgt 5 157,634 ± 55,757 ns/op ConcatList.concatWithStreamOf 1000 avgt 5 5915,797 ± 1692,437 ns/op ConcatList.concatWithStreamOf 10000 avgt 5 58156,102 ± 1655,090 ns/op
What's immediately obvious is that the version used today is the worst :(.
If we leave it aside, we can see that concatWithStreamOf and concatWithStreamConcat perform almost identically, as do concatWithNewArrayList and concatWithCommonsCollection. A look at the code of ListUtils.union(list1, list2)
shows that it does exactly what concatWithNewArrayList does.
Further analysis
To take the analysis a step further, let's re-run the benchmark with JMH's built-in GC profiler via the command-line option -prof gc
, which is often a good start ;). I'm particularly interested in the memory allocation rate.
ConcatList.concatWithCommonsCollection:·gc.alloc.rate.norm 10 avgt 5 232,012 ± 0,002 B/op ConcatList.concatWithCommonsCollection:·gc.alloc.rate.norm 1000 avgt 5 16072,479 ± 0,036 B/op ConcatList.concatWithCommonsCollection:·gc.alloc.rate.norm 10000 avgt 5 160077,107 ± 0,434 B/op ConcatList.concatWithGuava:·gc.alloc.rate.norm 10 avgt 5 888,058 ± 0,009 B/op ConcatList.concatWithGuava:·gc.alloc.rate.norm 1000 avgt 5 16969,104 ± 0,221 B/op ConcatList.concatWithGuava:·gc.alloc.rate.norm 10000 avgt 5 212148,835 ± 2,822 B/op ConcatList.concatWithNewArrayList:·gc.alloc.rate.norm 10 avgt 5 232,011 ± 0,003 B/op ConcatList.concatWithNewArrayList:·gc.alloc.rate.norm 1000 avgt 5 16072,511 ± 0,040 B/op ConcatList.concatWithNewArrayList:·gc.alloc.rate.norm 10000 avgt 5 160076,834 ± 0,366 B/op ConcatList.concatWithStreamConcat:·gc.alloc.rate.norm 10 avgt 5 424,023 ± 0,006 B/op ConcatList.concatWithStreamConcat:·gc.alloc.rate.norm 1000 avgt 5 8344,716 ± 0,173 B/op ConcatList.concatWithStreamConcat:·gc.alloc.rate.norm 10000 avgt 5 80350,980 ± 1,306 B/op ConcatList.concatWithStreamOf:·gc.alloc.rate.norm 10 avgt 5 824,037 ± 0,007 B/op ConcatList.concatWithStreamOf:·gc.alloc.rate.norm 1000 avgt 5 16905,076 ± 0,208 B/op ConcatList.concatWithStreamOf:·gc.alloc.rate.norm 10000 avgt 5 212083,533 ± 2,497 B/op
Here, we're interested in the value of gc.alloc.rate.norm
, which is the normalized memory allocation rate per operation, and we can see a big difference right away.
If we take ConcatList.concatWithNewArrayList as our baseline:
- concatWithCommonsCollection: same allocation.
- concatWithGuava: 4x more allocation.
- concatWithStreamConcat: 2x more d'allocation.
- concatWithStreamOf: 4x more allocation.
This explains some of the differences in performance, but not all.
Let's run the test again with the stack profiler, a simple, naive profiler, integrated into JMH.
# Benchmark: fr.loicmathieu.jmh.ConcatList.concatWithCommonsCollection # Parameters: (size = 10) ....[Thread state: RUNNABLE]........................................................................ 33,3% 50,0% $lt;stack is empty, everything is filtered?> 31,6% 47,4% fr.loicmathieu.jmh.ConcatList.concatWithCommonsCollection 1,5% 2,3% fr.loicmathieu.jmh.jmh_generated.ConcatList_concatWithCommonsCollection_jmhTest.concatWithCommonsCollection_avgt_jmhStub 0,1% 0,1% java.util.ArrayList.<init> 0,1% 0,1% java.util.Arrays.copyOf 0,0% 0,0% org.apache.commons.collections4.ListUtils.union 0,0% 0,0% java.lang.invoke.MethodHandle.maybeCustomize # Benchmark: fr.loicmathieu.jmh.ConcatList.concatWithGuava # Parameters: (size = 10) ....[Thread state: RUNNABLE]........................................................................ 33,3% 50,0% $lt;stack is empty, everything is filtered?> 19,2% 28,8% java.util.stream.SpinedBuffer.ensureCapacity 5,3% 8,0% java.util.stream.AbstractPipeline.wrapSink 3,4% 5,1% java.util.stream.ReferencePipeline.toArray 3,2% 4,8% java.util.stream.SpinedBuffer.copyInto 2,1% 3,1% fr.loicmathieu.jmh.jmh_generated.ConcatList_concatWithGuava_jmhTest.concatWithGuava_avgt_jmhStub 0,1% 0,1% java.util.stream.StreamSupport.stream 0,0% 0,0% java.util.stream.ReferencePipeline.lambda$toArray$0 0,0% 0,0% java.util.ArrayList.spliterator 0,0% 0,0% java.util.stream.SpinedBuffer.inflateSpine # Benchmark: fr.loicmathieu.jmh.ConcatList.concatWithNewArrayList # Parameters: (size = 10) ....[Thread state: RUNNABLE]........................................................................ 33,3% 50,0% $lt;stack is empty, everything is filtered?> 24,4% 36,6% java.util.ArrayList.addAll 8,7% 13,1% fr.loicmathieu.jmh.jmh_generated.ConcatList_concatWithNewArrayList_jmhTest.concatWithNewArrayList_avgt_jmhStub 0,1% 0,2% java.util.Arrays.copyOf 0,0% 0,1% java.util.ArrayList.<init> 0,0% 0,0% fr.loicmathieu.jmh.ConcatList.concatWithNewArrayList # Benchmark: fr.loicmathieu.jmh.ConcatList.concatWithStreamConcat # Parameters: (size = 10) ....[Thread state: RUNNABLE]........................................................................ 33,3% 50,0% $lt;stack is empty, everything is filtered?> 22,5% 33,8% java.util.ArrayList$ArrayListSpliterator.forEachRemaining 5,8% 8,7% fr.loicmathieu.jmh.ConcatList.concatWithStreamConcat 3,7% 5,6% java.util.stream.ReferencePipeline.toArray 1,1% 1,7% fr.loicmathieu.jmh.jmh_generated.ConcatList_concatWithStreamConcat_jmhTest.concatWithStreamConcat_avgt_jmhStub 0,0% 0,1% java.util.ImmutableCollections.listFromTrustedArrayNullsAllowed 0,0% 0,1% java.util.stream.ReferencePipeline.lambda$toArray$0 0,0% 0,1% java.util.stream.StreamSupport.stream 0,0% 0,0% java.util.ArrayList.spliterator # Benchmark: fr.loicmathieu.jmh.ConcatList.concatWithStreamOf # Parameters: (size = 10) ....[Thread state: RUNNABLE]........................................................................ 33,3% 50,0% $lt;stack is empty, everything is filtered?> 16,9% 25,4% java.util.ArrayList$ArrayListSpliterator.forEachRemaining 4,5% 6,8% java.util.stream.SpinedBuffer.copyInto 4,5% 6,7% java.util.stream.AbstractPipeline.wrapSink 3,4% 5,2% java.util.stream.SpinedBuffer.ensureCapacity 3,4% 5,0% fr.loicmathieu.jmh.jmh_generated.ConcatList_concatWithStreamOf_jmhTest.concatWithStreamOf_avgt_jmhStub 0,3% 0,5% java.util.stream.ReferencePipeline.toArray 0,1% 0,2% java.util.stream.ReferencePipeline$Head.forEach 0,0% 0,0% java.util.ArrayList.spliterator 0,0% 0,0% java.util.stream.SpinedBuffer.inflateSpine
concatWithCommonsCollection and concatWithNewArrayList both use java.util.Arrays.copyOf
, this method is annotated with @IntrinsicCandidate
which means that there is an intrinsic that allows it to be optimized directly within the JVM. To my knowledge, even if the annotation contains Candidate in the name, within Oracle's OpenJDK JVM there is always at least one intrinsic for one of the supported architectures (which may not be the case for other JVM vendors). This intrinsic enables memory-efficient copying of arrays of list elements without the need to traverse the list and copy elements one by one.
concatWithStreamConcat and concatWithStreamOf both run throught all list elements via java.util.ArrayList$ArrayListSpliterator.forEachRemaining
, which is less efficient.
concatWithGuava will also run through all the elements in the list, but in addition will create a list with the first two lists, doubling the number of times the elements are copied!
Conclusion
The simplest code is often the most efficient, even if it's not necessarily the most elegant.
The JVM contains a number of intrinsics that enable array copy operations in memory, which are essential when working with large data lists or performing frequent copy operations on them.
Of course, this optimization only makes sense if the code called is in the application's hot path. Here, concatenation is potentially performed thousands of times per second in Kestra, so it's important that it performs in the most efficient way.
As a reminder, this article contains numbers, which are just that: numbers. They can be very different from one context to another. It's always best to make your own benchmarks, in your own context, and then validate them in a real environment to deduce their relevance.
For the curious, here's the PR in the repo Kestra: https://github.com/kestra-io/kestra/pull/7160 and here is the complete benchmark: ConcatList.java.