Monday, October 6, 2014

Java 8 Collectors for Guava Collections

Java 8 comes with streaming API, it divides data processing into two phases: intermediate operations and terminal operation.
Terminal operation can have few different forms and I would like to concentrate on reduction to collections, especially to Guava immutable collections.
Terminal operation requires collector which will collect data and return it as required structure, but Guava does not provide such collector. In order to create Guava collection out of a stream we have to first reduce stream result into temporary collection and than transfer it:
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.toList;
import com.google.common.collect.ImmutableSortedSet;

...stream.map(...).filter(....).
     collect(collectingAndThen(Collectors.toList(), ImmutableSortedSet::copyOf));
Reduction of our stream stores results in a temporary List (Collectors.toList()). Once stream processing is done the finisher function will convert content of this List into into Guava collection (ImmutableSortedSet::copyOf).

The problem with this approach is... that we have this extra converting loop and two arrays in memory (List and Builder). This could be avoided it we would have collector that is based on Guava's Builder. So.... I've implemented one, once we use it, the code above can be simplified into such form:
import static org.cyclop.common.Gullectors.toNaturalImmutableSortedSet;
import com.google.common.collect.ImmutableSortedSet;

...stream.map(...).filter(....).collect(toNaturalImmutableSortedSet());

The code is straight forward, let's concentrate on implementation of #toNaturalImmutableSortedSet()
public static <T extends Comparable<?>> 
  Collector<T, ?, ImmutableSortedSet<T>> toNaturalImmutableSortedSet() {
    
  Supplier<ImmutableSortedSet.Builder<T>> supplier = ImmutableSortedSet::naturalOrder;

  BiConsumer<ImmutableSortedSet.Builder<T>, T> accumulator = (b, v) -> b.add(v);

  BinaryOperator<ImmutableSortedSet.Builder<T>> combiner = (l, r) -> l.addAll(r.build());

  Function<ImmutableSortedSet.Builder<T>, ImmutableSortedSet<T>> finisher = 
      ImmutableSortedSet.Builder::build;

  return Collector.of(supplier, accumulator, combiner, finisher);
}

Our collector is being created by factory method Collector#of that takes four arguments:
  • #supplier - this function will be called only once to create structure that will collect stream results - in our case it's Biulder from ImmutableSortedSet
  • #accumulator - provides function that will get executed for each element that reaches terminal operation, meaning each element that went trough stream and should be collected for returning. In our case we are providing function that will execute #add(v) on Builder which has been provided in first argument (#supplier)
  • #combiner - this one will be not used in our example, but it's necessary for processing of parallel streams, it would be used to merge them
  • #finisher - this is the final step and it will be executed after stream processing is done. Elements returned by stream are contained in Builder (#supplier) and in this last phase we are calling #build() method on it, which results in ImmutableSortedSet !

Based on this pattern we can implement other collectors:
public static <T> Collector<T, ?, ImmutableList<T>> toImmutableList() {
    Supplier<ImmutableList.Builder<T>> supplier = ImmutableList.Builder::new;
    BiConsumer<ImmutableList.Builder<T>, T> accumulator = (b, v) -> b.add(v);
    BinaryOperator<ImmutableList.Builder<T>> combiner = (l, r) -> l.addAll(r.build());
    Function<ImmutableList.Builder<T>, ImmutableList<T>> finisher = 
        ImmutableList.Builder::build;

    return Collector.of(supplier, accumulator, combiner, finisher);
}

public static <T> Collector<T, ?, ImmutableSet<T>> toImmutableSet() {
    Supplier<ImmutableSet.Builder<T>> supplier = ImmutableSet.Builder::new;
    BiConsumer<ImmutableSet.Builder<T>, T> accumulator = (b, v) -> b.add(v);
    BinaryOperator<ImmutableSet.Builder<T>> combiner = (l, r) -> l.addAll(r.build());
    Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher = 
        ImmutableSet.Builder::build;

    return Collector.of(supplier, accumulator, combiner, finisher);
}

public static <T, K, V> Collector<T, ?, ImmutableMap<K, V>> toImmutableMap(
        Function<? super T, ? extends K> keyMapper,
        Function<? super T, ? extends V> valueMapper) {

    Supplier<ImmutableMap.Builder<K, V>> supplier = ImmutableMap.Builder::new;
    BiConsumer<ImmutableMap.Builder<K, V>, T> accumulator = 
         (b, t) -> b.put(keyMapper.apply(t), valueMapper.apply(t));
    BinaryOperator<ImmutableMap.Builder<K, V>> combiner = (l, r) -> l.putAll(r.build());
    Function<ImmutableMap.Builder<K, V>, ImmutableMap<K, V>> finisher = 
       ImmutableMap.Builder::build;

    return Collector.of(supplier, accumulator, combiner, finisher);
}

Finally here is the source code: Gullectors.java
and unit tests:TestGullectors.java