----------------------- REVIEW 1 ---------------------
PAPER: 7
TITLE: Data Flow Fusion with Series Expressions in Haskell
AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Amos Robinson

OVERALL EVALUATION: 2 (accept)
REVIEWER'S CONFIDENCE: 4 (high)

----------- REVIEW -----------
The paper introduces data flow fusion, a new technique for optimizing
array calculus, made available via GHC optimization plugin and repa
library interface. The paper begins by introducing the problems with
stream fusion, and then describes the ways to fix them with flow
fusion. Finally, benchmark results are presented for six programming
patterns. Sufficient analyses are given for every benchmark results.

The execution speed compared to C/Fortran has been one of the main
reason that kept people away from learning new programming languages.
Therefore, we cannot over-emphasize the importance of constant factor
optimization of simple programs in order to attract new Haskell
users. The result presented in this paper, that four out of six
examples (including the most complex one) can be optimized up to
nearly or over 90% of the hand-written C counterpart, constitutes a
solid evidence for stating that the performance of the Haskell array
libraries matches that of the hand written C codes in the domain
considered. In addition to that, in my opinion repa interfaces are
well-designed and are accessible with modest Haskell knowledge. The
firm performance and not so scary interfaces are necessary condition
of advertising Haskell for C/Fortran programmers.

I think this paper contributes in increasing the Haskell user
community and in my opinion deserves presentation at the Haskell
symposium.

The appendix section that summarizes the benchmark codes makes this
paper more readable to Haskell users. Adding a small dictionary for
non-Haskell notations that appear in this paper may further improve
the readability of this paper.


----------------------- REVIEW 2 ---------------------
PAPER: 7
TITLE: Data Flow Fusion with Series Expressions in Haskell
AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Amos Robinson

OVERALL EVALUATION: -1 (weak reject)
REVIEWER'S CONFIDENCE: 3 (medium)

----------- REVIEW -----------
Summary:
========

This paper tackles the problem of fusing arrays (ie. eliminating intermediate
arrays) in the presence of branching data-flows. This program transformation
step is used for compiling Data Parallel Haskell and is implemented as a GHC
plugin. The technique builds on the framework of series expressions by Waters,
and applies it to a functional setting.

General comments:
=================

This improves on short-cut fusion which is limited to data-structures that have a 
single consumer.

The core of the fusion is a transformation into a strict, first-order, non-recursive
data-flow graph, followed by a scheduling step that translates the DFG into a 
specific form of loops, with re-use of loop-indices guaranteed by construction.
The resulting code, is imperative loop based code embedded in Haskell.

The core technical material for this fusion optimisation comes from Waters (1991).
He develops this in the context of a Common Lisp Macro package, and generates loops
in a stateful language. The work presented here embeds this technology into GHC, as
plugin, and generates either explicitly monadic code or code with unchecked effects,
very close to Waters. One notable difference is the usage of rate inference, and 
rank-2 types to capture "rate" information on type-level.
In summary, I believe this paper makes only a small contribution in the area of
automated program transformations, mainly adapting existing technology.

Performance results from a small set of micro-benchmarks show significant improvements 
in performance, in most cases achieving results comparable to hand-coded C.
The results cover the main points  in terms of performance, and evaluate w.r.t. a 
realistic (C-level) baseline, however, given that the core technology isn't novel I
would have expected a more thorough evaluation, both in terms of test programs used, 
and discussing the performance results. There is little further discussion  on limitations
of the approach, beyond the good results presented here.

There are several issues with presentation in the paper. It fails to clearly separate
the core transformation steps from merely administrative operations. It goes down to
a lot of detail of the embedding as a GHC plugin. It doesn't separate out the notions
introduced in Waters' framework before going to the discussion of implementing it.
There is also some confusion over terminology, mainly because the paper mixes established
GHC compilation terms, with DPH-style extensions, terminology from Shivers and concepts 
from Water's framework. Rephrasing, trying to minimise the number of new notions and 
targeting Haskell community would help.

Overall, while I think this is a publishable paper and of interest for compiler developers,
I believe its contributions are too small w.r.t. existing work (Waters, 1991) to warrant
publication, and I am suggesting a (weak) reject.

Specific comments:
==================

Sec 1 works very well to motivate the work.

It would be good to introduce the concepts from Waters' work early, in a separate section.
This would help to avoid confusion on terminology (see above) and to contrast the work
presented here from existing technology. Maybe a new section btw Sec 2 and 3.

Sec 3: The reference to rate inference is vague here, due to lacking context.

Sec 4:
In general, I found the style of highlighting issues in presented source code etc,
a bit awkward. Rather than using text markers like *CHANGE in \tt formatted code, I'd
prefer visual highlighting with font, underline or such.  Similarly, in the list of 
steps at the beginning of Sec 4 it would seem natural to highlight Phases 2-6, as those
covered in the GHC plugin, and in particular Phase 4 as the main technical one.
It would be good to highlight Phase 4 prominently in the text to separate it from
mainly administrative steps.

Sec 6:
I realise that the micro-benchmarks are motivated from the use of the optimisation for DPH.
But since this paper presents this optimisation as a stand-alone operation, it would be really
nice to have some larger applications as well, even if they don't come with hand-coded C versions
to compare to. 

Some typos in the text


----------------------- REVIEW 3 ---------------------
PAPER: 7
TITLE: Data Flow Fusion with Series Expressions in Haskell
AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Amos Robinson

OVERALL EVALUATION: 1 (weak accept)
REVIEWER'S CONFIDENCE: 3 (medium)

----------- REVIEW -----------
This paper presents a new fusion system for Haskell that handles branching data flow. Unlike short-cut fusion, which rewrites producer-consumer pairs locally and thus cannot fuse one producer with multiple consumers, the new "flow fusion" system forms a pipeline of specialized IRs to convert functional code on vectors into imperative loops that may construct multiple results in one pass. The three major IR stages are
(1) series operators (Figure 3), which represent rates of data flow by phantom-type eigenvariables;
(2) process descriptions (Figure 4), which represent the data flow graph and nested rate contexts of a fusable process;
(3) loop nests (Figure 6), which break up the parts of a loop (start, body, end) so that multiple loops can be interleaved to run concurrently.
Overall, this work brings Waters's series expressions and online criteria to the Haskell audience.

I'm happy to see that this new fusion system is useful and usable by Haskell programmers today. I'm also happy to see that rank variables, series expressions, and loop anatomies can be "functionally flavoured" and made so pleasant to understand and work with. However, the fusion GHC-plugin as it currently stands doesn't seem to take much advantage of the fact that series operators can be rank-2-typed in the familiar way (Figure 3). After all, the series operators and their types are not visible to the end programmer at all, are they? Wouldn't it be even more useful if a DSL of static, first order, non-recursive data flows were embedded in a Haskell data type and perhaps compiled at what Haskell would call run time? Instead of such a programmer-accessible embedding, just porting Waters's compilation method to live inside GHC doesn't seem so surprising in its success.

Local comments:

p5 "If the Core version of the series process cannot be converted to our internal Process language..." -- Given that the "slurp" phase takes place after rate inference, I'm unclear what could obstruct the conversion to Process and cause "a compile-time warning". If badSwitchy in Section 4.1.2 is an example, please point to it from near the beginning of Section 4.1.

The beginning of Section 4.1.1 reminds me to ask: What is a "worker" and a "worker expression"? I see that Figure 4 defines what an "operator" is, but what are the "series arguments to" an operator?

p6 "We also use a preparation transform to force worker functions to be floated into their use sites -- so that combinators like mkSel, map and fold are directly applied to workers." -- Are the two following code snippets the "before" and "after" of applying this preparation transform? (If not, please do give an example of this preparation transform in action. If so,) which worker function is floated in this example into which use site, so that which combinator is directly applied to which worker?