----------------------- REVIEW 1 ---------------------
PAPER: 13
TITLE: Guiding Parallel Array Fusion with Indexed Types
AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Simon
Peyton Jones

OVERALL RATING: 2 (strong accept)
REVIEWER'S CONFIDENCE: 2 (medium)

This paper reports on a new version of the Repa library for parallel
array computation. Previous versions have provided good performance,
especially accommodating fusion; but making this work has required
sophisticated knowledge from the programmer, and may entail embedding
assumptions in the code (eg about data representations) that aren't
revealed in type signatures. The new version 3 uses indexed types to
express information about data representations, solving these problems.
It's all implemented and on Hackage; two extended are discussed. This
all looks good to me; I have only minor comments.

p2: If you need more space (see "TeXery", below), you could delete the
duplicated definition of doubleZip. [I hate it when reviewers tell me
to add something when my paper is already up against the page limit:
they should always say what can be deleted to make space for it!]

p2: "foundational view that"

p3: In the definition of doubleZip in S2.3, I think "sh" should be
"sh1". In the following paragraph, "vec" should be "vec1,vec2".

p3: "at an particular point"; "the the"

p4: Actually, Repa has an infinity of representations, because they're
recursively structured. (Or at least of indices for them: perhaps S (S
D) and S D are different indices for the same representation. But there
are still infinitely many ways of partitioning arrays.)

p4: Another way to save some space would be to condense the bulleted
list to a paragraph.

p5: Delete the comma in "between, manifest"

p7: "performing this intersection is straightforward" (it's still
possible when there is internal structure to consider, just not so
straightforward).

p7: "fill traverse"

p8: "makes is attractive"

p8: In a paper on functional programming, I suggest using a cross
rather than a dot for multiplication.

p9: Much of S5 can be condensed. The point of this paper is not that
you can get good performance out of Repa 3; you could do that already
with Repa 2. The point is that you can get that good performance
cleanly and safely without hackery. So the discussion of smallness and
interleave hints is important; I guess the ThreadScope graphs are too,
to motivate the hints; but the comparison between Gauss-Reidel and
Jacobi is beside the point, isn't it?

p10: "the underlying the row-major vector"

p12: Be consistent about "POPL" vs "Principles of...". [8] appeared at
the Haskell Symposium, not Workshop. The first author of [4] has been
mangled.

TeXery: You are using "\small" for verbatim blocks. I think this is a
bad idea - ACM's 9pt text size is small enough already. But if you
insist on using it, you have to fix a bug: you're consistently
inheriting the condensed line spacing of the small verbatim in the
previous text paragraph - eg in the "Our Repa library..." paragraph at
the start of S1. (TeXnical details: line spacing applies to a whole
paragraph, and it is the value in force at the end of the paragraph
that is used. So if you choose \small in the middle of a paragraph and
let it run past the end of the paragraph, the text size will change
where the \small occurs but the whole paragraph will have \small-er
line spacing. The fix is to make sure that you end the previous
paragraph before selecting \small for verbatim.)


----------------------- REVIEW 2 ---------------------
PAPER: 13
TITLE: Guiding Parallel Array Fusion with Indexed Types
AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Simon
Peyton Jones

OVERALL RATING: 2 (strong accept)
REVIEWER'S CONFIDENCE: 3 (high)

The paper describes an implementation technique for an efficient array
library.  The technique was used in version 3 of the Repa library.
The central idea of the paper is to replace a single array type with a
family of array types, each with their own representation and, hence,
differing strengths and weaknesses.   This makes it possible to use the
type system to capture the design choices that were made in the
implementations of the various array operations.

I really enjoyed reading this paper.  The authors took care to
introduce the reader to just enough detail about the design choices in
the previous versions of Repa which, in turn, lead to a nice
explanation of the problems that motivated the current implementation.
As a result, by the time the central idea of the paper is introduced in
Section 3, it feels very natural.

In addition to presenting a technique for an efficient implementation
of a rather flexible array library, the paper also showcases the
strengths of Haskell's type system.  The combination of type-classes
and data-families makes for a very elegant implementation that would be
hard to achieve in any other mainstream programming language.


----------------------- REVIEW 3 ---------------------
PAPER: 13
TITLE: Guiding Parallel Array Fusion with Indexed Types
AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Simon
Peyton Jones

OVERALL RATING: -1 (weak reject)
REVIEWER'S CONFIDENCE: 3 (high)

This paper presents a new version of Repa, a Haskell library for high
performance, parallel computation over arrays. This version, version 3,
overcomes some of the limitations and undesirable aspects of Repa 2.
The main contribution is the use of GHC's type families and type
functions to develop an API with types that describe both the type of
the array elements and the representation of the array being used.  The
result is a library with richer types whose API makes it easier for the
end-user to write a high-level program that is transformed by the
compiler into efficient code via inlining, fusion, and unboxing.

The topic of the paper is interesting and relevant, but the paper
itself reads more like documentation of the library rather than a paper
about the contributions.  It assumes knowledge of a great many
features, notably type families, type functions, fusion, and stencil
convolutions.  Given the audience, some knowledge might be assumed, but
the latter concept in particular is not well known, even to seasoned
Haskell programmers unless they work in this particular domain.  It
also seems to assume quite a bit about of knowledge about how array
programming is done in general, and gives little explanation of how
Repa works.  I understand there are two papers on this subject, but a
short refresher might be in order.

The structure of the paper starts off very nicely.  Section 2 clearly
outlines the problems and Section 3 the main idea. In Section 4, the
discussion is lost in implementation details, including several large
blocks of code.  A more high-level discussion would be better, and
indeed might save the user from being barraged by the large number of
type classes that this library requires and are continuously
introduced.  These details obscure the take-home message at the moment.
Section 5 is dedicated to examples and benchmarks.  It takes up a
significant amount of space in describing the problems themselves, and
needs a bit more structure.  I believe that more time could have been
spent explaining the problems that arise and how they are overcome; the
discussion of which feels a bit rushed as is.

Section 6 is called "Challenges of Array Fusion", and seems to
encapsulate conclusions and future work.  This section is particularly
confusing.  I appreciate the discussion of limitations, and how they
can be overcome, but the discussion fails to distinguish clearly
between limitations of the current design, limitations of the compiler,
and those that are more fundamental.  The related work is unacceptable,
referring to prior Repa papers with almost no discussion.

The reliance on previous Repa papers makes the contribution of this one
less clear.  It is unclear if this is simply an "incremental" update or
a totally new take on the project.  Personally, I would have preferred
more examples and a more in-depth look at the "main idea" than the
nitty-gritty details of the implementation.

Overall, I think this paper is interesting and relevant to the Haskell
community, and they are taking a newer approach.  It needs more
principled discussion, however, and less dependence on previous papers
and source code.  I think it is not yet ready for publication, and am
therefore recommending a weak reject.  I think that with some
restructuring, however, the paper would be an excellent contribution.

More specific remarks:

page 1: the term "descriptive types" is introduced as if it is a
specific term, but really refers to the fact that types simply contain
more information

Section 2

page 3: Here, as well as throughout the paper, numerical references are
used as textual citations (for example, "see [8] for a full
definition").  These should be replaced with actual text containing the
author's name, at least.

Section 3

page 4: The kind declaration is a bit deceptive.  Haskell supports
promoted data types, but no kind declarations, and it doesn't seem to
add much to characterize the nullary datatypes this way, since the goal
is to prevent such a closed kind.

Section 4

page 7: "To improve cache performance, these loops fill traverse the
array in " -- I assume "fill traverse" is a typo

Section 5

page 8: is the dot in the equation for the linear solver supposed to be
a full stop?

page 9: Mention of LLVM comes out of nowhere. Can you provide context
as to why this is relevant, here?


----------------------- REVIEW 4 ---------------------
PAPER: 13
TITLE: Guiding Parallel Array Fusion with Indexed Types
AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Simon
Peyton Jones

OVERALL RATING: 2 (strong accept)
REVIEWER'S CONFIDENCE: 4 (expert)

This paper describes a new version of the library "repa" (regular
parallel arrays", which represents a major overhaul compared to
previous versions. The key change is that now the evaluation state of
an array (delayed or manifest) is statically known by virtue of a
sophisticated type family encoding. After a motivation criticising
earlier versions, the paper elaborates on a variety of aspects related
to this major change, and substantiates a fundamental claim of their
improvements: "to express the cost model clearly byt non-intrusively".

Repa-3 arrays encode their runtime representation as one of a number of
"indexing" types, which include useful other variants such as foreign
data, unboxing, and special types geared towards stencil operations.
This type encoding of an array's internal representation allows for
much better inlining heuristics and specific optimisations following
the application characteristics.
Such a programming model is (rightly!) more explicit and requires the
programmer to show (operational) understanding of the program's exact
array evaluation at runtime. Yet, this was true for well-performing
repa-1 programs as well. The sorcery of the force function now gives
way to an explicitly controlled behaviour, indicated and checked by
means of type families.

A particularly useful feature are the additional "hints" indicating the
expected amount and nature of work for creating an array. Of course,
"interleaving" has a considerable tradeoff with caching, but the
measured program seems to win overall by mixing up the tasks. And while
we are at it:
the "caching" motivation for the column-wise treatment of 2-dimensional
is mentioned a couple of times, yet never really substantiated, and
therefore one of the (minor) shortcomings in the presented material.

A second shortcoming is the absense of proper conclusions that would
allow for a summary and future work. Related work is also covered only
sparsely.

But overall, this is a Very Good Paper Indeed, rich in content,
instructive, and definitely a step forward towards top performance for
functional programming on modern multicore hardware.  It is surprising
that the amount of material presented here fits into the page limit,
which - in turn - indicates how well and accessible the paper is
written.

Minor things to correct:
p.3
known to the the programmer -> known to the pr.
p.5
performing concurrent writes -> writing elements in parallel
     (a truly "concurrent write" implies the same location)
p.7
makeCursror -> makeCursor
p.7and8
forward references, avoid if possible
p.8
the result of the new iter. being computed -> the new state in the next
iter.
Are the threads "blocked" or "waiting to be scheduled" (2 different
states)?
And are these the same or indeed new threads (in which case the threads
are probably rather "being created" than anything else)?
Threadscope probably provides a way to make this clear by the
picture(?)


----------------------- REVIEW 5 ---------------------
PAPER: 13
TITLE: Guiding Parallel Array Fusion with Indexed Types
AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Simon
Peyton Jones

OVERALL RATING: 2 (strong accept)
REVIEWER'S CONFIDENCE: 2 (medium)

SUMMARY: This paper describes a refined approach to parallel array
fusion that uses indexed types to specify the internal representation
of each array, so that client programmers can reason about the
performance of their program in terms of the source code. The results
are fully implemented in Repa version 3.2.

The problem of previous approach is that the source program needs to be
cluttered (with explicit pattern matching, etc.) to guide the compiler
towards efficient code, though it obscures the algorithmic meaning of
the source program and adds implicit precondition (about representation
of arrays) that is not captured in the function signature. This can be
seen with the following simple example:

 doubleZip :: Array DIM2 Int -> Array DIM2 Int -> Array DIM2 Int
doubleZip arr1@(Manifest !_ !_) arr2(Manifest !_ !_)
  = force $ map (* 2) $ zipWith (+) arr1 arr2

In other words, even though the programmer statically has a clear idea
of the array representation they desire, this knowledge is invisible in
the types and only exposed to the compiler if very aggressive value
inlining is used.

They tackled this problem by a clear idea to expose static information
about array representation to Haskell's type system. For example, the
above cluttered program becomes the following clear code, in which the
information of array representation is exposed in types instead of
explicit patterns:

 doubleZip2 :: Array U DIM2 Int -> Array U Dim2 Int -> Array U DIM2 Int
 doubleZip2 arr1 arr2
  = computeP $ map (* 2) $ zipWith (+) arr1 arr2

In addition, the type-indexed approach has a major advantage that it
supports a richer variety of array representations. Two end-user
applications make full use of this advantage to achieve improved
performance: the small-hint representation to avoid unnecessary
parallelization and the interleaved representation to deal with
unbalanced workloads.


EVALUATION: This is a well-written paper that provides a novel approach
to parallel array fusion that uses indexed types to specify the
internal representation of arrays.  It describes its approach precisely
and evaluates the implementation carefully with practical end-user
applications. The proposed approach not only solves the problem of
previous approach, but also supports a rich variety of array
representations seamlessly, which is useful and important in practical
applications.

I suggest that this paper will be accepted.


TYPOS.
 p. 2, section 2.2, paragraph 2: ---but, in Repa 1 &, 2 they -> ---but,
in Repa 1 & 2, they  p. 3, section 2.4, paragraph 1: the the programmer
-> the programmer  p, 9, section 5.1.1, the last sentence: cmap -> smap
?