----------------------- REVIEW 1 --------------------- PAPER: 13 TITLE: Guiding Parallel Array Fusion with Indexed Types AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Simon Peyton Jones OVERALL RATING: 2 (strong accept) REVIEWER'S CONFIDENCE: 2 (medium) This paper reports on a new version of the Repa library for parallel array computation. Previous versions have provided good performance, especially accommodating fusion; but making this work has required sophisticated knowledge from the programmer, and may entail embedding assumptions in the code (eg about data representations) that aren't revealed in type signatures. The new version 3 uses indexed types to express information about data representations, solving these problems. It's all implemented and on Hackage; two extended are discussed. This all looks good to me; I have only minor comments. p2: If you need more space (see "TeXery", below), you could delete the duplicated definition of doubleZip. [I hate it when reviewers tell me to add something when my paper is already up against the page limit: they should always say what can be deleted to make space for it!] p2: "foundational view that" p3: In the definition of doubleZip in S2.3, I think "sh" should be "sh1". In the following paragraph, "vec" should be "vec1,vec2". p3: "at an particular point"; "the the" p4: Actually, Repa has an infinity of representations, because they're recursively structured. (Or at least of indices for them: perhaps S (S D) and S D are different indices for the same representation. But there are still infinitely many ways of partitioning arrays.) p4: Another way to save some space would be to condense the bulleted list to a paragraph. p5: Delete the comma in "between, manifest" p7: "performing this intersection is straightforward" (it's still possible when there is internal structure to consider, just not so straightforward). p7: "fill traverse" p8: "makes is attractive" p8: In a paper on functional programming, I suggest using a cross rather than a dot for multiplication. p9: Much of S5 can be condensed. The point of this paper is not that you can get good performance out of Repa 3; you could do that already with Repa 2. The point is that you can get that good performance cleanly and safely without hackery. So the discussion of smallness and interleave hints is important; I guess the ThreadScope graphs are too, to motivate the hints; but the comparison between Gauss-Reidel and Jacobi is beside the point, isn't it? p10: "the underlying the row-major vector" p12: Be consistent about "POPL" vs "Principles of...". [8] appeared at the Haskell Symposium, not Workshop. The first author of [4] has been mangled. TeXery: You are using "\small" for verbatim blocks. I think this is a bad idea - ACM's 9pt text size is small enough already. But if you insist on using it, you have to fix a bug: you're consistently inheriting the condensed line spacing of the small verbatim in the previous text paragraph - eg in the "Our Repa library..." paragraph at the start of S1. (TeXnical details: line spacing applies to a whole paragraph, and it is the value in force at the end of the paragraph that is used. So if you choose \small in the middle of a paragraph and let it run past the end of the paragraph, the text size will change where the \small occurs but the whole paragraph will have \small-er line spacing. The fix is to make sure that you end the previous paragraph before selecting \small for verbatim.) ----------------------- REVIEW 2 --------------------- PAPER: 13 TITLE: Guiding Parallel Array Fusion with Indexed Types AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Simon Peyton Jones OVERALL RATING: 2 (strong accept) REVIEWER'S CONFIDENCE: 3 (high) The paper describes an implementation technique for an efficient array library.  The technique was used in version 3 of the Repa library. The central idea of the paper is to replace a single array type with a family of array types, each with their own representation and, hence, differing strengths and weaknesses.   This makes it possible to use the type system to capture the design choices that were made in the implementations of the various array operations. I really enjoyed reading this paper.  The authors took care to introduce the reader to just enough detail about the design choices in the previous versions of Repa which, in turn, lead to a nice explanation of the problems that motivated the current implementation. As a result, by the time the central idea of the paper is introduced in Section 3, it feels very natural. In addition to presenting a technique for an efficient implementation of a rather flexible array library, the paper also showcases the strengths of Haskell's type system.  The combination of type-classes and data-families makes for a very elegant implementation that would be hard to achieve in any other mainstream programming language. ----------------------- REVIEW 3 --------------------- PAPER: 13 TITLE: Guiding Parallel Array Fusion with Indexed Types AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Simon Peyton Jones OVERALL RATING: -1 (weak reject) REVIEWER'S CONFIDENCE: 3 (high) This paper presents a new version of Repa, a Haskell library for high performance, parallel computation over arrays. This version, version 3, overcomes some of the limitations and undesirable aspects of Repa 2. The main contribution is the use of GHC's type families and type functions to develop an API with types that describe both the type of the array elements and the representation of the array being used.  The result is a library with richer types whose API makes it easier for the end-user to write a high-level program that is transformed by the compiler into efficient code via inlining, fusion, and unboxing. The topic of the paper is interesting and relevant, but the paper itself reads more like documentation of the library rather than a paper about the contributions.  It assumes knowledge of a great many features, notably type families, type functions, fusion, and stencil convolutions.  Given the audience, some knowledge might be assumed, but the latter concept in particular is not well known, even to seasoned Haskell programmers unless they work in this particular domain.  It also seems to assume quite a bit about of knowledge about how array programming is done in general, and gives little explanation of how Repa works.  I understand there are two papers on this subject, but a short refresher might be in order. The structure of the paper starts off very nicely.  Section 2 clearly outlines the problems and Section 3 the main idea. In Section 4, the discussion is lost in implementation details, including several large blocks of code.  A more high-level discussion would be better, and indeed might save the user from being barraged by the large number of type classes that this library requires and are continuously introduced.  These details obscure the take-home message at the moment. Section 5 is dedicated to examples and benchmarks.  It takes up a significant amount of space in describing the problems themselves, and needs a bit more structure.  I believe that more time could have been spent explaining the problems that arise and how they are overcome; the discussion of which feels a bit rushed as is. Section 6 is called "Challenges of Array Fusion", and seems to encapsulate conclusions and future work.  This section is particularly confusing.  I appreciate the discussion of limitations, and how they can be overcome, but the discussion fails to distinguish clearly between limitations of the current design, limitations of the compiler, and those that are more fundamental.  The related work is unacceptable, referring to prior Repa papers with almost no discussion. The reliance on previous Repa papers makes the contribution of this one less clear.  It is unclear if this is simply an "incremental" update or a totally new take on the project.  Personally, I would have preferred more examples and a more in-depth look at the "main idea" than the nitty-gritty details of the implementation. Overall, I think this paper is interesting and relevant to the Haskell community, and they are taking a newer approach.  It needs more principled discussion, however, and less dependence on previous papers and source code.  I think it is not yet ready for publication, and am therefore recommending a weak reject.  I think that with some restructuring, however, the paper would be an excellent contribution. More specific remarks: page 1: the term "descriptive types" is introduced as if it is a specific term, but really refers to the fact that types simply contain more information Section 2 page 3: Here, as well as throughout the paper, numerical references are used as textual citations (for example, "see [8] for a full definition").  These should be replaced with actual text containing the author's name, at least. Section 3 page 4: The kind declaration is a bit deceptive.  Haskell supports promoted data types, but no kind declarations, and it doesn't seem to add much to characterize the nullary datatypes this way, since the goal is to prevent such a closed kind. Section 4 page 7: "To improve cache performance, these loops fill traverse the array in " -- I assume "fill traverse" is a typo Section 5 page 8: is the dot in the equation for the linear solver supposed to be a full stop? page 9: Mention of LLVM comes out of nowhere. Can you provide context as to why this is relevant, here? ----------------------- REVIEW 4 --------------------- PAPER: 13 TITLE: Guiding Parallel Array Fusion with Indexed Types AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Simon Peyton Jones OVERALL RATING: 2 (strong accept) REVIEWER'S CONFIDENCE: 4 (expert) This paper describes a new version of the library "repa" (regular parallel arrays", which represents a major overhaul compared to previous versions. The key change is that now the evaluation state of an array (delayed or manifest) is statically known by virtue of a sophisticated type family encoding. After a motivation criticising earlier versions, the paper elaborates on a variety of aspects related to this major change, and substantiates a fundamental claim of their improvements: "to express the cost model clearly byt non-intrusively". Repa-3 arrays encode their runtime representation as one of a number of "indexing" types, which include useful other variants such as foreign data, unboxing, and special types geared towards stencil operations. This type encoding of an array's internal representation allows for much better inlining heuristics and specific optimisations following the application characteristics. Such a programming model is (rightly!) more explicit and requires the programmer to show (operational) understanding of the program's exact array evaluation at runtime. Yet, this was true for well-performing repa-1 programs as well. The sorcery of the force function now gives way to an explicitly controlled behaviour, indicated and checked by means of type families. A particularly useful feature are the additional "hints" indicating the expected amount and nature of work for creating an array. Of course, "interleaving" has a considerable tradeoff with caching, but the measured program seems to win overall by mixing up the tasks. And while we are at it: the "caching" motivation for the column-wise treatment of 2-dimensional is mentioned a couple of times, yet never really substantiated, and therefore one of the (minor) shortcomings in the presented material. A second shortcoming is the absense of proper conclusions that would allow for a summary and future work. Related work is also covered only sparsely. But overall, this is a Very Good Paper Indeed, rich in content, instructive, and definitely a step forward towards top performance for functional programming on modern multicore hardware.  It is surprising that the amount of material presented here fits into the page limit, which - in turn - indicates how well and accessible the paper is written. Minor things to correct: p.3 known to the the programmer -> known to the pr. p.5 performing concurrent writes -> writing elements in parallel      (a truly "concurrent write" implies the same location) p.7 makeCursror -> makeCursor p.7and8 forward references, avoid if possible p.8 the result of the new iter. being computed -> the new state in the next iter. Are the threads "blocked" or "waiting to be scheduled" (2 different states)? And are these the same or indeed new threads (in which case the threads are probably rather "being created" than anything else)? Threadscope probably provides a way to make this clear by the picture(?) ----------------------- REVIEW 5 --------------------- PAPER: 13 TITLE: Guiding Parallel Array Fusion with Indexed Types AUTHORS: Ben Lippmeier, Manuel Chakravarty, Gabriele Keller and Simon Peyton Jones OVERALL RATING: 2 (strong accept) REVIEWER'S CONFIDENCE: 2 (medium) SUMMARY: This paper describes a refined approach to parallel array fusion that uses indexed types to specify the internal representation of each array, so that client programmers can reason about the performance of their program in terms of the source code. The results are fully implemented in Repa version 3.2. The problem of previous approach is that the source program needs to be cluttered (with explicit pattern matching, etc.) to guide the compiler towards efficient code, though it obscures the algorithmic meaning of the source program and adds implicit precondition (about representation of arrays) that is not captured in the function signature. This can be seen with the following simple example:  doubleZip :: Array DIM2 Int -> Array DIM2 Int -> Array DIM2 Int doubleZip arr1@(Manifest !_ !_) arr2(Manifest !_ !_)   = force $ map (* 2) $ zipWith (+) arr1 arr2 In other words, even though the programmer statically has a clear idea of the array representation they desire, this knowledge is invisible in the types and only exposed to the compiler if very aggressive value inlining is used. They tackled this problem by a clear idea to expose static information about array representation to Haskell's type system. For example, the above cluttered program becomes the following clear code, in which the information of array representation is exposed in types instead of explicit patterns:  doubleZip2 :: Array U DIM2 Int -> Array U Dim2 Int -> Array U DIM2 Int  doubleZip2 arr1 arr2   = computeP $ map (* 2) $ zipWith (+) arr1 arr2 In addition, the type-indexed approach has a major advantage that it supports a richer variety of array representations. Two end-user applications make full use of this advantage to achieve improved performance: the small-hint representation to avoid unnecessary parallelization and the interleaved representation to deal with unbalanced workloads. EVALUATION: This is a well-written paper that provides a novel approach to parallel array fusion that uses indexed types to specify the internal representation of arrays.  It describes its approach precisely and evaluates the implementation carefully with practical end-user applications. The proposed approach not only solves the problem of previous approach, but also supports a rich variety of array representations seamlessly, which is useful and important in practical applications. I suggest that this paper will be accepted. TYPOS.  p. 2, section 2.2, paragraph 2: ---but, in Repa 1 &, 2 they -> ---but, in Repa 1 & 2, they  p. 3, section 2.4, paragraph 1: the the programmer -> the programmer  p, 9, section 5.1.1, the last sentence: cmap -> smap ?