Rejoinder
=========

We appreciate the detailed and insightful comments of the reviewers, which will help us to improve the paper.

== Review #78A asked about the expressiveness of Repa and, in particular, whether we can implement divide-and-conquer algorithms, such as Strassen's matrix multiplication.

Yes, we can express divide-and-conquer algorithms, such as Strassen's matrix multiplication, in a recursive fashion.  In fact, the FFT implementation in Section 7 is realised as a recursive divide-and-conquer algorithm.  In general, we can implement divide-and-conquer algorithms as long as the division of the input is solely dependent on the *shape* and not on the value of the input.  The latter would lead to *irregular* parallelism, whereas Repa is aimed at implementing regular array algorithms.

== Review #78A is also concerned that parallelism in Repa is implicit and cannot be controlled by the programmer.

This is an interesting point that we discussed in some detail in a preliminary version of the paper, but had to cut out due to the page limit.  We are not supposed to augment the paper in this response; hence, it must suffice to say that the programmer does have some control by virtue of an algorithm's structure and the use of the 'force' function.  For example, the second to last paragraph of Section 7 explains how the parallelsim in the function 'fft1D' decreases in each recursive step.  By structuring the function differently, and not forcing the results of the recursive calls separately, we could prevent the parallelism from decreasing.


== Review #78C notes that there are some inconsistencies in the presented code and that the version of Repa currently published on the Haskell package repository, Hackage, differs in some aspects from the paper.

We apologise for the inconsistencies in the code, which are due to copy editing the code after typesetting it in LaTeX prior to submission.  We will make sure to fix these mistakes.

We published the source code of the Repa library on Hackage a while after the paper submission.  In the meantime, we had already improved some implementation details in the library — all the ideas described in the paper are still the same, though.  When revising the paper, we will ensure the paper is in sync with the published code and will include a URL for the version of the library that matches the paper.


== Review #78C is concerned that some 'techniques behind the scenes [concerning parallelism] remain unexplained.'

Those techniques that remain unexplained are part of the base library of Data Parallel Haskell and have been explained in our previous work (references [5, 16]).  We briefly mention this fact in Section 3.1, but will make sure to state this more clearly.


== Review #78C suggests that our claim that Repa 'does not require any compiler support specific to its implementation' appears to be untrue. This is because we require the use of GHC 6.13 (the current developer version) to achieve optimal performance.

We stand by our claim.  Some *general purpose* optimisations implemented in GHC (in particular, inlining)  have been substantially improved in GHC 6.13.  This benefits all programs, not just Repa.