Referee(s)' Comments to Author:
Referee: 1
Comments to the Author
SUMMARY

The paper describes a type-and-effect system for a core language equipped with
mutable and immutable algebraic data structures and with a mixture of strict
and lazy evaluation. The system allows functions to be polymorphic in the
mutability of their arguments and results. This feature is intended to remedy
the code duplication that is sometimes observed in ML and Haskell
programs. Furthermore, the system keeps track of effects, and requires that
only pure computations can be suspended. This feature is intended to
facilitate reasoning about programs as well as program optimisation.

EVALUATION

** in favor of the paper

The ideas that form the technical foundation of the paper appear sound and
simple. As far as I can see, there are two basic features.

First, all algebraic data is allocated in regions, and every region is marked
as either mutable or immutable. This choice occurs at region creation time;
once a region is created, its status remains fixed. Destructive update is
permitted only within mutable regions.

Second, the system keeps track of effects. Read and write effects are
distinguished, and each effect carries the name of the region that is
affected. Effect masking is permitted. (There are in fact two kinds of
effect masking: the traditional kind, where any effect on any region r can
be masked outside of the "letregion r" construct; and a new kind, where
a read effect on an immutable region r can be masked at any time.) The
system uses effects to enforce that only pure computations can be suspended.

The above two features are essentially orthogonal. As far as I can see, the
only interaction between the two lies in the idea that a read effect on a
region r can be masked (that is, considered pure) if only if this region is
immutable. Thus, the second feature takes advantage of the presence of the
first feature. This seems a natural and interesting idea.

The paper presents a core language, which is quite verbose, but one expects
that a reasonably lightweight surface language could be designed, by using
constraint-based type inference in the style of Haskell. This may have been
done in separate work by the author, which I have not looked up.

** against the paper

Against the paper, I have two main criticisms, plus some minor issues.

* First criticism. The results that are formally established are, I believe,
insufficient. The author establishes type soundness, in the form of subject
reduction and progress statements. So, "well-typed programs do not
crash". This is fine, but somewhat trivial: this result would still hold if
the system was simplified by removing the requirement that suspended
computations must be pure, or further simplified by no longer keeping track of
effects, or even further simplified by viewing all regions as mutable.

To put this another way, although the abstract claims that "only computations
without visible side effects are suspended", there is no formal version of
this statement, and (as a consequence) no formal proof that this property of
the system can be exploited to enable interesting program transformations.
Section 5.1 shows that the statement is actually false, or only a certain
interpretation of it is true: because the system does not keep track of
allocation effects, a computation that allocates fresh mutable memory is
considered pure; so it is not the case that pure computations can be shared
without altering the program's semantics. Sections 5 and 6, which study
program transformations, are informal: no semantics preservation property is
proved.

To put this another way, I would like a formal answer to the questions: "why
not view every region as mutable and forget about effects? what is the benefit
of marking certain regions immutable? what is the benefit of keeping track of
effects?" Apparently the answer is that this enables more program
transformations, but this remains informal in the present paper.

* Second criticism. The author carefully avoids the issue of the interaction
between polymorphism and mutable state. He side-steps the issue by adopting
the convention that type abstractions suspend evaluation. Although this is
technically correct (and the discussion at the end of Section 4.6 is
accurate), it is not satisfactory. The programming languages in the ML family
traditionally have a true type-erasure semantics, where type abstractions do
not suspend evaluation. The paper does not explain how this property could be
maintained in the presence of mutability polymorphism.

Roughly speaking, in an ML-like language, one must be careful not to
generalize the type of an expression that allocates fresh mutable
memory. There are two traditional ways of solving this problem. One (a) is to
adopt the value restriction: consider that every expression may allocate fresh
mutable memory, unless it is a value; so, generalize only values. The other
(b) is to use a more refined type-and-effect system: keep track of allocation
effects, and generalize only expressions that do not allocate fresh mutable
memory. It is not clear to me how to adapt either of these traditional
solutions to the setting of the present paper. Consider approach (a). Is an
application of the data constructor "cons" to values a value? In an ML-like
language, the answer would be positive if this data constructor has immutable
fields, and negative if it has at least one mutable field. Here, however, we
have mutability polymorphism, so the answer would be positive if the region
that this data constructor inhabits is immutable, and negative if it is
mutable. Because this region is in general a region variable, there is no
fixed answer. Approach (b) presents a similar problem.  Assume an expression
has an allocation effect in region r. Can this expression be generalized? The
answer is positive if r is immutable (an allocation effect in an immutable
region can be masked), and negative if r is mutable. Again, there is no fixed
answer.

An example may best summarize the problem. Consider the expression "Cons
(lambda x.x) Nil", that is, a singleton list of the identity function. What
should be the type of this list? If it is an immutable list, then its type can
be generalized, so one type could be "forall r a, Const r => List r (a -> a)".
On the other hand, if we do not generalize, then we have no constraint on r,
so one type could be "List r (t -> t)", where r is any fixed region and t is
any fixed type. These two type schemes are incomparable, and it is not obvious
how to find a single principal type scheme that subsumes these two.

I believe that this issue may be the main reason why mutability polymorphism
was never introduced in ML.

It would be very nice if the author would face this problem and propose a true
type-erasure semantics.

* Minor criticism. The paper puts much emphasis on explicit witnesses, or
proof terms, for propositions. It is not clear to me why it is important or
useful to construct explicit witnesses. The propositions involved seem simple
enough that entailment of propositions is decidable, so proof terms could be
reconstructed by the type-checker of the core language instead of incorporated
as part of type derivations. Such an approach, it seems, would make the
presentation of the core language significantly more lightweight. I would like
this issue to be clarified.

* Minor criticism. I believe there might be technical problems in the
formulation of some of the typing rules, but this can probably be fixed;
detailed comments appear further on.

** bottom line

I have mixed feelings about this paper. I like the simplicity of the basic
ideas, but, as explained above, I am disappointed to find that (i) the
guarantees offered by the system are not fully spelled out and (ii) the system
does not have a true type-erasure semantics (type abstractions suspend
evaluation) so it would not be suitable for adoption in an ML-like language.

I believe I would be willing to accept the lack of an answer to issue (ii), as
the problem is challenging and the author could perhaps attempt to justify why
it might be acceptable for type abstractions to suspend evaluation. On the
other hand, the paper really feels incomplete without an answer to (i). The
paper is about "program optimisation", but never delivers a formal guarantee
that types can help optimise programs.

I summary, I would like to recommend a major revision of the paper, containing
a formal statement of the claim that "only computations without visible side
effects are suspended" and an application of this statement to formally
establish the correctness of at least one of the program transformations
presented in the paper. I would also like issue (ii) to be addressed or at
least discussed.

DETAILED COMMENTS

page 2, "only pure function applications should be suspended": why is this
important or desirable?

page 2, "in ML, references and arrays [...] are the only ones capable of being
destructively updated". In ocaml, there are also strings, which technically
are not considered arrays. The fact that ocaml strings are mutable is often a
problem: for instance, the compiler cannot hoist string constants out of
expressions! One would like to have both mutable and immutable strings, and
one would like some functions (such as String.length) to be polymorphic with
regard to this. Arrays pose an analogous problem. The problem with algebraic
data structures is perhaps somewhat less acute: in practice, there does not
seem to be much duplication between (say) a library for purely functional
trees and a library for imperative trees.

page 3, "we give integers the type Int r, where r is a region variable, and
constrain r to be mutable or constant as needed". A more naive or more
natural approach might be to give integers the type "Int r", where r is one
"mutable" or "constant". This approach would lead to more compact types and
to a purely unification-based type inference algorithm. Would it work? Would
it be less expressive than your approach? It seems that it would amount to
working with just two regions, a mutable one and a constant one. It would
not allow effect masking, so it would classify fewer expressions as pure.
Is that why you reject this approach?

page 5, the constant 5 has type "forall r, Int r". Should we interpret this
as a function that waits for a parameter r and allocates a boxed integer 5?
My impression (after reading the entire paper) is that the answer is "yes",
but it was not initially clear, since it was not explained up front that
type abstractions suspend evaluation. By the way, one of the motivations of
the paper is program optimisation; however, the introduction of mutability
polymorphism means that integers must be boxed, so there is a price to be
paid up front. Of course, as demonstrated in Section 6, one may hope to be
able to unbox integers in many situations, but this requires work.

Section 2.1, it seems that a bare arrow stands for an arrow with effect
bottom; please mention this. Is updateInt a primitive function? (I believe it
is.)

page 4, the discussion in Section 2.2 is somewhat unclear to me. I understand
that in Haskell, the conjunction of "Mutable r" and "Const r" does not lead to
an immediate type error. My understanding is that in Disciple, it does lead to
a type error. Is this correct?

page 4, the syntax used in the definition of the type List is quite
surprising, as it seems that the parameter r is unused. At this point, you
have not introduced the notion of a "primary region variable". Of course the
reader may suspect that something like this is going on behind the scenes;
nevertheless the syntax of the definition is disturbing. I would suggest
changing the syntax to:

 data List r a = Nil@r | Cons@r a (List r a)

This is of course more verbose, but has the merit of explicitly showing that r
is used as the region where Nil and Cons are allocated. This syntax also
offers extra flexibility, in that it now becomes possible for Nil and Cons to
inhabit different regions (so, if needed, one could be considered mutable and
the other immutable).

page 4, "if the list is not a cons, then a run-time error is raised". This is
admittedly mostly a matter of taste, but it would be nice if the system did
not permit any kind of run-time errors. This could be achieved in several
ways. One simple way would be to follow ocaml's convention that only records
can have mutable fields. (In your algebraic data type formulation, update
would be permitted only if there is just one data constructor.) Another,
perhaps more satisfactory approach would be to enrich type environments with a
distinction between mutable and immutable bindings, in the style of caml-light
where matching against the pattern "C x" would make "x" a mutable variable if
the data constructor "C" was declared to carry a mutable argument. In your
system, this might involve populating the type environment with bindings of
the form "x : T@r", "x has type T and inhabits region r", and allow an update
to the variable x provided the constraint "Mutable r" holds. I believe this
would allow programmers to express updates in a more natural style; plus, the
run-time test would be eliminated, and compilation to efficient code would be
easier (your current update_Cons,0 operator requires some work to compile
efficiently).

page 6, a different design choice would have been to have an explicit "force"
operation, as well as a type for suspensions. This would then allow suspending
impure computations: the type "lazy a e" would be the type of suspended
computations with result type a and effect e. This would be sound
type-and-effect system, I believe. You could still recognize suspended
computations that are pure (if and where desired) since they have type "lazy a
bottom". Could you explain why you choose not to follow this route? One point
that I do understand (I think) is that, as soon as you choose not to have an
explicit type for suspensions, then (a) force must be implicit in function
applications, case constructs, etc. and (b) one must require that only pure
computations would be suspended, otherwise the type-and-effect system would
be unsound. (By the way, this would be a technical explanation why you require
that only pure computations can be suspended, *if* the type-and-effect system
was used for some *other* purpose than making sure that only pure computations
can be suspended!)

page 7, on the "phase distinction" between region variables and region
handles. In my opinion, this is a sound distinction to make, and I commend you
for making it. Region variables are type variables: they have lexical scope,
while region handles are freshly allocated at runtime: they have dynamic
scope. The distinction is analogous to the by-now-standard distinction between
term variables x and memory locations l. Although one could technically insist
on confusing region variables and region handles (by viewing the latter as
bound by a toplevel letregion), it is clearer to distinguish. A couple of
things I do not understand. On page 7, the sentence "region handles are bound
by region variables" does not make sense to me. On page 20, the "similarity
judgement", which relates region variables and region handles, is mysterious.
I see no reason why this is needed: in my experience, region variables are
replaced at runtime with region handles, just like term variables are replaced
at runtime with (values that contain) memory locations in standard treatment
of ML references. There is usually no "similarity judgement x ~ l"! Perhaps
the need for this judgement arises because some typing rules (e.g. letregion)
are seemingly non-hygienic? I comment on this further on.

page 7, "This similar to system" -> "This is similar to the system"

page 8, at this point, you have introduced "underlined const", "MkConst", and
"Const"; similarly "underlined mutable", "MkMutable", and "Mutable".  This is
noisy and confusing. Later on, you introduce MkPurify, MkPureBot, MkPureJoin,
etc. Simple, but still noisy. Figures 6 and 8 are also (in my humble opinion)
mostly noise. As suggested earlier in my review, why not remove witnesses
entirely and let the type-checker silently reconstruct them where needed? It
seems that this might be possible. You would end up with a constraint-based
type system, a pretty standard notion. (You would need to define what it means
for a constraint to be satisfiable, and this could be done only with respect
to a world, a mapping of region handles to either "mutable" or
"immutable". Entailment of constraints would involve a universal
quantification over worlds.) On a related note, it would be nice if you could
refer to "propositions" and "proofs" instead of "(dependent) kinds" and
"types".  The terminology would be much less intimidating! I understand that
you refer to witnesses as "types" because they do not exist at runtime; but
you could also call them "proofs" and indicate that proofs are erased at
runtime.

page 9, if the system had a notion of subtyping, then the constraint "Pure e"
could be replaced with "e <= bottom". I understand that you do not wish to
have subtyping, so perhaps you could just have a construct to convert a pure
effect e into bottom. Ah, it later turns out (page 11) that you do have such
a construct, "mask". But then, the type of suspend could be simplified to
(a -> b) -> (a -> b), could it not? (Assuming that a bare arrow means an
arrow with effect bottom.)

page 9, last sentence, missing dot.

page 10, the example of lazyMap seems more complex than it should be. Since
the effect e is constrained to be pure, it could be replaced with bottom.
Furthermore, since r1 is constrained to be constant, the read effect Read r1
could be ignored. I understand that the point here is to illustrate the
conjunction of effects, but perhaps it would be better to pick an example
where no simplifications are possible.

page 10, "guarentees" (use ispell)

page 11, "super-kinds", wow! more terminology escalation. Could you view
"propositions" and "kinds" as two distinct syntactic categories, and thereby
avoid the need for "super-kinds" altogether?

page 11, "We use use"

figure 1, in the definition of Sigma (store typing), I had difficulty reading
the first production "l: tau, rho". I was troubled by the comma. Technically
the comma is already used as a separator in the production "Sigma1, Sigma2",
so it might be better not to overload it. How about "l: tau@rho"?

page 13, "[in TyLetRegion], we have included the r notin fv(tau) premise
anyway because it makes the proof of soundness easier". I am surprised! In my
opinion the premise does not just make things "easier", it is required for
hygiene. Without the premise, the type tau would be allowed to contain a free
region variable r, which would not make sense -- the type tau would be
ill-kinded with respect to the environments Gamma and Sigma. Why take the
trouble to define well-kindedness if you allow ill-kinded types to appear in
judgements? This leads me to ask why you do *not* require the premise "r notin
fv(sigma)". Either that, or replace "sigma" in the conclusion with something
along the lines of "sigma \ r". Otherwise, again, the effect "sigma" is not
well-kinded in the context of the conclusion. This does not make sense to me.

figure 4, there is something I do not understand about TyAbsT and TyAppT.
First, since type abstractions suspend evaluation, it seems to me that they
should have a latent effect; so the universal type should not take the form
"forall a, tau" but "forall a, tau/e". I do not see this in the syntax of
types. In TyAbsT, the type variable a may occur in sigma2, it seems, but so it
may occur free in the conclusion; again this is unhygienic. Second, in the
conclusion of TyAppT, the substitution of phi2 for a is applied not only to
the type phi12 (as it should) but also to the effect sigma1. This does not
seem to make sense, since the scope of the type variable in the premise did
not extend all the way to sigma1 (did it?). Did I misunderstand? Please clarify.

figure 4, rule TyData requires the type phi to be well-kinded, but there is no
such requirement for phi'. Yet the other premises do not seem to automatically
guarantee that phi' is well-kinded.

figure 5, there seems to be some duplication between the store typing Sigma
and the store H: both associate region handles to mutability flags. Later on,
this forces you to require Sigma and H to be in agreement. Is there any way
of avoiding this duplication and dropping the agreement hypotheses?

page 16, "the second premise of EvLetRegion is always true": if it is
literally "always true", why is it there? Surely you meant something slightly
different.  By the way, the first two premises EvLetRegion appear somewhat
circular, in that the first premise seems to define H' in terms of the
Delta_i, while the second premise seems to indicate how to obtain the Delta_i
when H' is known. This is perhaps not technically a problem (these are just
equality hypotheses, not definitions), but it is somewhat disturbing.

figure 7, do I read EvSuspendApp correctly? It seems to me that reduction is
non-deterministic and that a suspended application can spontaneously
beta-reduce at any time. It also seems to me that suspended computations are
viewed as values, hence duplicated; they are not shared. In other words, this
is not a call-by-need semantics. Would you be able to show that (for
well-typed terms only) it is equivalent to a call-by-need semantics?

page 17, "kind expressions can contain types, and hence store locations as
well". I do not see how it is possible for a type to contain a store location.

page 19, back to the topic of hygiene for region variables: "if we were to
permit this example to be well-typed then we would need to weaken our
Preservation theorem"... Why on earth would you do this? Don't. You will keep
a simple statement of Preservation and will lose nothing. If you do wish to
adapt this example so that it can be well-typed, then I suggest that
existential quantification is the tool you are looking for. The sub-expression
"succ r1 r1 (23 r1)" has type "Int r1", so it also has type "exists r, Int r".
This new type does not mention r1, so it is fine for it to occur in the
conclusion of a hygienic TyLetRegion rule. At runtime, the witness for r is a
dynamic region handle, which does escape the scope of the "letregion"
construct where it was allocated -- but that is fine, since region handles are
not lexically scoped.

page 20, I am surprised to see a new typing rule added after type soundness
has been stated. Is it sound? Why did you find it difficult to include this
rule in the soundness proof? Is it useful? Why did you find it necessary to
include this rule in the implementation? More specifically, here are some
comments about rule TyObs:

* I observe the following fact: give me a type derivation that uses TyObs;
 then I can insert a "letregion r" construct just below every use of TyObs,
 and I obtain a new type derivation, which is still well-formed, since r
 does not occur free in the conclusion of TyObs.

* thus, TyObs is no more expressive than a hygenic version of TyLetRegion
 with built-in effect masking; in other words, I suggest removing TyObs
 and replacing "sigma" with "sigma \ (Read r \/ Write r)" in the conclusion
 of TyLetRegion; you will obtain a hygenic version of TyLetRegion and remove
 the problematic TyObs, without any loss of expressiveness.

* the rule TyObs, as it stands, seems useless, because the requirement that
 r must not appear in Gamma implies that nothing can be known about r; in
 particular, it cannot be known that r is mutable, or that r is immutable.
 A combined TyLetRegion/TyObs rule would not have this problem.

page 20, the informal reasoning appears circular. You are trying to justify
why it is sound to use TyObs to prove that a certain computation is pure, but
the informal argument exploits the fact that "suspended computations must be
pure". In short, these lines ("As r does not appear [...] has finished
evaluating") do not make much sense to me. Better to incorporate some
version of the rule in the formal soundness proof.

page 20, "it does not guarantee that optimising transforms based on masked
effect information preserve the meaning of the original program". Where is the
emphasis? Do you mean that this property is in fact true, but not guaranteed
(because not yet formally proved), or is in fact false? At this point of the
paper I thought the former (after all, optimisation is the motivation of the
paper). But upon reading 5.1 I found that it really seems to be the latter.
That is a disappointment!

page 22, so, expressions with an allocation effect in a mutable memory region
are not deterministic, yet your system views them as pure. This seems to be a
shortcoming of your system. You argue "we do not actually need allocation
effects", but I do not understand your one-line argument; if you wish to
pursue this argument, please clarify it and make it a formal result. I would
suggest instead introducing explicit allocation effects (the theoretical cost
seems negligible; the practical cost might be some extra verbosity), together
with the rule that an allocation effect on an immutable region can be masked.
Then prove a formal result about your system -- computations which the system
deems "pure" are actually pure in a certain sense (deterministic?) and can
be shared, hoisted, etc.

page 24, "Doing this exposes [...]": missing a colon at the end of this
sentence.

page 25, "One potential disadvantage [...]". A possibility might be to
explicitly mask the effect Read r1 in the transformed code. Then the witness
w1 would still be in use. I don't know if this would help much.

Section 6.1, it is conventional wisdom in the separation logic community that
individual witnesses of distinctness are not very convenient, because their
number grows quadratically with the number of regions involved. One might wish
to adopt some ideas from separation logic and use constraints such as (say)
"Mutable r1 * Mutable r2", which (by definition of the star) implies that r1
and r2 are distinct regions. This might take us quite far away from the
current system, though.

The ideas in Section 6.1 can be traced back at least to the "calculus of
capabilities" and "alias types" papers (which are properly acknowledged),
so I do not see anything really original here.

page 28, "one of the current limitations [...] is that the results of case
alternatives must have the same type". This is reasonable! It is actually so
in all type systems that I know of, including systems of dependent types,
etc. The key is not to relax this rule but to enrich the type algebra so that
the results of case alternatives can be *made* to have the same type.  Here,
in particular, existential quantification over region variables would be very
useful. It would allow you to "choose between a mutable and a constant
integer", since both have type "exists r, Int r". You don't need "a notion of
subtyping", only an introduction rule for existential quantification (which
admittedly can be viewed as a form of subtyping, if desired). Note that
existential quantification over regions is a key ingredient in the system
of "alias types" and its descendants.

page 28, last paragraph. One may wonder whether it would be possible to extend
your system with support for changing mutable data into immutable data. A
classic motivating example would be a tail-recursive version of list map in
destination-passing style: the new list is mutable while it is constructed,
but becomes immutable once returned to the client. Of course, converting
mutable data to immutable data is unsound if there is sharing. One expects
that linearity might help: it should be sound to convert mutable data to
immutable data if it is not shared. If "Mutable r" is viewed as an ownership
assertion over the region (as in separation logic), then it could be safely
turned into "Immutable r". Such a mechanism would be quite flexible; in
particular, there would be no need to delimit a lexical scope where
initialisation is performed; the decision to convert "Mutable r" into
"Immutable r" could be taken at any point in the lifetime of a data structure.

----------------------------------------------------------------------

Referee: 2
Comments to the Author

Witnessing Mutability, Purity and Aliasing for Program Optimization


This paper describes a language in which the fields of any constructed
data value may, a priori, be mutated. Fields are associated with
regions and regions may be constrained to be constant or
mutable. Constraints are realized by capabilities/witnesses that are
passed around in the operational semantics, with update operations
requiring a witness of mutability for the field to be
updated. Functions are annotated with effects, specifying the regions
that may be read or written, and can be polymorphic in types, regions
and effects. Further refinements include witnesses that particular
effects can be considered pure (reading from a constant region),
selective suspension (only applications of *pure* functions may be
suspended) and witnesses for disjointness of regions.

There is a great deal of previous work on effects, capabilities and
region-based type systems, so the basic ideas used here are all fairly
familiar. The main distinguishing features of this work seem to be the
proposal to work with pervasive mutability in a high-level,
higher-order language, the emphasis on witness passing and
construction (presented using some not-entirely-trivial kinding), and
the explicit goal of justifying interesting program transformations,
rather than just, say, safe memory management.

Interesting though this subject is, I do not think this paper reaches
the standard of JFP. My two main criticisms are that the language
design is rather unconvincing and that the formal development is
extremely weak.

The initial motivation given is that languages like ML separate
mutable reference types from value types. As a high-level programmer,
I have always regarded this as a - rather major - feature, not a bug!
Despite writing a great deal of ML code using both purely functional
and quite heavily imperative data representations, I have rarely
wished for polymorphism in the mutability of data. I can already make
lists of reference cells, for example, and "deciding that we need an
operation to destructively insert an element into the middle of a
list" is a radical change of data representation, suddenly introducing
identity and mutability where there was none before - I'd expect to
have to make big changes elsewhere, replacing copying with explicit
cloning and so on. I wouldn't expect much code to be usefully
polymorphic in mutability, and this seems to be borne out by the
examples in the present paper, most of which are predicated on
particular assumptions about mutability and constancy and do a lot of
work that's specific to those assumptions; not only being careful
about copying things, but also passing around and constructing a
enormous number of regions and witnesses. Hence one writes different
code for different combinations of constraints, which is more
straightforward in existing languages. The language seems likely to be
inefficient (by default, even "5" has to be compiled as a fresh
allocation) and isn't especially expressive about what can be updated
(only fields of immutable constructors, so in practice one probably
ends up using another unary datatype corresponding to ref to allow
mutation on the outside). The heavily-kinded presentation of witnesses
is rather baroque and clumsy compared with (a) just using separate
syntactic categories for things and (b) using a bit more logic -
having all this general purpose machinery of kinds sits uneasily with
the odd syntactic restrictions on uniqueness of region identifiers in
the initial program, constructions that can only appear in certain
places in the program, and so on.

The only theorem is a preservation and progress result, which does not
directly establish a comprehensible property of programs; it merely
says something about the extra witnesses that get passed
around. There's no *immediately* obvious reason to believe that the
presence or absence of these (fairly complex) witnesses implies
something useful. Furthermore, the most interesting aspect of effect
systems (going all the way back to FX) is masking - rules that allow
unobservable side effects to be removed from the effect
annotations. Although the paper presents a masking rule, that is not
included in the system for which preservation and progress has been
shown. Much of the paper presents a number of plausible-looking
optimising tranformations based on the extended type system, but this
is done via examples, rather than by carefully specifying the general
pattern of the transformations, and there are no proofs at all that
any of these rewrites preserve observational equivalence. That might
be acceptable in a more engineering-focused paper on practical
compiler implementation (of the sort that contains performance
numbers), but for a metatheory paper, it's well behind the state of
the art. The author is commendably clear and upfront about the
limitations of what has been proved, but I think the theory would need
to be properly developed to justify journal publication.

----------------------------------------------------------------------

Referee: 3
Comments to the Author

The paper presents a region-based type theory for polymorphism over
mutability, purity and laziness, and alias analysis.  It's hard to
know what to make of this work.  I don't see anything wrong with it,
although I caution that the paper contains no proofs, or even the
complete statement of the theorem (since some key definitions are
relegated to an appendix, which I haven't seen).  However, the
work's motivation isn't clear, and the language contains a raft of
strange design decisions.

In the first place, it's not clear what the role of the language is.
At the outset it seems as though the language is for programmers,
opening "Suppose we are writing a library that provides a useful data
structure".  But the language doesn't seem appropriate for
programmers; it involves numerous complication to do even simple
tasks.  At the end of the paper, it gives the impression that the
language is intended as an internal language for a compiler.  It seems
more appropriate for such a role, but then it's not clear why
polymorphism over mutability would be valuable.  Moreover, the
language hides the real-world cost of laziness, so there is a limit to
how low-level it can be.

It's also not clear why use a region-based type system in the first
place.  Regions don't seem necessary for the paper's avowed goal of
polymorphism over mutability, and they don't play any role in purity.
The regions do seem necessary for the alias analysis, but the paper
presents the alias material as an afterthought.

The type theory combines eager and lazy evaluation in an unusual
manner.  Rather than using types to distinguish between eager and lazy
terms, the two are combined into the same type.  This weakens the
invariant provided by the type, so any operation must be able to
accept either a value or a suspension.  Consequently, an
implementation must use call-by-need, which would hurt performance.
Or so it would seem, if we assume that the suspend primitive actually
creates suspensions.  In the operational semantics (EvSuspendApp) it
appears that the suspension primitive doesn't suspend at all.

I'm not sure why the author decided to make the predicates (Const,
Mutable, Pure) kinds and their witnesses types.  The language would
have been simpler if propositions and the proofs thereof were their
own syntactic classes.

The language also would have been simpler if the predicates has taken
argument syntactically (ie, Const(r) instead of Const).  This would
have eliminated the need for kind application and the arrow superkind.

In addition, it is never explained why there is a distinction between
box and diamond kinds.  If that distinction were also eliminated, the
entire syntactic class of superkinds could be discarded.

The KiAll rule is very strange.  It allows the body of the quantifier
to be an arbitrary constructor, not necessarily a type.  I have no
idea what a non-type quantifier would mean.

To summarize, I think the paper would benefit greatly with a
discussion of the rationale behind the type system's design choices.

Some other comments:

On page 3 the paper describes the effects as a lattice.  It appears to
me to be a join semi-lattice.

The List example on page 4 is confusing.  At first glance it appears
as though the first argument is irrelevant, but I gather that the
first argument is taken to be the region in which datatype's values
are allocated.  It would be helpful if the author could devise a
notation that is less misleading.

The citation for LF on page 6 is unusual.  Usually one would cite the
LF paper (Harper, Honsell, Plotkin).

I'm surprised to see this on page 6: "To avoid problems with variable
capture we require all bound variables r in the initial program to be
unique."  I am very suspicious.  This trick rarely works and I'd be
surprised if it worked here.  The problem is the property is not
preserved under evaluation.  I also don't see the benefit to the
trick, as opposed to the usual strategy of considering
alpha-equivalent expressions identical.

The discussion of region deallocation on page 19 is a bit misleading.
The reader might get the impression that the problem is with
stack-like deallocation, rather than deallocation in general.  More to
the point, since reads from immutable regions are considered to
generate no effect, immutable regions cannot be deallocated.

I think the discussion at the bottom of page 19 of the r \not\in
fv(\tau) requirement misses the point.  If the requirement were
omitted, region variables could escape their scope.  Consequently, the
type system would fail to enjoy the regularity property: well-formed
terms could have ill-formed types.  The particular issue cited of a
term changing type as it evaluates is merely a symptom of an
non-regular type system.

On page 20 the papers says "Although TyObs is valid, we have omitted
it ... because our soundness proof does not cover it."  If the
soundness proof does not cover it, in what sense is it valid?