Scrap Your Reprinter

Back in 2013, Andrew Rice and I were doing some initial groundwork on how to build tools to help scientists write better code (e.g., with the help of refactoring tools and verification tools). We talked to a lot of scientists who wrote Fortran almost exclusively, so we started creating infrastructure for building tools to work on Fortran. This was the kernel of the CamFort project for which we got an EPSRC grant in 2015 (which is ongoing). The CamFort tool now has a couple of fairly well developed specification/verification features, and a few refactoring features. Early on, I started building everything in Haskell using the brilliant uniplate library, based on the Scrap Your Boilerplate [1] work. This helped us to get the tool off the ground quickly by utilising the power of datatype generic programming. Fairly quickly we hit upon an interesting problem with building refactoring tools: how do you output source code for a refactored AST whilst preserving all the original comments and white space? It is not enough just to pretty print the AST, unless your AST contains all the comments and layout information. Building a parser to capture all this information is extremely hard, and we use a parser generator which limits flexibility (but is really useful for a large grammar). Another approach is to output patch/edit information for the original source code, calculated from the AST.

In the end, I came up with a datatype generic algorithm which I call the reprinter. The reprinter takes the original source code and an updated AST (which contains location information) and maps them into a new piece of source code. Here is an illustration which I’ll briefly explain:

reprinter

Some source text (arithmetic code in prefix notation here) is parsed into an AST. The AST contains the “spans” of each syntactic fragment: the start position and end position in the original source code (for simplicity in this illustration, just the column number is represented). Some transformation/refactoring is applied next. In this case, the transformation rewrites redundant additions of 0, which happens in the node coming from source locations 10 to 16. The refactored node is marked in red. The reprinting then runs, stitching together the original source code with the updated source tree. A pretty printer is used to generate code for any new nodes, but all the original source text for the other nodes is preserved. The cool thing about this algorithm is that it is datatype generic: it works for any datatype, with some modest side conditions about storing source spans. The implementation uses the Scrap Your Zipper [2] library to do a context-dependent generic traversal of a datatype. In essence, the algorithm is similar to what one might do if you were to spit out edit information from an AST, then apply this to a piece of source text. But, the algorithm does this generically, and in a single simultaneous pass of the AST and the input source text.

I’ve always thought it was a cute and useful algorithm, which combined some cool techniques from functional programming.  As with all the “Scrap Your X” libraries it saves huge amounts of time and messing around, especially when your AST representation keeps changing (which it did/does for us). The algorithm is really useful anywhere you need to update human-written source code in a layout-preserving way; for example, IDEs and refactoring tools but also in interactive theorem provers and program synthesis tools, where you need to synthesise source text into some existing human-written code. This is one of the ways it is used in CamFort, where specifications are synthesised from code analysis data and then inserted as comments into user code.

This summer, I was fortunate enough to have the resources to hire several interns. One of the interns, Harry Clarke, (amongst other things) worked with me to tidy up the code for the reprinter, add some better interfaces, make it usable as a library for others, and write it all up. He presented the work at IFL 2017 and the pre-proceedings version of the paper is available. We are working on a post-proceedings version for December, so any comments gratefully appreciated.


[1] Lämmel, Ralf, and Simon Peyton Jones. Scrap your boilerplate: a practical design pattern for generic programming. Vol. 38. No. 3. ACM, 2003.

[2] Adams, Michael D. “Scrap your zippers: a generic zipper for heterogeneous types.” Proceedings of the 6th ACM SIGPLAN workshop on Generic programming. ACM, 2010.

Advertisements

FSCD day 4 – rough notes

(Blog posts for Day 1, Day 2, Day 3, Day 4 (half day))

I decided to take electronic notes at ICFP and FSCD (colocated) this year, and following the example of various people who put their conference notes online (which I’ve found useful), I thought I would attempt the same. However, there is a big caveat: my notes are going to be partial and may be incorrect; my apologies to the speakers for any mistakes.


(FSCD Keynote #3) Type systems for the relational verification of higher order programs, Marco Gaboardi

Relational properties R(X_1, X_2) \Rightarrow S(P(X_1), P(X_2)). For example, take R and S to be notions of program equivalence (equivalent inputs produce equivalent outputs). Another relation might be in information-flow security where relations R, S mean “these two programs are low-level equivalent” (low-security).
Another is differential privacy where R means two programs differ in one individual data point and  S(Y_!, Y_@) = Pr[Y_1] \leq e^\epsilon Pr[Y_2].

In relational cost analysis, we want to compute the difference in cost (the relative cost) between the two programs (may depend on the input and the underlying relation), e.g., cost(P)-cost(Q) \leq f(X_1, X_2, R, S) (giving an upper bound computed in terms of the inputs and underlying relations). This is useful for guaranteeing compile optimisations (not worse, or definitely better), ruling out side-channel attacks (i.e., the \textsf{cost}(e[v_1/x]) - \textsf{cost}(e[v_2/x]) = 0, such that different inputs do not yield different costs thus information about the inputs is not leaked).

Motivating example 1: find in a list of lists (2D data structure). We want to prove that one implementation of this algorithm is faster than the other (I’ve omitted the code that Marco showed us, which was expressed via a common higher-order function with two different parameter functions to give the two implementations).
Motivating example 2: prove a precise upper bound on the relative cost of insertion sort between two lists (different by n in length).

[Clarkson, Schneider ’08] – formalises the idea of ‘hyperproperties’. Properties are sets of traces, hyperproperties are sets of sets of traces. Relation verification is a set of pair of traces (a 2-property). They show how to reduce verification of hyperproperties into the verification of properties. Using these kind of results requires encodings, they do not reflect the relational nature of the property, and they do not reflect the connection of the relational reasoning to the programs syntax (see the talk from ICFP day 2, on A Relational Logic for Higher-order Programs). Lots of previous work (going back to Abadi’s System R).

Relational typing: \Gamma \vdash t_1 \approx t_2 : \tau. Talk about relational properties of the input and relational properties of the output. Usually, if we interpret to relations then we want to prove soundness as: (\bar{v_1}, \bar{v_2}) \in [\Gamma] \Rightarrow ([t_1[\bar{v_1}/\Gamma]], t_2[\bar{v_2}/Gamma]]) \in [\tau].\
How can we reason about the cost? Two approaches from here: (1) lightweight typing  (bottom up) extending type system with the necessary constructors, (2) heavyweight typing (top down) use a powerful type system and encode the needed constructions. (In the future, we want to find a balance between the two approaches).

Approach (1) (lightweight) called Relcost a relational refinement type-and-effect system. The idea is to take advantage of structural similarities between programs and inputs as much as possible. There will be two sorts of type (unary and relational). Typing judgments \Omega \vdash^U_L t : A where U is the upper bound and L is the lower bound execution cost, which are thought of as a unary effect. A relational judgement \Gamma \vdash t_1 \ominus t_2 \preceq D : \tau where D is the upper bound on the relative cost of the two programs. The types have an annotation function type (like in normal effect systems) \sigma \xrightarrow{U, L} \tau and data types for integers and lists have indices to express dependencies.

e.g. n : \textsf{int}, f : \textsf{int} \xrightarrow{k, k} \textsf{int} \vdash^{k+1}_1 \textsf{if} \, n \leq 0 \textsf{then} f \; n \textsf{else} \; 1 : \textsf{int}.
But if we know that n is known to be 5 this can be refined (with the type n : \textsf{int}[5] in the indices, which will change the bound to $\vdash^{k+1}_{k+1}$.

The relational types are a little different. There is still a latent effect on function types, but is now a single relative cost \sigma \xrightarrow{D} \tau.
Judgements are like the following, which says equal integers have zero cost: \Gamma \vdash n \ominus n \preceq 0 : \textsf{int}_r.

The unary cost and relational cost are connected as follows:
\dfrac{|\Gamma| \vdash^U_{-} t_1 : A \qquad |\Gamma| \vdash^{-}_L t_2 : A}{\Gamma \vdash t_1 \ominus t_2 \preceq U - L : \textsf{U} A}
Thus, we can drop into the unary cost to compute a worst and best run time individual, and then combine these into a relative cost.
There is a box modality which captures terms which have the same cost and allow a reseting therefore of the differential cost, e.g., if we have identical terms then:
\dfrac{\Gamma \vdash t \ominus t \preceq D : \tau \qquad \forall x . \Gamma(x) \sqsubseteq \Box \Gamma(x)}{\Gamma \vdash t \ominus t \preceq 0 : \Box \tau}.
The type \textsf{List}^\alpha{I} \tau captures lists which differ in less than \alpha positions. This interacts with the \Box type in order to capture the notion of differing numbers of elements in the list.
In the semantics, the relative cost is pushed inside the semantics (some step-indexed logical relations are used, details not shown).
Going back to the earlier example of searching nested lists, we see a typing of 1 \leq |\texttt{find1}| \leq 3n and 3n \leq |\texttt{find2}| \leq 4n meaning \texttt{find1} has a relatively smaller cost. Plugging this into the higher-order function and doing the type derivation again gives a looser bound.

Approach (2) is the heavyweight approach was presented earlier at ICFP (A Relational Logic for Higher-order Programs), the language RHOL (Relational Higher Order Logic). Has relational and unary judgments, where the relational rule is \Gamma \mid \Psi \vdash t_1 : \tau_1 \approx t_2 : \tau_2 \mid \theta which contains term variable assertions \Psi and relational assertion \theta. In the type system, some rules are synchronous (building up the related two terms with the same syntactic construct) or asynchronous (building up one of the terms).

HOL terms can be encoded into RHOL, and vice versa (a bit like the hyperproperty result of Clarkson/Schneider), but RHOL is much easier to work with for relational reasoning but also all the power of HOL can be embedded into RHOL.
The system R^C is a monadic meta language with a specific cost effect (with \textsf{cstep}_n(m) introducing a cost. Intuition: if a term m is closed and $m \Downarrow_n v$ then $m \cong \textsf{cstep}_n(v)$.We can define formulae in HOL which all us to reason about these explicit cost terms and their quantities: we can define what we need in the logic.

For computing a relative upper bound on the insertion sort, we want to prove \textsf{UnsortedDiff}(l_1, l_2, n) \Rightarrow \textsf{cost}(\textit{isort} \, l_1) - \textsf{cost}(\textit{isort} \, l_2) \leq n. Using the system , we can prove this property, using a suitable invariant/predicate over the input and output list(s).

Take home: type-based relational verification is a large research space that needs more work.

Q: what about recursion? In the first system, letrec is included straightforwardly, in the second system the assumption is terminating (general) recursion.
Q: are the costs tied to a specific evaluation strategy? Yes, but one can encode different ones in the system. (Both systems are parametric in the way you count cost). In the first system, this is part of the typing, in the second, this comes where you put monadic cost operations \textsf{cstep}.


(FSCD) Arrays and References in Resource Aware ML, Benjamin Lichtman, Jan Hoffmann

Resource-aware ML (RAML) – model resource usages of programs with cost semantics, e.g., implement Quicksort and it produce the bound 14n^2 + 19^n + c (I forgot what the constant c was). This work introduces references and arrays to enable analysis of programs where resource consumption depends on the data stored in mutable structures (references and arrays).

How does it work? Automatic amortized resource analysis. Assign “potential” functions to a data structures, which will pay the resource consumption cost, then hopefully there is some left over for later. This propagates backwards to give the overall upper bound cost.

Each variable in scope may contribute (carry) potential, e.g., recursive data types (such as x : \textsf{list}^2(\textsf{int}) that is it can contribute 2n units of potential where n is the length of the list.

Based on an affine type system (all variable used at most once). Subexpressions are bound to variables whenever possible (share-let normal form) in order to explicitly track when things are evaluated (via variables). When a variable carrying potential needs to be shared, the potential can be explicitly split share x as (x1, x2) in e in order to use a variable more than once (gives you contraction). This splits the potential of x across the two variables. Benjamin walked us through a nice example with appending to a list.

Now let’s add references.

g l = 
  let t = ref l in 
  share r as (r1, r2) in  -- r2 : L^1 (int)
  let _ = h r1 in   -- h : (L^q int) ref -> int  
                    -- (need to make the type of this stronger)
  append (!r2, []) -- append : L^1(int) * L^0(int) -> L^0(int)

One approach is to carry around extra information about possible effects, but they want to keep effects out of the type system because they seem them as cumbersome and difficult to use. Instead, strengthen contracts by require that potential annotations of data inside references are fixed.

g l = 
  let t = ref l in 
  share r as (r1, r2) in  -- the restricted h means r1 and r2 have 
                          -- to have the same type
  let _ = h r1 in   -- h : (L^1 int) ref -> int  
  append (!r2, [])

Introduce a swap command as the only way to update a cell, that behaves like:

swap (r, l) = let x = !r in (let _ = r := l in x)

Requires that data placed into a cell has the same type as data being taken out, ensures that potential annotations of mutable cells never change (the only way to update a cell). Thus, swap is the well-typed version of dereference in situations where you need to work with potential. Thus, require the previous program to use swap on r2 with an empty list instead of a direct dereference in the last line.

The paper has a detailed example of depth-first search (stateful) which shows more of the details related to how to work with state and potentials. This also utilises an idea of “wrapper” types.

Introduce a notion of “memory typing”, which tracks all memory locations pointed to by mutual references, this is require to prove the soundness of the cost analysis (based on the free potential of the system, the potential of the context, and the potential of each mutable memory reference).

Mutable arrays were then a natural extension from this basis, with an array-swap operation for updating an array cell.

In the future, they’d like to create some syntactic sugar so that the programming style is not affected as much by having to instead swap, aswap, and wrapper types. Would like to investigate how fixed potential annotations restrict the inhabitants of a type (Yes, yes, this is very interesting!)

I asked about what happens if you write down Landin’s Knot to get recursion via higher-order state: Benjamin answered that functions don’t get given potential so storing and updating functions in a reference would type correctly. I still wonder if this would let me get the correct potential bounds for a recursive algorithm (like DFS) if the recursion was implemented via higher-order state.


(FSCD) The Complexity of Principal Inhabitation, Andrej Dudenhefner, Jakob Rehof

In the simply-typed lambda calculus (STLC), with principle types. We say \tau is the principle type of M if \vdash M : \tau and for all types \sigma such that M : \sigma there exists a substitution S such that S(\tau) = \sigma.

Definition: (Normal Principal Inhabitant), We say that a term M in beta-normal form is a normal principal inhabitant of a type t, if t is the principal type of M.

This work shows that principal inhabitation for STLC is PSPACE-complete (in 1979 it was shown that inhabitation (not considering) principality is PSPACE-complete).  Thus, this work seeks to answer the following: since principality is a global property of derivations, does it increase the complexity of inhabitation? This is also practically interesting for type-based program synthesis, since a normal principal inhabitant of a type is a natural implementation of the type.

For example, a \rightarrow a \rightarrow a is inhabited by the K combinator K = \lambda x . \lambda y .  x but is not principally inhabited (I haven’t understood why yet).

The proof is based on the subformula calculus. The subformula calculus provides paths to index types (as trees) which are strings of either 1 or 2.  This is useful for reasonable about subformulae, where the usual STLC rules are re-expressed in terms of type paths, e.g., \dfrac{\Gamma, x : 1\pi \vdash e : 2\pi}{\Gamma \vdash \lambda x . e : \pi}. Define relations on paths, R_M as the minimal equivalence relation for a beta-normal term M (capture constraints on subformulae of terms) and R_\tau  which defines an equivalence relation on paths of paths with common subpaths (capture constraints on subformulae on types).

(I had to step out, so missed the end).


(FSCD) Types as Resources for Classical Natural Deduction, Delia Kesner, Pierre Vial

Quantitative types seen as resources (can’t be duplicated); they provide simple arithmetical arguments to prove operational equivalences. These ideas are extended to classical logic in this talk.

In simple types, typability implies normalisation. With intersection types this is an equivalence, i.e., normalising also implies typability. In an intersection type system, a variable can be assigned several types, e.g. x : A \wedge B \wedge C \wedge B (note the two occurrences of B) where intersection is associative, commutative and (possibly) idempotent. In 1994, Gardner introduces a system which is not idempotent, where types are now multisets (rather than sets in the earlier formulations). This has the flavour of linear logic resources (my interpretation: the types can then capture how many times a variable is used (quantitative) if we contraction takes the intersection of the two types and intersection is non-idempotent, e.g., x_1 : A \wedge B, x_2 : B \wedge C \leadsto x : A \wedge B \wedge B \wedge C).

In the literature, we see how to build a computational interpretation of classical natural deduction: Intuitionistic Logic + Peirce’s law gives classical logic and Felleisen’s call-cc operator gives the computational interpretation. Relatedly, the lambda_mu calculus (Parigot 92) give a direct interpretation of classical natural deduction.

Types are strict where intersection can only appear on the left-hand side of a function arrow.

Interestingly, you can have a situation where the type of a lambda abstraction has an empty intersection of types in its source type. In order to get normalisation in this context, the system was extended a bit to deal with this (I didn’t capture this part, see paper). The full system had rules to ensure this didn’t happen.

A notion of “weighted” subject reduction is defined, where the size of a derivation tree is strictly decreasing during reduction.

ICFP / FSCD day 3 – rough notes

(Blog posts for Day 1, Day 2, Day 3, Day 4 (half day))

I decided to take electronic notes at ICFP and FSCD (colocated) this year, and following the example of various people who put their conference notes online (which I’ve found useful), I thought I would attempt the same. However, there is a big caveat: my notes are going to be partial and may be incorrect; my apologies to the speakers for any mistakes.


(ICFP) – A Specification for Dependent Types in Haskell, Stephanie Weirich, Antoine Voizard, Pedro Henrique Avezedo de Amorim, Richard A. Eisenberg

Haskell already has dependent types! ish! (ICFP’14) But singletons are “gross”! (mock discussion between Stephanie and Richard). This work describes a semantics for dependent types in Haskell, along with a replacement for GHC’s internal (Core) language, along with the high-level type theory and fully mechanized metatheory.

But isn’t Haskell already dependently typed? Cf Haskell indexed type for vectors. The following cannot be written in GHC at the moment:

vreplicate :: Pi(n :: Nat) -> Int -> Vec n
vreplicate Zero _ = Nil
vreplicate (Succ n) a = Cons a (vreplicate n a)

The reason this can’t be typed is due to n being erased (after compilation, i.e., it doesn’t appear in the run time) but it is needed dynamically in the first argument. In Dependent Haskell this can be done.

But… it’s not your usual system. For example, what about infinite data?

inf :: Nat
inf = S inf
vinf :: Vec inf
vinf = Cons 2 vinf  -- yields equality demand Vec (S inf) = Vec inf

Idealized Haskell (IH) = System F_omega + GADTs (has uncediable type checking!)

System FC (GHC Core) is a reified version of Idealized Haskell with enough annotations so that type checking is decidable, trivial, and syntax directed. Terms in Core can be see as type derivations of IH terms: contains type annotations and type equalities. (These can all be erased to get back into IH.) The first contribution of this work is a dependent System FC (DC). But its very complicated (needs also type soundness (progress/preservation), erasable coercions, ).
Instead, they introduce System D, the dependently typed analog of Idealized Haskell: its a stripped down dependent Haskell in Curry style (implicit types) and similarly to IH its type checking is undecidable. It has dependent types, non-termination, and coercion abstraction.

Coercion abstraction: for example

g : Int -> Int |- \v -> \c -> \v -> v : Pi n . (g n ~ 7) -> Vec (g n) -> Vec 7

Here the variable v (which has type Vec (g n) when passed in) is implicitly cast using the coercion c (which is passed in as a parameter) to Vec 7. This is translated into DC using an explicit coercion operator, applying c to coerce v.

System D and System DC were mechanized (in Coq) during the design using Ott (to input the rules of the language) & LNgen (for generating lemmas). Currently working on dependent pattern matching, a full implementation in GHC, and roles. (Hence, this is the specification of the type system so far).

Q: type inference for Dependent Haskell?
A: Not sure yet (but see Richard Eisenberg’s thesis)
Q: How do you get decidable type checking in the presence of infinite terms?
A: I didn’t understand the answer (will followup later).
Q: Why is coercion abstraction explicit in D while coercion application is implicit?
A: “Coercion application *is* explicit in D, it’s the use of coercion to cast a type that is implicit, but if you were to apply it to a value it would be explicit.”


(ICFP) – Parametric Quantifiers for Dependent Type Theory Andreas Nuyts, Andrea Vezzosi, Dominique Devriese

A type variable is parameteric if is only used for type-checking (free well-behavedness theorems): i.e., can’t be inspected by pattern matching so you have the same algorithm on all types, e.g. flatten : forall X . Tree X -> List X. Parametericity is well studied in the System F world. This work looks at parametericity in dependent type theory.; some results carry over, some can be proved internally, but some are lost. This work formulates a sound dependent type system ParamDTT which .

Parametricity gives representation independence (in System F):
A \rightarrow B \cong (\forall X . (X \rightarrow A) \rightarrow (X \rightarrow B)
(This result follows by parametricity, using the result that g : \forall X . (X \rightarrow A) \rightarrow (X \rightarrow B) implies g \, X_0 \, r_0 \, x\, = g \, A \, \textit{id} \, (r_0 \; x_0) which can be provided by relational parametricity, see Reynolds “related things map to related things”, applies identity extension lemma).

Can we do the same thing for DTT and can we prove the result internally?
\Pi is however not parametric. For example if we convert to \Pi (X : U) . (X \rightarrow A) \rightarrow (X \rightarrow B). Suppose $B = U$ then we can leak details of the implementation (rerepresentation type is returned as data) \textit{leak} \; X \; r \; x = X (has this type). This violates the identity extension lemma used in the above proof for System F.

So instead, add a parametric quantifier to DTT to regain representation independence, making the above \textit{leak} ill typed. The proof of parametricity for g : \forall X . (X \rightarrow A) \rightarrow (X \rightarrow B) can  now be proved internally. This uses the relational interval type 0 -- 1 : \mathbb{I} (cf Bernardy, Coquand, and Moulin 2015, on bridge/path-based proofs) which gives a basis for proving the core idea of “related things map to related things” where 0 is connected with the type X (in the above type) and 1 is connected to the type A via a switching term / r/ : \mathbb{I} \rightarrow U (see paper for the proof) (I think this is analogous to setting up a relation between X and A in the usual relational parametricity proof).

They extended Agda with support for this.


(ICFP) – Normalization by Evaluation for Sized Dependent Types
Andreas Abel, Andrea Vezzosi, Theo Winterhalter

(context, DTT a la Martin-Lof, and you can add subtpying to this, where definitional equality implies subtyping). Definitional equality is decideable (for the purposes of type checking), but the bigger the better (want to be able to know as many things equal as possible).

Termination is needed for consistency, general fix gives you inconsistency. Instead, you can used data types indexed by tree height, where \textsf{Nat}^i = \{ n \mid n < i\}, you can define \textsf{fix} : (\forall i \rightarrow (\textsf{Nat}^{i} \rightarrow C) \rightarrow (\textsf{Nat}^{i+1} \rightarrow C)) \rightarrow \forall j \rightarrow \textsf{Nat}^j \rightarrow C.

However, size expressions are not unique, which is problematic for proofs. e.g., suc i : \textsf{Nat}^i \rightarrow \textsf{Nat}^\infty but also suc \infty : \textsf{Nat}^i \rightarrow \textsf{Nat}^\infty. Intuition: sizes are irrelevant in terms, but relevant in types only.

Andreas set up some standard definitions of inductive Nat then tried to define a Euclidean division on Nat (where monus is minus cutting off at 0).

div : Nat → Nat → Nat
div zero y = zero
div (suc x) y = {! suc (div (monus x y) y) !}

However, Agda fails to termination check this. The solution is to “size type”-up everything, i.e., redefine the usual inductive Nat to have a size index:

data Nat : Size → Set where
 zero : ∀ i → Nat (↑ i)
 suc : ∀ i → Nat i → Nat (↑ i)

All the other definitions were given sized type parameters (propogated through) and then div was redefined (type checking to):

div : ∀ i → Nat i → Nat ∞ → Nat i
div .(↑ i) (zero i) y = zero i
div .(↑ i) (suc i x) y = suc _ (div i (monus i _ x y) y)

So far this is the usual technique. However, now we have difficulty proving a lemma (which was straightforward before sizing that minus of x with itself gives 0. Now this looks like:

monus-diag : ∀ i (x : Nat i) → Eq ∞ (monus i i x x) (zero ∞)

In the case: monus-diag .(↑ i) (suc i x) = monus-diag i x Agda gets the error i != ↑ i of type Size.

In the Church-style, dependent type functions = polymorphic functions, so you can’t have irrelevant arguments.  In Curry-style (with forall) this is okay. (see the previous talk, this could be a possible alternate solution?)

Instead, we want “Churry”-style where size arguments are used for type checking but can be ignored during equality checking: thus we want typing rules for irrelevant sizes. This was constructed via a special “irrelevant” modality \bullet, where sizes uses as indexes can be marked as irrelevant in the context. Here is one of the rules for type formation when abstracting over an irrelevantly typed argument:

\dfrac{\Gamma \vdash A : Type \quad \Gamma, \bullet \bullet x : A \vdash B : \textsf{type}}{\Gamma \vdash (x : \bullet A) \rightarrow B : \textsf{type}}

This seemed to work really nicely (I couldn’t get all the details down, but the typing rules were nice and clean and made a lot of sense as a way of adding irrelevancy).


(ICFP) – A Metaprogramming Framework for Formal Verification
Gabriel Ebner, Sebastian Ullrich, Jared Roesch, Jeremy Avigad, Leonardo De Moura

(Context: Lean is a dependently-typed theorem prover that aims to bridge the gap between interactive and automatic theorem proving. It has a very small trusted kernel (no pattern matching and termination checking), based on Calculus of Inductive Constructions + proof irrelevance + quotient types.)

They want to extend Lean with tactics, but its written in C++ (most tactic approaches are defined internally, cf Coq). Goal: to extend Lean using Lean by making Lean a reflective metaprogramming language (a bit like Idris) does, in order to build and run tactics within Lean. Do this by exposing Lean internals to Lean (Unification, type inference, type class resolution, etc.). Also needed an efficient evaluator to run meta programs.

lemma simple (p q : Prop) (hp : p) (hq : q) : q := 
by assumption

Assumption is a tactic, a meta program in the tactic monad:

meta def assumption : tactic unit := do
  ctx <- local_context,
  t   <- target
  h   <- find t ctx,
  exact h

(seems like a really nice DSL for building the tactics). Terms are reflected so there is a meta-level inductive definition of Lean terms, as well as quote and unquote primitives built in, but with a shallow (constant-time) reflection and reification mechanism.

The paper shows an example that defines a robust simplifier that is only about 40 locs (applying a congruence closure). They built a bunch of other things, including a bigger prover (~3000 loc) as well as command-line VM debugger as a meta program (~360 loc).

Next need a DSL for actually working with tactics (not just defining them). Initially quite ugly (with lots of quoting/unquoting). So to make things cleaner they let the tactics define their own parsers (which can then hook into the reflection mechanism). They then reused this to allow user-defined notations.


(FSCD) A Fibrational Framework for Substructural and Modal Logics, Dan Licata, Michael Shulman, Mitchell Riley

Background/context: model logic core \dfrac{\emptyset \vdash A}{\Gamma \vdash \Box A} for necessity (true without any hypotheses) \dfrac{\Gamma \vdash \Diamond A}{\Diamond \Gamma \vdash \Diamond B} (if something A is possible true then possibility can propagate to the context– doesn’t change the “possibleness”). Various intuitionistic substructural and modal logics/type systems: linear/affine, relevant, ordered, bunched (separation logic), coeffects, etc.  Cohesive HoTT: take dependent type theory and add modalities (int, sharp, flat). S-Cohesion (Finster, Licata, Morehouse, Riley: a comonad and monad that themselves are adjoint, ends up with two kind of products which the modality maps between). Motivation: what are the common patterns in substructural and modal logics? How do we construct these things more systematically from a a small basic calculus.

With S4 modality (in sequent calculus), its common to make two contexts, one of modality formulae and one of normal, e.g., \Gamma; \Delta \vdash A which is modelled by \Box \Gamma \times \Delta \rightarrow A. In the sequent calculus style,  use the notion of context to form the type. Linear logic ! is similar (but with no weakening). The rules for \otimes follow the same kind of pattern: e.g., \dfrac{\Gamma, A, B \vdash C}{\Gamma, A\otimes B \vdash C} where the context “decays” into the logical operator (rather than being literally equivalent): the types inherent the properties of the context (cf. also exchange).

The general pattern: operations on contexts, with explicit or admissible structure properties, then have a type constructor (an operator) that internalises the implementation and inherits the properties. This paper provides a framework for doing this, and abstracting the common aspects of many intuitionistic substructural and modal logicas based on a sequent calculus (has cut elimination for all its connectives, equational theory, and categorical semantics).

Modal operators are decomposed in the same way as adjoint linear logic (Benton&Wadler’94), see also Atkey’s lambda-calculus for resource logic (2004) and Reed (2009).

So.. how? Sequents are completely Cartesian but are annotated with a description of the context and its properties/substructurality by a first order term (a context descriptor) \Gamma \vdash_a A where a is this context descriptor. Context descriptors are built on “modes”. e.g., a: A, b : B, c : C, d : D \vdash_{(a \otimes b) \otimes (c \otimes d)} X the meaning of this sequent depends on the equations/inequations on the context descriptor:, e.g.,if you have associativity of \otimes you get ordered logic, if you have contraction a \Rightarrow a \otimes a then you get a relevant logic, etc. Bunched implication will have two kinds of nodes in the context descriptor. For a modality, unary function symbols, e.g. \textbf{r}(a) \otimes b means that the left part of the context is wrapped by a modality.

A subtlety of this is “weakening over weakening”, e.g., \dfrac{\Gamma \vdash_a B}{\Gamma, x : A \vdash_a B} where we have weakend in the sequent calculus but the context descriptor is unchanged (i.e., we can’t use x : A to prove B).

The F types give you a product structured according to a context descriptor, e.g. A \otimes B := F_{x \otimes y} (x : A, y : B). All of the left rules become instances of one rule on F: \dfrac{\Gamma, \Delta \vdash_{\beta[\alpha / x]} B}{\Gamma, x : F_\alpha(\Delta) \vdash B} which gives you all your left-rules (a related dual rule gives the right sequent rules).

Categorical semantics is based on a cartesian 2-multicategory (see paper!)
Q: is this like display logics? Yes in the sense of “punctuating” the shape of the context, but without the “display” (I don’t know what this means though).

This is really beautiful stuff. It looks like it could extend okay with graded modalities for coeffects, but I’ll have to go away and try it myself.


(FSCD) – Dinaturality between syntax and semantics, Paolo Pistone

Multivariate functions F X X and G Y Y where the first parameter is contravariant and second is covariant may have dinatural transformations between them. (Dinaturality is weaker than naturality). Yields a family of maps \theta_X \in Hom_C(F X X, G X X) which separate the contravariant and covariant actions in the coherence condition (see Wiki for the hexagonal diagram which is the generalisation of the usual naturality property into the dinatural setting).

A standard approach in categorical semantics is to interpret open formula by multivariant functor F X Y = X \rightarrow Y and proofs/terms by dinautral transformations.  Due to Girard, Scedrov, Scott 1992 there is the theorem that: if M is closed and \vdash M : \sigma then M is dinatural. The aim of the paper is to prove the converse result: if M is closed and \beta\eta-normal and (syntactically) dinatural then \vdash M : \sigma (i.e., implies typability).

Syntactic dinaturality examples where shown (lambda terms which exhibit the dinaturality property). Imagine a transformation on a Church integer (X \rightarrow X) \xrightarrow{h} (X \rightarrow X)– this is a dinatural transformation and its dinaturality hexagon is representable by lambda terms. (I didn’t get a chance to grok all the details here and serialise them).

Typability of a lambda term was characterised by its tree structure and properties of its shape and size (e.g., \lambda x_1 . \ldots \lambda x_n . M : \sigma_1 \rightarrow \ldots \sigma_h \rightarrow \tau then $n \leq h$, there are various others; see paper).

The theorem then extends from typability to parametricity (implies that the syntactically dinatural terms are parametric) when the type is “interpretable” (see paper for definition).

(sorry my notes don’t go further as I got a bit lost, entirely my fault, I was flagging after lunch).


(FSCD) – Models of Type Theory Based on Moore Paths, Andrew Pitts, Ian Orton

In MLTT (Martin-Lof Type Theory), univalence is an extensional property (e.g., (\forall x . f x = g x)  \rightarrow (f = g)) of types in a universe U: given X, Y : U every identification p : Id_U X Y induces an (Id)-isomorphism X \cong Y. U is univalent “if all Id-isomorphisms X \cong Y is U are induced by some identification P : Id_U X Y” (want this expressed in type theory) (this is a simpler restating of Voevodsky’s original definition due to Licata, Shulman, et al.). This paper: towards answering, are there models for usual type theory which contain such univalent universes?

Univalence is inconsistent with extensional type theory where p : Id_A X Y implies x = y \wedge p = \textsf{refl}. Univalence is telling you that the notion of identification is richer subsuming ETT where there is an identification between two types then they are already equal, which is much smaller/thinner.

Need a source of models of “intensional” identification types (i.e., not the extensional ones ^^^): homotopy type theory provides a source of such models via paths: p : Id_A x y are paths from point x to point y  in a space A where \textsf{refl}_A is a “constant path” (transporting along such a path does nothing to the element).

So we need to find some structure Path_A x y for these identifications, but we need to restrict to families of type (B \; x \mid x : A) carrying structure to guarantee substitution operations (in this DTT context).  “Holes” is diagrams are filled using path reversals and path composition. Solution is “Moore” paths.

A Moore path from x to y in a topological space X is specified by a shape (its length) |p| in \mathbb{R}_+ and an extent function p@ : \mathbb{R}_+ \rightarrow X satisfying p@ 0 = x and (\forall i \geq |p|) . p@i = y. These paths have a reversal operator by truncated subtraction \textit{rev}(p@i) = p@(|p| - i). (note this is idempotent on constant paths as required, allowing holes in square paths to be filled (I had to omit the details, too many diagrams)). But topological spaces don’t give us a model of MLTT, so they replace it by an arbitrary topos (modelling extensional type theory) and instead of \mathbb{R} for path length/shape use any totally ordered commutative ring. All the definition/properties of Moore paths are the expressed in the internal language of the topos.  This successfully gives you a category with families (see Dybjer) modelling intensional Martin-Lof type theory (with Pi). This also satisfies dependent functional extensionality thus its probably a good model as univalence should imply extensionality. But haven’t yet got a proof that this does contain a univalence universe (todo).

Q: can it just be done with semirings? Yes.


(ICFP) – Theorems for Free for Free: Parametricity, With and Without Types, Amal Ahmed, Dustin Jamner, Jeremy G. Siek, Philip Wadler

Want gradual typing AND parametricity. There has been various work integrating polymorphic types and gradual typing (with blame) but so far there have been no (published) proofs of parametricity. This paper proposes a variant of the polymorphic blame calculus \lambda B [Ahmed, Findler, Siek, Wadler POPL’11] (fixed a few problems) and proves parametricity for it.

Even dynamically typed code cast to universal type should behave parametrically., e.g. consider the increment function which is typed dynamically and cast to a parametric type \lambda x . x + 1 : \ast \Rightarrow^p \forall X . X \rightarrow X) [int] 4 returns 5 but we want to raise a “blame” for the coercion p here (coercing a monomorphic to polymorphic type is bad!).

The language has coercions and blame, and values can be tagged with a coercion (same for function values). Some rules:
(v : G \Rightarrow^p \ast \Rightarrow^q G) \mapsto v  (that is, casts to the dynamic type and back again cancel out).
But, if you cast to a different type (tag) then blame pops out: (v : G \Rightarrow^p \ast \Rightarrow^q G') \mapsto \textsf{blame} p (or maybe that was blame q?)

add = \Lambda X . \lambda (x : X) . 3 + (x : X \Rightarrow^p \ast \Rightarrow^q \textsf{int}) this has a type that makes it look like a constant function \lambda X . X \rightarrow \textsf{int}. When executing this we want to raise blame rather than actually produce a value (e.g. for add 3 [\textsf{int}] 2). Solution: use runtime type generation rather than substitution for type variables.

A type-name store \Sigma provides a dictionary mapping type variables to types.
\Sigma \rhd (\Lambda X . v) [B] \mapsto \Sigma, \alpha := B \rhd v[\alpha/X] i.e., instead of substituting the B in the application, create a fresh type name \alpha that is then mapped to B in the store. Then we extend this with types and coercion: \Sigma \rhd (\Lambda X . v) [B] \mapsto \Sigma, \alpha := B \rhd v[\alpha/X] : A[\alpha/X] \Rightarrow^{+\alpha} A[B/X] where +\alpha (I think!) means replace occurrences of B with \alpha.

Casts out a polymorphic type are instantiated with the dynamic type. Casts into polymorphic types are delayed like function casts.

Parametricity is proved via step-indexed Kripke logical relations. The set of worlds W are tuples of the type-name stores, and relations about when concealed values should be related.  See paper: also shows soundness of the logical relations wrt. contextual equivalence. There was some cool stuff with “concealing” that I missed really.


(ICFP) – On Polymorphic Gradual Typing, Yuu Igarashi, Taro Sekiyama, Atsushi Igarashi

Want to smoothly integrate polymorphism/gradual. e.g., apply dynamically-typed arguments into polmorphically typed parameters, and vice versa. Created two systems: System Fg (polymorphic gradually typed lambda cal) which translates to System Fc.

Want conservativity: A non-dynamically typed term in System F should be convertible into System Fg, and vice versa.  The following rule shows the gradual typing “consistency” rule for application (in System Fg and gradual lambda calculus):
\dfrac{\Gamma \vdash s : T_{11} \rightarrow T_{12} \quad \Gamma \vdash t : T_1 \quad T_1 \sim T_{11}}{\Gamma \vdash s \; t : T_2} where \sim defines type consistency. Idea: extend this to include notions of consistency between polymorphic and dynamic types. An initial attempt breaks conservativity since there are ill-typed System F terms which are well-typed in System Fg then.

The gradual guarantee conjecture: coercions without changes to the operational semantics.

They introduce “type precision” (which I think replaces consistency) which is not symmetric where A \sqsubseteq \astI didn’t yet get the importance of this asymmetry). But there are some issues again… a solution is to do fresh name generation to prevent non-parametric behaviour (Ahmed et al. ’11, ’17 the last talk) But making types less precise still changes behaviour (create blame exceptions). Their approach to fix this is to restrict the term precision relation \sqsubseteq so that it can not be used inside of \Lambda (term level) and \forall (type level).
Still need to prove the gradual guarantee for this new approach (and adapt the parametricity results from Ahmed et al. ’17).


(ICFP) – Constrained Type Families, J. Garrett Morris, Richard A. Eisenberg

(Recall type families = type-level functions)
Contribution 1: Discovery: GHC assumes that all type families are total
Contribution 2: First type-safety proof with non-termination and non-linear patterns
Contribution 2.5 (on the way): simplified metatheory
The results are applicable to any language with partiality and type-level functions (e.g. Scala)
(Haskellers leave no feature unabused!)

-- Example type family:

type family Pred n
type instance Pred (S n) = n
type instance Pred Z = Z

Can define a closed version:

type family Pred n where
            Pred (S n) = n
            Pred Z = Z

Definition: A ground type has no type families.
Definition: A total type family, when applied to ground types, always produces a ground type.

In the new proposal: all partial type families must be associated type families.

type family F a                 -- with no instances
x = fst (5, undefined :: F Int) -- is typeable, but there are no instance of F!

Instead, put it into a class:

class CF a where
    type F a
x = fst (5, undefined :: F Int) -- now get error "No instance for (CF Int)"

Can use closed type families for type equality.

type family EqT a b where
       EqT a a = Char
       EqT a b = Bool
f :: a -> EqT a (Maybe a) -- This currently fails to type check but it should work
f _ = False

Another example:

type family Maybes a 
type instance Maybes a = Maybe (Maybes a)
justs = Just justs -- GHC says we can't have an infinite type a ~ Maybe a
-- But we clearly do with this type family!

Why not just ban ​Maybes? Sometimes we need recursive type families.

By putting (partial) type classes inside a class means we are giving a constraint that explains there should be a type at the bottom of (once we’ve evaluated) a type family. Therefore we escape the totality trap and avoid the use of bogus types.

Total type families need not be associated. Currently GHC has a termination checker, but it is quite weak, so often people turn it off with UndecideableInstances.
But, wrinkle: backwards compatibility (see paper) (lots of partial type families that are not associated).

Forwards compatibility: dependent types, terminating checking, is Girard’s paradox encodable?

This work also lets us simplify injective type family constraints (this is good, hard to understand at the moment). It makes closed type families more powerful (also great).


(ICFP) – Automating Sized-Type Inference for Complexity Analysis, Martin Avanzini, Ugo Dal Lago

(for higher-order functional programs automatically). Instrument a program with a step counter by turning them into state monad which gives a dynamic count.
Sized typed (e.g., sized vectors), let you type ‘reverse’ of a list as \forall i . L_i \, a \rightarrow L_i \, a. But inference is highly non-trivial.

Novel sized-type system with polymorphic recursion and arbitrary rank polymorphism on sizes with inference procedure for the size types. Constructor definitions define size measure, e.g., for lists of nats:  \textsf{nil} : L_1 and \textsf{cons} : \forall i j . N_i \rightarrow L_j \rightarrow L_{i + j + 1} (here the natural number has a size too).
Can generalise size types in the application rules, (abs, app, and var rules deal with abstraction and instantiation of size parameters).

Theorem (soundness of sizes): (informally) the sizes attached to the a well-typed program gives you a bound on the number of execution steps that actually happen dynamically (via the state encoding). [also implies termination].

Inference step 1: add template annotations, e.g., \textit{reverse} : \forall k . L_k \rightarrow L_{g(k)}. Then we need to find a concrete interpretation of g.
Step 2: constraint generation, which arise from subtyping which approximate sizes, plus the constructor sizes. These collect together to constraint the meta variables introduced in step 1.
Step 3: constraint solving: replace meta variable by Skolem terms. Then use an SMT solver (the constraints are non-linear integer constraints). (from a question) The solution may not be affine (but if it is then it is easier to find).

The system does well at inferring the types of lots of standard higher-order functions with lists. But cross product on lists produces a type which is too polymorphic: with a little more annotation you can give it a useful type.
Q: What happens with quicksort? Unfortunately it can’t be done in the current system. Insertion sort is fine though.
Q: What about its relation to RAML? The method is quite different (RAML uses an amortized analysis). The examples that each can handle are quite different.

(ICFP) – Inferring Scope through Syntactic Sugar, Justin Pombrio, Shriram Krishnamurthi, Mitchell Wand

[Github]

Lots of languages use syntactic sugar, but then its hard to understand what the scoping rules are for these syntactic sweets.

E.g. list comprehension [e | p <- L] desugars to map (\p . e) L. The approach in this work is a procedure of scope inference which takes these rewrite patterns (where e and p and L are variables/parameters). We know the scoping rules for the core language (i.e., the binding on the right) [which parameterises the scope inference algorithm]. Desugaring shouldn’t change scope (i.e., if there is path between two variables then it should be preserved). So the scope structure is homomorphic to the desugared version, but smaller.  (A scope preorder was defined, I didn’t get to write down the details).

Theorem: if resugaring succeeds, then desugaring will preserve scope.

Evaluation: the techniques was tested on Pyret (educational language), Haskell (list comprehensions), and Scheme (all of the binding constructs in R5RS). It worked for everything apart from ‘do’ in R5RS.

ICFP / FSCD day 2 – rough notes

(Blog posts for Day 1, Day 2, Day 3, Day 4 (half day))

I decided to take electronic notes at ICFP and FSCD (colocated) this year, and following the example of various people who put their conference notes online (which I’ve found useful), I thought I would attempt the same. However, there is a big caveat: my notes are going to be partial and may be incorrect; my apologies to the speakers for any mistakes.


(ICFP Keynote #2) – Challenges in Assuring AI, John Launchbury

The moment we think we’ve got our (verification) hands round the world’s systems (e.g., compilers), a whole new set of systems appear: AI.

The dual of “have I tested enough?” is “how good is my model?” (in a verification-based approach). What is the model we have for AI systems?

What is intelligence? Information processing. But there are different kinds of information processing ability; asking if a system is ‘intelligent’ is too coarse/binary. John breaks this down into ‘perceiving’ P, ‘learning’ L, ‘abstracting’ A (create new meanings), and ‘reasoning’ R (plan and decide).

AI wave 1 : rule-based, hand-crafted knowledge: P 1, L 0, A 0, R 3 (poor handling of uncertainty, no learning). Type systems are a little like this (perceiving, didn’t have to tell it everything). Can do some pretty amazing stuff: see smart security systems that analysis code and patch bugs.

AI wave 2: statistical learning. Specific problem domains, train them on big data. Still requires a lot of engineering work (to get the right models and statistical techniques for the task). P 3, L 3, A 1, R 1 (less good at reasoning).

What does it mean to prove a vision system correct?

Manifold hypothesisdata coming in is high-dimensional but the features you are interested in form lower-dimensional structures. E.g., with cars, lots of data comes in but understanding the data means separating the manifolds (of low dimensionality).
Another example, a 28×28 image is a 784 dimensional space (as in 784 numbers, 784-length vector). Variation in handwritten digits form 10 distinct manifolds within the 784 dimensional space. High-degree of coherence between all data samples for a particular digit. (“Manifold” here may be slightly more informally used than the full mathematical definition).

Imagine two interlocking spiral arms which are really 1-dimensional manifolds in a 2-dimensional space. By continuous transforms on the 2-dimensional space these can morphed into linearly separable components. Sometimes stretching into a new dimension enables enclosed manifolds to be separated.

Neural nets separate manifolds; each layer of a neural network stretches and squashes the data space until the manifolds are cleanly separated. Non-linear functions at each step (otherwise multiple linear transformations would collapse into just one layer). Use calculus to adjust the weights (error propagation backwards through the layers).

Structured neural nets have different kinds of layers, e.g., feature maps (which perform a local analysis over the whole input); generated potentially via convolutions (i.e., turn a 28×28 image into 20 24×24 images), do some subsampling. Programming these requires some trial-and-error.

Models to explain decisions: “why did you classify this image as a cat?”; not just “I ran it through  my calculations and ‘cat’ came up as highest”. A deep neural net could produce a set of words (fairly abstract) and then a language-generating recurrent neural net (RNN) translates these into captions (sentences). Currently, statistically impressive but individual unreliable.

Adversarial perturbations: Panda + <1% computed distortion, becomes classified as a Gibbon. Whilst the two images are indistinguishable to a human, the noise interferes with the manifold separation. There are even universal adversarial perturbations that throw-off classification of all images.

Assured control code (e.g., in a helicopter): a physical machine is modelled (system dynamics, power, inertia), a control system design is created, from which controller code is generated. But this does not produce a very nice to fly system. Need some kind of refinement back to the control code. Given the correct structure for the control code framework then the fine tuning from feedback on the actual system is very effective. This can be overlayed with different goals (different fitness functions), e.g., safety over learning.

AI wave 3: combining wave 1 and 2 techniques. Perceive into a contextual model on which abstractions are computed and which can be tuned (learning) to produce reasoning. Aim to get a better intelligence profile: P 3, L 3, A 2, R 3. Hand assurance arguments on the contextual model part instead.
e.g., build models that are specialised based on our knowledge, e.g., we know the probably number of strokes in each digit 0-9 and its trajectory. The generative model generates explanation of how a test character was generated (go from the model of whats its looking for to what it sees; how likely is the thing seen generatable from the model of a particular digit).

Need to be able to specify intent in a variety of ways; all real world specifications are partial and humans can’t always communicate the full intent of what they want (cf., trying to explain how to drive a car).

What are we trying to assure? Mistakes in sensing/reasoning leading to undesirable actions, undesirable emergent behaviours, hackable systems being subverted, misalignment between human/machine values.

(AlphaGo is a wave3-like system as it combined the neural net-based classification of moves with traditional planning/reasoning/tree-based pruning/search).



(FSCD Keynote #2) – Uniform Resource Analysis by Rewriting: Strengths and Weaknesses, George Moser

Can we use complexity measures on term rewriting systems to make complexity arguments about higher-order functions?

The main scheme is to convert a program via a complexity reflecting transformation into a Term Rewriting System (TRS), then do automated resource analysis on the TRSS to get an asymptotic bound. The first half of the talk is some background on doing complexity analysis (which was a bit fast for me so my notes are a bit incoherent below). There are some really nice complexity analysis tools for programs coming out of this work (links near the end).

Some preliminaries: Example, set up list reverse as a finite set of rewrite rules (a Term Rewriting System, TRS):

rev(xs)            -> rev'(xs, nil)
rev'(nil, acc)     -> acc
rev'(x :: xs, acc) -> rev'(xs, x :: acc)

A computation of this set of rules is the application of the rules from left to right. The rules are non-overlapping.

Can see rewriting as an abstraction of functional programming, or can see it as part of equational reasoning (universal algebra); but this really generalises functional programming (function symbols can appear nested on the left-hand side).

Definition: A TRS is terminating if the rewriting relation is “well-founded”.
Definition: A function symbol f is “defined” if it is the root symbol on the left of a rule (other f is a constructor).
Definition: Runtime complexity wrt. a terminating TRS is defined:

dh(t) = max \{ n \mid exists u . t \rightarrow^n u\}
rc(n) = max \{ dh(t) \mid \mathsf{size}(t) \leq n \wedge \textit{n and t are "basic"} \}.

RC is the runtime complexity; Q: is this a “natural notion” for rewriting? (I don’t know how one defined natural here).
Derivational complexity has no restriction on the terms,
dc(n) = max \{ dh(t) \mid \textsf{size}(t) \leq n\} This has been used mainly to make termination arguments.
See “Termination proofs and the length of derivations” (Hofbauer, Lautemann, 1989).

Definition: Multiset Path Order. A precedence order > induces a multiset path order >_{mpo}, (whose definition I’m not going to transcribe, Wikipedia has a definition) It basically says that s = f(s_1, ..., s_n) > t = g(t_1,...,t_m) if s is > all of t_i, or s is \geq some t_i.

The Hydra Battle- is this terminating? (Dershowitz and Jouannaud designed the TRS as a termination problem, later rectified by Dershowitz): The beast is a finite tree, each leaf corresponds to a head. Hercules chops off heads of the Hydra, but the Hydra regrows:

  • If the cut head has a pre-predecessor (grandmother) then the remaining subtree is multiplied by the stage of the game (heads regrowing and multiplying).
  • Otherwise, nothing happens (the head doesn’t regrow)

Can show it is terminating by an inductive argument over transfinite numbers. (but complexity is quite bad). See “The Hydra battle and Cicho’s principle” (Moser, 2009). 

The RTA list of open problem was mentioned: http://www.win.tue.nl/rtaloop/

TRSes can be represented via a matrix interpretation (details were too fast for me).
These techniques were developed into a fully automated complexity analysis tool for TRSes called tct-trs (modular complexity analysis framework).

Going back to the original goal: can we use this for higher-order programs? Consider the textbook example of reverse defined by a left-fold. Need to do a complexity preserving conversion into a TRS (complexity preserving transformation ensures lower bounds, complexity reflexing ensures upper bounds). First, defunctionalise via a rewrite system into a first-order system. Built a tool tct-hoca which works well for various small higher-order functional programs (with complexity proofs taking up to 1 minute to compute).  See the tools at:

What about “real” programs or “imperative” programs? Use ‘integer transition systems’ for modelling imperative programs, which is integrated into the tools now.

Some related work: Multivariate amortized source analysis (Hoffman, Aehlig, Hofmann, 2012) and a type-based approach “Towards automatic resource bound analysis for OCaml” (Hofmann, Das, Weng, POPL 2017) (note to self: read this!).

A previous challenge from Tobias Nipkow: how do you do this with data structures like splay trees (cf Tarjan)? There is some new work in progress. Build on having sized types so that we can account for tree size. An annotated signature then decorates function types with the size polynomial, e.g. \texttt{splay} : A \times T_n \xrightarrow{n^3 + 2} T_n (recursive function to create a splay tree). A type system for this was shown, with cost annotation judgment (no time to copy down any details).

The strength of the uniform resource analysis approach is its modularity and extensibility (different intermediate languages and complexity problems).
Weaknesses: the extensibility/modularity required some abstraction which weakens the proving power, so adding new kinds of analysis (e.g., constant amortised / logarithmic amortised) requires a lot of work.
Audience question: can I annotate my program with the expected complexity and get a counter-example if its false? (counter-examples, not yet).


(FSCD) Continuation Passing Style for Effect Handlers, Daniel Hillerström, Sam Lindley, Bob Atkey, KC Sivaramakrishnan

Consider two effects: nondeterminism and exceptions, in the drunk toss example (a drunk person tosses a coin and either catches it and gets a bool for head or tails, or drops the coin [failure/exception]).

drunkToss : Toss ! {Choose:Bool; Fail:Zero}
drunkToss = if do Choose
            then if do Choose then Heads else Tails
            else absurd do Fail

Induces a simple computation tree.

A modular handler, using row polymorphism (with variable r below)

allChoices : a ! {Choose : Bool; r} => List a ! {r}
allChoices = return x |-> [x]
             Choose k |-> k true ++ k false

fail : a ! {Fail: Zero; r} => List a ! {r}
fail = return x |-> [x]
       Fail k   |-> []

The ! is an annotation explaining the effect operations, k are continuations, r are row variables (universally quantified).  Two possible interpretations:

(1) handle the computation with fail first, then handle that with allChoices
returns the result [[Heads, Tails], []].

(2) handle with allChoices first then fail gives the result [].

So the order of the handling matters. The operational semantics for handlers is based on a substitution in the computation tree for the relevant operations.

How do handlers get compiled? Use CPS! (restricted subset of the lambda calculus, good for implementing control flow). Based on \lambda^{\rho}_{\textit{eff}} (Hillerstrom, Lindley 2016) – lambda calculus with effect handlers (separate syntactic category but contains lambda terms), row polymorphism, and monadic metalanguage style constructs for effectful let binding and ‘return’. The CPS translation is a homomorphism on all the lambda calculus syntax, but continuation passing on the effectful fragment (effectful let and return).  One part of the interpretation for the effectful operation term: [[\textbf{do} \, l \, V]] = \lambda k . \lambda h . \; h (l \; \langle[[V]], \lambda x . k \; x \; h\rangle) (see the paper for the full translation!)

A key part of the translation is to preserve the stack of handlers so that the order is preserved and (the interpretation) of operations can access up and down the stack.
A problem with the translation is that it yields administrative redexes (redundant beta-redexes) and is not tail-recursive. Use an uncurried CRPS where the stack of continuations is explicit (see Materzok and Biernacki 2012), this leads to a new translation which can manipulate the continuation stack directly (see paper). However, this still yields some administrative redexes. Solution: adopt a two-level calculus (Danvy, Nielsen 2033) which has two kinds of continuations: static (translation time) and dynamic (run time), where you can reify/reflect between the two. This let’s you squash out administrative redexes from the runtime to the static part, which can then be beta-reduced immediately at compile time (end up with no static redexes).
The paper proves that this translation preserves the operational semantics.
Implementation in the ‘links’ language: https://github.com/links-lang/links

I was going to ask about doing this typed: this is shown in the paper.


(ICFP) How to Prove Your Calculus Is Decidable: Practical Applications of Second-Order Algebraic Theories and Computation, Makoto Hamana

SOL (Second-Order Laboratory) – tool for analysing confluence and termination of second-order rewrite rules within Haskell. Provides an embedded Haskell DSL (via Template Haskell) for describing rewrite rules. SOL can then automatically check termination and confluence.

Confluence + termination => decidability. Can use this to prove a calculi is decidable.

Based on a higher-order version of Knuth-Bendix critical pair checking using extended HO pattern unification (Libal,Miller 2016).

For example, the theory of monads is decidable: given two well-typed terms s, t consistent of return, bind, and variables, the question  is s = t derivable from the three laws is decidable. How? First, orient the monad laws as rewrite laws. Check confluence (Church-Rosser, CR) and Strong Normalisation (SN)– together these imply their exists unique normal forms, which are then used to prove equivalence (reduce terms to their normal forms). So, how to prove CR and SN?

One way to prove Strong Normalisation is to assign weights and show that reductions reduce weights. Unfortunately, this does not work in the higher-order context of the monad operations. Another standard technique is to, use the “reducibility method” (see Tait,Girard and Lindley,Stark’05). SOL uses the ‘general schema criterion’ which works for general rewrite rules (Blanqui’00,’16) using various syntactic conditions (positivity, accessibility, safe use of recursive calls, metavariables). [sound but obviously not complete due to halting problem]. (Uses also Newman’s Lemma– a terminating TRS is confluent when it is locally confluent).

This tool can be applied to various calculi (see paper, 8 different systems). Sounds really useful when building new calculi. The tool was applied to a database of well know termination/confluence problems and could check 93/189 termination and 82/96 confluence problems.

“Let’s prove your calculus is decidable using SOL” (Makoto is happy to assist!)


(ICFP) A Relational Logic for Higher-Order Programs
Alejandro Aguirre, Gilles Barthe, Marco Gaboardi, Deepak Garg, Pierre-Yves Strub

Relational properties are things like, if X and Y are related by R then F(X) and F(Y) are related by R, or over two properties, e.g., the results are related instead by S.

Relational refinement types: \Gamma \vdash t_1 \sim t_2 : \{n \mid n_1 = n_2\} These can be used to express monotonicity properties, are syntax directed, and exploit structural similarities of code. However, say we want to prove the naturality of take, but formulating the right relational refinement type is difficult because the computations are structurally different.

Start with a basic logic with lambda-terms over simple inductive types, and a separate layer of predicates

RHOL has judgements \Gamma \mid \Psi \vdash t_1 : \tau_1 \sim t_2 : \tau_2 \mid \phi(r_1, r_2) where \phi is a binary predicate on properties of the two terms and \Psi gives assertions about the free variables. A key idea here is that the types and assertions are separate.

An example rule two-sided rule (where we inductively construct terms with the same syntactic constructor on both sides of a relation):

\dfrac{\Gamma, x_1 : \sigma_1, x_2 : \sigma_2 \mid \Psi, \phi' \vdash t_1 : \tau_1 \sim t_2 : \tau_2 \mid \phi} {\Gamma \mid \Psi \vdash \lambda x_1 : t_1 : \sigma_1 \rightarrow \tau_1 \sim \lambda x_2 . t_2 : \sigma_2 \rightarrow \tau_2  \mid \forall x_1 . \phi' \Rightarrow \phi[r_1 x_1 / r_1, \, r_2 x_2 / r_2]}.
This seems quite natural to me: abstraction introduces a universal quantification over the constraints attached to the free variables.

Rules can also be one side (where you construct a term on just one side of a relation).
The paper shows that RHOL is as expressive as HOL (by a mutual encoding).

Other relational typing system can be embedded into RHOL: Relational Refinement Types, DCC (Dependency Core Calculus, and RelCost (relational cost).

Q: if HOL and RHOL are equivalent (rather than their just being an embedding of HOL into RHOL say) then why have RHOL? Relational reasoning was gained which was not built into HO. Update: I discussed this further later with one of the authors, and they made the point that people have done relational-style reasoning in HOL but it requires taking a pairing of programs and building all the relational machinery by hand which is very cumbersome. RHOL has all this built in and the one-side and two-side rules let you reason about differently-structured terms much more easily. I can see this now.


(ICFP)
Foundations of Strong Call by Need

Thibaut Balabonski, Pablo Barenbaum, Eduardo Bonelli, Delia Kesner

What does this lambda term reduce to:  (\lambda x \, y. x \, x) \, (id \, id)? It depends on call-by-value, call-by-name, call-by-need, or full beta then you get different answers. The call-by-value, call-by-name, and call-by-need strategies are called weak because they do not reduce underneath lambda abstraction. As a consequence, they do not compute normal forms on their own.

Weak call-by-need strategy (the usual strategy outlined in the literature): procrastinate and remember. Beta reduction into a let binding (don’t evaluate the right-hand side of an application). (There was some quite technical reduction work here, but I noted this interesting rule on “bubbling” reductions out by splitting a nested reduction into two: t[v[u/y]/x] \rightarrow t[u/y][v/x]).

Strong call-by-need provide the call-by-need flavour of only evaluating needed values once, but crucially computes normal forms (by reducing under a lambda). This strong approach has two properties (conservative) the strong strategy first does whatever the weak strategy will do; and (complete) if there is a beta-normal form it will reach (a representative of) it. This comes by reducing under the lambda at some points. Consequently, this approach only ever duplicates values, not computations.

Q: If you implemented this in Haskell, would it make it faster? I didn’t quite hear the answer, but this would completely change the semantics as Haskell is impure with respect to exceptions and non-termination. It sounded like it would be more useful for something pure like Coq/Agda (where it could provide some performance improvement).


(ICFP) – No-Brainer CPS Conversion, Milo Davis, William Meehan, Olin Shivers

The Fischer/Reynolds algorithm for CPS transformation introduces administrative redexes. Since then, other algorithms have improved on this considerably (Danvy/Filinsky). In this approach, some additional reductions are applied to the term to get a smaller CPS term out. The moral of the story (or the key idea) is to treat the CPS converter like a compiler (use abstract representations, symbol tables).

Rules of the game: make the terms strictly smaller (greedy reductions), should be strongly normalisation, and confluent, keep the original algorithmic complexity of the CPS transforms (linear time).

Approach:
– Variables are the same size, so do beta redexes on variables
– Beta reduce application of a lambda to a lambda (\lambda x . e) (\lambda y . e') only when the number of reference to x in e is less than or equal to 1 (so things don’t explode).
– Eta reduce (??only when there are no references to the variable in the body??) (I think this is what was said, but I think I might have made a mistake here).

Relies on some machinery: doing reference counts (variable use count), abstract representation of continuations (halt, variables, function continuations, and application continuations) rather than going straight to syntax and syntax constructor functions (which can do reductions) to dovetail with this, and explicit environments to be used with function closures.


(ICFP) Compiling to Categories, Conal Elliott

Why overload? Get a common vocabulary, laws for modular reasoning.

But generally doesn’t apply to lambda, variables, and application. So what to do? Eliminate them! Rewrite lambda terms into combinators: const, id, apply, pairing of functions (fork), curry, and keep going…. and you can automate this via a compiler plugin (so you don’t have to write code like this yourself).

This is implemented by the interface (algebra) of a category (for function composition / identities) + Cartesian (product) structure + coproducts + exponents: giving you (bi)Cartesian-closed categories. Implement this as class/interface so you can give different realisations/compilations, e.g., a graph drawing implementation (Conal showed some pretty pictures generated from this).

Another implementation is to generate Verilog (hardware descriptions), or to graphics shader code.

What about compiling to the derivative of a program? Represent this via the type:
newtype D a b = D (a -> b x (a -o b)) i.e., a differentiable function is a function which produces its result and its derivative as a linear map (an implementation was then shown, see paper). You can then compose interpretations, e.g., graph interpretation with derivative interpretation.

Another interpretation: interval analysis (e.g., Interval Double = (Double, Double)). There are more examples in the paper, including constraint solving via SMT by writing functions to Bool which are then compiled directly into SMT constraints (rather than using existing embeddings, e.g., SBV).

Q: what about recursion? Can be done (compiling from a letrec into a fix-point combinator class).


 (ICFP) Visitors Unchained, François Pottier

(Humorous tagline from Francois: Objects at ICFP?! Binders again?!)

Manipulating abstract syntax with binding can be a chore: nameplate (boilerplate on names, term coined by James Cheney). Large and easy to get wrong (especially in weakly typed languages, non-dependent languages). This paper is a way of getting rid of nameplate in OCaml via a library and an syntax extension. Supports multiple representations, complex binding, is modular, and relies on just a small amount of code generation.

Several Haskell libraries support this via its support for datatype generic programming. This will use OCaml’s object technology. Annotating a data type with [@@deriving visitors { variety = 'map' }] automatically generates a visitor pattern (a class) for doing a map-operations on the datatype. To use this, create a local object inheriting the visitor class and override one or more methods, e.g., for a syntax tree type, override the visit method for the “add” node in order to implement the rewriting of e+0 -> e.
Visitors are quite versatile: easy to customise behaviour via inheritance which is something you can’t do in FP without a lot of redesign to your algorithms.

Want to traverse syntax with binders, considering three things (from three perspectives or “users” in the talk here):
(1) “End user” describes the structure of an AST, parameterised by types of bound names and free names and using a parametric data type abs whenever a binding is needed (e.g., in a lambda). This should also deriving the map visitor(see above) with option ["BindingForms.map"].
(2) “Binding library” (provided as part of this work) defines the abs type which capture the binding construct, but they leave the scope extrusion function extend as a parameter (extend gets used when you traverse a binder to go inside a scope, e.g., going into the body of a lambda and extending the context by the bound variable).
(3) The last part defines how to represent environments (and lookups) via an overriding a visitor, nominal terms, and scope extrusion. (The work provides some implementations of these part in a library too).

As seen above, the structure of this library is fairly modular. The “end user” has to glue all these things together.
One limitation is that nonlinear patterns (e.g., as in Erlang) can’t be represented (I didn’t immediately see why though).


(ICFP) – Staged Generic Programming, Jeremy Yallop

Advert at the start: if you have an abstract library that is not very high-performance, then the techniques here could be useful to you.

Generic programming a la Scrap Your Boilerplate, for any data type get automatic traversals e.g., listify (matches c) for some constant c, gives you a list of all nodes in a data type matching c; as a programming you don’t need to write listify or matches: they are generated. Requires internally a way to represent types as data and to map data types into a generic form, a collection of traversals over the general form, and generic schemes which plug together the traversals with parameter generic queries. (In Haskell see the syb). However, the overhead of this roughly 20x writing the code manually! This overhead was quite portable: reimplementing Scrap Your Boilerplate into OCaml had about the same amount of slow down compared with manual traversals.

Why this slow? Type comparisons are slow, lots of indirection, and lots of unnecessary applications of applications. (Dominic: to me it seems like it ends up a bit like the slowness involved in dynamically typed languages).

Solution: use staging to optimise the datatype generic implementation. Staging lets you explain which parts of the program can be evaluated early, and which bits need to be evaluated later (due to waiting for inputs)– these are the quoted bits (inside  with the option to jump back into the early part by backquoting ~).

Naively staging SYB keeps the type representation unchanged, but the shallow traversals and recursion schemes can be quoted judiciously such that type comparison, indirect calls, and polymorphism are all eliminated. (Jeremy showed us the code output.. but it has lots of noise and opportunities for improvement).

Instead, use a special fixed-point operator that inlines non-recursive functions and perform (semantics preserving) code transformations as well: e.g., decomposing the generic map operation.

One technique is to allow equations on “partially-static data” e.g., if you have [] @ x @ [] (i.e., static empty list concat dynamic list x concat static empty list []) this still can be statically rewritten into x. Thus apply laws on the static parts as far as possible.
Another technique was to eta reduces matches that have identical branches.
(I think there were a few more thing here). Ends up with much tighter code which performs very close to or even better than the hand-written code.

Q: How does this compare to the Template-your-Boilerplate library for Haskell?
I completely missed the answer as I was looking up this library which I really should be using…!

ICFP / FSCD day 1 – rough notes

(Blog posts for Day 1, Day 2, Day 3, Day 4 (half day))

I decided to take electronic notes at ICFP and FSCD (colocated) this year, and following the example of various people who put their conference notes online (which I’ve found useful), I thought I would attempt the same. However, there is a big caveat: my notes are going to be partial and may be incorrect; my apologies to the speakers for any mistakes.


(ICFP keynote #1) Computational Creativity, Chris Martens (slides)

In the early days, automated theorem proving by computers was seen as an AI activity. Theorem proving, program synthesis, and AI planning are search procedures but can be viewed as creative procedures.

Linear logic is useful for describing plots (story telling).

There is a difference between what an AI opponent and AI cooperator should do: an opponent need not make intelligible moves, but a cooperator needs to act in a way that is intelligible/understandable to the human player.

Applied Grice’s maxims of conversation to build an “intentional” cooperative player for the Hanabi game. Grice’s maxims are: quantity (not too much or little), quality (tell the truth), relation (be relevant), manner (avoid ambiguity).

Theory of mind: forming mental models of other people’s mental models.

Dynamic epistemic logic, has two indexed modalities:
\Box_a A   (agent ‘a’ believes A)
[\alpha] A   (A holds true after action \alpha).

Actions are defined inductively:
\begin{array}{rll} \alpha, \beta & = \\ & \textit{flip} \; p & \text{(change truth value)} \\ \mid & ?A & \text{(precondition)} \\ \mid & \alpha + \beta & \text{(non deterministic choice)} \\ \mid & \alpha ; \beta  & \text{(sequence)} \\ \mid & \alpha^a & \text{(appearance to agent 'a')} \\ \mid & \alpha* & \text{(public action)} \end{array}

Semantics is based on a possible worlds formulation (current actions change future possible worlds). Ostari (Eger & Martens) is a full epistemic logic language that can capture lots of different games with different kinds of knowledge between players.

Key point: use compositionally (of various kinds) as a design principle

Three research challenges:

  • Dynamic logics with functional action languages
  • Constructive/lambda-calculable DEL
  • Epistemic session types

(FSCD keynote #1) Brzozowski Goes Concurrent – Alexandra Silva

Ongoing project CoNeCo (Concurrency, Networks, and Coinduction)

SDN – Software Defined Network architectures let you write a program for the network controller. Requires languages for the controller to interact with the underlying switches / routers. Goals of new network PL: raise the level of abstraction beyond the underlying OpenFlow API– make it easier to reason about. Based on Kleene algebra.

NetKat – based on regular expression (Kleene algebra) with tests (KAT) + additional specialized constructions relating to networking. This is compiled to OpenFlow.

Kleene algebra, e.g., (0 + 1(01^*0)^*1)^* – multiples of 3 in binary.

For reasoning about imperative programs, this can be extended with the idea of ‘tests’ (Kozen). Need to capture more of the control flow graph, but there are guards (tests). The solution is to split the underlying alphabet of the Kleene algebra into two: one for actions and one for tests: combine a Kleene algebra and a Boolean algebra (where there is negation), e.g. $\bar{b}$ is the negation of b. Then the three standard control flow constructs of an imperative language are defined:

\begin{array}{rl} p ; q & = pq \\ \textbf{if} \; b \; \textbf{then} \; p \; \textbf{else} \; q & = bp + \bar{b}q \\ \textbf{while} \; b \; \textbf{do} \; p & = (bp)^*\bar{b} \end{array}

Subsumes Hoare logic, i.e.:
b \{p\} c \Leftrightarrow b\,p\,\bar{c} = 0

Decidable in PSPACE.

What is the minimum number of constructs needed to be added to KAT to deal with packet flow?

A packet is an assignment of constant values n to fields x a packet history is a nonempty sequence of packets. Include assignments (field <- value) and tests on fields (filed = value). e.g.,

sw = 6; pt = 88; dest <- 10.0.0.1; pt <- 50

For all packets incoming on port 88 of switch 6 set the destination IP to 10.0.0.1 and send the packet to port 50.

Can then reason about reachability (can two hosts communicate)? Security (does all untrusted traffic pass through an intrusion detection)? loop detection (packet to be stuck in a forwarding cycle).

Since networks are non-deterministic, this has been adapted to probabilistic NetKAT.

Can we add concurrency to NetKAT (deal with interaction between different components): within this paradigm, can we add concurrency to Kleene algebra? Concurrent Kleene algebra adds a parallel composition || (cf. Hoare 2011). Would like to reason about weak memory model like behaviour (cf. Sewell’s litmus tests). There were developments in concurrent Kleene algebra, but a solid foundations was not yet available.

Have proved a Kleene theorems: equivalence between Kleene algebra and finite automata. In this case, the corresponding automata to a concurrent Kleene algebra is a pomset automata.

Can make Concurrent Kleene algebras into regular subsets of pomsets (useful model of concurrency, cf Gischer) i.e., [-] : 2^{\textsf{Pom}_{\Sigma}}.

Pomset automata- has the idea of splitting into two threads (two paths) that once they reach their accept states, then go onto the next state of the originating state (can be expressed as multiple automata). This reminds me of Milner’s interleaving semantics (for atomic actions) in CCS. Alphabet of the automata is a pomset over some other alphabet of actions.

Showed the (Brzozowski) construction to build an automata from a Concurrent Kleene algebra term, and then how to go back from automata to expressions.

The exchange law captures the interaction of sequential and parallel interaction. Currently working on extending the Kleene theorem in this work to include the exchange law.



(ICFP) Faster Coroutine Pipelines, Mike Spivey

(I missed the start and had some trouble catching up).

Coroutine pipelines are a way to structure stream-processing programs. There are libraries for doing this that are based on a term encoding that is then interpreted which suffers serious slow-down when there is process nesting. This paper proposes an alternate approaching using continuation which doesn’t suffer these slow down problems.

The direct style can be mapped to the CPS style which is a monad morphism between two Pipe monads (capturing the coroutine pipelines). To show that the two models are equivalent could be done with logical relations (Mike recommended Relating models of backtracking by Wand and Vaillancourt (2004)).

Termination. Consider a situation:
p1 \mid\mid ((p2 \mid\mid p3 \mid\mid p4) ; q) \mid\mid p5
where p3 then terminates.

Need to end up with p1 \mid\mid q \mid\mid p5 not p1 \mid\mid ((p2 \mid\mid p4) ; q) \mid\mid p5. (I wasn’t sure if this was an open question or whether there was a solution in this paper).

Ohad Kammar asked on the ICFP slack whether this relates to
Ploeg and Kiselyov’s paper at Haskell 2014 but this didn’t get asked.


(ICFP) A pretty but not greedy printer (Functional pearl), Jean-Philippe Bernardy

(slides: https://github.com/jyp/prettiest/blob/master/talk/Outline.org)

Hughes proposed a pretty printing approach, which was improved upon by Wadler. For example, if you want to pretty print some SExpr but with a maximum column width: when do you spill onto the next line?

Proposes three laws of pretty-printing:
1). Do not print beyond the right margin
2). Do not reveal the structure of the input
3). Use as few lines as possible.

(Hughes breaks 3 and Wadler breaks 2 (I didn’t spot how in the talk)).
These three rules are not compatible with a greedy approach.
This work trades performance for being law-abiding instead.

A small API is provided to write pretty printers (four combinators:
text, flush, (horizontal composition), ).

Phil Wadler asked how you represent hang in the library. JP quickly typed up the solution:
hang x y = (x text " " y) (x $$ (text " " y))

Phil countered that he could do the same, but JP answered that it didn’t have the expected behaviour. I’m not entirely sure what hang does, but in the Hughes-based library (https://hackage.haskell.org/package/pretty-1.1.3.5/docs/Text-PrettyPrint.html), hang :: Doc -> Int -> Doc -> Docwith behaviour like:

Prelude Text.PrettyPrint> hang (text "Foo\nBar") 0 (text "Bar")
Foo
Bar Bar

(not sure what the extra Int argument is for).


(ICFP) Generic Functional Parallel Algorithms: Scan and FFT – Conal Elliott

Arrays are the dominant type for parallel programming, but FP uses a variety of data types. Generic programming decomposes data types into fundamental building blocks (sums, products, composition).

Perfect trees can be represented as a n-wise composition of a functor $h$.

Prefix sum (left scan)

b_k = \Sigma_{1 \leq i \leq k} a_i
for k = 1, ..., n+1, e.g., scan [1,2,3,4] = [1,3,6,10]
Define more generally as:

class Functor f => LScan f where
  lscan :: Monoid a => f a -> f a x a

Instances were then given for the components of a datatype generic representation, (constant types, type variable, sum, product, composition) which gives data type generic scan. (Insight: Do left scans on left vectors, not left scans on right vectors! The latter adds a lot of overhead).

Data types were describe numerically, e.g., 2 is a pair, \stackrel{\leftarrow}{4} is a 4-vector.

Can do the esame with Discrete Fourier Transform. A nice property is that a
DFT can be factored into separate parts (Johnson 2010, e.g., a 1D DFT of size N = N1*N2 is equivalent to a 2D DFT of size N1xN2– I couldn’t see this so well on the slide as I was in the overflow room).


(ICFP) A Unified Approach to Solving Seven Programming Problems (Functional Pearl) – William E. Byrd, Michael Ballantyne, Gregory Rosenblatt, Matthew Might

Started off referencing http://matt.might.net/articles/i-love-you-in-racket/ – 99 ways to say “I love you” in Racket (all based on different list operations). But what if you wanted more, for some other language? How could you generate this, to say get 1000, or 10000 such examples? The solution put forward here is to using miniKanren, the constraint logic language. The eval function is turned into a relation then can be “run” in reverse, (i.e., run 99 (q) (evalo q '(I love you)) gives 99 S-expressions that evaluate to ​(I love you)). What about for generating quines (programs that evaluate to themselves)? (run 1 (e) (evalo e e)) produced just the 0 expression and getting three results was just 0#t and #f but with 4 it produced a more interesting quine.

Then, William showed (run n (p q) (eval p q) (evalo q p) (/= q p) generating “twines” (pair of mutually producing programs). This was then applied to a proof checker to turn it into an automated theorem prover! The same technology was used then for program synthesis (although some additional work was needed under the hood to make it fast enough).


(FSCD) Relating System F and λ2: A Case Study in Coq, Abella and Beluga – Jonas KaiserBrigitte PientkaGert Smolka

System F (Girard ’72): Two-sorted, types and terms, based on the presentation of System F in Harper’13. Meanwhile, study of CC lead to Pure Type Systems- System F appeared in the Lambda Cube at the corner \lambda2. This work shows the formal relation (that indeed these two systems are equivalent). A proof is partially given in Geuvers ’93, but a full Coq formalisation was given by Kaiser, Tebbi and Smolka at POPL 2017. In this work, the proof is replayed across three tools to provide a useful comparison of the three systems: Coq, Abella and Beluga.

A complication is that the syntax is non-uniform on the System F side: \Pi x : a . b in PTS can correspond to A \rightarrow B and \forall X . B. Have to also keep track of context assumptions. One (of many) “topics of interest” in this comparison is how to manage contexts (which is interesting to me as I am quite interested in CMTT in Beluga).

The approaches were:

  • Coq – first-order de Bruijn indices, parallel substitutions (Autosubst library), and invariants (e.g, a context can be extended with a new term variable). Inductive predicate for typing judgments. Traversal of binders requires context adjustments.
  • Abella – HOAS, nominal quantification (fresh names handled internally), relation proof search [Abella has two layers: specification and logic]. Contexts represented by lists (as in Coq). A compound inductive predicate is define to relate the different context representations (and to keep them all consistent).
  • Beluga – HOAS, 1st class contexts (via Contextual Modal Type Theory) and context schemas. Object K (e.g., terms, types, deriviations) are alway paired with a 1-st class context \Gamma that gives it meaning [\Gamma \vdash K]. There is no concept of free variable., e.g., in Coq 0 \vdash 0_{ty} \rightarrow 0_{ty} \textsf{ty} \Rightarrow \bot is provable but in Belluga \bullet \vdash 0 is not even well formed (accessing the first variable 0 (think de Bruijns) in an empty context). Context schemas provide a way to give the structure of the typed contexts (as a dependent record).

(ICFP) A Framework for Adaptive Differential Privacy – Daniel Winograd-CortAndreas HaeberlenAaron RothBenjamin C. Pierce

Associated GitHub: https://github.com/dwincort/AdaptiveFuzz

They created a new language, Adaptive Fuzz for adaptive differential privacy (DP), uses Fuzz type checking to verify individual pieces in a piecewise static way.

DFuzz is a dependently typed version of Fuzz, can we use this? Privacy cost dependent on feature selection: which could depend on values in the databases: but this is practically useless as we don’t know what is in the database. We can still get static guarantees if it is done piece-wise in two modes: data mode (access to sensitive data, uses the Fuzz type checker to provide DP proofs) and adaptive mode (computation between pieces). End up with “light” dependent types.

These two modes have the same syntax though (code it one works in the other). Here is an example type for a function that scales one number by the other (multiplication by abs):

\textsf{scale} \; (c : \mathbb{R}) \; (n : [\textsf{abs} \; c] \mathbb{R}) : \mathbb{R}

(scale the second number by the first argument). In adaptive mode the annotations are ignored, in data mode the first argument is forced to be a constant (partial evaluation happens under the hood so we don’t need dependent types really).

General scheme is: analyst (program) writes data mode code, it is partially evaluated and the Fuzz type checker works out a cost, which is the compared against a privacy filter (where/how is this defined?) Can write a gradient descent algorithm with different levels of randomness provided by different privacy budgets (how much information do we want to reveal at the end, e.g., if budget is infinite then you get not noise in the result). (another example in the paper is feature selection).

This seems like a really interesting mix of a static and semi-static approach (via partial evaluation). I wonder how this relates to fine-grained cost-tracking type systems?


(ICFP) – Symbolic Conditioning of Arrays in Probabilistic Programs –
Praveen Narayanan, Chung-chieh Shan

(tag line: Bayesian inference via program transformation).
Say you are benchmarking some iterative code, which you model by a n + b for some startup cost b and n iterations and a a cost for the loop. You might then collect some results and do linear regression (best fist line) to see if this model is confirmed. Bayesian linear regression gives you many lines for different kinds of fit.

  1. Write a generative model, e.g., some normally distributed n to generate a set of possible lines (normally distributed gradients), to which we maybe then add some additional normally distributed noise for each point.  (k measurements)
  2. Observe running times
  3. Infer distribution of lines. Delete those candidates from the generative model that weren’t observed (from step 2), keep the ones that were.

Bayesian inference is compositional: you can build it from different components, by treating distributions as programs (e.g., a program normal (a * n_1 + b) is gives us one of the lines in the generative model) and then transforming the program. A program transformation implements the model-theoretic notion of “disintegration” but the existing approach for this is problematic (I wasn’t quite clear on how). A new transformation approach is proposed which provides a smaller output program via some additional combinators which replace an unrolled set of k observations into a replication like combination (plate).

Definitional interpreters: programs that take ASTs and run them, giving an executable semantics. Key challenge: a parameterised interpreter that recovers both concrete and abstract semantics.

A concrete interpreter for the lambda calculus: exp -> m (val) where m is a monad which (at least) captures the environment and the store (which is used for representing the variable environment). The definition is written non-recursive “unfixed” (unrolled), which can be fixed (e.g., apply to Y combinator) later to run the interpreter. By leaving it unfixed, we can interpose different behaviour in the recursive cases.

To make the interpreter abstract, abstract the primitive operations and types, e.g., integers are extended with an ‘abstract’ value, i.e., \mathbb{Z} \cup \{\texttt{'N}\} where 'N represents the abstract integer. The semantics is then main non-deterministic so that an iszero? predicate on 'N splits into two execution traces. But its partial (could run forever).

How to make it total (always terminating). Look for states that it has already seen before, sufficient for termination but unsound for abstraction. If you observe fact 'N then one branch fails (non-terminating as visit previous state) and the 1 branch succeeds, so the result is just 1 (not ideal). Instead, look in a “cache” of precomputed values to return something like [[fact 'N]] = {1} U {'N x $[fact 'N]}. Intercept recursion points to evaluate through a cache and to stop when a previously computed value is hit (sounds space intensive?).


(ICFP) – On the Expressive Power of User-Defined Effects: Effect Handlers, Monadic Reflection, Delimited Control – Yannick Forster, Ohad Kammar, Sam Lindley, Matija Pretnar

Say you want to write a program like:

toggle = { x <- get!
           y <- not! x
           put! y
           x }

If we don’t have effects, then we can do explicit state passing, but you have to do a full rewrite. Ideally want only local transformations. (Ohad showed this in three different styles, see below).

Relative expressiveness in language design, compare/contrast: (1 – Eff) algebraic effects and handlers (2 – Mon) monads (3 – Del) delimited control. Did this by extending CBPV in these three directions (and formalised this in Abella) and defining macro translations between every pair of these extended languages. Expressivity is stated as formal translations between calculi. Then this was considering in the typed setting. Interestingly there is no typed macro translation from Eff to Del nor from Eff to Mon.

[There is a large design space which yields lots of different languages, this study is a start. Inexpressivity is brittle: adding other language features can change the result.]

In the Eff version, we give standard state handlers to implement get and put. Its type contains the effect-system like information State = {get : 1 -> bit, put : bit -> 1}where toggle : U_State F bit.

In the Mon version, monadic reflection is used with a state monad.
(Q: put was of type U (bit -> F) but I thought the monad would be T = U F?)

(Q: In the translation from Mon -> Eff, how do generate the effect type if the source program only uses put or get ? I guess it just maps to the set of all effect operations).

(Q: the talked showed just simple state effects; are the results shown for any other kind of effect).


(ICFP) – Imperative Functional Programs That Explain Their WorkWilmer Ricciotti, Jan Stolarek, Roly Perera, James Cheney

https://github.com/jstolarek/slicer

Slicing can be used to explain particular (parts of) a program output: which parts of the source code contributed to a value. Extend TML (Transparent ML) to Imperative TML.

Construct backwards and forward slicing as a Galois connection (between two lattices). The lattices are based on the idea of definedness: where ‘holes’ make programs more undefined, e.g., 1 + 2 is above 1 + \Box and \box + 2 (partial expressions). Forward slicing preserves meets in the lattice; backward slicing should be consistent and minimal with respect to forward slicing (I missed what this meant exactly, but I talked to Janek later to clarify: consistency is x \leq \textsf{fwd}(\textsf{bwd}(x))). The idea then is that we have \textsf{bwd} : values_\Box \rightarrow expr_\Box and \textsf{fwd} : expr_\Box \rightarrow values_\Box which form a Galois connection, i.e., \textsf{bwd}(x) \leq y  \Leftrightarrow x \leq \textsf{fwd}(y). [Given one of the slicings we don’t necessarily get the other for free].

Given a small program l1 := 0; (!l1, !l2) a partial version of it \Box; (!l1, !l) and an initial store [l1 -> 1, l2 -> 2] should forward slice to (\Box, 2) by propagating the “hole” through the store.


(ICFP) Effect-Driven QuickChecking of Compilers – Jan Midtgaard, Mathias Nygaard Justesen, Patrick Kasting, Flemming Nielson, Hanne Riis Nielson

They made a generator for writing OCaml programs: useful for automated testing, e.g., comparing two compilers against each other (compile -> run -> diff, any difference will be suspicious).

Generated a program let k = (let i = print_newline () in fun q -> fun i -> "") () in 0 which found a bug: different behaviour between the native code generator and the bytecode generator due to effects being delayed (indefinitely). Another example was found (related), due to effects being removed.

Program generation, how to do it:

  • Try: generate arbitrary strings?! Most won’t lex or parse).
  • Instead, follow the grammar (see Celento et al ’80). Most won’t type check!
  • Instead, follow the simply-typed lambda calculus (Palka et al. ’11) by bottom-up reading of a typing relation. This will make it through the type checker (I guess you can generalise the type rules).

Observed behaviour in OCaml depends on evaluation order (when effects are involved). OCaml bytecode uses right-to-left but sometimes native code backend uses left-to-right (hm!) But the specification is itself a bit loose. Can we avoid generating such programs since they are not well specified for anyway?

  • Instead, following a type-and-effect systems which has one bit for marking pure/effectful (boolean lattice effect system). Extend to a pair of bits, where the second bit is order dependence (extend the effect algebra a bit more). Our goal is then a program which may have effects but may not have order-dependent effects.

Have an effect-dependent soundness theorem (that the effect bits correctly anticipate the effects).

Next, there is a type preserving shrinker to reduce huge ugly generated programs into a minimal (much nicer) example. e.g., replace a pure integer expression by a literal 0 and large literals by small literals, replace beta-redexes by let bindings, etc. Shrunk tests are checked whether they still show a difference.

A question was asked whether the effect-based mechanism could be used to rule-out other allowed difference, e.g., in floating point. Jan answered that floating point isn’t generated anyway, but an interesting idea.

Another question: how do you avoid generating non-terminating programs? Jan explained that since it is STLC (+ extension) only total terms are generated.

An afternoon of DTT/theorem provers: Agda and Coq

Note from 23rd August, 2017
I found this draft blog post lying around, written in the spring of 2015 while I was working at Imperial College London as a Research Associate in the Mobility Reading Group with Nobuko Yoshida. This was the fruit of a discussion with Tiago Cogumbreiro where we were comparing Coq and Agda for theorem proving. I believe we had a few more “side-by-side” comparisons written down, but this is all I have here: simple code for showing that addition of two even numbers yields an even number. It might prove to be a useful reference for someone else (at the time, I had a hard time remembering Coq syntax as I didn’t use it very often, and Tiago kindly gave me some pointers) so I have posted it “as is” now, over two years later.


Agda Coq
data Nat : Set where
  zero : Nat
  succ : Nat -> Nat

data Even : Nat -> Set where
  even_base : Even zero
  even_step : forall {n} -> Even n -> Even (succ (succ n))
Inductive Nat :=
  | zero : Nat
  | succ : Nat -> Nat.

Inductive Even: Nat -> Prop :=
  | even_base : Even zero
  | even_step : forall n, Even n -> Even (succ (succ n)).
_+_ : Nat -> Nat -> Nat
zero + n = n
(succ n) + m = succ (n + m)
Fixpoint add (n:Nat) (m:Nat) :=
  match n with
    | zero => m
    | succ n' => succ (add n' m)
  end.
even_sum : forall {n m} -> Even n -> Even m -> Even (n + m)
even_sum even_base x      = x
even_sum (even_step x) y  = even_step (even_sum x y)
Lemma add_succ:
  forall n m, add (succ n) m  = succ (add n m).
Proof.
  auto.
Qed.

Lemma even_sum :
  forall n m, Even n -> Even m -> Even (add n m).
Proof.
  intros n m even_n even_m.
  induction even_n.
   (* case even_base *)
   - simpl.
     assumption.
   (* case even_step *)
   - repeat (rewrite add_succ).
     apply even_step.
     assumption.
Qed.

Rearranging equations using a zipper

Whilst experimenting with some ideas for a project, I realised I needed a quick piece of code to rearrange equations (defined in terms of +, *, -, and /) in AST form, e.g., given an AST for the equation x = y + 3, rearrange to get y = x - 3.

I realised that equations can be formulated as zippers over an AST, where operations for navigating the zipper essentially rearrange the equation. I thought this was quite neat, so I thought I would show the technique here. The code is in simple Haskell.

I’ll show the construction for a simple arithmetic calculus with the following AST data type of terms:

data Term = Add Term Term 
          | Mul Term Term 
          | Div Term Term
          | Sub Term Term  
          | Neg Term
          | Var String
          | Const Integer

with some standard pretty printing code:

instance Show Term where 
    show (Add t1 t2) = (show' t1) ++ " + " ++ (show' t2)
    show (Mul t1 t2) = (show' t1) ++ " * " ++ (show' t2)
    show (Sub t1 t2) = (show' t1) ++ " - " ++ (show' t2)
    show (Div t1 t2) = (show' t1) ++ " / " ++ (show' t2)
    show (Neg t) = "-" ++ (show' t) 
    show (Var v) = v
    show (Const n) = show n

where show' is a helper to minimise brackets e.g. pretty printing “-(v)” as “-v”.

show' :: Term -> String
show' (Var v) = v
show' (Const n) = show n
show' t@(Neg (Var v)) = show t
show' t@(Neg (Const n)) = show t
show' t = "(" ++ show t ++ ")"

Equations can be defined as pairs of terms, i.e., ‘T1 = T2’ where T1 and T2 are both represented by values of Term. However, instead, I’m going to represent equations via a zipper.

Zippers (described beautifully in the paper by Huet) represent values that have some subvalue “in focus”. The position of the focus can then be shifted through the value, refocussing on different parts. This is encoded by pairing a focal subvalue with a path to this focus, which records the rest of the value that is not in focus. For equations, the zipper type pairs a focus Term (which we’ll think of as the left-hand side of the equation) with a path (which we’ll think of as the right-hand side of the equation).

data Equation = Eq Term Path

Paths give a sequence of direction markers, essentially providing an address to the term in focus, starting from the root, where each marker is accompanied with the label of the parent node and the subtree of the branch not taken, i.e., a path going left is paired with the right subtree (which is not on the path to the focus).

data Path = Top (Either Integer String)  -- At top: constant or variable
           | Bin Op                -- OR in a binary operation Op,
                 Dir               --    in either left (L) or right (R) branch
                 Term              --    with the untaken branch 
                 Path              --    and the rest of the equation
           | N Path                -- OR in the unary negation operation

data Dir = L | R 
data Op = A | M | D | S | So | Do

The Op type gives tags for every operation, as well as additional tags So and Do which represent sub and divide but with arguments flipped. This is used to get an isomorphism between the operations that zip “up” and “down” the equation zipper, refocussing on subterms.

A useful helper maps tags to their operations:

opToTerm :: Op -> (Term -> Term -> Term)
opToTerm A = Add
opToTerm M = Mul
opToTerm D = Div
opToTerm S = Sub
opToTerm So = (\x -> \y -> Sub y x)
opToTerm Do = (\x -> \y -> Div y x)

Equations are pretty printed as follows:

instance Show Path where
    show p = show . pathToTerm $ p
    
instance Show Equation where
    show (Eq t p) = (show t) ++ " = " ++ (show p)

where pathToTerm converts paths to terms:

pathToTerm :: Path -> Term
pathToTerm (Top (Left c)) = Const c
pathToTerm (Top (Right v))= Var v
pathToTerm (Bin op L t p) = (opToTerm op) (pathToTerm p) t
pathToTerm (Bin op R t p) = (opToTerm op) t (pathToTerm p)
pathToTerm (N p)          = Neg (pathToTerm p)

Now onto the zipper operations which providing rebalancing of the equation. Equations are zipped-down by left and right, which for a binary operation focus on either the left or right argument respectively, for unary negation focus on the single argument, and for constants or variables does nothing. When going left or right, the equations are rebalanced with their inverse arithmetic operations (show in the comments here):

left (Eq (Var s) p)     = Eq (Var s) p
left (Eq (Const n) p)   = Eq (Const n) p
left (Eq (Add t1 t2) p) = Eq t1 (Bin S L t2 p)   -- t1 + t2 = p  -> t1 = p - t2
left (Eq (Mul t1 t2) p) = Eq t1 (Bin D L t2 p)   -- t1 * t2 = p  -> t1 = p / t2
left (Eq (Div t1 t2) p) = Eq t1 (Bin M L t2 p)   -- t1 / t2 = p  -> t1 = p * t2
left (Eq (Sub t1 t2) p) = Eq t1 (Bin A L t2 p)   -- t1 - t2 = p  -> t1 = p + t2
left (Eq (Neg t) p)     = Eq t (N p)             -- -t = p       -> t = -p

right (Eq (Var s) p)     = Eq (Var s) p          
right (Eq (Const n) p)   = Eq (Const n) p
right (Eq (Add t1 t2) p) = Eq t2 (Bin So R t1 p)  -- t1 + t2 = p -> t2 = p - t1
right (Eq (Mul t1 t2) p) = Eq t2 (Bin Do R t1 p)  -- t1 * t2 = p -> t2 = p / t1
right (Eq (Div t1 t2) p) = Eq t2 (Bin D R t1 p)   -- t1 / t2 = p -> t2 = t1 / p
right (Eq (Sub t1 t2) p) = Eq t2 (Bin S R t1 p)   -- t1 - t2 = p -> t2 = t1 - p
right (Eq (Neg t) p)     = Eq t (N p)

In both left and right, Add and Mul become subtraction and dividing, but in right in order for the the zipping-up operation to be the inverse, subtraction and division are represented using the flipped So and Do markers.

Equations are zipped-up by up, which unrolls one step of the path and reforms the term on the left-hand side from that on the right. This is the inverse of left and right:

up (Eq t1 (Top a))        = Eq t1 (Top a)
up (Eq t1 (Bin A L t2 p)) = Eq (Sub t1 t2) p -- t1 = t2 + p -> t1 - t2 = p
up (Eq t1 (Bin M L t2 p)) = Eq (Div t1 t2) p -- t1 = t2 * p -> t1 / t2 = p
up (Eq t1 (Bin D L t2 p)) = Eq (Mul t1 t2) p -- t1 = p / t2 -> t1 * t2 = p
up (Eq t1 (Bin S L t2 p)) = Eq (Add t1 t2) p -- t1 = p - t2 -> t1 + t2 = p

up (Eq t1 (Bin So R t2 p)) = Eq (Add t2 t1) p -- t1 = p - t2 -> t2 + t1 = p
up (Eq t1 (Bin Do R t2 p)) = Eq (Mul t2 t1) p -- t1 = p / t2 -> t2 * t1 = p
up (Eq t1 (Bin D R t2 p))  = Eq (Div t2 t1) p -- t1 = t2 / p -> t2 / t1 = p
up (Eq t1 (Bin S R t2 p))  = Eq (Sub t2 t1) p -- t1 = t2 - p -> t2 - t1 = p

up (Eq t1 (N p))           = Eq (Neg t1) p    -- t1 = -p     -> -t1 = p

And that’s it! Here is an example of its use from GHCi.

foo = Eq (Sub (Mul (Add (Var "x") (Var "y")) (Add (Var "x") 
           (Const 1))) (Const 1)) (Top (Left 0))

*Main> foo
((x + y) * (x + 1)) - 1 = 0

*Main> left $ foo
(x + y) * (x + 1) = 0 + 1

*Main> right . left $ foo
x + 1 = (0 + 1) / (x + y)

*Main> left . right . left $ foo
x = ((0 + 1) / (x + y)) - 1

*Main> up . left . right . left $ foo
x + 1 = (0 + 1) / (x + y)

*Main> up . up . left . right . left $ foo
(x + y) * (x + 1) = 0 + 1

*Main> up . up . up . left . right . left $ foo
((x + y) * (x + 1)) - 1 = 0

It is straightforward to prove that: up . left $ x = x (when left x is not equal to x) and up . right $ x = x(when right x is not equal to x).

Note, I am simply rebalancing the syntax of equations: this technique does not help if you have multiple uses of a variable and you want to solve the question for a particular variable, e.g. y = x + 1/(3x), or quadratics.

Here’s a concluding thought. The navigation operations left, right, and up essentially apply the inverse of the operation in focus to each side of the equation. We could therefore reformulate the navigation operations in terms of any group: given a term L ⊕ R under focus where is the binary operation of a group with inverse operation -1, then navigating left applies ⊕ R-1 to both sides and navigating right applies ⊕ L-1. However, in this blog post there is a slight difference: navigating applies the inverse to both sides and then reduces the term of the left-hand side using the group axioms X ⊕ X-1 = I (where I is the identity element of the group) and X ⊕ I = X such that the term does not grow larger and larger with inverses.

I wonder if there are other applications, which have a group structure (or number of interacting groups), for which the above zipper approach would be useful?

Automatic SIMD Vectorization for Haskell and ICFP 2013

I had a great time at ICFP 2013 this year where I presented my paper “Automatic SIMD Vectorization for Haskell”, which was joint work with Leaf Petersen and Neal Glew of Intel Labs. The full paper and slides are available online. Our paper details the vectorization process in the Intel Labs Haskell Research Compiler (HRC) which gets decent speedups on numerical code (between 2-7x on 256-bit vector registers). It was nice to be able to talk about HRC and share the results. Paul (Hai) Liu also gave a talk at the Haskell Symposium which has more details about HRC than the vectorization paper (see the paper here with Neal Glew, Leaf Petersen, and Todd Anderson). Hopefully there will be a public release of HRC in future.  

Still more to do

It’s been exciting to see the performance gains in compiled functional code over the last few years, and its encouraging to see that there is still much more we can do and explore. HRC outperforms GHC on roughly 50% of the benchmarks, showing some interesting trade-offs going on in the two compilers. HRC is particularly good at compiling high-throughput numerical code, thanks to various strictness/unboxing optimisations (and the vectorizer), but there is still more to be done.

Don’t throw away information about your programs

One thing I emphasized in my talk was the importance of keeping, not throwing away, the information encoded in our programs as we progress through the compiler stack. In the HRC vectorizer project, Haskell’s Data.Vector library was modified to distinguish between mutable array operations and “initializing writes”, a property which then gets encoded directly in HRC’s intermediate representation. This makes vectorization discovery much easier. We aim to preserve as much effect information around as possible in the IR from the original Haskell source.

This connected nicely with something Ben Lippmeier emphasised in his Haskell Symposium paper this year (“Data Flow Fusion with Series Expressions in Haskell“, joint with Manuel Chakravarty, Gabriele Keller and Amos Robinson). They provide a combinator library for first-order non-recursive dataflow computations which is guaranteed to be optimised using flow fusion (outperforming current stream fusion techniques). The important point Ben made is that, if your program fits the pattern, this optimisation is guaranteed. As well as being good for the compiler, this provides an obvious cost model for the user (no more games trying to coax the compiler into optimising in a particular way).

This is something that I have explored in the Ypnos array language, where the syntax is restricted to give (fairly strong) language invariants that guarantee parallelism and various optimisations, without undecidable analyses. The idea is to make static as much effect and coeffect (context dependence) information as possible. In Ypnos, this was so successful that I was able to encode the Ypnos’ language invariant of no out-of-bounds array access directly in Haskell’s type system (shown in the DSL’11 paper; this concept was also discussed briefly in my short language design essay).

This is a big selling point for DSLs in general: restrict a language such that various program properties are statically decidable, facilitating verification and optimisation.

Ypnos has actually had some more development in the past year, so if things progress further, there may be some new results to report on. I hope to be posting again soon about more research, including the ongoing work with Tomas Petricek on coeffect systems, and various other things I have been playing with. – D

The Four Rs of Programming Language Design (revisited)

Following my blog post last year about the “four Rs of programming language design” I wrote a short essay expanding upon the idea which has now been included in the online post-proceedings of the ACM Onward ’11 essay track (part of the SPLASH conference).

The essay was a lot of fun to write (I somehow end up referencing books from the 19th century, a sci-fi novel, 1984, and The Matrix!) and is a kind of mission statement (at least for myself) for language design; it is available on my research page here. I hope that it provides some food-for-thought for others interested in, or working in, language design.

Subcategories & “Exofunctors” in Haskell

In my previous post I discussed the new constraint kinds extension to GHC, which provides a way to get type-indexed constraint families in GHC/Haskell. The extension provides some very useful expressivity. In this post I’m going to explain a possible use of the extension.

In Haskell the Functor class is misleading named as it actually captures the notion of an endofunctor, not functors in general. This post shows a use of constraint kinds to define a type class of exofunctors; that is, functors that are not necessarily endofunctors. I will explain what all of this means.

This example is just one from a draft note (edit July 2012: draft note subsumed by my TFP 2012 submission) explaining the use of constraint families, via the constraint kinds extension, for describing abstract structures from category theory that are parameterised by subcategories, including non-endofunctors, relative monads, and relative comonads.

I will try to concisely describe any relevant concepts from category theory, through the lens of functional programming, although I’ll elide some details.

The Hask category

The starting point of the idea is that programs in Haskell can be understood as providing definitions within some category, which we will call Hask. Categories comprise a collection of objects and a collection of morphisms which are mappings between objects. Categories come equipped with identity morphisms for every object and an associative composition operation for morphisms (see Wikipedia for a more complete, formal definition). For Hask, the objects are Haskell types, morphisms are functions in Haskell, identity morphisms are provided by the identity function, and composition is the usual function composition operation. For the purpose of this discussion we are not really concerned about the exact properties of Hask, just that Haskell acts as a kind of internal language for category theory, within some arbitrary category Hask (Dan Piponi provides some discussion on this topic).

Subcategories

Given some category C, a subcategory of C comprises a subcollection of the objects of C and a subcollection of the morphisms of C which map only between objects in the subcollection of this subcategory.

We can define for Hask a singleton subcategory for each type, which has just that one type as an object and functions from that type to itself as morphisms e.g. the Int-subcategory of Hask has one object, the Int type, and has functions of type Int → Int as morphisms. If this subcategory has all the morphisms Int → Int it is called a full subcategory. Is there a way to describe “larger” subcategories with more than just one object?

Via universal quantification we could define the trivial (“non-proper”) subcategory of Hask with objects of type a (implicitly universally quantified) and morphisms a -> b, which is just Hask again. Is there a way to describe “smaller” subcategories with fewer than all the objects, but more than one object? Yes. For this we use type classes.

Subcategories as type classes

The instances of a single parameter type class can be interpreted as describing the members of a set of types (or a relation on types for multi-parameter type classes). In a type signature, a universally quantified type variable constrained by a type class constraint represents a collection of types that are members of the class. E.g. for the Eq class, the following type signature describes a collection of types for which there are instances of Eq:

      Eq a => a

The members of Eq are a subcollection of the objects of Hask. Similarly, the type:

      (Eq a, Eq b) => (a -> b)

represents a subcollection of the morphisms of Hask mapping between objects in the subcollection of objects which are members of Eq. Thus, the Eq class defines an Eq-subcategory of Hask with the above subcollections of objects and morphisms.

Type classes can thus be interpreted as describing subcategories in Haskell. In a type signature, a type class constraint on a type variable thus specifies the subcategory which the type variable ranges over the objects of. We will go on to use the constraint kinds extension to define constraint-kinded type families, allowing structures from category theory to be parameterised by subcategories, encoded as type class constraints. We will use functors as the example in this post (more examples here).

Functors in Haskell

In category theory, a functor provides a mapping between categories e.g. F : C → D, mapping the objects and morphisms of C to objects and morphisms of D. Functors preserves identities and composition between the source and target category (see Wikipedia for more). An endofunctor is a functor where C and D are the same category.

The type constructor of a parametric data type in Haskell provides an object mapping from Hask to Hask e.g. given a data type data F a = ... the type constructor F maps objects (types) of Hask to other objects in Hask. A functor in Haskell is defined by a parametric data type, providing an object mapping, and an instance of the well-known Functor class for that data type:

class Functor f where
   fmap :: (a -> b) -> f a -> f b

which provides a mapping on morphisms, called fmap. There are many examples of functors in Haskell, for examples lists, where the fmap operation is the usual map operation, or the Maybe type. However, not all parametric data types are functors.

It is well-known that the Set data type in Haskell cannot be made an instance of the Functor class. The Data.Set library provides a map operation of type:

 
Set.map :: (Ord a, Ord b) => (a -> b) -> Set a -> Set b

The Ord constraint on the element types is due to the implementation of Set using balanced binary trees, thus elements must be comparable. Whilst the data type is declared polymorphic, the constructors and transformers of Set allow only elements of a type that is an instance of Ord.

Using Set.map to define an instance of the Functor class for Set causes a type error:

instance Functor Set where
   fmap = Data.Set.map

...

foo.lhs:4:14:
    No instances for (Ord b, Ord a)
      arising from a use of `Data.Set.map'
    In the expression: Data.Set.map
    In an equation for `fmap': fmap = Data.Set.map
    In the instance declaration for `Functor Set'

The type error occurs as the signature for fmap has no constraints, or the empty (always true) constraint, whereas Set.map has Ord constraints. A mismatch occurs and a type error is produced.

The type error is however well justified from a mathematical perspective.

Haskell functors are not functors, but endofunctors

First of all, the name Functor is a misnomer; the class actually describes endofunctors, that is functors which have the same category for their source and target. If we understand type class constraints as specifying a subcategory, then the lack of constraints on fmap means that Functor describes endofunctors HaskHask.

The Set data type is not an endofunctor; it is a functor which maps from the Ord-subcategory of Hask to Hask. Thus Set :: OrdHask. The class constraints on the element types in Set.map declare the subcategory of Set functor to which the morphisms belong.

Type class of exofunctors

Can we define a type class which captures functors that are not necessarily endofunctors, but may have distinct source and target categories? Yes, using an associated type family of kind Constraint.

The following ExoFunctor type class describes a functor from a subcategory of Hask to Hask:

{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE TypeFamilies #-}

class ExoFunctor f where
   type SubCat f x :: Constraint
   fmap :: (SubCat f a, SubCat f b) => (a -> b) -> f a -> f b

The SubCat family defines the source subcategory for the functor, which depends on f. The target subcategory is just Hask, since f a and f b do not have any constraints.

We can now define the following instance for Set:

instance ExoFunctor Set where
   type SubCat Set x = Ord x
   fmap = Set.map

Endofunctors can also be made an instance of ExoFunctor using the empty constraint e.g.:

instance ExoFunctor [] where
    type SubCat [] a = ()
    fmap = map

(Aside: one might be wondering whether we should also have a way to restrict the target subcategory to something other than Hask here. By covariance we can always “cast” a functor C → D, where D is a subcategory of some other category E, to C → E without any problems. Thus, there is nothing to be gained from restricting the target to a subcategory, as it can always be reinterpreted as Hask.)

Conclusion (implementational restrictions = subcategories)

Subcategory constraints are needed when a data type is restricted in its polymorphism by its operations, usually because of some hidden implementational details that have permeated to the surface. These implementational details have until now been painful for Haskell programmers, and have threatened abstractions such as functors, monads, and comonads. Categorically, these implementational restrictions can be formulated succinctly with subcategories, for which there are corresponding structures of non-endofunctors, relative monads, and relative comonads. Until now there has been no succinct way to describe such structures in Haskell.

Using constraint kinds we can define associated type families, of kind Constraint, which allow abstract categorical structures, described via their operations as a type class, to be parameterised by subcategories on a per-instance basis. We can thus define a class of exofunctors, i.e. functors that are not necessarily endofunctors, which we showed here. The other related structures which are difficult to describe in Haskell without constraint kinds: relative monads and relative comonads, are discussed further in a draft note (edit July 2012: draft note subsumed by my TFP 2012 submission). The note includes examples of a Set monad and an unboxed array comonad, both of which expose their implementational restrictions as type class constraints which can be described as subcategory constraints.
Any feedback on this post or the draft note is greatly appreciated. Thanks.