A short exploration of GHC’s instance resolution hiding mistakes from the type checker.

I was recently working on some Haskell code (for research, with Jack Hughes) and happened to be using a monoid (via the Monoid type class) and I was rushing. I accidentally wrote x `mempty` y instead of x `mappend` y. The code with mempty type checked and compiled, but I quickly noticed some tests giving unexpected results. After a pause, I checked the recent diff and noticed this mistake, but I had to think for a moment about why this mistake was not leading to a type error. I thought this was an interesting little example of how type class instance resolution can sometimes trip you up in Haskell, and how to uncover what is going on. This also points to a need for GHC to explain its instance resolution, something others have thought about; I will briefly mention some links at the end.

The nub of the problem

In isolation, my mistake was essentially this:

whoops :: Monoid d => d -> d -> d
whoops x y = mempty x y  -- Bug here. Should be "mappend x y"

So, given two parameters of type d for which there is a monoid structure on d, we use both parameters as arguments to mempty.

Recall the Monoid type class is essentially:

class Monoid d where
  mempty :: d
  mappend :: d -> d -> d

(note that ​​Monoid also has a derived operation mconcat and is now decomposed into Semigroup and Monoid but I elide that detail here, see https://hackage.haskell.org/package/base-4.14.0.0/docs/Prelude.html#t:Monoid).

We might naively think that whoops would therefore not type check since we do not know that d is a function type. However, whoops is well-typed and evaluating
whoops [1,2,3] [4,5,6] returns []. If the code had been as I intended, (using mappend here instead of mempty) then we would expect [1,2,3,4,5,6] according to the usual monoid on lists.

The reason this is not a type error is because of GHC’s instance resolution and the following provided instance of `Monoid`:

instance Monoid b => Monoid (a -> b) where
  mempty = \_ -> mempty
  mappend f g = \x -> f x `mappend` g x

That is, functions are monoids if their domain is a monoid with mempty as the constant function returning the mempty element of b and mappend as the pointwise lifting of a monoid to a function space.

In this case, we can dig into what is happening by compiling1 with --ddump-ds-preopt and looking at GHC’s desugared output before optimisation, where all the type class instances have been resolved. I’ve cleaned up the output a little (mostly renaming):

whoops :: forall d. Monoid d => d -> d -> d
whoops = \ (@ d) ($dMonoid :: Monoid d) ->
  let
    $dMonoid_f :: Monoid (d -> d)
    $dMonoid_f = GHC.Base.$fMonoid-> @ d @ d $dMonoid }

    $dMonoid_ff :: Monoid (d -> d -> d)
    $dMonoid_ff = GHC.Base.$fMonoid-> @ (d -> d) @ d $dMonoid_f }
  in
    \ (x :: d) (y :: d) -> mempty @ (d -> d -> d) $dMonoid_ff x y

The second line shows whoops has a type parameter (written @ d) and the incoming dictionary $dMonoid representing the Monoid d type class instance (type classes are implemented as data types called dictionaries, and I use the names $dMonoidX for these here).

Via the explicit type application (of the form @ t for a type term t) we can see in the last line that mempty is being resolved at the type d -> d -> d with the monoid instance Monoid (d -> d -> d) given here by the $dMonoid_ff construction just above. This is in turn derived from the Monoid (d -> d) given by the dictionary construction $dMonoid_f just above that. Thus we have gone twice through the lifting of a monoid to a function space, and so our use of mempty here is:

mempty @ (d -> d -> d) $dMonoid_ff = \_ -> (\_ -> mempty @ $dMonoid)

thus

mempty @ (d -> d -> d) $dMonoid_ff x y = mempty @ d $dMonoid

That’s why the program type checks and we see the mempty element of the original intended monoid on d when applying whoops to some arguments and evaluating.

Epilogue

I luckily spotted my mistake quite quickly, but this kind of bug can be a confounding experience for beginners. There has been some discussion about extending GHCi with a feature allowing users to ask GHC to explain its instance resolution. Michael Sloan has a nice write up discussing the idea and there is a GHC ticket proposing by Icelandjack something similar which seems like it would work well in this context where you want to ask what the instance resolution was for a particular expression. There are many much more confusing situations possible that get hidden by the implicit nature of instance resolution, so I think this would be a very useful feature, for beginner and expert Haskell programmers alike. And it certainly would have explained this particular error quickly to me without me having to scribble on paper and then check --ddump-ds-preopt to confirm my suspicion.

Additional: I should also point out that this kind of situation could be avoided if there were ways to scope, import, and even name type class instances. The monoid instance Monoid b => Monoid (a -> b) is very useful, but having it in scope by default as part of base was really the main issue here.


1 To compile, put this in a file with a main stub, e.g.

main :: IO ()
main = return ()
Advertisement

Considering the order of results when computing Cartesian product [short]

Sometimes in programming you need to do a pairwise comparison of some elements coming from two collections, for example, checking possible collisions between particles (which may be embedded inside a quadtree representation for efficiency). A handy operation is then the Cartesian product of the two sets of elements, to get the set of all pairs, which can then be traversed.

Whenever I need a Cartesian product of two lists in Haskell, I whip out the list monad to generate the Cartesian product:

cartesianProduct :: [a] -> [b] -> [(a, b)]
cartesianProduct as bs = as >>= (\a -> bs >>= (\b -> [(a, b)]))

Thus for every element a in as we get every element b in bs and form the singleton list of the pair (a, b), all of which get concatenated to produce the result. For example:

*Main> cartesianProduct [1..4] [1..4]
[(1,1),(1,2),(1,3),(1,4),(2,1),(2,2),(2,3),(2,4),(3,1),(3,2),(3,3),(3,4),(4,1),(4,2),(4,3),(4,4)]

This traverses the Cartesian space in row order. That is, if we imagine the square grid of possibilities here (where the element at the ith row and jth column corresponds to element (i, j)), then cartesianProduct generates the pairs in the following order:

cp-rowOrder

In mathematics, the Cartesian product is defined on sets, so the order of these pairs is irrelevant. Indeed, if we want to examine all pairs, then it may not matter in what order. However, if we learn something as we look at the pairs (i.e., accumulating some state), then the order can be important.

In some recent research, I was building an automated algebra tool to find axioms for some algebraic structure, based on an interpretation. This involved generating all pairs of syntax trees t and t' of terms over the algebraic structure, up to a particular depth, and evaluating whether t = t'. I also had some automated proof search machinery to make sure that this equality wasn’t already derivable from the previously generated axioms. I could have done this derivability check as a filtering afterwards, but I was exploring a very large space, and did not expect to even be able to generate all the possibilities. I just wanted to let the thing run for a few hours and see how far it got. Therefore, I needed Cartesian product to get all pairs, but the order in which I generated the pairs became important for the effectiveness of my approach.  The above ordering (row major order) was not so useful as I was unlikely to find interesting axioms quickly by traversing the span of a (very long) row; I needed to unpack the space in a more balanced way.

My first attempt at a more balanced Cartesian product was the following:

cartesianProductBalanced :: [a] -> [b] -> ([(a, b)])
cartesianProductBalanced as bs =
     concatMap (zip as) (tails bs)
  ++ concatMap (flip zip bs) (tail (tails as))

tails gives the list of successively applying tail, e.g., tails [1..4] = [[1,2,3,4],[2,3,4],[3,4],[4],[]]

This definition for cartesianProductBalanced essentially traverses the diagonal of the space and then the lower triangle of the matrix, progressing away from the diagonal to the bottom-left, then traversing the upper triangle of the matrix (the last line of code), progressing away from the diagonal to the top-right corner. The ordering for a 4×4 space is then:

*Main> cartesianProductBalanced [1..4] [1..4]
[(1,1),(2,2),(3,3),(4,4),(1,2),(2,3),(3,4),(1,3),(2,4),(1,4),(2,1),(3,2),(4,3),(3,1),(4,2),(4,1)]

cp-low-then-upper
This was more balanced, giving me a better view of the space, but does not scale well: traversing the elements of this Cartesian product linearly means first traversing all the way down the diagonal, which could be very long! So, we’ve ended up with a similar problem to the row-major traversal.

Instead, I finally settled on a “tiling” Cartesian product:

cartesianProductTiling :: [a] -> [b] -> [(a, b)]
cartesianProductTiling [] _ = []
cartesianProductTiling _ [] = []
cartesianProductTiling [a] [b] = [(a, b)]
cartesianProductTiling as bs =
       cartesianProductTiling as1 bs1
    ++ cartesianProductTiling as2 bs1
    ++ cartesianProductTiling as1 bs2
    ++ cartesianProductTiling as2 bs2
  where
    (as1, as2) = splitAt ((length as) `div` 2) as
    (bs1, bs2) = splitAt ((length bs) `div` 2) bs

This splits the space into four quadrants and recursively applies itself to the upper-left, then lower-left, then upper-right, then lower-right, e.g.,

*Main> cartesianProductTiling [1..4] [1..4]
[(1,1),(2,1),(1,2),(2,2),(3,1),(4,1),(3,2),(4,2),(1,3),(2,3),(1,4),(2,4),(3,3),(4,3),(3,4),(4,4)]

cp-tiling

Thus, we explore the upper-left portion first, without having to traverse down a full row, column, or diagonal. This turned out to be much better for my purposes of searching the space of possible axioms for an algebraic structure.

Note that this visits the elements in Z-order (also known as the Lebesgue space-filling curve) [Actually, it is a reflected Z-order curve, but that doesn’t matter here].

The only downside to this tiling approach is that it does not work for infinite spaces (as it calculates the length of as and bs), so we cannot exploit Haskell’s laziness here to help us in the infinite case. I’ll leave it as an exercise for the reader to define a tiling version that works for infinite lists.

Of course, there are many other possibilities depending on your application domain!


Later edit:  In the case of the example I mentioned, I actually do not need the full Cartesian product since the order of pairs was not relevant to me (equality is symmetric) and neither do I need the pairs of identical elements (equality is reflexive). So for my program, computing just the lower triangle of Cartesian product square is much more efficient, i.e,:

cartesianProductLowerTriangle :: [a] -> [b] -> [(a, b)]
cartesianProductLowerTriangle as bs = concatMap (zip as) (tail (tails bs))

However, this does not have the nice property of tiling product where we visit the elements in a more balanced fashion. I’ll leave that one for another day (I got what I needed from my program in the end anyway).

At the end of 2018

As 2018 draws to a close, I wanted to share a couple of personal thoughts from the year: one related to success/failure and the other on resolutions, which I’ll cover first.

Following tradition, I often make several informal New Years’ resolutions, usually involving health/fitness goals. My level of success in keeping these resolutions tends to also be traditional: stick with it for 2-3 weeks then forget about it. Last year I realised my problem is that I usually make too many resolutions, and I make resolutions that are not realistic given all my other constraints. Therefore, at the start of 2018, I made just one resolution: keep a daily record of my activities.

The format I chose for this daily record was a Google Drive spreadsheet (for easy multi-device access), with columns for various things I want to be aware of and try to improve in some way: bed time, wake time, diet related items such as fruit and vegetable intake, alcohol, time spent reading (for work), exercise, quiet time in the mornings, and so on. I’m proud to say that I kept my resolution every single day. The time commitment was low (1-2 minutes before bed), and it quickly had a positive impact on the goals of previous years.

My spreadsheet included fixed rows providing the mean so far for most columns. This was a useful guide throughout the year. I had some vague goals and the data allowed me to perform “course corrections” as the year progressed. Towards the end of the year, it got harder to affect the means, but this almost became a challenge: e.g., “can I shave just 2 minutes off my average bed time before the end of the year?”. A quick bit of programming provided me with some further data analysis tools for some interesting insight into daily/weekly patterns throughout the year: for some reason I went to bed latest on Wednesdays on average (probably cramming lecture prep)!

My resolution for 2019 is to continue, but now I have some more clear goals in mind. I also have a new set of columns. Some information wasn’t that useful, and there are other things I want to improve that are easily measured and tracked. These items can now be easily integrated into the whole process. Recently, I also realised that I could include a quick micro-journal (a couple of summary sentences about the day), which I would like to continue in 2019. As someone who has struggled to keep up daily routines and disciplines, I have found this all immensely helpful. I did not see an improvement in every column, but it did improve some markedly. I’d recommend this technique highly to anyone.

Lastly, a thought on failure. As the year draws to its end, I’ve been thinking about my successes and failures in 2018. It’s always easy to focus on the failures, especially in academia where there are often many. This year has certainly had many difficulties and disappointments, along with some successes. The thing that is so pernicious about failure is that it is hungry and it will devour as much as you let it devour. Instead, we have to try to integrate the failure into our selves positively, in a way that strengthens, in a way that teaches and informs. Otherwise, we will just let failure swallow up the rest of our success, energy, drive, creativity, joy, and everything else. So at the end of the year, I’m trying to dwell less on the existence of the failures, but more on what they’ve taught me for the future and what might emerge out of the ashes. That’s the only way forward, otherwise failure wins and eats everything else up.

So, happy new year! See many of you in 2019. I hope you have a great start to the year. I’m looking forward to travelling to Lisbon shortly for POPL 2019 where I will be giving a tutorial on the Granule project joint with Harley Eades and my student Vilem Liepelt.

Exam advice

Part of my job is to give advice to my undergraduate students about exam technique and preparing for exams. I was always quite nervous about exams “back in the day”, and I put a lot of effort into revising and planning. In retrospect, I think I enjoyed these times: I have many happy memories of long days and evenings studying with my friends (whose presence and encouragement made it a lot more palatable).

This year, I decided to serialise my advice and put it online in case it can be of any help or inspiration to others either taking exams, or giving exam advice. I’m sure I can think of more advice, so I made it a GitHub “gist” so I can revise it whenever more advice pops into my head.

(by Dominic Orchard, School of Computing, University of Kent, 2018)

Disclaimer: this advice is non exhaustive, and every piece of advice might not suit you. At least, I hope it helps you to think about how you can develop your own strategy for exam preparation.

My top tip is to see exams as formative rather than purely about assessment. This is an opportunity to force yourself to learn a topic deeply, which will then benefit you in the future, rather than a box ticking exercise to get a fancy piece of paper at the end. This will orient your attitude to maximising your potential.

Exams are hard. They occupy a short space of time in your whole life but they can have a big impact on the rest of it, so make the most of them. Study well and study wisely.

Exam preparation

  • Make a schedule ahead of time, planning backwards from your exams; revisit and edit this plan often (e.g., once or twice a week).

  • For every module, make a list of things you don’t understand and a list of things you aren’t fully comfortable with.
    As you go along you will likely need to add to these lists, but then you will know exactly what
    to focus on. You also get the satisfaction of moving items from the first list to the second, and eventually crossing
    them out. The aim is to understand what you don’t understand: to know what you don’t know.
    Then you can make a plan for fixing that.

  • Find a system that works for you in terms of study length, number and duration of breaks, where to study,
    who to study with. Mix different forms of revision: reading, writing, talking, drawing, coding, listening, watching.
    If lecture videos are available, this can be a very useful resource.

  • Group study can be great: you can teach each other, and having to teach something helps you to learn it better.
    But sometimes you need time alone as well to consolidate. You might plan revision sessions with friends where you each take it in turns to teach a particular topic.

  • Past papers are worth doing, but don’t obsess over them: it’s hard to draw conclusions about future exam questions from them. However, they do usually provide a good guide on the format of the exam, which you can use to calculate how many minutes you should be spending per mark (for example).

  • Get your hands on as many relevant example problems as possible (e.g., from the lectures, class work, textbooks) and work through them all, noting down any tough spots you find particularly difficult.

  • Get lots of sleep, exercise, and eat well. Consider dropping/reducing caffeine and alcohol.

In the exam

  • Read through the paper first and briefly plan your time. Consider how many minutes on average you should spend per question or per mark.

  • Some exams will have a choice between questions: consider carefully which one you think you will perform best at (this might involve sketching some ideas of the answers in a rough book).

  • Read and re-read questions carefully.

  • If something is unclear to you about the question, state any assumptions you make in your answer.

  • Consider the mark awarded for a question and whether your answer provides enough detail to match the marks (e.g., an answer for a 5 marks question will need a lot of more justification and detail than a 1 mark question).

  • Where possible, check and verify your work. Think about how you can do this in the case of formal/mathematical/technical questions.

  • If you need to use the toilet and its becoming distracting, ask the invigilator if you can go. Its better to spend 3 minutes on a bathroom break than to loose concentration due to being distracted.

  • Use all the time you have. If you find you have extra time, go back and check and recheck your answers. Consider adding more detail to some answers. Make sure you have adhered to the exam requirements such as numbering and location of your answers.

view raw

advice.md

hosted with ❤ by GitHub

Scrap Your Reprinter

Back in 2013, Andrew Rice and I were doing some initial groundwork on how to build tools to help scientists write better code (e.g., with the help of refactoring tools and verification tools). We talked to a lot of scientists who wrote Fortran almost exclusively, so we started creating infrastructure for building tools to work on Fortran. This was the kernel of the CamFort project for which we got an EPSRC grant in 2015 (which is ongoing). The CamFort tool now has a couple of fairly well developed specification/verification features, and a few refactoring features. Early on, I started building everything in Haskell using the brilliant uniplate library, based on the Scrap Your Boilerplate [1] work. This helped us to get the tool off the ground quickly by utilising the power of datatype generic programming. Fairly quickly we hit upon an interesting problem with building refactoring tools: how do you output source code for a refactored AST whilst preserving all the original comments and white space? It is not enough just to pretty print the AST, unless your AST contains all the comments and layout information. Building a parser to capture all this information is extremely hard, and we use a parser generator which limits flexibility (but is really useful for a large grammar). Another approach is to output patch/edit information for the original source code, calculated from the AST.

In the end, I came up with a datatype generic algorithm which I call the reprinter. The reprinter takes the original source code and an updated AST (which contains location information) and maps them into a new piece of source code. Here is an illustration which I’ll briefly explain:

reprinter

Some source text (arithmetic code in prefix notation here) is parsed into an AST. The AST contains the “spans” of each syntactic fragment: the start position and end position in the original source code (for simplicity in this illustration, just the column number is represented). Some transformation/refactoring is applied next. In this case, the transformation rewrites redundant additions of 0, which happens in the node coming from source locations 10 to 16. The refactored node is marked in red. The reprinting then runs, stitching together the original source code with the updated source tree. A pretty printer is used to generate code for any new nodes, but all the original source text for the other nodes is preserved. The cool thing about this algorithm is that it is datatype generic: it works for any datatype, with some modest side conditions about storing source spans. The implementation uses the Scrap Your Zipper [2] library to do a context-dependent generic traversal of a datatype. In essence, the algorithm is similar to what one might do if you were to spit out edit information from an AST, then apply this to a piece of source text. But, the algorithm does this generically, and in a single simultaneous pass of the AST and the input source text.

I’ve always thought it was a cute and useful algorithm, which combined some cool techniques from functional programming.  As with all the “Scrap Your X” libraries it saves huge amounts of time and messing around, especially when your AST representation keeps changing (which it did/does for us). The algorithm is really useful anywhere you need to update human-written source code in a layout-preserving way; for example, IDEs and refactoring tools but also in interactive theorem provers and program synthesis tools, where you need to synthesise source text into some existing human-written code. This is one of the ways it is used in CamFort, where specifications are synthesised from code analysis data and then inserted as comments into user code.

This summer, I was fortunate enough to have the resources to hire several interns. One of the interns, Harry Clarke, (amongst other things) worked with me to tidy up the code for the reprinter, add some better interfaces, make it usable as a library for others, and write it all up. He presented the work at IFL 2017 and the pre-proceedings version of the paper is available. We are working on a post-proceedings version for December, so any comments gratefully appreciated.


[1] Lämmel, Ralf, and Simon Peyton Jones. Scrap your boilerplate: a practical design pattern for generic programming. Vol. 38. No. 3. ACM, 2003.

[2] Adams, Michael D. “Scrap your zippers: a generic zipper for heterogeneous types.” Proceedings of the 6th ACM SIGPLAN workshop on Generic programming. ACM, 2010.

FSCD day 4 – rough notes

(Blog posts for Day 1, Day 2, Day 3, Day 4 (half day))

I decided to take electronic notes at ICFP and FSCD (colocated) this year, and following the example of various people who put their conference notes online (which I’ve found useful), I thought I would attempt the same. However, there is a big caveat: my notes are going to be partial and may be incorrect; my apologies to the speakers for any mistakes.


(FSCD Keynote #3) Type systems for the relational verification of higher order programs, Marco Gaboardi

Relational properties R(X_1, X_2) \Rightarrow S(P(X_1), P(X_2)). For example, take R and S to be notions of program equivalence (equivalent inputs produce equivalent outputs). Another relation might be in information-flow security where relations R, S mean “these two programs are low-level equivalent” (low-security).
Another is differential privacy where R means two programs differ in one individual data point and  S(Y_!, Y_@) = Pr[Y_1] \leq e^\epsilon Pr[Y_2].

In relational cost analysis, we want to compute the difference in cost (the relative cost) between the two programs (may depend on the input and the underlying relation), e.g., cost(P)-cost(Q) \leq f(X_1, X_2, R, S) (giving an upper bound computed in terms of the inputs and underlying relations). This is useful for guaranteeing compile optimisations (not worse, or definitely better), ruling out side-channel attacks (i.e., the \textsf{cost}(e[v_1/x]) - \textsf{cost}(e[v_2/x]) = 0, such that different inputs do not yield different costs thus information about the inputs is not leaked).

Motivating example 1: find in a list of lists (2D data structure). We want to prove that one implementation of this algorithm is faster than the other (I’ve omitted the code that Marco showed us, which was expressed via a common higher-order function with two different parameter functions to give the two implementations).
Motivating example 2: prove a precise upper bound on the relative cost of insertion sort between two lists (different by n in length).

[Clarkson, Schneider ’08] – formalises the idea of ‘hyperproperties’. Properties are sets of traces, hyperproperties are sets of sets of traces. Relation verification is a set of pair of traces (a 2-property). They show how to reduce verification of hyperproperties into the verification of properties. Using these kind of results requires encodings, they do not reflect the relational nature of the property, and they do not reflect the connection of the relational reasoning to the programs syntax (see the talk from ICFP day 2, on A Relational Logic for Higher-order Programs). Lots of previous work (going back to Abadi’s System R).

Relational typing: \Gamma \vdash t_1 \approx t_2 : \tau. Talk about relational properties of the input and relational properties of the output. Usually, if we interpret to relations then we want to prove soundness as: (\bar{v_1}, \bar{v_2}) \in [\Gamma] \Rightarrow ([t_1[\bar{v_1}/\Gamma]], t_2[\bar{v_2}/Gamma]]) \in [\tau].\
How can we reason about the cost? Two approaches from here: (1) lightweight typing  (bottom up) extending type system with the necessary constructors, (2) heavyweight typing (top down) use a powerful type system and encode the needed constructions. (In the future, we want to find a balance between the two approaches).

Approach (1) (lightweight) called Relcost a relational refinement type-and-effect system. The idea is to take advantage of structural similarities between programs and inputs as much as possible. There will be two sorts of type (unary and relational). Typing judgments \Omega \vdash^U_L t : A where U is the upper bound and L is the lower bound execution cost, which are thought of as a unary effect. A relational judgement \Gamma \vdash t_1 \ominus t_2 \preceq D : \tau where D is the upper bound on the relative cost of the two programs. The types have an annotation function type (like in normal effect systems) \sigma \xrightarrow{U, L} \tau and data types for integers and lists have indices to express dependencies.

e.g. n : \textsf{int}, f : \textsf{int} \xrightarrow{k, k} \textsf{int} \vdash^{k+1}_1 \textsf{if} \, n \leq 0 \textsf{then} f \; n \textsf{else} \; 1 : \textsf{int}.
But if we know that n is known to be 5 this can be refined (with the type n : \textsf{int}[5] in the indices, which will change the bound to $\vdash^{k+1}_{k+1}$.

The relational types are a little different. There is still a latent effect on function types, but is now a single relative cost \sigma \xrightarrow{D} \tau.
Judgements are like the following, which says equal integers have zero cost: \Gamma \vdash n \ominus n \preceq 0 : \textsf{int}_r.

The unary cost and relational cost are connected as follows:
\dfrac{|\Gamma| \vdash^U_{-} t_1 : A \qquad |\Gamma| \vdash^{-}_L t_2 : A}{\Gamma \vdash t_1 \ominus t_2 \preceq U - L : \textsf{U} A}
Thus, we can drop into the unary cost to compute a worst and best run time individual, and then combine these into a relative cost.
There is a box modality which captures terms which have the same cost and allow a reseting therefore of the differential cost, e.g., if we have identical terms then:
\dfrac{\Gamma \vdash t \ominus t \preceq D : \tau \qquad \forall x . \Gamma(x) \sqsubseteq \Box \Gamma(x)}{\Gamma \vdash t \ominus t \preceq 0 : \Box \tau}.
The type \textsf{List}^\alpha{I} \tau captures lists which differ in less than \alpha positions. This interacts with the \Box type in order to capture the notion of differing numbers of elements in the list.
In the semantics, the relative cost is pushed inside the semantics (some step-indexed logical relations are used, details not shown).
Going back to the earlier example of searching nested lists, we see a typing of 1 \leq |\texttt{find1}| \leq 3n and 3n \leq |\texttt{find2}| \leq 4n meaning \texttt{find1} has a relatively smaller cost. Plugging this into the higher-order function and doing the type derivation again gives a looser bound.

Approach (2) is the heavyweight approach was presented earlier at ICFP (A Relational Logic for Higher-order Programs), the language RHOL (Relational Higher Order Logic). Has relational and unary judgments, where the relational rule is \Gamma \mid \Psi \vdash t_1 : \tau_1 \approx t_2 : \tau_2 \mid \theta which contains term variable assertions \Psi and relational assertion \theta. In the type system, some rules are synchronous (building up the related two terms with the same syntactic construct) or asynchronous (building up one of the terms).

HOL terms can be encoded into RHOL, and vice versa (a bit like the hyperproperty result of Clarkson/Schneider), but RHOL is much easier to work with for relational reasoning but also all the power of HOL can be embedded into RHOL.
The system R^C is a monadic meta language with a specific cost effect (with \textsf{cstep}_n(m) introducing a cost. Intuition: if a term m is closed and $m \Downarrow_n v$ then $m \cong \textsf{cstep}_n(v)$.We can define formulae in HOL which all us to reason about these explicit cost terms and their quantities: we can define what we need in the logic.

For computing a relative upper bound on the insertion sort, we want to prove \textsf{UnsortedDiff}(l_1, l_2, n) \Rightarrow \textsf{cost}(\textit{isort} \, l_1) - \textsf{cost}(\textit{isort} \, l_2) \leq n. Using the system , we can prove this property, using a suitable invariant/predicate over the input and output list(s).

Take home: type-based relational verification is a large research space that needs more work.

Q: what about recursion? In the first system, letrec is included straightforwardly, in the second system the assumption is terminating (general) recursion.
Q: are the costs tied to a specific evaluation strategy? Yes, but one can encode different ones in the system. (Both systems are parametric in the way you count cost). In the first system, this is part of the typing, in the second, this comes where you put monadic cost operations \textsf{cstep}.


(FSCD) Arrays and References in Resource Aware ML, Benjamin Lichtman, Jan Hoffmann

Resource-aware ML (RAML) – model resource usages of programs with cost semantics, e.g., implement Quicksort and it produce the bound 14n^2 + 19^n + c (I forgot what the constant c was). This work introduces references and arrays to enable analysis of programs where resource consumption depends on the data stored in mutable structures (references and arrays).

How does it work? Automatic amortized resource analysis. Assign “potential” functions to a data structures, which will pay the resource consumption cost, then hopefully there is some left over for later. This propagates backwards to give the overall upper bound cost.

Each variable in scope may contribute (carry) potential, e.g., recursive data types (such as x : \textsf{list}^2(\textsf{int}) that is it can contribute 2n units of potential where n is the length of the list.

Based on an affine type system (all variable used at most once). Subexpressions are bound to variables whenever possible (share-let normal form) in order to explicitly track when things are evaluated (via variables). When a variable carrying potential needs to be shared, the potential can be explicitly split share x as (x1, x2) in e in order to use a variable more than once (gives you contraction). This splits the potential of x across the two variables. Benjamin walked us through a nice example with appending to a list.

Now let’s add references.

g l = 
  let t = ref l in 
  share r as (r1, r2) in  -- r2 : L^1 (int)
  let _ = h r1 in   -- h : (L^q int) ref -> int  
                    -- (need to make the type of this stronger)
  append (!r2, []) -- append : L^1(int) * L^0(int) -> L^0(int)

One approach is to carry around extra information about possible effects, but they want to keep effects out of the type system because they seem them as cumbersome and difficult to use. Instead, strengthen contracts by require that potential annotations of data inside references are fixed.

g l = 
  let t = ref l in 
  share r as (r1, r2) in  -- the restricted h means r1 and r2 have 
                          -- to have the same type
  let _ = h r1 in   -- h : (L^1 int) ref -> int  
  append (!r2, [])

Introduce a swap command as the only way to update a cell, that behaves like:

swap (r, l) = let x = !r in (let _ = r := l in x)

Requires that data placed into a cell has the same type as data being taken out, ensures that potential annotations of mutable cells never change (the only way to update a cell). Thus, swap is the well-typed version of dereference in situations where you need to work with potential. Thus, require the previous program to use swap on r2 with an empty list instead of a direct dereference in the last line.

The paper has a detailed example of depth-first search (stateful) which shows more of the details related to how to work with state and potentials. This also utilises an idea of “wrapper” types.

Introduce a notion of “memory typing”, which tracks all memory locations pointed to by mutual references, this is require to prove the soundness of the cost analysis (based on the free potential of the system, the potential of the context, and the potential of each mutable memory reference).

Mutable arrays were then a natural extension from this basis, with an array-swap operation for updating an array cell.

In the future, they’d like to create some syntactic sugar so that the programming style is not affected as much by having to instead swap, aswap, and wrapper types. Would like to investigate how fixed potential annotations restrict the inhabitants of a type (Yes, yes, this is very interesting!)

I asked about what happens if you write down Landin’s Knot to get recursion via higher-order state: Benjamin answered that functions don’t get given potential so storing and updating functions in a reference would type correctly. I still wonder if this would let me get the correct potential bounds for a recursive algorithm (like DFS) if the recursion was implemented via higher-order state.


(FSCD) The Complexity of Principal Inhabitation, Andrej Dudenhefner, Jakob Rehof

In the simply-typed lambda calculus (STLC), with principle types. We say \tau is the principle type of M if \vdash M : \tau and for all types \sigma such that M : \sigma there exists a substitution S such that S(\tau) = \sigma.

Definition: (Normal Principal Inhabitant), We say that a term M in beta-normal form is a normal principal inhabitant of a type t, if t is the principal type of M.

This work shows that principal inhabitation for STLC is PSPACE-complete (in 1979 it was shown that inhabitation (not considering) principality is PSPACE-complete).  Thus, this work seeks to answer the following: since principality is a global property of derivations, does it increase the complexity of inhabitation? This is also practically interesting for type-based program synthesis, since a normal principal inhabitant of a type is a natural implementation of the type.

For example, a \rightarrow a \rightarrow a is inhabited by the K combinator K = \lambda x . \lambda y .  x but is not principally inhabited (I haven’t understood why yet).

The proof is based on the subformula calculus. The subformula calculus provides paths to index types (as trees) which are strings of either 1 or 2.  This is useful for reasonable about subformulae, where the usual STLC rules are re-expressed in terms of type paths, e.g., \dfrac{\Gamma, x : 1\pi \vdash e : 2\pi}{\Gamma \vdash \lambda x . e : \pi}. Define relations on paths, R_M as the minimal equivalence relation for a beta-normal term M (capture constraints on subformulae of terms) and R_\tau  which defines an equivalence relation on paths of paths with common subpaths (capture constraints on subformulae on types).

(I had to step out, so missed the end).


(FSCD) Types as Resources for Classical Natural Deduction, Delia Kesner, Pierre Vial

Quantitative types seen as resources (can’t be duplicated); they provide simple arithmetical arguments to prove operational equivalences. These ideas are extended to classical logic in this talk.

In simple types, typability implies normalisation. With intersection types this is an equivalence, i.e., normalising also implies typability. In an intersection type system, a variable can be assigned several types, e.g. x : A \wedge B \wedge C \wedge B (note the two occurrences of B) where intersection is associative, commutative and (possibly) idempotent. In 1994, Gardner introduces a system which is not idempotent, where types are now multisets (rather than sets in the earlier formulations). This has the flavour of linear logic resources (my interpretation: the types can then capture how many times a variable is used (quantitative) if we contraction takes the intersection of the two types and intersection is non-idempotent, e.g., x_1 : A \wedge B, x_2 : B \wedge C \leadsto x : A \wedge B \wedge B \wedge C).

In the literature, we see how to build a computational interpretation of classical natural deduction: Intuitionistic Logic + Peirce’s law gives classical logic and Felleisen’s call-cc operator gives the computational interpretation. Relatedly, the lambda_mu calculus (Parigot 92) give a direct interpretation of classical natural deduction.

Types are strict where intersection can only appear on the left-hand side of a function arrow.

Interestingly, you can have a situation where the type of a lambda abstraction has an empty intersection of types in its source type. In order to get normalisation in this context, the system was extended a bit to deal with this (I didn’t capture this part, see paper). The full system had rules to ensure this didn’t happen.

A notion of “weighted” subject reduction is defined, where the size of a derivation tree is strictly decreasing during reduction.

ICFP / FSCD day 3 – rough notes

(Blog posts for Day 1, Day 2, Day 3, Day 4 (half day))

I decided to take electronic notes at ICFP and FSCD (colocated) this year, and following the example of various people who put their conference notes online (which I’ve found useful), I thought I would attempt the same. However, there is a big caveat: my notes are going to be partial and may be incorrect; my apologies to the speakers for any mistakes.


(ICFP) – A Specification for Dependent Types in Haskell, Stephanie Weirich, Antoine Voizard, Pedro Henrique Avezedo de Amorim, Richard A. Eisenberg

Haskell already has dependent types! ish! (ICFP’14) But singletons are “gross”! (mock discussion between Stephanie and Richard). This work describes a semantics for dependent types in Haskell, along with a replacement for GHC’s internal (Core) language, along with the high-level type theory and fully mechanized metatheory.

But isn’t Haskell already dependently typed? Cf Haskell indexed type for vectors. The following cannot be written in GHC at the moment:

vreplicate :: Pi(n :: Nat) -> Int -> Vec n
vreplicate Zero _ = Nil
vreplicate (Succ n) a = Cons a (vreplicate n a)

The reason this can’t be typed is due to n being erased (after compilation, i.e., it doesn’t appear in the run time) but it is needed dynamically in the first argument. In Dependent Haskell this can be done.

But… it’s not your usual system. For example, what about infinite data?

inf :: Nat
inf = S inf
vinf :: Vec inf
vinf = Cons 2 vinf  -- yields equality demand Vec (S inf) = Vec inf

Idealized Haskell (IH) = System F_omega + GADTs (has uncediable type checking!)

System FC (GHC Core) is a reified version of Idealized Haskell with enough annotations so that type checking is decidable, trivial, and syntax directed. Terms in Core can be see as type derivations of IH terms: contains type annotations and type equalities. (These can all be erased to get back into IH.) The first contribution of this work is a dependent System FC (DC). But its very complicated (needs also type soundness (progress/preservation), erasable coercions, ).
Instead, they introduce System D, the dependently typed analog of Idealized Haskell: its a stripped down dependent Haskell in Curry style (implicit types) and similarly to IH its type checking is undecidable. It has dependent types, non-termination, and coercion abstraction.

Coercion abstraction: for example

g : Int -> Int |- \v -> \c -> \v -> v : Pi n . (g n ~ 7) -> Vec (g n) -> Vec 7

Here the variable v (which has type Vec (g n) when passed in) is implicitly cast using the coercion c (which is passed in as a parameter) to Vec 7. This is translated into DC using an explicit coercion operator, applying c to coerce v.

System D and System DC were mechanized (in Coq) during the design using Ott (to input the rules of the language) & LNgen (for generating lemmas). Currently working on dependent pattern matching, a full implementation in GHC, and roles. (Hence, this is the specification of the type system so far).

Q: type inference for Dependent Haskell?
A: Not sure yet (but see Richard Eisenberg’s thesis)
Q: How do you get decidable type checking in the presence of infinite terms?
A: I didn’t understand the answer (will followup later).
Q: Why is coercion abstraction explicit in D while coercion application is implicit?
A: “Coercion application *is* explicit in D, it’s the use of coercion to cast a type that is implicit, but if you were to apply it to a value it would be explicit.”


(ICFP) – Parametric Quantifiers for Dependent Type Theory Andreas Nuyts, Andrea Vezzosi, Dominique Devriese

A type variable is parameteric if is only used for type-checking (free well-behavedness theorems): i.e., can’t be inspected by pattern matching so you have the same algorithm on all types, e.g. flatten : forall X . Tree X -> List X. Parametericity is well studied in the System F world. This work looks at parametericity in dependent type theory.; some results carry over, some can be proved internally, but some are lost. This work formulates a sound dependent type system ParamDTT which .

Parametricity gives representation independence (in System F):
A \rightarrow B \cong (\forall X . (X \rightarrow A) \rightarrow (X \rightarrow B)
(This result follows by parametricity, using the result that g : \forall X . (X \rightarrow A) \rightarrow (X \rightarrow B) implies g \, X_0 \, r_0 \, x\, = g \, A \, \textit{id} \, (r_0 \; x_0) which can be provided by relational parametricity, see Reynolds “related things map to related things”, applies identity extension lemma).

Can we do the same thing for DTT and can we prove the result internally?
\Pi is however not parametric. For example if we convert to \Pi (X : U) . (X \rightarrow A) \rightarrow (X \rightarrow B). Suppose $B = U$ then we can leak details of the implementation (rerepresentation type is returned as data) \textit{leak} \; X \; r \; x = X (has this type). This violates the identity extension lemma used in the above proof for System F.

So instead, add a parametric quantifier to DTT to regain representation independence, making the above \textit{leak} ill typed. The proof of parametricity for g : \forall X . (X \rightarrow A) \rightarrow (X \rightarrow B) can  now be proved internally. This uses the relational interval type 0 -- 1 : \mathbb{I} (cf Bernardy, Coquand, and Moulin 2015, on bridge/path-based proofs) which gives a basis for proving the core idea of “related things map to related things” where 0 is connected with the type X (in the above type) and 1 is connected to the type A via a switching term / r/ : \mathbb{I} \rightarrow U (see paper for the proof) (I think this is analogous to setting up a relation between X and A in the usual relational parametricity proof).

They extended Agda with support for this.


(ICFP) – Normalization by Evaluation for Sized Dependent Types
Andreas Abel, Andrea Vezzosi, Theo Winterhalter

(context, DTT a la Martin-Lof, and you can add subtpying to this, where definitional equality implies subtyping). Definitional equality is decideable (for the purposes of type checking), but the bigger the better (want to be able to know as many things equal as possible).

Termination is needed for consistency, general fix gives you inconsistency. Instead, you can used data types indexed by tree height, where \textsf{Nat}^i = \{ n \mid n < i\}, you can define \textsf{fix} : (\forall i \rightarrow (\textsf{Nat}^{i} \rightarrow C) \rightarrow (\textsf{Nat}^{i+1} \rightarrow C)) \rightarrow \forall j \rightarrow \textsf{Nat}^j \rightarrow C.

However, size expressions are not unique, which is problematic for proofs. e.g., suc i : \textsf{Nat}^i \rightarrow \textsf{Nat}^\infty but also suc \infty : \textsf{Nat}^i \rightarrow \textsf{Nat}^\infty. Intuition: sizes are irrelevant in terms, but relevant in types only.

Andreas set up some standard definitions of inductive Nat then tried to define a Euclidean division on Nat (where monus is minus cutting off at 0).

div : Nat → Nat → Nat
div zero y = zero
div (suc x) y = {! suc (div (monus x y) y) !}

However, Agda fails to termination check this. The solution is to “size type”-up everything, i.e., redefine the usual inductive Nat to have a size index:

data Nat : Size → Set where
 zero : ∀ i → Nat (↑ i)
 suc : ∀ i → Nat i → Nat (↑ i)

All the other definitions were given sized type parameters (propogated through) and then div was redefined (type checking to):

div : ∀ i → Nat i → Nat ∞ → Nat i
div .(↑ i) (zero i) y = zero i
div .(↑ i) (suc i x) y = suc _ (div i (monus i _ x y) y)

So far this is the usual technique. However, now we have difficulty proving a lemma (which was straightforward before sizing that minus of x with itself gives 0. Now this looks like:

monus-diag : ∀ i (x : Nat i) → Eq ∞ (monus i i x x) (zero ∞)

In the case: monus-diag .(↑ i) (suc i x) = monus-diag i x Agda gets the error i != ↑ i of type Size.

In the Church-style, dependent type functions = polymorphic functions, so you can’t have irrelevant arguments.  In Curry-style (with forall) this is okay. (see the previous talk, this could be a possible alternate solution?)

Instead, we want “Churry”-style where size arguments are used for type checking but can be ignored during equality checking: thus we want typing rules for irrelevant sizes. This was constructed via a special “irrelevant” modality \bullet, where sizes uses as indexes can be marked as irrelevant in the context. Here is one of the rules for type formation when abstracting over an irrelevantly typed argument:

\dfrac{\Gamma \vdash A : Type \quad \Gamma, \bullet \bullet x : A \vdash B : \textsf{type}}{\Gamma \vdash (x : \bullet A) \rightarrow B : \textsf{type}}

This seemed to work really nicely (I couldn’t get all the details down, but the typing rules were nice and clean and made a lot of sense as a way of adding irrelevancy).


(ICFP) – A Metaprogramming Framework for Formal Verification
Gabriel Ebner, Sebastian Ullrich, Jared Roesch, Jeremy Avigad, Leonardo De Moura

(Context: Lean is a dependently-typed theorem prover that aims to bridge the gap between interactive and automatic theorem proving. It has a very small trusted kernel (no pattern matching and termination checking), based on Calculus of Inductive Constructions + proof irrelevance + quotient types.)

They want to extend Lean with tactics, but its written in C++ (most tactic approaches are defined internally, cf Coq). Goal: to extend Lean using Lean by making Lean a reflective metaprogramming language (a bit like Idris) does, in order to build and run tactics within Lean. Do this by exposing Lean internals to Lean (Unification, type inference, type class resolution, etc.). Also needed an efficient evaluator to run meta programs.

lemma simple (p q : Prop) (hp : p) (hq : q) : q := 
by assumption

Assumption is a tactic, a meta program in the tactic monad:

meta def assumption : tactic unit := do
  ctx <- local_context,
  t   <- target
  h   <- find t ctx,
  exact h

(seems like a really nice DSL for building the tactics). Terms are reflected so there is a meta-level inductive definition of Lean terms, as well as quote and unquote primitives built in, but with a shallow (constant-time) reflection and reification mechanism.

The paper shows an example that defines a robust simplifier that is only about 40 locs (applying a congruence closure). They built a bunch of other things, including a bigger prover (~3000 loc) as well as command-line VM debugger as a meta program (~360 loc).

Next need a DSL for actually working with tactics (not just defining them). Initially quite ugly (with lots of quoting/unquoting). So to make things cleaner they let the tactics define their own parsers (which can then hook into the reflection mechanism). They then reused this to allow user-defined notations.


(FSCD) A Fibrational Framework for Substructural and Modal Logics, Dan Licata, Michael Shulman, Mitchell Riley

Background/context: model logic core \dfrac{\emptyset \vdash A}{\Gamma \vdash \Box A} for necessity (true without any hypotheses) \dfrac{\Gamma \vdash \Diamond A}{\Diamond \Gamma \vdash \Diamond B} (if something A is possible true then possibility can propagate to the context– doesn’t change the “possibleness”). Various intuitionistic substructural and modal logics/type systems: linear/affine, relevant, ordered, bunched (separation logic), coeffects, etc.  Cohesive HoTT: take dependent type theory and add modalities (int, sharp, flat). S-Cohesion (Finster, Licata, Morehouse, Riley: a comonad and monad that themselves are adjoint, ends up with two kind of products which the modality maps between). Motivation: what are the common patterns in substructural and modal logics? How do we construct these things more systematically from a a small basic calculus.

With S4 modality (in sequent calculus), its common to make two contexts, one of modality formulae and one of normal, e.g., \Gamma; \Delta \vdash A which is modelled by \Box \Gamma \times \Delta \rightarrow A. In the sequent calculus style,  use the notion of context to form the type. Linear logic ! is similar (but with no weakening). The rules for \otimes follow the same kind of pattern: e.g., \dfrac{\Gamma, A, B \vdash C}{\Gamma, A\otimes B \vdash C} where the context “decays” into the logical operator (rather than being literally equivalent): the types inherent the properties of the context (cf. also exchange).

The general pattern: operations on contexts, with explicit or admissible structure properties, then have a type constructor (an operator) that internalises the implementation and inherits the properties. This paper provides a framework for doing this, and abstracting the common aspects of many intuitionistic substructural and modal logicas based on a sequent calculus (has cut elimination for all its connectives, equational theory, and categorical semantics).

Modal operators are decomposed in the same way as adjoint linear logic (Benton&Wadler’94), see also Atkey’s lambda-calculus for resource logic (2004) and Reed (2009).

So.. how? Sequents are completely Cartesian but are annotated with a description of the context and its properties/substructurality by a first order term (a context descriptor) \Gamma \vdash_a A where a is this context descriptor. Context descriptors are built on “modes”. e.g., a: A, b : B, c : C, d : D \vdash_{(a \otimes b) \otimes (c \otimes d)} X the meaning of this sequent depends on the equations/inequations on the context descriptor:, e.g.,if you have associativity of \otimes you get ordered logic, if you have contraction a \Rightarrow a \otimes a then you get a relevant logic, etc. Bunched implication will have two kinds of nodes in the context descriptor. For a modality, unary function symbols, e.g. \textbf{r}(a) \otimes b means that the left part of the context is wrapped by a modality.

A subtlety of this is “weakening over weakening”, e.g., \dfrac{\Gamma \vdash_a B}{\Gamma, x : A \vdash_a B} where we have weakend in the sequent calculus but the context descriptor is unchanged (i.e., we can’t use x : A to prove B).

The F types give you a product structured according to a context descriptor, e.g. A \otimes B := F_{x \otimes y} (x : A, y : B). All of the left rules become instances of one rule on F: \dfrac{\Gamma, \Delta \vdash_{\beta[\alpha / x]} B}{\Gamma, x : F_\alpha(\Delta) \vdash B} which gives you all your left-rules (a related dual rule gives the right sequent rules).

Categorical semantics is based on a cartesian 2-multicategory (see paper!)
Q: is this like display logics? Yes in the sense of “punctuating” the shape of the context, but without the “display” (I don’t know what this means though).

This is really beautiful stuff. It looks like it could extend okay with graded modalities for coeffects, but I’ll have to go away and try it myself.


(FSCD) – Dinaturality between syntax and semantics, Paolo Pistone

Multivariate functions F X X and G Y Y where the first parameter is contravariant and second is covariant may have dinatural transformations between them. (Dinaturality is weaker than naturality). Yields a family of maps \theta_X \in Hom_C(F X X, G X X) which separate the contravariant and covariant actions in the coherence condition (see Wiki for the hexagonal diagram which is the generalisation of the usual naturality property into the dinatural setting).

A standard approach in categorical semantics is to interpret open formula by multivariant functor F X Y = X \rightarrow Y and proofs/terms by dinautral transformations.  Due to Girard, Scedrov, Scott 1992 there is the theorem that: if M is closed and \vdash M : \sigma then M is dinatural. The aim of the paper is to prove the converse result: if M is closed and \beta\eta-normal and (syntactically) dinatural then \vdash M : \sigma (i.e., implies typability).

Syntactic dinaturality examples where shown (lambda terms which exhibit the dinaturality property). Imagine a transformation on a Church integer (X \rightarrow X) \xrightarrow{h} (X \rightarrow X)– this is a dinatural transformation and its dinaturality hexagon is representable by lambda terms. (I didn’t get a chance to grok all the details here and serialise them).

Typability of a lambda term was characterised by its tree structure and properties of its shape and size (e.g., \lambda x_1 . \ldots \lambda x_n . M : \sigma_1 \rightarrow \ldots \sigma_h \rightarrow \tau then $n \leq h$, there are various others; see paper).

The theorem then extends from typability to parametricity (implies that the syntactically dinatural terms are parametric) when the type is “interpretable” (see paper for definition).

(sorry my notes don’t go further as I got a bit lost, entirely my fault, I was flagging after lunch).


(FSCD) – Models of Type Theory Based on Moore Paths, Andrew Pitts, Ian Orton

In MLTT (Martin-Lof Type Theory), univalence is an extensional property (e.g., (\forall x . f x = g x)  \rightarrow (f = g)) of types in a universe U: given X, Y : U every identification p : Id_U X Y induces an (Id)-isomorphism X \cong Y. U is univalent “if all Id-isomorphisms X \cong Y is U are induced by some identification P : Id_U X Y” (want this expressed in type theory) (this is a simpler restating of Voevodsky’s original definition due to Licata, Shulman, et al.). This paper: towards answering, are there models for usual type theory which contain such univalent universes?

Univalence is inconsistent with extensional type theory where p : Id_A X Y implies x = y \wedge p = \textsf{refl}. Univalence is telling you that the notion of identification is richer subsuming ETT where there is an identification between two types then they are already equal, which is much smaller/thinner.

Need a source of models of “intensional” identification types (i.e., not the extensional ones ^^^): homotopy type theory provides a source of such models via paths: p : Id_A x y are paths from point x to point y  in a space A where \textsf{refl}_A is a “constant path” (transporting along such a path does nothing to the element).

So we need to find some structure Path_A x y for these identifications, but we need to restrict to families of type (B \; x \mid x : A) carrying structure to guarantee substitution operations (in this DTT context).  “Holes” is diagrams are filled using path reversals and path composition. Solution is “Moore” paths.

A Moore path from x to y in a topological space X is specified by a shape (its length) |p| in \mathbb{R}_+ and an extent function p@ : \mathbb{R}_+ \rightarrow X satisfying p@ 0 = x and (\forall i \geq |p|) . p@i = y. These paths have a reversal operator by truncated subtraction \textit{rev}(p@i) = p@(|p| - i). (note this is idempotent on constant paths as required, allowing holes in square paths to be filled (I had to omit the details, too many diagrams)). But topological spaces don’t give us a model of MLTT, so they replace it by an arbitrary topos (modelling extensional type theory) and instead of \mathbb{R} for path length/shape use any totally ordered commutative ring. All the definition/properties of Moore paths are the expressed in the internal language of the topos.  This successfully gives you a category with families (see Dybjer) modelling intensional Martin-Lof type theory (with Pi). This also satisfies dependent functional extensionality thus its probably a good model as univalence should imply extensionality. But haven’t yet got a proof that this does contain a univalence universe (todo).

Q: can it just be done with semirings? Yes.


(ICFP) – Theorems for Free for Free: Parametricity, With and Without Types, Amal Ahmed, Dustin Jamner, Jeremy G. Siek, Philip Wadler

Want gradual typing AND parametricity. There has been various work integrating polymorphic types and gradual typing (with blame) but so far there have been no (published) proofs of parametricity. This paper proposes a variant of the polymorphic blame calculus \lambda B [Ahmed, Findler, Siek, Wadler POPL’11] (fixed a few problems) and proves parametricity for it.

Even dynamically typed code cast to universal type should behave parametrically., e.g. consider the increment function which is typed dynamically and cast to a parametric type \lambda x . x + 1 : \ast \Rightarrow^p \forall X . X \rightarrow X) [int] 4 returns 5 but we want to raise a “blame” for the coercion p here (coercing a monomorphic to polymorphic type is bad!).

The language has coercions and blame, and values can be tagged with a coercion (same for function values). Some rules:
(v : G \Rightarrow^p \ast \Rightarrow^q G) \mapsto v  (that is, casts to the dynamic type and back again cancel out).
But, if you cast to a different type (tag) then blame pops out: (v : G \Rightarrow^p \ast \Rightarrow^q G') \mapsto \textsf{blame} p (or maybe that was blame q?)

add = \Lambda X . \lambda (x : X) . 3 + (x : X \Rightarrow^p \ast \Rightarrow^q \textsf{int}) this has a type that makes it look like a constant function \lambda X . X \rightarrow \textsf{int}. When executing this we want to raise blame rather than actually produce a value (e.g. for add 3 [\textsf{int}] 2). Solution: use runtime type generation rather than substitution for type variables.

A type-name store \Sigma provides a dictionary mapping type variables to types.
\Sigma \rhd (\Lambda X . v) [B] \mapsto \Sigma, \alpha := B \rhd v[\alpha/X] i.e., instead of substituting the B in the application, create a fresh type name \alpha that is then mapped to B in the store. Then we extend this with types and coercion: \Sigma \rhd (\Lambda X . v) [B] \mapsto \Sigma, \alpha := B \rhd v[\alpha/X] : A[\alpha/X] \Rightarrow^{+\alpha} A[B/X] where +\alpha (I think!) means replace occurrences of B with \alpha.

Casts out a polymorphic type are instantiated with the dynamic type. Casts into polymorphic types are delayed like function casts.

Parametricity is proved via step-indexed Kripke logical relations. The set of worlds W are tuples of the type-name stores, and relations about when concealed values should be related.  See paper: also shows soundness of the logical relations wrt. contextual equivalence. There was some cool stuff with “concealing” that I missed really.


(ICFP) – On Polymorphic Gradual Typing, Yuu Igarashi, Taro Sekiyama, Atsushi Igarashi

Want to smoothly integrate polymorphism/gradual. e.g., apply dynamically-typed arguments into polmorphically typed parameters, and vice versa. Created two systems: System Fg (polymorphic gradually typed lambda cal) which translates to System Fc.

Want conservativity: A non-dynamically typed term in System F should be convertible into System Fg, and vice versa.  The following rule shows the gradual typing “consistency” rule for application (in System Fg and gradual lambda calculus):
\dfrac{\Gamma \vdash s : T_{11} \rightarrow T_{12} \quad \Gamma \vdash t : T_1 \quad T_1 \sim T_{11}}{\Gamma \vdash s \; t : T_2} where \sim defines type consistency. Idea: extend this to include notions of consistency between polymorphic and dynamic types. An initial attempt breaks conservativity since there are ill-typed System F terms which are well-typed in System Fg then.

The gradual guarantee conjecture: coercions without changes to the operational semantics.

They introduce “type precision” (which I think replaces consistency) which is not symmetric where A \sqsubseteq \astI didn’t yet get the importance of this asymmetry). But there are some issues again… a solution is to do fresh name generation to prevent non-parametric behaviour (Ahmed et al. ’11, ’17 the last talk) But making types less precise still changes behaviour (create blame exceptions). Their approach to fix this is to restrict the term precision relation \sqsubseteq so that it can not be used inside of \Lambda (term level) and \forall (type level).
Still need to prove the gradual guarantee for this new approach (and adapt the parametricity results from Ahmed et al. ’17).


(ICFP) – Constrained Type Families, J. Garrett Morris, Richard A. Eisenberg

(Recall type families = type-level functions)
Contribution 1: Discovery: GHC assumes that all type families are total
Contribution 2: First type-safety proof with non-termination and non-linear patterns
Contribution 2.5 (on the way): simplified metatheory
The results are applicable to any language with partiality and type-level functions (e.g. Scala)
(Haskellers leave no feature unabused!)

-- Example type family:

type family Pred n
type instance Pred (S n) = n
type instance Pred Z = Z

Can define a closed version:

type family Pred n where
            Pred (S n) = n
            Pred Z = Z

Definition: A ground type has no type families.
Definition: A total type family, when applied to ground types, always produces a ground type.

In the new proposal: all partial type families must be associated type families.

type family F a                 -- with no instances
x = fst (5, undefined :: F Int) -- is typeable, but there are no instance of F!

Instead, put it into a class:

class CF a where
    type F a
x = fst (5, undefined :: F Int) -- now get error "No instance for (CF Int)"

Can use closed type families for type equality.

type family EqT a b where
       EqT a a = Char
       EqT a b = Bool
f :: a -> EqT a (Maybe a) -- This currently fails to type check but it should work
f _ = False

Another example:

type family Maybes a 
type instance Maybes a = Maybe (Maybes a)
justs = Just justs -- GHC says we can't have an infinite type a ~ Maybe a
-- But we clearly do with this type family!

Why not just ban ​Maybes? Sometimes we need recursive type families.

By putting (partial) type classes inside a class means we are giving a constraint that explains there should be a type at the bottom of (once we’ve evaluated) a type family. Therefore we escape the totality trap and avoid the use of bogus types.

Total type families need not be associated. Currently GHC has a termination checker, but it is quite weak, so often people turn it off with UndecideableInstances.
But, wrinkle: backwards compatibility (see paper) (lots of partial type families that are not associated).

Forwards compatibility: dependent types, terminating checking, is Girard’s paradox encodable?

This work also lets us simplify injective type family constraints (this is good, hard to understand at the moment). It makes closed type families more powerful (also great).


(ICFP) – Automating Sized-Type Inference for Complexity Analysis, Martin Avanzini, Ugo Dal Lago

(for higher-order functional programs automatically). Instrument a program with a step counter by turning them into state monad which gives a dynamic count.
Sized typed (e.g., sized vectors), let you type ‘reverse’ of a list as \forall i . L_i \, a \rightarrow L_i \, a. But inference is highly non-trivial.

Novel sized-type system with polymorphic recursion and arbitrary rank polymorphism on sizes with inference procedure for the size types. Constructor definitions define size measure, e.g., for lists of nats:  \textsf{nil} : L_1 and \textsf{cons} : \forall i j . N_i \rightarrow L_j \rightarrow L_{i + j + 1} (here the natural number has a size too).
Can generalise size types in the application rules, (abs, app, and var rules deal with abstraction and instantiation of size parameters).

Theorem (soundness of sizes): (informally) the sizes attached to the a well-typed program gives you a bound on the number of execution steps that actually happen dynamically (via the state encoding). [also implies termination].

Inference step 1: add template annotations, e.g., \textit{reverse} : \forall k . L_k \rightarrow L_{g(k)}. Then we need to find a concrete interpretation of g.
Step 2: constraint generation, which arise from subtyping which approximate sizes, plus the constructor sizes. These collect together to constraint the meta variables introduced in step 1.
Step 3: constraint solving: replace meta variable by Skolem terms. Then use an SMT solver (the constraints are non-linear integer constraints). (from a question) The solution may not be affine (but if it is then it is easier to find).

The system does well at inferring the types of lots of standard higher-order functions with lists. But cross product on lists produces a type which is too polymorphic: with a little more annotation you can give it a useful type.
Q: What happens with quicksort? Unfortunately it can’t be done in the current system. Insertion sort is fine though.
Q: What about its relation to RAML? The method is quite different (RAML uses an amortized analysis). The examples that each can handle are quite different.

(ICFP) – Inferring Scope through Syntactic Sugar, Justin Pombrio, Shriram Krishnamurthi, Mitchell Wand

[Github]

Lots of languages use syntactic sugar, but then its hard to understand what the scoping rules are for these syntactic sweets.

E.g. list comprehension [e | p <- L] desugars to map (\p . e) L. The approach in this work is a procedure of scope inference which takes these rewrite patterns (where e and p and L are variables/parameters). We know the scoping rules for the core language (i.e., the binding on the right) [which parameterises the scope inference algorithm]. Desugaring shouldn’t change scope (i.e., if there is path between two variables then it should be preserved). So the scope structure is homomorphic to the desugared version, but smaller.  (A scope preorder was defined, I didn’t get to write down the details).

Theorem: if resugaring succeeds, then desugaring will preserve scope.

Evaluation: the techniques was tested on Pyret (educational language), Haskell (list comprehensions), and Scheme (all of the binding constructs in R5RS). It worked for everything apart from ‘do’ in R5RS.

ICFP / FSCD day 2 – rough notes

(Blog posts for Day 1, Day 2, Day 3, Day 4 (half day))

I decided to take electronic notes at ICFP and FSCD (colocated) this year, and following the example of various people who put their conference notes online (which I’ve found useful), I thought I would attempt the same. However, there is a big caveat: my notes are going to be partial and may be incorrect; my apologies to the speakers for any mistakes.


(ICFP Keynote #2) – Challenges in Assuring AI, John Launchbury

The moment we think we’ve got our (verification) hands round the world’s systems (e.g., compilers), a whole new set of systems appear: AI.

The dual of “have I tested enough?” is “how good is my model?” (in a verification-based approach). What is the model we have for AI systems?

What is intelligence? Information processing. But there are different kinds of information processing ability; asking if a system is ‘intelligent’ is too coarse/binary. John breaks this down into ‘perceiving’ P, ‘learning’ L, ‘abstracting’ A (create new meanings), and ‘reasoning’ R (plan and decide).

AI wave 1 : rule-based, hand-crafted knowledge: P 1, L 0, A 0, R 3 (poor handling of uncertainty, no learning). Type systems are a little like this (perceiving, didn’t have to tell it everything). Can do some pretty amazing stuff: see smart security systems that analysis code and patch bugs.

AI wave 2: statistical learning. Specific problem domains, train them on big data. Still requires a lot of engineering work (to get the right models and statistical techniques for the task). P 3, L 3, A 1, R 1 (less good at reasoning).

What does it mean to prove a vision system correct?

Manifold hypothesisdata coming in is high-dimensional but the features you are interested in form lower-dimensional structures. E.g., with cars, lots of data comes in but understanding the data means separating the manifolds (of low dimensionality).
Another example, a 28×28 image is a 784 dimensional space (as in 784 numbers, 784-length vector). Variation in handwritten digits form 10 distinct manifolds within the 784 dimensional space. High-degree of coherence between all data samples for a particular digit. (“Manifold” here may be slightly more informally used than the full mathematical definition).

Imagine two interlocking spiral arms which are really 1-dimensional manifolds in a 2-dimensional space. By continuous transforms on the 2-dimensional space these can morphed into linearly separable components. Sometimes stretching into a new dimension enables enclosed manifolds to be separated.

Neural nets separate manifolds; each layer of a neural network stretches and squashes the data space until the manifolds are cleanly separated. Non-linear functions at each step (otherwise multiple linear transformations would collapse into just one layer). Use calculus to adjust the weights (error propagation backwards through the layers).

Structured neural nets have different kinds of layers, e.g., feature maps (which perform a local analysis over the whole input); generated potentially via convolutions (i.e., turn a 28×28 image into 20 24×24 images), do some subsampling. Programming these requires some trial-and-error.

Models to explain decisions: “why did you classify this image as a cat?”; not just “I ran it through  my calculations and ‘cat’ came up as highest”. A deep neural net could produce a set of words (fairly abstract) and then a language-generating recurrent neural net (RNN) translates these into captions (sentences). Currently, statistically impressive but individual unreliable.

Adversarial perturbations: Panda + <1% computed distortion, becomes classified as a Gibbon. Whilst the two images are indistinguishable to a human, the noise interferes with the manifold separation. There are even universal adversarial perturbations that throw-off classification of all images.

Assured control code (e.g., in a helicopter): a physical machine is modelled (system dynamics, power, inertia), a control system design is created, from which controller code is generated. But this does not produce a very nice to fly system. Need some kind of refinement back to the control code. Given the correct structure for the control code framework then the fine tuning from feedback on the actual system is very effective. This can be overlayed with different goals (different fitness functions), e.g., safety over learning.

AI wave 3: combining wave 1 and 2 techniques. Perceive into a contextual model on which abstractions are computed and which can be tuned (learning) to produce reasoning. Aim to get a better intelligence profile: P 3, L 3, A 2, R 3. Hand assurance arguments on the contextual model part instead.
e.g., build models that are specialised based on our knowledge, e.g., we know the probably number of strokes in each digit 0-9 and its trajectory. The generative model generates explanation of how a test character was generated (go from the model of whats its looking for to what it sees; how likely is the thing seen generatable from the model of a particular digit).

Need to be able to specify intent in a variety of ways; all real world specifications are partial and humans can’t always communicate the full intent of what they want (cf., trying to explain how to drive a car).

What are we trying to assure? Mistakes in sensing/reasoning leading to undesirable actions, undesirable emergent behaviours, hackable systems being subverted, misalignment between human/machine values.

(AlphaGo is a wave3-like system as it combined the neural net-based classification of moves with traditional planning/reasoning/tree-based pruning/search).



(FSCD Keynote #2) – Uniform Resource Analysis by Rewriting: Strengths and Weaknesses, George Moser

Can we use complexity measures on term rewriting systems to make complexity arguments about higher-order functions?

The main scheme is to convert a program via a complexity reflecting transformation into a Term Rewriting System (TRS), then do automated resource analysis on the TRSS to get an asymptotic bound. The first half of the talk is some background on doing complexity analysis (which was a bit fast for me so my notes are a bit incoherent below). There are some really nice complexity analysis tools for programs coming out of this work (links near the end).

Some preliminaries: Example, set up list reverse as a finite set of rewrite rules (a Term Rewriting System, TRS):

rev(xs)            -> rev'(xs, nil)
rev'(nil, acc)     -> acc
rev'(x :: xs, acc) -> rev'(xs, x :: acc)

A computation of this set of rules is the application of the rules from left to right. The rules are non-overlapping.

Can see rewriting as an abstraction of functional programming, or can see it as part of equational reasoning (universal algebra); but this really generalises functional programming (function symbols can appear nested on the left-hand side).

Definition: A TRS is terminating if the rewriting relation is “well-founded”.
Definition: A function symbol f is “defined” if it is the root symbol on the left of a rule (other f is a constructor).
Definition: Runtime complexity wrt. a terminating TRS is defined:

dh(t) = max \{ n \mid exists u . t \rightarrow^n u\}
rc(n) = max \{ dh(t) \mid \mathsf{size}(t) \leq n \wedge \textit{n and t are "basic"} \}.

RC is the runtime complexity; Q: is this a “natural notion” for rewriting? (I don’t know how one defined natural here).
Derivational complexity has no restriction on the terms,
dc(n) = max \{ dh(t) \mid \textsf{size}(t) \leq n\} This has been used mainly to make termination arguments.
See “Termination proofs and the length of derivations” (Hofbauer, Lautemann, 1989).

Definition: Multiset Path Order. A precedence order > induces a multiset path order >_{mpo}, (whose definition I’m not going to transcribe, Wikipedia has a definition) It basically says that s = f(s_1, ..., s_n) > t = g(t_1,...,t_m) if s is > all of t_i, or s is \geq some t_i.

The Hydra Battle- is this terminating? (Dershowitz and Jouannaud designed the TRS as a termination problem, later rectified by Dershowitz): The beast is a finite tree, each leaf corresponds to a head. Hercules chops off heads of the Hydra, but the Hydra regrows:

  • If the cut head has a pre-predecessor (grandmother) then the remaining subtree is multiplied by the stage of the game (heads regrowing and multiplying).
  • Otherwise, nothing happens (the head doesn’t regrow)

Can show it is terminating by an inductive argument over transfinite numbers. (but complexity is quite bad). See “The Hydra battle and Cicho’s principle” (Moser, 2009). 

The RTA list of open problem was mentioned: http://www.win.tue.nl/rtaloop/

TRSes can be represented via a matrix interpretation (details were too fast for me).
These techniques were developed into a fully automated complexity analysis tool for TRSes called tct-trs (modular complexity analysis framework).

Going back to the original goal: can we use this for higher-order programs? Consider the textbook example of reverse defined by a left-fold. Need to do a complexity preserving conversion into a TRS (complexity preserving transformation ensures lower bounds, complexity reflexing ensures upper bounds). First, defunctionalise via a rewrite system into a first-order system. Built a tool tct-hoca which works well for various small higher-order functional programs (with complexity proofs taking up to 1 minute to compute).  See the tools at:

What about “real” programs or “imperative” programs? Use ‘integer transition systems’ for modelling imperative programs, which is integrated into the tools now.

Some related work: Multivariate amortized source analysis (Hoffman, Aehlig, Hofmann, 2012) and a type-based approach “Towards automatic resource bound analysis for OCaml” (Hofmann, Das, Weng, POPL 2017) (note to self: read this!).

A previous challenge from Tobias Nipkow: how do you do this with data structures like splay trees (cf Tarjan)? There is some new work in progress. Build on having sized types so that we can account for tree size. An annotated signature then decorates function types with the size polynomial, e.g. \texttt{splay} : A \times T_n \xrightarrow{n^3 + 2} T_n (recursive function to create a splay tree). A type system for this was shown, with cost annotation judgment (no time to copy down any details).

The strength of the uniform resource analysis approach is its modularity and extensibility (different intermediate languages and complexity problems).
Weaknesses: the extensibility/modularity required some abstraction which weakens the proving power, so adding new kinds of analysis (e.g., constant amortised / logarithmic amortised) requires a lot of work.
Audience question: can I annotate my program with the expected complexity and get a counter-example if its false? (counter-examples, not yet).


(FSCD) Continuation Passing Style for Effect Handlers, Daniel Hillerström, Sam Lindley, Bob Atkey, KC Sivaramakrishnan

Consider two effects: nondeterminism and exceptions, in the drunk toss example (a drunk person tosses a coin and either catches it and gets a bool for head or tails, or drops the coin [failure/exception]).

drunkToss : Toss ! {Choose:Bool; Fail:Zero}
drunkToss = if do Choose
            then if do Choose then Heads else Tails
            else absurd do Fail

Induces a simple computation tree.

A modular handler, using row polymorphism (with variable r below)

allChoices : a ! {Choose : Bool; r} => List a ! {r}
allChoices = return x |-> [x]
             Choose k |-> k true ++ k false

fail : a ! {Fail: Zero; r} => List a ! {r}
fail = return x |-> [x]
       Fail k   |-> []

The ! is an annotation explaining the effect operations, k are continuations, r are row variables (universally quantified).  Two possible interpretations:

(1) handle the computation with fail first, then handle that with allChoices
returns the result [[Heads, Tails], []].

(2) handle with allChoices first then fail gives the result [].

So the order of the handling matters. The operational semantics for handlers is based on a substitution in the computation tree for the relevant operations.

How do handlers get compiled? Use CPS! (restricted subset of the lambda calculus, good for implementing control flow). Based on \lambda^{\rho}_{\textit{eff}} (Hillerstrom, Lindley 2016) – lambda calculus with effect handlers (separate syntactic category but contains lambda terms), row polymorphism, and monadic metalanguage style constructs for effectful let binding and ‘return’. The CPS translation is a homomorphism on all the lambda calculus syntax, but continuation passing on the effectful fragment (effectful let and return).  One part of the interpretation for the effectful operation term: [[\textbf{do} \, l \, V]] = \lambda k . \lambda h . \; h (l \; \langle[[V]], \lambda x . k \; x \; h\rangle) (see the paper for the full translation!)

A key part of the translation is to preserve the stack of handlers so that the order is preserved and (the interpretation) of operations can access up and down the stack.
A problem with the translation is that it yields administrative redexes (redundant beta-redexes) and is not tail-recursive. Use an uncurried CRPS where the stack of continuations is explicit (see Materzok and Biernacki 2012), this leads to a new translation which can manipulate the continuation stack directly (see paper). However, this still yields some administrative redexes. Solution: adopt a two-level calculus (Danvy, Nielsen 2033) which has two kinds of continuations: static (translation time) and dynamic (run time), where you can reify/reflect between the two. This let’s you squash out administrative redexes from the runtime to the static part, which can then be beta-reduced immediately at compile time (end up with no static redexes).
The paper proves that this translation preserves the operational semantics.
Implementation in the ‘links’ language: https://github.com/links-lang/links

I was going to ask about doing this typed: this is shown in the paper.


(ICFP) How to Prove Your Calculus Is Decidable: Practical Applications of Second-Order Algebraic Theories and Computation, Makoto Hamana

SOL (Second-Order Laboratory) – tool for analysing confluence and termination of second-order rewrite rules within Haskell. Provides an embedded Haskell DSL (via Template Haskell) for describing rewrite rules. SOL can then automatically check termination and confluence.

Confluence + termination => decidability. Can use this to prove a calculi is decidable.

Based on a higher-order version of Knuth-Bendix critical pair checking using extended HO pattern unification (Libal,Miller 2016).

For example, the theory of monads is decidable: given two well-typed terms s, t consistent of return, bind, and variables, the question  is s = t derivable from the three laws is decidable. How? First, orient the monad laws as rewrite laws. Check confluence (Church-Rosser, CR) and Strong Normalisation (SN)– together these imply their exists unique normal forms, which are then used to prove equivalence (reduce terms to their normal forms). So, how to prove CR and SN?

One way to prove Strong Normalisation is to assign weights and show that reductions reduce weights. Unfortunately, this does not work in the higher-order context of the monad operations. Another standard technique is to, use the “reducibility method” (see Tait,Girard and Lindley,Stark’05). SOL uses the ‘general schema criterion’ which works for general rewrite rules (Blanqui’00,’16) using various syntactic conditions (positivity, accessibility, safe use of recursive calls, metavariables). [sound but obviously not complete due to halting problem]. (Uses also Newman’s Lemma– a terminating TRS is confluent when it is locally confluent).

This tool can be applied to various calculi (see paper, 8 different systems). Sounds really useful when building new calculi. The tool was applied to a database of well know termination/confluence problems and could check 93/189 termination and 82/96 confluence problems.

“Let’s prove your calculus is decidable using SOL” (Makoto is happy to assist!)


(ICFP) A Relational Logic for Higher-Order Programs
Alejandro Aguirre, Gilles Barthe, Marco Gaboardi, Deepak Garg, Pierre-Yves Strub

Relational properties are things like, if X and Y are related by R then F(X) and F(Y) are related by R, or over two properties, e.g., the results are related instead by S.

Relational refinement types: \Gamma \vdash t_1 \sim t_2 : \{n \mid n_1 = n_2\} These can be used to express monotonicity properties, are syntax directed, and exploit structural similarities of code. However, say we want to prove the naturality of take, but formulating the right relational refinement type is difficult because the computations are structurally different.

Start with a basic logic with lambda-terms over simple inductive types, and a separate layer of predicates

RHOL has judgements \Gamma \mid \Psi \vdash t_1 : \tau_1 \sim t_2 : \tau_2 \mid \phi(r_1, r_2) where \phi is a binary predicate on properties of the two terms and \Psi gives assertions about the free variables. A key idea here is that the types and assertions are separate.

An example rule two-sided rule (where we inductively construct terms with the same syntactic constructor on both sides of a relation):

\dfrac{\Gamma, x_1 : \sigma_1, x_2 : \sigma_2 \mid \Psi, \phi' \vdash t_1 : \tau_1 \sim t_2 : \tau_2 \mid \phi} {\Gamma \mid \Psi \vdash \lambda x_1 : t_1 : \sigma_1 \rightarrow \tau_1 \sim \lambda x_2 . t_2 : \sigma_2 \rightarrow \tau_2  \mid \forall x_1 . \phi' \Rightarrow \phi[r_1 x_1 / r_1, \, r_2 x_2 / r_2]}.
This seems quite natural to me: abstraction introduces a universal quantification over the constraints attached to the free variables.

Rules can also be one side (where you construct a term on just one side of a relation).
The paper shows that RHOL is as expressive as HOL (by a mutual encoding).

Other relational typing system can be embedded into RHOL: Relational Refinement Types, DCC (Dependency Core Calculus, and RelCost (relational cost).

Q: if HOL and RHOL are equivalent (rather than their just being an embedding of HOL into RHOL say) then why have RHOL? Relational reasoning was gained which was not built into HO. Update: I discussed this further later with one of the authors, and they made the point that people have done relational-style reasoning in HOL but it requires taking a pairing of programs and building all the relational machinery by hand which is very cumbersome. RHOL has all this built in and the one-side and two-side rules let you reason about differently-structured terms much more easily. I can see this now.


(ICFP)
Foundations of Strong Call by Need

Thibaut Balabonski, Pablo Barenbaum, Eduardo Bonelli, Delia Kesner

What does this lambda term reduce to:  (\lambda x \, y. x \, x) \, (id \, id)? It depends on call-by-value, call-by-name, call-by-need, or full beta then you get different answers. The call-by-value, call-by-name, and call-by-need strategies are called weak because they do not reduce underneath lambda abstraction. As a consequence, they do not compute normal forms on their own.

Weak call-by-need strategy (the usual strategy outlined in the literature): procrastinate and remember. Beta reduction into a let binding (don’t evaluate the right-hand side of an application). (There was some quite technical reduction work here, but I noted this interesting rule on “bubbling” reductions out by splitting a nested reduction into two: t[v[u/y]/x] \rightarrow t[u/y][v/x]).

Strong call-by-need provide the call-by-need flavour of only evaluating needed values once, but crucially computes normal forms (by reducing under a lambda). This strong approach has two properties (conservative) the strong strategy first does whatever the weak strategy will do; and (complete) if there is a beta-normal form it will reach (a representative of) it. This comes by reducing under the lambda at some points. Consequently, this approach only ever duplicates values, not computations.

Q: If you implemented this in Haskell, would it make it faster? I didn’t quite hear the answer, but this would completely change the semantics as Haskell is impure with respect to exceptions and non-termination. It sounded like it would be more useful for something pure like Coq/Agda (where it could provide some performance improvement).


(ICFP) – No-Brainer CPS Conversion, Milo Davis, William Meehan, Olin Shivers

The Fischer/Reynolds algorithm for CPS transformation introduces administrative redexes. Since then, other algorithms have improved on this considerably (Danvy/Filinsky). In this approach, some additional reductions are applied to the term to get a smaller CPS term out. The moral of the story (or the key idea) is to treat the CPS converter like a compiler (use abstract representations, symbol tables).

Rules of the game: make the terms strictly smaller (greedy reductions), should be strongly normalisation, and confluent, keep the original algorithmic complexity of the CPS transforms (linear time).

Approach:
– Variables are the same size, so do beta redexes on variables
– Beta reduce application of a lambda to a lambda (\lambda x . e) (\lambda y . e') only when the number of reference to x in e is less than or equal to 1 (so things don’t explode).
– Eta reduce (??only when there are no references to the variable in the body??) (I think this is what was said, but I think I might have made a mistake here).

Relies on some machinery: doing reference counts (variable use count), abstract representation of continuations (halt, variables, function continuations, and application continuations) rather than going straight to syntax and syntax constructor functions (which can do reductions) to dovetail with this, and explicit environments to be used with function closures.


(ICFP) Compiling to Categories, Conal Elliott

Why overload? Get a common vocabulary, laws for modular reasoning.

But generally doesn’t apply to lambda, variables, and application. So what to do? Eliminate them! Rewrite lambda terms into combinators: const, id, apply, pairing of functions (fork), curry, and keep going…. and you can automate this via a compiler plugin (so you don’t have to write code like this yourself).

This is implemented by the interface (algebra) of a category (for function composition / identities) + Cartesian (product) structure + coproducts + exponents: giving you (bi)Cartesian-closed categories. Implement this as class/interface so you can give different realisations/compilations, e.g., a graph drawing implementation (Conal showed some pretty pictures generated from this).

Another implementation is to generate Verilog (hardware descriptions), or to graphics shader code.

What about compiling to the derivative of a program? Represent this via the type:
newtype D a b = D (a -> b x (a -o b)) i.e., a differentiable function is a function which produces its result and its derivative as a linear map (an implementation was then shown, see paper). You can then compose interpretations, e.g., graph interpretation with derivative interpretation.

Another interpretation: interval analysis (e.g., Interval Double = (Double, Double)). There are more examples in the paper, including constraint solving via SMT by writing functions to Bool which are then compiled directly into SMT constraints (rather than using existing embeddings, e.g., SBV).

Q: what about recursion? Can be done (compiling from a letrec into a fix-point combinator class).


 (ICFP) Visitors Unchained, François Pottier

(Humorous tagline from Francois: Objects at ICFP?! Binders again?!)

Manipulating abstract syntax with binding can be a chore: nameplate (boilerplate on names, term coined by James Cheney). Large and easy to get wrong (especially in weakly typed languages, non-dependent languages). This paper is a way of getting rid of nameplate in OCaml via a library and an syntax extension. Supports multiple representations, complex binding, is modular, and relies on just a small amount of code generation.

Several Haskell libraries support this via its support for datatype generic programming. This will use OCaml’s object technology. Annotating a data type with [@@deriving visitors { variety = 'map' }] automatically generates a visitor pattern (a class) for doing a map-operations on the datatype. To use this, create a local object inheriting the visitor class and override one or more methods, e.g., for a syntax tree type, override the visit method for the “add” node in order to implement the rewriting of e+0 -> e.
Visitors are quite versatile: easy to customise behaviour via inheritance which is something you can’t do in FP without a lot of redesign to your algorithms.

Want to traverse syntax with binders, considering three things (from three perspectives or “users” in the talk here):
(1) “End user” describes the structure of an AST, parameterised by types of bound names and free names and using a parametric data type abs whenever a binding is needed (e.g., in a lambda). This should also deriving the map visitor(see above) with option ["BindingForms.map"].
(2) “Binding library” (provided as part of this work) defines the abs type which capture the binding construct, but they leave the scope extrusion function extend as a parameter (extend gets used when you traverse a binder to go inside a scope, e.g., going into the body of a lambda and extending the context by the bound variable).
(3) The last part defines how to represent environments (and lookups) via an overriding a visitor, nominal terms, and scope extrusion. (The work provides some implementations of these part in a library too).

As seen above, the structure of this library is fairly modular. The “end user” has to glue all these things together.
One limitation is that nonlinear patterns (e.g., as in Erlang) can’t be represented (I didn’t immediately see why though).


(ICFP) – Staged Generic Programming, Jeremy Yallop

Advert at the start: if you have an abstract library that is not very high-performance, then the techniques here could be useful to you.

Generic programming a la Scrap Your Boilerplate, for any data type get automatic traversals e.g., listify (matches c) for some constant c, gives you a list of all nodes in a data type matching c; as a programming you don’t need to write listify or matches: they are generated. Requires internally a way to represent types as data and to map data types into a generic form, a collection of traversals over the general form, and generic schemes which plug together the traversals with parameter generic queries. (In Haskell see the syb). However, the overhead of this roughly 20x writing the code manually! This overhead was quite portable: reimplementing Scrap Your Boilerplate into OCaml had about the same amount of slow down compared with manual traversals.

Why this slow? Type comparisons are slow, lots of indirection, and lots of unnecessary applications of applications. (Dominic: to me it seems like it ends up a bit like the slowness involved in dynamically typed languages).

Solution: use staging to optimise the datatype generic implementation. Staging lets you explain which parts of the program can be evaluated early, and which bits need to be evaluated later (due to waiting for inputs)– these are the quoted bits (inside  with the option to jump back into the early part by backquoting ~).

Naively staging SYB keeps the type representation unchanged, but the shallow traversals and recursion schemes can be quoted judiciously such that type comparison, indirect calls, and polymorphism are all eliminated. (Jeremy showed us the code output.. but it has lots of noise and opportunities for improvement).

Instead, use a special fixed-point operator that inlines non-recursive functions and perform (semantics preserving) code transformations as well: e.g., decomposing the generic map operation.

One technique is to allow equations on “partially-static data” e.g., if you have [] @ x @ [] (i.e., static empty list concat dynamic list x concat static empty list []) this still can be statically rewritten into x. Thus apply laws on the static parts as far as possible.
Another technique was to eta reduces matches that have identical branches.
(I think there were a few more thing here). Ends up with much tighter code which performs very close to or even better than the hand-written code.

Q: How does this compare to the Template-your-Boilerplate library for Haskell?
I completely missed the answer as I was looking up this library which I really should be using…!

ICFP / FSCD day 1 – rough notes

(Blog posts for Day 1, Day 2, Day 3, Day 4 (half day))

I decided to take electronic notes at ICFP and FSCD (colocated) this year, and following the example of various people who put their conference notes online (which I’ve found useful), I thought I would attempt the same. However, there is a big caveat: my notes are going to be partial and may be incorrect; my apologies to the speakers for any mistakes.


(ICFP keynote #1) Computational Creativity, Chris Martens (slides)

In the early days, automated theorem proving by computers was seen as an AI activity. Theorem proving, program synthesis, and AI planning are search procedures but can be viewed as creative procedures.

Linear logic is useful for describing plots (story telling).

There is a difference between what an AI opponent and AI cooperator should do: an opponent need not make intelligible moves, but a cooperator needs to act in a way that is intelligible/understandable to the human player.

Applied Grice’s maxims of conversation to build an “intentional” cooperative player for the Hanabi game. Grice’s maxims are: quantity (not too much or little), quality (tell the truth), relation (be relevant), manner (avoid ambiguity).

Theory of mind: forming mental models of other people’s mental models.

Dynamic epistemic logic, has two indexed modalities:
\Box_a A   (agent ‘a’ believes A)
[\alpha] A   (A holds true after action \alpha).

Actions are defined inductively:
\begin{array}{rll} \alpha, \beta & = \\ & \textit{flip} \; p & \text{(change truth value)} \\ \mid & ?A & \text{(precondition)} \\ \mid & \alpha + \beta & \text{(non deterministic choice)} \\ \mid & \alpha ; \beta  & \text{(sequence)} \\ \mid & \alpha^a & \text{(appearance to agent 'a')} \\ \mid & \alpha* & \text{(public action)} \end{array}

Semantics is based on a possible worlds formulation (current actions change future possible worlds). Ostari (Eger & Martens) is a full epistemic logic language that can capture lots of different games with different kinds of knowledge between players.

Key point: use compositionally (of various kinds) as a design principle

Three research challenges:

  • Dynamic logics with functional action languages
  • Constructive/lambda-calculable DEL
  • Epistemic session types

(FSCD keynote #1) Brzozowski Goes Concurrent – Alexandra Silva

Ongoing project CoNeCo (Concurrency, Networks, and Coinduction)

SDN – Software Defined Network architectures let you write a program for the network controller. Requires languages for the controller to interact with the underlying switches / routers. Goals of new network PL: raise the level of abstraction beyond the underlying OpenFlow API– make it easier to reason about. Based on Kleene algebra.

NetKat – based on regular expression (Kleene algebra) with tests (KAT) + additional specialized constructions relating to networking. This is compiled to OpenFlow.

Kleene algebra, e.g., (0 + 1(01^*0)^*1)^* – multiples of 3 in binary.

For reasoning about imperative programs, this can be extended with the idea of ‘tests’ (Kozen). Need to capture more of the control flow graph, but there are guards (tests). The solution is to split the underlying alphabet of the Kleene algebra into two: one for actions and one for tests: combine a Kleene algebra and a Boolean algebra (where there is negation), e.g. $\bar{b}$ is the negation of b. Then the three standard control flow constructs of an imperative language are defined:

\begin{array}{rl} p ; q & = pq \\ \textbf{if} \; b \; \textbf{then} \; p \; \textbf{else} \; q & = bp + \bar{b}q \\ \textbf{while} \; b \; \textbf{do} \; p & = (bp)^*\bar{b} \end{array}

Subsumes Hoare logic, i.e.:
b \{p\} c \Leftrightarrow b\,p\,\bar{c} = 0

Decidable in PSPACE.

What is the minimum number of constructs needed to be added to KAT to deal with packet flow?

A packet is an assignment of constant values n to fields x a packet history is a nonempty sequence of packets. Include assignments (field <- value) and tests on fields (filed = value). e.g.,

sw = 6; pt = 88; dest <- 10.0.0.1; pt <- 50

For all packets incoming on port 88 of switch 6 set the destination IP to 10.0.0.1 and send the packet to port 50.

Can then reason about reachability (can two hosts communicate)? Security (does all untrusted traffic pass through an intrusion detection)? loop detection (packet to be stuck in a forwarding cycle).

Since networks are non-deterministic, this has been adapted to probabilistic NetKAT.

Can we add concurrency to NetKAT (deal with interaction between different components): within this paradigm, can we add concurrency to Kleene algebra? Concurrent Kleene algebra adds a parallel composition || (cf. Hoare 2011). Would like to reason about weak memory model like behaviour (cf. Sewell’s litmus tests). There were developments in concurrent Kleene algebra, but a solid foundations was not yet available.

Have proved a Kleene theorems: equivalence between Kleene algebra and finite automata. In this case, the corresponding automata to a concurrent Kleene algebra is a pomset automata.

Can make Concurrent Kleene algebras into regular subsets of pomsets (useful model of concurrency, cf Gischer) i.e., [-] : 2^{\textsf{Pom}_{\Sigma}}.

Pomset automata- has the idea of splitting into two threads (two paths) that once they reach their accept states, then go onto the next state of the originating state (can be expressed as multiple automata). This reminds me of Milner’s interleaving semantics (for atomic actions) in CCS. Alphabet of the automata is a pomset over some other alphabet of actions.

Showed the (Brzozowski) construction to build an automata from a Concurrent Kleene algebra term, and then how to go back from automata to expressions.

The exchange law captures the interaction of sequential and parallel interaction. Currently working on extending the Kleene theorem in this work to include the exchange law.



(ICFP) Faster Coroutine Pipelines, Mike Spivey

(I missed the start and had some trouble catching up).

Coroutine pipelines are a way to structure stream-processing programs. There are libraries for doing this that are based on a term encoding that is then interpreted which suffers serious slow-down when there is process nesting. This paper proposes an alternate approaching using continuation which doesn’t suffer these slow down problems.

The direct style can be mapped to the CPS style which is a monad morphism between two Pipe monads (capturing the coroutine pipelines). To show that the two models are equivalent could be done with logical relations (Mike recommended Relating models of backtracking by Wand and Vaillancourt (2004)).

Termination. Consider a situation:
p1 \mid\mid ((p2 \mid\mid p3 \mid\mid p4) ; q) \mid\mid p5
where p3 then terminates.

Need to end up with p1 \mid\mid q \mid\mid p5 not p1 \mid\mid ((p2 \mid\mid p4) ; q) \mid\mid p5. (I wasn’t sure if this was an open question or whether there was a solution in this paper).

Ohad Kammar asked on the ICFP slack whether this relates to
Ploeg and Kiselyov’s paper at Haskell 2014 but this didn’t get asked.


(ICFP) A pretty but not greedy printer (Functional pearl), Jean-Philippe Bernardy

(slides: https://github.com/jyp/prettiest/blob/master/talk/Outline.org)

Hughes proposed a pretty printing approach, which was improved upon by Wadler. For example, if you want to pretty print some SExpr but with a maximum column width: when do you spill onto the next line?

Proposes three laws of pretty-printing:
1). Do not print beyond the right margin
2). Do not reveal the structure of the input
3). Use as few lines as possible.

(Hughes breaks 3 and Wadler breaks 2 (I didn’t spot how in the talk)).
These three rules are not compatible with a greedy approach.
This work trades performance for being law-abiding instead.

A small API is provided to write pretty printers (four combinators:
text, flush, (horizontal composition), ).

Phil Wadler asked how you represent hang in the library. JP quickly typed up the solution:
hang x y = (x text " " y) (x $$ (text " " y))

Phil countered that he could do the same, but JP answered that it didn’t have the expected behaviour. I’m not entirely sure what hang does, but in the Hughes-based library (https://hackage.haskell.org/package/pretty-1.1.3.5/docs/Text-PrettyPrint.html), hang :: Doc -> Int -> Doc -> Docwith behaviour like:

Prelude Text.PrettyPrint> hang (text "Foo\nBar") 0 (text "Bar")
Foo
Bar Bar

(not sure what the extra Int argument is for).


(ICFP) Generic Functional Parallel Algorithms: Scan and FFT – Conal Elliott

Arrays are the dominant type for parallel programming, but FP uses a variety of data types. Generic programming decomposes data types into fundamental building blocks (sums, products, composition).

Perfect trees can be represented as a n-wise composition of a functor $h$.

Prefix sum (left scan)

b_k = \Sigma_{1 \leq i \leq k} a_i
for k = 1, ..., n+1, e.g., scan [1,2,3,4] = [1,3,6,10]
Define more generally as:

class Functor f => LScan f where
  lscan :: Monoid a => f a -> f a x a

Instances were then given for the components of a datatype generic representation, (constant types, type variable, sum, product, composition) which gives data type generic scan. (Insight: Do left scans on left vectors, not left scans on right vectors! The latter adds a lot of overhead).

Data types were describe numerically, e.g., 2 is a pair, \stackrel{\leftarrow}{4} is a 4-vector.

Can do the esame with Discrete Fourier Transform. A nice property is that a
DFT can be factored into separate parts (Johnson 2010, e.g., a 1D DFT of size N = N1*N2 is equivalent to a 2D DFT of size N1xN2– I couldn’t see this so well on the slide as I was in the overflow room).


(ICFP) A Unified Approach to Solving Seven Programming Problems (Functional Pearl) – William E. Byrd, Michael Ballantyne, Gregory Rosenblatt, Matthew Might

Started off referencing http://matt.might.net/articles/i-love-you-in-racket/ – 99 ways to say “I love you” in Racket (all based on different list operations). But what if you wanted more, for some other language? How could you generate this, to say get 1000, or 10000 such examples? The solution put forward here is to using miniKanren, the constraint logic language. The eval function is turned into a relation then can be “run” in reverse, (i.e., run 99 (q) (evalo q '(I love you)) gives 99 S-expressions that evaluate to ​(I love you)). What about for generating quines (programs that evaluate to themselves)? (run 1 (e) (evalo e e)) produced just the 0 expression and getting three results was just 0#t and #f but with 4 it produced a more interesting quine.

Then, William showed (run n (p q) (eval p q) (evalo q p) (/= q p) generating “twines” (pair of mutually producing programs). This was then applied to a proof checker to turn it into an automated theorem prover! The same technology was used then for program synthesis (although some additional work was needed under the hood to make it fast enough).


(FSCD) Relating System F and λ2: A Case Study in Coq, Abella and Beluga – Jonas KaiserBrigitte PientkaGert Smolka

System F (Girard ’72): Two-sorted, types and terms, based on the presentation of System F in Harper’13. Meanwhile, study of CC lead to Pure Type Systems- System F appeared in the Lambda Cube at the corner \lambda2. This work shows the formal relation (that indeed these two systems are equivalent). A proof is partially given in Geuvers ’93, but a full Coq formalisation was given by Kaiser, Tebbi and Smolka at POPL 2017. In this work, the proof is replayed across three tools to provide a useful comparison of the three systems: Coq, Abella and Beluga.

A complication is that the syntax is non-uniform on the System F side: \Pi x : a . b in PTS can correspond to A \rightarrow B and \forall X . B. Have to also keep track of context assumptions. One (of many) “topics of interest” in this comparison is how to manage contexts (which is interesting to me as I am quite interested in CMTT in Beluga).

The approaches were:

  • Coq – first-order de Bruijn indices, parallel substitutions (Autosubst library), and invariants (e.g, a context can be extended with a new term variable). Inductive predicate for typing judgments. Traversal of binders requires context adjustments.
  • Abella – HOAS, nominal quantification (fresh names handled internally), relation proof search [Abella has two layers: specification and logic]. Contexts represented by lists (as in Coq). A compound inductive predicate is define to relate the different context representations (and to keep them all consistent).
  • Beluga – HOAS, 1st class contexts (via Contextual Modal Type Theory) and context schemas. Object K (e.g., terms, types, deriviations) are alway paired with a 1-st class context \Gamma that gives it meaning [\Gamma \vdash K]. There is no concept of free variable., e.g., in Coq 0 \vdash 0_{ty} \rightarrow 0_{ty} \textsf{ty} \Rightarrow \bot is provable but in Belluga \bullet \vdash 0 is not even well formed (accessing the first variable 0 (think de Bruijns) in an empty context). Context schemas provide a way to give the structure of the typed contexts (as a dependent record).

(ICFP) A Framework for Adaptive Differential Privacy – Daniel Winograd-CortAndreas HaeberlenAaron RothBenjamin C. Pierce

Associated GitHub: https://github.com/dwincort/AdaptiveFuzz

They created a new language, Adaptive Fuzz for adaptive differential privacy (DP), uses Fuzz type checking to verify individual pieces in a piecewise static way.

DFuzz is a dependently typed version of Fuzz, can we use this? Privacy cost dependent on feature selection: which could depend on values in the databases: but this is practically useless as we don’t know what is in the database. We can still get static guarantees if it is done piece-wise in two modes: data mode (access to sensitive data, uses the Fuzz type checker to provide DP proofs) and adaptive mode (computation between pieces). End up with “light” dependent types.

These two modes have the same syntax though (code it one works in the other). Here is an example type for a function that scales one number by the other (multiplication by abs):

\textsf{scale} \; (c : \mathbb{R}) \; (n : [\textsf{abs} \; c] \mathbb{R}) : \mathbb{R}

(scale the second number by the first argument). In adaptive mode the annotations are ignored, in data mode the first argument is forced to be a constant (partial evaluation happens under the hood so we don’t need dependent types really).

General scheme is: analyst (program) writes data mode code, it is partially evaluated and the Fuzz type checker works out a cost, which is the compared against a privacy filter (where/how is this defined?) Can write a gradient descent algorithm with different levels of randomness provided by different privacy budgets (how much information do we want to reveal at the end, e.g., if budget is infinite then you get not noise in the result). (another example in the paper is feature selection).

This seems like a really interesting mix of a static and semi-static approach (via partial evaluation). I wonder how this relates to fine-grained cost-tracking type systems?


(ICFP) – Symbolic Conditioning of Arrays in Probabilistic Programs –
Praveen Narayanan, Chung-chieh Shan

(tag line: Bayesian inference via program transformation).
Say you are benchmarking some iterative code, which you model by a n + b for some startup cost b and n iterations and a a cost for the loop. You might then collect some results and do linear regression (best fist line) to see if this model is confirmed. Bayesian linear regression gives you many lines for different kinds of fit.

  1. Write a generative model, e.g., some normally distributed n to generate a set of possible lines (normally distributed gradients), to which we maybe then add some additional normally distributed noise for each point.  (k measurements)
  2. Observe running times
  3. Infer distribution of lines. Delete those candidates from the generative model that weren’t observed (from step 2), keep the ones that were.

Bayesian inference is compositional: you can build it from different components, by treating distributions as programs (e.g., a program normal (a * n_1 + b) is gives us one of the lines in the generative model) and then transforming the program. A program transformation implements the model-theoretic notion of “disintegration” but the existing approach for this is problematic (I wasn’t quite clear on how). A new transformation approach is proposed which provides a smaller output program via some additional combinators which replace an unrolled set of k observations into a replication like combination (plate).

Definitional interpreters: programs that take ASTs and run them, giving an executable semantics. Key challenge: a parameterised interpreter that recovers both concrete and abstract semantics.

A concrete interpreter for the lambda calculus: exp -> m (val) where m is a monad which (at least) captures the environment and the store (which is used for representing the variable environment). The definition is written non-recursive “unfixed” (unrolled), which can be fixed (e.g., apply to Y combinator) later to run the interpreter. By leaving it unfixed, we can interpose different behaviour in the recursive cases.

To make the interpreter abstract, abstract the primitive operations and types, e.g., integers are extended with an ‘abstract’ value, i.e., \mathbb{Z} \cup \{\texttt{'N}\} where 'N represents the abstract integer. The semantics is then main non-deterministic so that an iszero? predicate on 'N splits into two execution traces. But its partial (could run forever).

How to make it total (always terminating). Look for states that it has already seen before, sufficient for termination but unsound for abstraction. If you observe fact 'N then one branch fails (non-terminating as visit previous state) and the 1 branch succeeds, so the result is just 1 (not ideal). Instead, look in a “cache” of precomputed values to return something like [[fact 'N]] = {1} U {'N x $[fact 'N]}. Intercept recursion points to evaluate through a cache and to stop when a previously computed value is hit (sounds space intensive?).


(ICFP) – On the Expressive Power of User-Defined Effects: Effect Handlers, Monadic Reflection, Delimited Control – Yannick Forster, Ohad Kammar, Sam Lindley, Matija Pretnar

Say you want to write a program like:

toggle = { x <- get!
           y <- not! x
           put! y
           x }

If we don’t have effects, then we can do explicit state passing, but you have to do a full rewrite. Ideally want only local transformations. (Ohad showed this in three different styles, see below).

Relative expressiveness in language design, compare/contrast: (1 – Eff) algebraic effects and handlers (2 – Mon) monads (3 – Del) delimited control. Did this by extending CBPV in these three directions (and formalised this in Abella) and defining macro translations between every pair of these extended languages. Expressivity is stated as formal translations between calculi. Then this was considering in the typed setting. Interestingly there is no typed macro translation from Eff to Del nor from Eff to Mon.

[There is a large design space which yields lots of different languages, this study is a start. Inexpressivity is brittle: adding other language features can change the result.]

In the Eff version, we give standard state handlers to implement get and put. Its type contains the effect-system like information State = {get : 1 -> bit, put : bit -> 1}where toggle : U_State F bit.

In the Mon version, monadic reflection is used with a state monad.
(Q: put was of type U (bit -> F) but I thought the monad would be T = U F?)

(Q: In the translation from Mon -> Eff, how do generate the effect type if the source program only uses put or get ? I guess it just maps to the set of all effect operations).

(Q: the talked showed just simple state effects; are the results shown for any other kind of effect).


(ICFP) – Imperative Functional Programs That Explain Their WorkWilmer Ricciotti, Jan Stolarek, Roly Perera, James Cheney

https://github.com/jstolarek/slicer

Slicing can be used to explain particular (parts of) a program output: which parts of the source code contributed to a value. Extend TML (Transparent ML) to Imperative TML.

Construct backwards and forward slicing as a Galois connection (between two lattices). The lattices are based on the idea of definedness: where ‘holes’ make programs more undefined, e.g., 1 + 2 is above 1 + \Box and \box + 2 (partial expressions). Forward slicing preserves meets in the lattice; backward slicing should be consistent and minimal with respect to forward slicing (I missed what this meant exactly, but I talked to Janek later to clarify: consistency is x \leq \textsf{fwd}(\textsf{bwd}(x))). The idea then is that we have \textsf{bwd} : values_\Box \rightarrow expr_\Box and \textsf{fwd} : expr_\Box \rightarrow values_\Box which form a Galois connection, i.e., \textsf{bwd}(x) \leq y  \Leftrightarrow x \leq \textsf{fwd}(y). [Given one of the slicings we don’t necessarily get the other for free].

Given a small program l1 := 0; (!l1, !l2) a partial version of it \Box; (!l1, !l) and an initial store [l1 -> 1, l2 -> 2] should forward slice to (\Box, 2) by propagating the “hole” through the store.


(ICFP) Effect-Driven QuickChecking of Compilers – Jan Midtgaard, Mathias Nygaard Justesen, Patrick Kasting, Flemming Nielson, Hanne Riis Nielson

They made a generator for writing OCaml programs: useful for automated testing, e.g., comparing two compilers against each other (compile -> run -> diff, any difference will be suspicious).

Generated a program let k = (let i = print_newline () in fun q -> fun i -> "") () in 0 which found a bug: different behaviour between the native code generator and the bytecode generator due to effects being delayed (indefinitely). Another example was found (related), due to effects being removed.

Program generation, how to do it:

  • Try: generate arbitrary strings?! Most won’t lex or parse).
  • Instead, follow the grammar (see Celento et al ’80). Most won’t type check!
  • Instead, follow the simply-typed lambda calculus (Palka et al. ’11) by bottom-up reading of a typing relation. This will make it through the type checker (I guess you can generalise the type rules).

Observed behaviour in OCaml depends on evaluation order (when effects are involved). OCaml bytecode uses right-to-left but sometimes native code backend uses left-to-right (hm!) But the specification is itself a bit loose. Can we avoid generating such programs since they are not well specified for anyway?

  • Instead, following a type-and-effect systems which has one bit for marking pure/effectful (boolean lattice effect system). Extend to a pair of bits, where the second bit is order dependence (extend the effect algebra a bit more). Our goal is then a program which may have effects but may not have order-dependent effects.

Have an effect-dependent soundness theorem (that the effect bits correctly anticipate the effects).

Next, there is a type preserving shrinker to reduce huge ugly generated programs into a minimal (much nicer) example. e.g., replace a pure integer expression by a literal 0 and large literals by small literals, replace beta-redexes by let bindings, etc. Shrunk tests are checked whether they still show a difference.

A question was asked whether the effect-based mechanism could be used to rule-out other allowed difference, e.g., in floating point. Jan answered that floating point isn’t generated anyway, but an interesting idea.

Another question: how do you avoid generating non-terminating programs? Jan explained that since it is STLC (+ extension) only total terms are generated.

An afternoon of DTT/theorem provers: Agda and Coq

Note from 23rd August, 2017
I found this draft blog post lying around, written in the spring of 2015 while I was working at Imperial College London as a Research Associate in the Mobility Reading Group with Nobuko Yoshida. This was the fruit of a discussion with Tiago Cogumbreiro where we were comparing Coq and Agda for theorem proving. I believe we had a few more “side-by-side” comparisons written down, but this is all I have here: simple code for showing that addition of two even numbers yields an even number. It might prove to be a useful reference for someone else (at the time, I had a hard time remembering Coq syntax as I didn’t use it very often, and Tiago kindly gave me some pointers) so I have posted it “as is” now, over two years later.


Agda Coq
data Nat : Set where
  zero : Nat
  succ : Nat -> Nat

data Even : Nat -> Set where
  even_base : Even zero
  even_step : forall {n} -> Even n -> Even (succ (succ n))
Inductive Nat :=
  | zero : Nat
  | succ : Nat -> Nat.

Inductive Even: Nat -> Prop :=
  | even_base : Even zero
  | even_step : forall n, Even n -> Even (succ (succ n)).
_+_ : Nat -> Nat -> Nat
zero + n = n
(succ n) + m = succ (n + m)
Fixpoint add (n:Nat) (m:Nat) :=
  match n with
    | zero => m
    | succ n' => succ (add n' m)
  end.
even_sum : forall {n m} -> Even n -> Even m -> Even (n + m)
even_sum even_base x      = x
even_sum (even_step x) y  = even_step (even_sum x y)
Lemma add_succ:
  forall n m, add (succ n) m  = succ (add n m).
Proof.
  auto.
Qed.

Lemma even_sum :
  forall n m, Even n -> Even m -> Even (add n m).
Proof.
  intros n m even_n even_m.
  induction even_n.
   (* case even_base *)
   - simpl.
     assumption.
   (* case even_step *)
   - repeat (rewrite add_succ).
     apply even_step.
     assumption.
Qed.