Scrap Your Reprinter

Back in 2013, Andrew Rice and I were doing some initial groundwork on how to build tools to help scientists write better code (e.g., with the help of refactoring tools and verification tools). We talked to a lot of scientists who wrote Fortran almost exclusively, so we started creating infrastructure for building tools to work on Fortran. This was the kernel of the CamFort project for which we got an EPSRC grant in 2015 (which is ongoing). The CamFort tool now has a couple of fairly well developed specification/verification features, and a few refactoring features. Early on, I started building everything in Haskell using the brilliant uniplate library, based on the Scrap Your Boilerplate [1] work. This helped us to get the tool off the ground quickly by utilising the power of datatype generic programming. Fairly quickly we hit upon an interesting problem with building refactoring tools: how do you output source code for a refactored AST whilst preserving all the original comments and white space? It is not enough just to pretty print the AST, unless your AST contains all the comments and layout information. Building a parser to capture all this information is extremely hard, and we use a parser generator which limits flexibility (but is really useful for a large grammar). Another approach is to output patch/edit information for the original source code, calculated from the AST.

In the end, I came up with a datatype generic algorithm which I call the reprinter. The reprinter takes the original source code and an updated AST (which contains location information) and maps them into a new piece of source code. Here is an illustration which I’ll briefly explain:


Some source text (arithmetic code in prefix notation here) is parsed into an AST. The AST contains the “spans” of each syntactic fragment: the start position and end position in the original source code (for simplicity in this illustration, just the column number is represented). Some transformation/refactoring is applied next. In this case, the transformation rewrites redundant additions of 0, which happens in the node coming from source locations 10 to 16. The refactored node is marked in red. The reprinting then runs, stitching together the original source code with the updated source tree. A pretty printer is used to generate code for any new nodes, but all the original source text for the other nodes is preserved. The cool thing about this algorithm is that it is datatype generic: it works for any datatype, with some modest side conditions about storing source spans. The implementation uses the Scrap Your Zipper [2] library to do a context-dependent generic traversal of a datatype. In essence, the algorithm is similar to what one might do if you were to spit out edit information from an AST, then apply this to a piece of source text. But, the algorithm does this generically, and in a single simultaneous pass of the AST and the input source text.

I’ve always thought it was a cute and useful algorithm, which combined some cool techniques from functional programming.  As with all the “Scrap Your X” libraries it saves huge amounts of time and messing around, especially when your AST representation keeps changing (which it did/does for us). The algorithm is really useful anywhere you need to update human-written source code in a layout-preserving way; for example, IDEs and refactoring tools but also in interactive theorem provers and program synthesis tools, where you need to synthesise source text into some existing human-written code. This is one of the ways it is used in CamFort, where specifications are synthesised from code analysis data and then inserted as comments into user code.

This summer, I was fortunate enough to have the resources to hire several interns. One of the interns, Harry Clarke, (amongst other things) worked with me to tidy up the code for the reprinter, add some better interfaces, make it usable as a library for others, and write it all up. He presented the work at IFL 2017 and the pre-proceedings version of the paper is available. We are working on a post-proceedings version for December, so any comments gratefully appreciated.

[1] Lämmel, Ralf, and Simon Peyton Jones. Scrap your boilerplate: a practical design pattern for generic programming. Vol. 38. No. 3. ACM, 2003.

[2] Adams, Michael D. “Scrap your zippers: a generic zipper for heterogeneous types.” Proceedings of the 6th ACM SIGPLAN workshop on Generic programming. ACM, 2010.


Automatic SIMD Vectorization for Haskell and ICFP 2013

I had a great time at ICFP 2013 this year where I presented my paper “Automatic SIMD Vectorization for Haskell”, which was joint work with Leaf Petersen and Neal Glew of Intel Labs. The full paper and slides are available online. Our paper details the vectorization process in the Intel Labs Haskell Research Compiler (HRC) which gets decent speedups on numerical code (between 2-7x on 256-bit vector registers). It was nice to be able to talk about HRC and share the results. Paul (Hai) Liu also gave a talk at the Haskell Symposium which has more details about HRC than the vectorization paper (see the paper here with Neal Glew, Leaf Petersen, and Todd Anderson). Hopefully there will be a public release of HRC in future.  

Still more to do

It’s been exciting to see the performance gains in compiled functional code over the last few years, and its encouraging to see that there is still much more we can do and explore. HRC outperforms GHC on roughly 50% of the benchmarks, showing some interesting trade-offs going on in the two compilers. HRC is particularly good at compiling high-throughput numerical code, thanks to various strictness/unboxing optimisations (and the vectorizer), but there is still more to be done.

Don’t throw away information about your programs

One thing I emphasized in my talk was the importance of keeping, not throwing away, the information encoded in our programs as we progress through the compiler stack. In the HRC vectorizer project, Haskell’s Data.Vector library was modified to distinguish between mutable array operations and “initializing writes”, a property which then gets encoded directly in HRC’s intermediate representation. This makes vectorization discovery much easier. We aim to preserve as much effect information around as possible in the IR from the original Haskell source.

This connected nicely with something Ben Lippmeier emphasised in his Haskell Symposium paper this year (“Data Flow Fusion with Series Expressions in Haskell“, joint with Manuel Chakravarty, Gabriele Keller and Amos Robinson). They provide a combinator library for first-order non-recursive dataflow computations which is guaranteed to be optimised using flow fusion (outperforming current stream fusion techniques). The important point Ben made is that, if your program fits the pattern, this optimisation is guaranteed. As well as being good for the compiler, this provides an obvious cost model for the user (no more games trying to coax the compiler into optimising in a particular way).

This is something that I have explored in the Ypnos array language, where the syntax is restricted to give (fairly strong) language invariants that guarantee parallelism and various optimisations, without undecidable analyses. The idea is to make static as much effect and coeffect (context dependence) information as possible. In Ypnos, this was so successful that I was able to encode the Ypnos’ language invariant of no out-of-bounds array access directly in Haskell’s type system (shown in the DSL’11 paper; this concept was also discussed briefly in my short language design essay).

This is a big selling point for DSLs in general: restrict a language such that various program properties are statically decidable, facilitating verification and optimisation.

Ypnos has actually had some more development in the past year, so if things progress further, there may be some new results to report on. I hope to be posting again soon about more research, including the ongoing work with Tomas Petricek on coeffect systems, and various other things I have been playing with. – D

Paper: Haskell Type Constraints Unleashed

Tom Schrijvers and I have a new paper describing extensions to Haskell’s type-constraint term language, which considerably increases its flexibility. These extensions are particularly useful when writing polymorphic EDSLs in Haskell, thus expanding Haskell’s capacity for embedding DSLs.


The popular Glasgow Haskell Compiler extends the Haskell 98 type system with several powerful features, leading to an expressive language of type terms. In contrast, constraints over types have received much less attention, creating an imbalance in the expressivity of the type system. In this paper, we rectify the imbalance, transferring familiar type-level constructs, synonyms and families, to the language
of constraints, providing a symmetrical set of features at the type-level and constraint-level. We introduce constraint synonyms and constraint families, and illustrate their increased expressivity for improving the utility of polymorphic EDSLs in Haskell, amongst other examples. We provide a discussion of the semantics of the new features relative to existing type system features and similar proposals, including details of termination.

[draft pdf submitted to FLOPS 2010]
Any feedback is most welcome. Blog posts to follow if you are too lazy to read the paper.

Paper: Ypnos, Declarative Parallel Structured Grid Programming

Max Bolingbroke, Alan Mycroft, and I have written a paper on a new DSL for programming structured grid computations with the view to parallelisation, called Ypnos, submitted to DAMP ’10]


A fully automatic, compiler-driven approach to parallelisation can result in unpredictable time and space costs for compiled code. On the other hand, a fully manual, human-driven approach to parallelisation can be long, tedious, prone to errors, hard to debug, and often architecture-specific. We present a declarative domain-specific language, Ypnos, for expressing structured grid computations which encourages manual specification of causally sequential operations but then allows a simple, predictable, static analysis to generate optimised, parallel implementations. We introduce the language and provide some discussion on the theoretical aspects of the language semantics, particularly the structuring of computations around the category theoretic notion of a comonad.

Any feedback is welcome.