Still more to do
It’s been exciting to see the performance gains in compiled functional code over the last few years, and its encouraging to see that there is still much more we can do and explore. HRC outperforms GHC on roughly 50% of the benchmarks, showing some interesting tradeoffs going on in the two compilers. HRC is particularly good at compiling highthroughput numerical code, thanks to various strictness/unboxing optimisations (and the vectorizer), but there is still more to be done.
Don’t throw away information about your programs
One thing I emphasized in my talk was the importance of keeping, not throwing away, the information encoded in our programs as we progress through the compiler stack. In the HRC vectorizer project, Haskell’s Data.Vector library was modified to distinguish between mutable array operations and “initializing writes”, a property which then gets encoded directly in HRC’s intermediate representation. This makes vectorization discovery much easier. We aim to preserve as much effect information around as possible in the IR from the original Haskell source.
This connected nicely with something Ben Lippmeier emphasised in his Haskell Symposium paper this year (“Data Flow Fusion with Series Expressions in Haskell“, joint with Manuel Chakravarty, Gabriele Keller and Amos Robinson). They provide a combinator library for firstorder nonrecursive dataflow computations which is guaranteed to be optimised using flow fusion (outperforming current stream fusion techniques). The important point Ben made is that, if your program fits the pattern, this optimisation is guaranteed. As well as being good for the compiler, this provides an obvious cost model for the user (no more games trying to coax the compiler into optimising in a particular way).
This is something that I have explored in the Ypnos array language, where the syntax is restricted to give (fairly strong) language invariants that guarantee parallelism and various optimisations, without undecidable analyses. The idea is to make static as much effect and coeffect (context dependence) information as possible. In Ypnos, this was so successful that I was able to encode the Ypnos’ language invariant of no outofbounds array access directly in Haskell’s type system (shown in the DSL’11 paper; this concept was also discussed briefly in my short language design essay).
This is a big selling point for DSLs in general: restrict a language such that various program properties are statically decidable, facilitating verification and optimisation.
Ypnos has actually had some more development in the past year, so if things progress further, there may be some new results to report on. I hope to be posting again soon about more research, including the ongoing work with Tomas Petricek on coeffect systems, and various other things I have been playing with. – D
The essay was a lot of fun to write (I somehow end up referencing books from the 19th century, a scifi novel, 1984, and The Matrix!) and is a kind of mission statement (at least for myself) for language design; it is available on my research page here. I hope that it provides some foodforthought for others interested in, or working in, language design.
In my previous post I discussed the new constraint kinds extension to GHC, which provides a way to get typeindexed constraint families in GHC/Haskell. The extension provides some very useful expressivity. In this post I’m going to explain a possible use of the extension.
In Haskell the Functor class is misleading named as it actually captures the notion of an endofunctor, not functors in general. This post shows a use of constraint kinds to define a type class of exofunctors; that is, functors that are not necessarily endofunctors. I will explain what all of this means.
This example is just one from a draft note (edit July 2012: draft note subsumed by my TFP 2012 submission) explaining the use of constraint families, via the constraint kinds extension, for describing abstract structures from category theory that are parameterised by subcategories, including nonendofunctors, relative monads, and relative comonads.
I will try to concisely describe any relevant concepts from category theory, through the lens of functional programming, although I’ll elide some details.
The starting point of the idea is that programs in Haskell can be understood as providing definitions within some category, which we will call Hask. Categories comprise a collection of objects and a collection of morphisms which are mappings between objects. Categories come equipped with identity morphisms for every object and an associative composition operation for morphisms (see Wikipedia for a more complete, formal definition). For Hask, the objects are Haskell types, morphisms are functions in Haskell, identity morphisms are provided by the identity function, and composition is the usual function composition operation. For the purpose of this discussion we are not really concerned about the exact properties of Hask, just that Haskell acts as a kind of internal language for category theory, within some arbitrary category Hask (Dan Piponi provides some discussion on this topic).
Given some category C, a subcategory of C comprises a subcollection of the objects of C and a subcollection of the morphisms of C which map only between objects in the subcollection of this subcategory.
We can define for Hask a singleton subcategory for each type, which has just that one type as an object and functions from that type to itself as morphisms e.g. the Intsubcategory of Hask has one object, the Int type, and has functions of type Int → Int as morphisms. If this subcategory has all the morphisms Int → Int it is called a full subcategory. Is there a way to describe “larger” subcategories with more than just one object?
Via universal quantification we could define the trivial (“nonproper”) subcategory of Hask with objects of type a (implicitly universally quantified) and morphisms a > b, which is just Hask again. Is there a way to describe “smaller” subcategories with fewer than all the objects, but more than one object? Yes. For this we use type classes.
The instances of a single parameter type class can be interpreted as describing the members of a set of types (or a relation on types for multiparameter type classes). In a type signature, a universally quantified type variable constrained by a type class constraint represents a collection of types that are members of the class. E.g. for the Eq class, the following type signature describes a collection of types for which there are instances of Eq:
Eq a => a
The members of Eq are a subcollection of the objects of Hask. Similarly, the type:
(Eq a, Eq b) => (a > b)
represents a subcollection of the morphisms of Hask mapping between objects in the subcollection of objects which are members of Eq. Thus, the Eq class defines an Eqsubcategory of Hask with the above subcollections of objects and morphisms.
Type classes can thus be interpreted as describing subcategories in Haskell. In a type signature, a type class constraint on a type variable thus specifies the subcategory which the type variable ranges over the objects of. We will go on to use the constraint kinds extension to define constraintkinded type families, allowing structures from category theory to be parameterised by subcategories, encoded as type class constraints. We will use functors as the example in this post (more examples here).
In category theory, a functor provides a mapping between categories e.g. F : C → D, mapping the objects and morphisms of C to objects and morphisms of D. Functors preserves identities and composition between the source and target category (see Wikipedia for more). An endofunctor is a functor where C and D are the same category.
The type constructor of a parametric data type in Haskell provides an object mapping from Hask to Hask e.g. given a data type data F a = ... the type constructor F maps objects (types) of Hask to other objects in Hask. A functor in Haskell is defined by a parametric data type, providing an object mapping, and an instance of the wellknown Functor class for that data type:
class Functor f where
fmap :: (a > b) > f a > f b
which provides a mapping on morphisms, called fmap. There are many examples of functors in Haskell, for examples lists, where the fmap operation is the usual map operation, or the Maybe type. However, not all parametric data types are functors.
It is wellknown that the Set data type in Haskell cannot be made an instance of the Functor class. The Data.Set library provides a map operation of type:
Set.map :: (Ord a, Ord b) => (a > b) > Set a > Set b
The Ord constraint on the element types is due to the implementation of Set using balanced binary trees, thus elements must be comparable. Whilst the data type is declared polymorphic, the constructors and transformers of Set allow only elements of a type that is an instance of Ord.
Using Set.map to define an instance of the Functor class for Set causes a type error:
instance Functor Set where
fmap = Data.Set.map
...
foo.lhs:4:14:
No instances for (Ord b, Ord a)
arising from a use of `Data.Set.map'
In the expression: Data.Set.map
In an equation for `fmap': fmap = Data.Set.map
In the instance declaration for `Functor Set'
The type error occurs as the signature for fmap has no constraints, or the empty (always true) constraint, whereas Set.map has Ord constraints. A mismatch occurs and a type error is produced.
The type error is however well justified from a mathematical perspective.
First of all, the name Functor is a misnomer; the class actually describes endofunctors, that is functors which have the same category for their source and target. If we understand type class constraints as specifying a subcategory, then the lack of constraints on fmap means that Functor describes endofunctors Hask → Hask.
The Set data type is not an endofunctor; it is a functor which maps from the Ordsubcategory of Hask to Hask. Thus Set :: Ord → Hask. The class constraints on the element types in Set.map declare the subcategory of Set functor to which the morphisms belong.
Can we define a type class which captures functors that are not necessarily endofunctors, but may have distinct source and target categories? Yes, using an associated type family of kind Constraint.
The following ExoFunctor type class describes a functor from a subcategory of Hask to Hask:
{# LANGUAGE ConstraintKinds #}
{# LANGUAGE TypeFamilies #}
class ExoFunctor f where
type SubCat f x :: Constraint
fmap :: (SubCat f a, SubCat f b) => (a > b) > f a > f b
The SubCat family defines the source subcategory for the functor, which depends on f. The target subcategory is just Hask, since f a and f b do not have any constraints.
We can now define the following instance for Set:
instance ExoFunctor Set where
type SubCat Set x = Ord x
fmap = Set.map
Endofunctors can also be made an instance of ExoFunctor using the empty constraint e.g.:
instance ExoFunctor [] where
type SubCat [] a = ()
fmap = map
(Aside: one might be wondering whether we should also have a way to restrict the target subcategory to something other than Hask here. By covariance we can always “cast” a functor C → D, where D is a subcategory of some other category E, to C → E without any problems. Thus, there is nothing to be gained from restricting the target to a subcategory, as it can always be reinterpreted as Hask.)
Subcategory constraints are needed when a data type is restricted in its polymorphism by its operations, usually because of some hidden implementational details that have permeated to the surface. These implementational details have until now been painful for Haskell programmers, and have threatened abstractions such as functors, monads, and comonads. Categorically, these implementational restrictions can be formulated succinctly with subcategories, for which there are corresponding structures of nonendofunctors, relative monads, and relative comonads. Until now there has been no succinct way to describe such structures in Haskell.
Using constraint kinds we can define associated type families, of kind Constraint, which allow abstract categorical structures, described via their operations as a type class, to be parameterised by subcategories on a perinstance basis. We can thus define a class of exofunctors, i.e. functors that are not necessarily endofunctors, which we showed here. The other related structures which are difficult to describe in Haskell without constraint kinds: relative monads and relative comonads, are discussed further in a draft note (edit July 2012: draft note subsumed by my TFP 2012 submission). The note includes examples of a Set monad and an unboxed array comonad, both of which expose their implementational restrictions as type class constraints which can be described as subcategory constraints.
Any feedback on this post or the draft note is greatly appreciated. Thanks.
Back in 2009 Tom Schrijvers and I wrote a paper entitled Haskell Type Constraints Unleashed [1] which appeared at FLOPS 2010 in April. In the paper we fleshed out the idea of adding constraint synyonyms and constraint families to GHC/Haskell, building upon various existing proposals for class families/indexed constraints. The general idea in our paper, and in the earlier proposals, is to add a mechanism to GHC/Haskell to allow constraints, such as type class or equality constraints, to be indexed by a type in the same way that type families and data families allow types to be indexed by types.
As an example of why constraint families are useful, consider the following type class which describes a simple, polymorphic, embedded language in Haskell (in the “finally tagless“style [2]) (this example appears in [1]):
class Expr sem where
constant :: a > sem a
add :: sem a > sem a > sem a
Instances of Expr provide different evaluation semantics for the language. For example, we might like to evaluate the language for numerical values, so we might try and write the following instance:
data E a = E {eval :: a}
instance Expr E where
constant c = E c
add e1 e2 = E $ (eval e1) + (eval e2)
However, this instance does not type check. GHC returns the type error:
No instance for (Num a)
arising from a use of `+'
In the second argument of `($)', namely `(eval e1) + (eval e2)'
...
The + operation requires the Num a constraint, yet the signature for add states no constraints on type variable a, thus the constraint is never satisfied in this instance. We could add the Num a constraint to the signature of add, but this would restrict the polymorphism of the language: further instances would have this constraint forced upon them. Other useful semantics for the language may require other constraints e.g. Show a for pretty printing. Should we just add more and more constraints to the class? By no means!
Constraint families, as we describe in [1], provide a solution: by associating a constraint family to the class we can vary, or index, the constraints in the types of add and constant by the type of an instance of Expr. The solution we suggest looks something likes:
class Expr sem where
constraint Pre sem a
constant :: Pre sem a => a > sem a
add :: Pre sem a => sem a > sem a > sem a
instance Expr E where
constraint Pre E a = Num a
...  methods as before
Pre is the name of a constraint family that takes two type parameters and returns a constraint, where the first type parameter is the type parameter of the Expr class.
We could add some further instances:
data P a = P {prettyP :: a}
instance Expr P where
constraint Pre P a = Show a
constant c = P c
add e1 e2 = P $ prettyP e1 ++ prettyP e2
At the time of writing the paper I had only a prototype implementation in the form of a preprocessor that desugared constraint families into some equivalent constructions (which were significantly more awkward and ugly of course). There has not been a proper implementation in GHC, or of anything similar. Until now.
At CamHac, the Cambridge Haskell Hackathon, last month, Max Bolingbroke started work on an extension for GHC called “constraint kinds”. The new extensions unifies types and constraints such that the only distinction is that constraints have a special kind, denoted Constraint. Thus, for example, the Show class constructor is actually a type constructor, of kind:
Show :: * > Constraint
For type signatures of the form C => t, the lefthand side is now a type term of kind Constraint. As another example, the equality constraints constructor ~ now has kind:
~ :: * > * > Constraint
i.e. it takes two types and returns a constraint.
Since constraints are now just types, existing type system features on type terms, such as synonyms and families, can be reused on constraints. Thus we can now define constraint synonyms via standard type synonms e.g.
type ShowNum a = (Num a, Show a)
And most excitingly, constraint families can be defined via type families of return kind Constraint. Our previous example can be written:
class Expr sem where
type Pre sem a :: Constraint
constant :: Pre sem a => a > sem a
add :: Pre sem a => sem a > sem a > sem a
instance Expr E where
type Pre E a = Num a
...
Thus, Pre is a type family of kind * > * > Constraint. And it all works!
The constraint kinds extension can be turned on via the pragma:
{# LANGUAGE ConstraintKinds #}
Max has written about the extension over at his blog, which has some more examples, so do go check that out. As far as I am aware the extension should be hitting the streets with version 7.4 of GHC. But if you can’t wait it is already in the GHC HEAD revision so you can checkout a development snapshot and give it a whirl.
In my next post I will be showing how we can use the constraint kinds extension to describe abstract structures from category theory in Haskell that are defined upon subcategories. I already have a draft note about this (edit July 2012: draft note subsumed by my TFP 2012 submission)
submission if you can’t wait!
[1] Orchard, D. Schrijvers, T.: Haskell Type Constraints Unleashed, FLOPS 2010, [author's copy with corrections] [On SpringerLink]
[2] Carrete, J., Kiselyov, O., Shan, C. C.: Finally Tagless, Partially Evaluated, APLAS 2007
]]>
Last year the Cambridge Computer Lab started an informal lecture course taught by PhD students aimed at final year undergraduates and MPhil students with the rough aim of providing lecturing experience to PhD students, and for disseminating more ideas from research to undergrads/MPhils/each other. In my group, the CPRG (Cambridge Programming Research Group), part of the bigger PLS (Programming, Logic, and Semantics Group), four of us decided to do a miniseries about aspects of programming language design. The slides/notes from our lectures can be found here on the dates of May 10th, 11th, 14th, and 24th.
To bind the miniseries together we prepared an informal, general summary of the ideas contained within which concludes with the summary:
The development of programming languages, and abstraction away from
machine code, has greatly aided software development. Programming lan
guages are a conduit between man and machine, with much of programming
language research aiming to improve this interaction and to help us better
express our ideas. We can attempt to improve languages for ease of reading,
ease of writing, and ease of reasoning, and improve our evaluation systems
to use less resources (whether it be processor time, memory, power, etc.)
whilst still providing a predictable system. Such facets of programming
language design are often nonorthogonal, thus a language designer must
tradeoff certain improvements for others. Often, a motivating application
domain or purpose can help distill which features of a language are most
important. This lecture series should give some food for thought in various
areas of general programming and programming language design.
From which I offered the following general slogan:
The four “R”s that programming language design must improve of programs: reading, ‘riting, reasoning, and running.
A bit of a generalisation perhaps, but hopefully a useful “elevatorpitch” slogan to get more people thinking about programming language design, and with a bit of humour (see The three Rs).
]]>
Abstract:
The popular Glasgow Haskell Compiler extends the Haskell 98 type system with several powerful features, leading to an expressive language of type terms. In contrast, constraints over types have received much less attention, creating an imbalance in the expressivity of the type system. In this paper, we rectify the imbalance, transferring familiar typelevel constructs, synonyms and families, to the language
of constraints, providing a symmetrical set of features at the typelevel and constraintlevel. We introduce constraint synonyms and constraint families, and illustrate their increased expressivity for improving the utility of polymorphic EDSLs in Haskell, amongst other examples. We provide a discussion of the semantics of the new features relative to existing type system features and similar proposals, including details of termination.
[draft pdf submitted to FLOPS 2010]
Any feedback is most welcome. Blog posts to follow if you are too lazy to read the paper.
]]>
Abstract:
A fully automatic, compilerdriven approach to parallelisation can result in unpredictable time and space costs for compiled code. On the other hand, a fully manual, humandriven approach to parallelisation can be long, tedious, prone to errors, hard to debug, and often architecturespecific. We present a declarative domainspecific language, Ypnos, for expressing structured grid computations which encourages manual specification of causally sequential operations but then allows a simple, predictable, static analysis to generate optimised, parallel implementations. We introduce the language and provide some discussion on the theoretical aspects of the language semantics, particularly the structuring of computations around the category theoretic notion of a comonad.
Any feedback is welcome.
BCTCS ’09 was held at Warwick University the university that I studied for my undergraduate degree at. I enjoyed the conference particularly, as I got to spend time with Steve Matthews and Sara Kalvala (my undergraduate project supervisors from my 3rd and 4th years), old friends from my undergraduate days, new friends from other universities, and also Bill Wadge, cocreator of Lucid, who was invited to speak at BCTCS ’09 all the way from Victoria, Canada.
If you have ever had a conversation with me about computer science you will know I have a particular proclivity for the Lucid programming language, so getting to chat about all things Lucid for a few days was great fun. Bill supervised Steve back when Bill was at Warwick, in the early days of Lucid. The quip is that Bill is my grandsupervisor.
The Lucian language is my own objectoriented hybrid of Lucid. I started Lucian as my 3rd year project and dissertation quite some time ago now. The ideas developed further after discussions with Bill back in 2007, resulting in Steve and I publishing a paper which came out in 2008. In January 2009, with the BCTCS looming, I thought it would be fitting to talk about Lucian at BCTCS, due to Bill and Steve being present, and BCTCS being held at Warwick, the crucible of Lucid’s youth. So I revisited Lucian and discovered that there was much more to say than was originally conveyed in the paper. I mulled Lucian over last term and have since revised the language and have had some further thoughts about its semantics.
Construction
The base case of the construction is an equilateral triangle. For each successive iteration the construction proceeds as follows:
Thus each edge is transformed as such:
The first 5 iterations look like:
Properties
Two properties of this fractal may seem paradoxical at first but are easily shown. I will show the derivations here for the interested without skipping too many steps.
For an infinite number of iterations the perimeter of the fractal (the length of all the edges) tends to infinity whilst the area enclosed remains finite.
Infinite Perimeter
First consider the number of edges for an infinite number of iterations. Initially the number of edges is 3, each iteration transforms a single edge into 4 edges (see above), thus the series of the edge count is: 3, 12, 48, …. The general form is:
Starting from an edge length of a each iteration divides the edge length by 3. The following defines the edge length for iteration n and from this calculates the perimeter for iteration n. The limit as n tends to infinity is found, showing that the
perimeter is infinite.
Thus it is easy to see that the perimeter length is infinite through standard results on limits.
Finite Area
The area calculation is a little bit more involved. The result can be seen on Mathworld or Wikipedia but a full derivation isn’t given, so I show my derivation here for the interested.
The area of the nth iteration of the Koch snowflake is the area of the base triangle + the area of the new, smaller, triangles added to each edge.
From the above length calculations and construction we know that each iteration of the construction divides the length of the edges by three. First consider the relationship between the area of a triangle of edge length a and a triangle of edge length a/3




Unsurprisingly we see that dividing the edge length by 3 divides the area by 9. At each iteration we add new equilateral triangles to each edge, a ninth of the area of the previous iteration’s triangles.
We formulate this as a summation of base case A0 plus the next n1 iterations where the triangle area at each stage is A0 divided by 3^{2n} or 9^{n} (successive divisions of the area by 9). The number of edges is defined by the previous iterations number of edges 3 * 4^{n1}, thus at the nth iteration we need to add this number of triangles to the snowflake.
Thus the area is finite at 8/5 times the area of the base equilateral triangle.
Summary
So the Koch snowflake construction has been introduced and it has been shown relatively easily that the area of a Koch snowflake tends to a finite limit of 8/5 times the base case area (5) and that the length of perimeter tends to infinity (4). Next time I will probably talk a bit about the fractal dimension of the Koch snowflake and also show an interesting problem that uses the Koch snowflake.
As a last note, following on from the blog post I made about the Python turtle library, a turtle program of the Koch snowflake for Python can be downloaded here.