Ifaz Kabir - Conjuring Types from the Ether

Recently I have been working through a compiler book on the side, and there were two places where I wanted custom types but did not want to define them globally. The types were going to be used locally within a function, and it felt odd to define them globally when no other function was going to use them. Even defining them locally in the same file and then then hiding it in the interface file felt odd, I just needed the types for single functions, nothing else in the module!

This is what I mean by Conjuring Types from the Ether: having access to types without having to declare them! Generally you want lightweight syntax for creating values of these types, otherwise it would be worth the effort to declare them.

Both the places where I had the problem are simply cases of Boolean Blindness, and both stem from me trying to keep pattern matching simple!

Boolean Blindness

The first case where I wanted to conjure types from the ether is a more classic version of Boolean Blindness. I had some code like this:

let global_function_exposed_in_module params =
  ...
  let check_function args = ... in
  match check_function params with
  | true -> ??
  | false -> ??

As I was trying to fill in the above ??, I realized that I had forgotten what true meant and what false meant. And the actualy nome of check_function was weird enough that I couldn’t figure out if true meant success or not, check_function is just a place holder name I’m using for this post.

This is exactly Boolean Blindness! As Bob Harper put it I have “blinded” myself by “reducing the information” I had “at hand to a bit, and then trying to recover that information later by remembering the provenance of that bit”.

The JavaScript/TypeScript Solution: At this point, if I were using a in a language like JavaScript, I would go and modify check_function to return the strings "Success" or "Fail". This is somewhat natural in JavaScript because you can case match on strings. TypeScript with it’s literal types improves things! The case match on "Success" and "Fail" would be compile time checked to be exhaustive! (Disclaimer: I haven’t actually checked if literal types can be conjured from the ether or not.)

The Racket/Lisp Solution: In Racket, or any Lisp with pattern matching, the situation is ever so slightly improved because you have access to symbols. You can return the symbols 'Success or 'Fail, and the situation is again improved by using typed racket for exhastivity checking. But overall this is very similar to the JavaScript/TypeScript case.

The OCaml Solution: I was programming in OCaml, and I was really happy that OCaml has something similar to symbols, called Polymorphic Variants. Generally, Polymorphic Variants can carry around data other pieces of data just like like Algebraic Data Types, and have interesting structural subtyping between themselves. These properties weren’t that useful to me, but what was useful was being able to conjure the Polymorphic Variants from the Ether without having to declare them globally.

let global_function_exposed_in_module params =
  ...
  let check_function args: [`Success|`Fail] = ... in
  match check_function params with
  | `Success -> ??
  | `Fail -> ??

The use of the type [``Success|``Fail] is contained completely within this function, and I also get a exhaustivity check!

Position Blindness?

This second case where I wanted to conjure types from the other involved tuples and records. I was traversing the AST, and creating sets of two kinds of variables. Initially my code looked like the following, collect_exp, and collect_dec are mutually recursive.

  let collect_exp (env: (var Set.t * var Set.t)) params:
    var Set.t * var Set.t =
    ...
  and collect_dec (env: (var Set.t * var Set.t)) params:
    var Set.t * var Set.t =
    ...
    let (collect_a?, collect_b?) = collect_exp args in
    ...

Midway through collect_dec I forgot what kind of variable collect_exp returned in which position. Normally, would know which position is what based on the differences in the types in the two positions, but here both the types are the same.

This again is another kind of boolean blindness. To illustrate this, I’m gong to switch to Standard ML syntax.

    let x = collect_exp args in
    let collect_a = x#1? in
    let collect_b = x#2? in
    ...

To choose between variable kind a and variable kind b, I am relying on the boolean 1|2!

The solution here in an untyped setting is to modify the code to use dictionaries/records instead of tuples. So collect_exp would return {kind_a: var Set.t, kind_b: var Set.b}. We are no longer blind to any positions, because we have names instead, just like names in variants/symbols.

But declaring a type like the following feels overkill.

type record_used_in_foo_function_only =
  { kind_a: var Set.t
  ; kind_b: var Set.t
  }

In OCaml, you can conjure object types from the ether, but the syntax there is a little heavy weight. For example, here’s how you would create an object representing the above record.

let my_object =
  object
    val kind_a = Set.empty
    val kind_b = Set.empty
  end

I consider the use of the keyword object and the delimiter end to be fairly heavy. Plus they cannot be easily destruct like tuples and records, and copy syntax for objects is also somewhat ugly. For records you can do { x with kind_a = Set.union y z }.

In the end, I ended up just keeping the code as is and just going back to collect_exp to figure out which position is what, but I really wish I could conjure a record type from the ether. This is possible in other languages. The examples I can think of easily right now are records in Flix and Elm. In these languages you can destruct records with types conjured from the ether using record style patern matching, and they also have lightweight copy syntax.

let {kind_a = y; kind_b = z} = collect_exp args in

An aternative that I did consider but then decided against is changing the tuples to contain polymorphic variants.

let collect_exp env params: ([`A of var Set.t], [`B of var Set.t]).

As I mentioned above, polymorphic variants can carry data, and here the variant `A caries a variable set with it.

Position Blindness in Function Arguments

This kind of positional blindness also happens when passing arguments to functions of the same type. For example, I can never remember which of the arguments of memcpy is the source and which is the destination without looking at the manpage. The declared type is memcpy(void *, void *, size_t).

Most modern languages solve this by using named arguments. Some languages like (iirc) Swift, are strict about not letting you pass named arguments positionally, but other languages like C# allow for flexibility. Alternatively, you can also ask that the arguments be records or structs. E.g. what if the argument to memcpy(MemCpy) was a struct?

typedef struct {
  void * dest, src;
  size_t size;
} MemCpy;