This is the blog version of a talk I did for the PLSE seminars at UofA.
In this post I am going to discuss the problems with adding mutation to an object-oriented programming language with depth-subtyping and propose a type-system which safely supports mutation while keeping a notion of depth-subtyping.
I will avoid formal definitions for this post and instead motivate ideas through examples. Many of these ideas will be familiar for anyone who has programmed in an object-oriented language before. I will also be very loose with terminology. For example, I might say objects are subtypes when I mean their types are subtypes.
What is Typing?
Before I can discuss depth-subtyping or even subtyping, I am going to discuss what I mean by typing. For this post I am going to focus on typing for values, rather than typing for program fragments.
So what are values? Values are things such as true
, 1
, 1.3
, and the object
{x: 2, y: 3}
. This last example is an object with fields x
and y
which
contain the numbers 2
and 3
.
If you are familiar with object-oriented languages you will recognize the above
values as having the types Bool
, Int
, Float
, and {x: Int, y: Int}
.
For this post, we are going to think of typing as the following:
The type of a value represents the operations we are allowed to perform on the value.
For example, if we have an Int
, we are allowed to add it to another Int
, but
we are not allowed to add it to an object:
1 + 3
is allowed1 + {x: 2, y: 3}
is not allowed
What is Subtyping?
Let’s start with some examples:
Nat <: Int
. We often think of natural numbers as being a subtype of integers.{x: Int, y: Int, z: Int} <: {x: Int, y: Int}
. If we have a object with fieldsx
,y
, andz
are subtypes of objects with fieldsx
andy
.
But what does subtyping mean? Given the above notion of typing:
We say that
T
is a subtype ofU
(writtenT<:U
) if all operations allowed on values of typeU
are allowed on values of typeT
.
Width-subtyping
Before we discuss depth-subtyping, let’s discuss a contrasting idea: width-subtyping. Let’s go back to the last example:
{x: Int, y: Int, z: Int} <: {x: Int, y: Int}
Any operations allowed on {x: Int, y: Int}
are also allowed on {x: Int, y: Int, z: Int}
; we just ignore the field z
. This idea is known as
width-subtyping: objects which are wider in fields are subtypes.
Most languages which support subtyping support width-subtyping in some way or
the other. For example, in Java
we can extend an object type with more fields.
What is Depth-subtyping?
Depth-subtyping is the idea that if we have subtypes T<:U
, then objects
containing T
s are subtypes of objects containing U
s: {field: T} <: {field: U}
. Contrasted with width-subtyping, instead of going wider in fields, we go
deeper in fields.
For example, since we had Nat<:Int
, we have PointNat<:PointInt
where
PointNat
and PointInt
are defined as:
= {x: Nat, y: Nat}
PointNat = {x: Int, y: Int} PointInt
And it can be depth-subtyping all the way down: objects
containing {field: T}
are subtypes of objects containing {field: U}
, and so
on.
We want Depth-subtyping!
Depth-subtyping can be very useful. Consider the following function:
diff(PointInt inp){
Int return (inp.x - inp.y)
}
diff
takes an object which contains Int
s in the field x
and y
, and
subtracts the x
value from the y
value and returns the answer.
Now suppose we have a PointNat p
.
= new PointNat(3,4) PointNat p
It is completely safe to pass p
to the function diff
, and
depth-subtyping would allow us to reuse the diff function for PointNat
s.
Without depth-subtyping we would have to rewrite the function for PointNat
s.
The diff
function only relies on the input object having fields x
and y
containing values which can be treated as Int
s. In general, it is safe to pass
a PointNat
to a function which expects its input to be a PointInt
… unless
the input is being mutated.
Mutation is Bad for Depth-subtyping!
To see why mutation is bad for depth-subtyping, we are going to introduce some more types.
= {x: Int, y: Int, z: Int}
Point3D = {field: T} Container[T]
Usual width-subtyping implies that Point3D <: PointInt
. Consider the following
program. We will explain it shortly.
1. Point3D p3d = new Point3D(1, 2, 3)
2. PointInt p2d = new PointInt(1, 2)
3. Container[Point3D] c = new Container(p3d)
4. Container[PointInt] c2 = c
5. c2.field := p2d
6. Point3d p = c.field
7. return p.x + p.y + p.z
We first create points p3d
and p2d
, the first being a 3d-point (Point3D
)
and the second being a 2d-point (PointInt
). Then, we create a 3d-point
container, c
. Next, we use depth-subtyping to cast c
to a 2d-point container
c2
. Then we mutate the container with p2d
using the reference c2
. Now we
use the original reference c
, which is treated as a 3d-point container, to
read its field as a 3d-point. Then we read the fields of the object and sum them
up.
If we run this code, it will throw a runtime exception when p.z
is executed!
This is because, we when we read c.field
, it actually contains a 2d-point, so
it does not have a z
field. So trying to read the non-existent z
field
throws a runtime exception.
Fun fact: If we used Array
instead of Container
, Java
would compile the
above code! The Array
upcasting in Java was well intentioned, it would allow a
form of depth-subtyping for arrays, but it is ultimately unsafe.
What went Wrong?
So what went wrong? The problem was that the operation “assign 2d-point”:
- should be allowed for 2d-point containers, but
- should not be allowed for 3d-point containers.
So not all operations allowed for 2d-point containers should be allowed for 3d-point containers, even though 3d-points are subtypes of 2d-points.
This might lead us to believe that we cannot allow depth-subtyping in languages with field mutation. Fortunately, this turns out not be true! We can recover depth-subtyping, or at least a notion thereof, if we use something which I call bounded field-typing.
Bounded Field-typing
In bounded field-typing, fields have bounded types such as the following:
{fld: T..U}
We call the lower bound T
the setter type of the field fld
, and U
the
getter type. Usually, the setter type is a subtype of the getter type, T<:U
.
If x
has the above object type, reading the field fld
produces a value of
type U
. Under bounded field-typing, field reads produce the getter type. For
mutation, the assignment x.fld:=y
is allowed if y
has type T
; i.e. we
allow assignments of the setter type.
Recovering Depth-subtyping
In the bounded setting, we allow depth-subtyping for getter-types:
S<:U
implies{fld: T..S}<:{fld: T..U}
.
{fld: T..U}
allows us to read a U
from the field fld
. Since all operations
on U
s are allowed on S
s, we are allowed to read an S
from a {fld: T..S}
and treat it as an U
.
We also allow depth-subtyping for setter-types, but in the other direction, i.e. subtyping is contravariant in the setter type.
S<:T
implies{fld: T..U}<:{fld: S..U}
.
Notice how, S
and T
appear in opposite directions of the <:
symbol. If we
are allowed to write a value which allows the operations of T
to be performed
on it, then it is safe to write a value which allows more operations.
In the bounded setting, the previous types such as PointInt
and PointNat
are
written with the setter and getter types being equal:
= {x: Nat..Nat, y: Nat..Nat}
PointNat = {x: Int..Int, y: Int..Int} PointInt
Safe Depth-subtyping
In the bounded setting, PointNat
is not a subtype of PointInt
. Since we do
not have that Int<:Nat
, their setter types are not in the subtyping
relationship required for depth-subtyping. However, since Nat<:Int
we have:
<: {x: Nat..Int, y: Nat..Int} PointNat
Using the same subtyping relationship for the setter types for PointInt
, we
have:
<: {x: Nat..Int, y: Nat..Int} PointInt
Using this, we can write a function diff
which works for both PointInt
s and
PointNat
s.
diff({x: Nat..Int, y: Nat..Int} inp){
Int return (inp.x - inp.y)
}
At this point, you will notice that the above, diff
function is not quite
general enough. If we had a value of the following type, we would not be able to
use diff
on it, even integers are neither subtypes nor supertypes of natural
numbers.
= {x: EvenInt..EvenInt, y: EvenInt..EvenInt} PointEvenInt
For this reason, under bounded field-typing it is very useful to have a bottom
type, ⊥
. There is no way to create a value of type ⊥
, but we assume that all
operations are allowed on values of type ⊥
. More importantly, ⊥<:T
for any
type T
. This allows us to write diff
as the following.
diff({x: ⊥..Int, y: ⊥..Int} inp){
Int return (inp.x - inp.y)
}
Notice that we use ⊥
for the setter types of our input parameter. We can now
call the diff
function with a PointEvenInt
as its input, or any object as
long as it contains field x
and y
which contain values which can be treated
as Int
s. So we have effectively recovered what we wanted from depth-subtyping!
3d-point Containers are not 2d-point Containers
Under bounded field-typing, we still have that 3d-points (Point3D
) are
subtypes of 2d-points (PointInt
), since width-subtyping is still safe. As we
discussed in the previous example, 3d-point Containers are not 2d-point
Containers because they do not allow the operation “Assign 2d-point”. This is
now reflected at the type level.
1. Point3D p3d = new Point3D(1, 2, 3)
2. PointInt p2d = new PointInt(1, 2)
3. Container[Point3D] c = new Container(p3d)
4. Container[PointInt] c2 = c
5. c2.field := p2d
6. Point3d p = c.field
7. return p.x + p.y + p.z
Here the assignment in line 4 is not allowed and caught by the type system. To make the assignment valid, we can change the container type to a bounded type.
1. Point3D p3d = new Point3D(1, 2, 3)
2. PointInt p2d = new PointInt(1, 2)
3. Container[Point3D] c = new Container(p3d)
4. Container[Point3d..PointInt] c2 = c
5. c2.field := p2d
6. Point3d p = c.field
7. return p.x + p.y + p.z
But now, the assignment in line 5 is not valid since it is only safe to assign
3d-points to c2
. If we change the original Container of c
type to
PointInt
, then the assignment is allowed.
1. Point3D p3d = new Point3D(1, 2, 3)
2. PointInt p2d = new PointInt(1, 2)
3. Container[PointInt] c = new Container(p3d)
4. Container[PointInt] c2 = c
5. c2.field := p2d
6. Point3d p = c.field
7. return p.x + p.y + p.z
But now, the field read in line 6 is not allowed, since we are only allowed to
assume that the field contains a PointInt
. No matter what we do, the type
system does not allow us to assign a 2d-point and read it back as a 3d-point.
Bounded field-typing really is safe.
This concludes the post. I hope this shed some light into why the “getter and setter methods” meme in Java exists. I’m going to leave you guys by saying that bounded field-typing really is safe! I formally proved it to be safe in an extension of the DOT calculus called κDot (officially kappa-dot, informally kay-dot). You can find the Coq code at here and the accompanying paper which I presented at the Scala Symposium here.
Updates
A friend pointed out that this post is being discussed on Reddit.