Functionality, Mutability & Non-Determinism in Type Theory.

Recent conversations have convinced me that there are several misconceptions about the status of side effects in a type theoretic setting, and how to reason about their benignity. Briefly, I will clarify this matter by proposing how one might go about integrating free assignables and non-determinism into Computational Type Theory.1 I do not answer all questions raised by this proposal, but I intend to be provocative and suggest a rough path toward realizing this goal.

With apologies to Conor McBride for stealing his line, I would say that I cannot trust a type theorist who has not changed their mind at least once about effects: having changed my mind twice, I hope that I can provide clarity on a few specific issues:

  1. What is functional programming, and how does it differ from purely functional programming?

  2. What is the precise definition of a benign effect, and how do you reason about them in a type-theoretic setting?

What is functional programming?

In his post a few years back, What is a functional language?, Bob Harper listed several characteristics which he felt should be present in an acceptable general-purpose functional programming language. Keeping his points in mind, I’d like to take a slightly more specific and technical approach by propounding a definition for a «functional program»:

A functional program has two essential characteristics: it is extensional (functional) with respect to the equality of its inputs and outputs, and it respects the monotonicity of observations.

Functionality: Respecting Equality

The first half of the definition should be automatic to anyone who is familiar with the concept of functionality in type theory, and in propounding it, I echo the definition which is implicit in Longley’s gorgeous paper, When is a functional program not a functional program?.

If you read Longley, you will see that purely functional programming occurs when you rule out computational effects syntactically; in this setting, so long as types with non-trivial equivalence relations are not added, everything respects equivalence (i.e. is functional/extensional) by definition. An example of such a language is the lambda calculus, System T, or PCF. As soon as you add effects, like exceptions or assignables, it becomes possible to subvert equivalence and write a non-functional program.

A beautiful example of this is the probing of Baire functionals for their modulus of continuity at a particular input. The modulus of continuity is an algorithm-dependent property, and it would make no sense for its observation to be extensional/functional, since such an observation could never respect the equivalence of two Baire functionals which probe their input in a different order. Escardó and Xu demonstrate that the existence of a continuous function for calculating the modulus of continuity of Baire functionals is inconsistent even in intensional MLTT;2 Rahli and Bickford clarify the matter somewhat by showing the consistency of a weaker continuity principle, essentially stating that the modulus of continuity can be computed, but not continuously.3

It should not be surprising at all that the strong continuity principle is not kosher in intuitionistic type theory, since we already knew that such a principle is not extensional, whereas Martin-Löf’s semantics for type theory requires that all inhabitants of the \(\Pi\)-type respect equality. In order to make it a functional program, the output sum is simply squashed (i.e. the type is changed such that all elements are identified). This is a beautiful example of what it means to be a functional program: a program that was not functional because it permitted intensional (algorithm-dependent) observations is changed to one which is trivially functional, because all such observations are ruled out in the types.

The particular version of squashing that is used in this case causes the computational content to be preserved: that is, the modulus of continuity is still computed and present in the witness term, but the surrounding type theory makes it impossible to make an observation that depends on its identity.

Monotonic Observations

Now, in addition to respecting equality in the obvious way, I also include the stipulation that a functional program must respect the monotonicity of observations. What do I mean by this? It must be that once you have observed something at a point in time, then that will not cease to be evident in the future. This is analogous to the monotonicity property in Kripke or Beth semantics.

This extra stipulation is required in order to make precise the fact that we intend to rule out non-determinism. As Martin-Löf’s student Granström noted in his dissertation,4 this crucially does not rule out oracles which give a random answer, but it does require that once the value of some expression has been observed, that it does not ever change in the future; this may also be read as a statement about the acceptability of lawless choice sequences in intuitionistic type theory.

A technique which is employed every day in functional ML developments is to hide imperative and nondeterministic code behind an abstraction boundary such that only observations which respect equality are possible (i.e. only functional observations). For example, consider the following signature:

signature CRYPTO =
  type public_key
  type private_key
  type encrypted

  val generate : unit -> {publicKey : public_key, privateKey : private_key}

  val encrypt : public_key * string -> encrypted
  val decrypt : private_key * encrypted -> string

For any proper implementation, two instances of non-determinism (a non-functional behavior) are locked behind this signature: the generation of the key pairs, and the randomness involved in encrypting the string. Computationally, you may implement this signature by defining type encrypted = string, but the types make intensional observations on that string impossible. Abstraction is the poor man’s quotient.

Consider if encrypt were of type public_key * string -> string: if this were the case, any program that used it would be in danger of no longer being functional, since the same input might result in different outputs at different times, should the program inspect the encrypted blob. However, because we have used an abstract type, it would not be possible to make such an observation. This is a perfect example of how to adjust the type structure in order to cause an imperative program to become functional; and this technique is a preferable alternative to the Algol-style modal separation of effects which is used in other traditions, where no attempt at all is made to make programs be functional in any way except the most trivial way possible.

The way that “functional programming” happens in a “purely functional” programming language, is that you simply become content to compute syntax trees in a monad which will be run elsewhere, rather than computing values. And this is analogous to the old escape hatch which Dummett describes for reasoning intensionally about choice sequences,5 which is simply to do the old switcharoo and say, “Actually, I am not observing a choice sequence, I am observing the codes of a choice sequence”. Because codes have the discrete/intensional equality, then all observations of them are trivially functional, but this is not functional programming in any interesting or sufficiently rich sense.

Reasoning about benign effects

A benign effect, then, is one which is used in the service of implementing a functional program, where functional program is defined as above. In Type Theory, all typeable programs are functional, and this is enforced by the meaning explanation for hypothetico-general judgement.6

Somehow this idea has crept into the public consciousness that there is no clear path toward making precise the idea of a benign effect; however, I propose that an effect is benign if it occurs in a term \(M\) such that for some type \(A\), \({{{\color{MidnightBlue}{M}}}\in{{\color{MidnightBlue}{A}}}}\); this may of course be generalized in the usual way to open terms in contexts.

In order to demonstrate how this might work, I’ll add free assignables to the computation system of MLTT, following Harper’s presentation in PFPL.7

Adding free assignables to the computation system

First, let us consider the structural type system of MLTT’s underlying computation system. It is typically said that this is “untyped”, but another perspective is to say that Martin-Löf’s theory of expressions (the system of arities, or abstract binding trees) is itself a type system, in the style of a logical framework. Whilst in practice, only one sort \({\mathsf{exp}}{}\) is considered, other sorts might be added in the future. We’ll call the syntactic categories (e.g. \({\mathsf{exp}}{}, ({\mathsf{exp}}){\mathsf{exp}}{}\) etc.) syntactic types, reserving the term type or semantic type for the layer of types which denote partial equivalence relations on the canonical forms of the computation system.

The syntactic type system is a lot like a canonical logical framework; likewise, it deals with matters of grammar rather than matters of behavior. The semantic/behavioral type system is then a refinement of this, in the sense of Melliès and Zeilberger.8

A fragment of the basic typing/sorting rules are as follows, where \(\Sigma,a\sim\tau\) is a signature \(\Sigma\) of assignables extended by a new name \(a\) with type \(\tau\) (see PFPL for the details):

\[\begin{gather} {\dfrac{\matrix{ {{{\color{MidnightBlue}{\Gamma}}}\vdash_{{\color{MidnightBlue}{\Sigma}}} {{{\color{MidnightBlue}{e_1}}}:{{\color{MidnightBlue}{\tau_1}}}}} & {{{\color{MidnightBlue}{\Gamma}}}\vdash_{{\color{MidnightBlue}{\Sigma,a\sim e_1}}} {{{\color{MidnightBlue}{e_2}}}:{{\color{MidnightBlue}{\tau_2}}}}} }}{ {{{\color{MidnightBlue}{\Gamma}}}\vdash_{{\color{MidnightBlue}{\Sigma}}} {{{\color{MidnightBlue}{{\nabla a:=e_1.\, e_2}}}}:{{\color{MidnightBlue}{\tau_2}}}}} }}\\[6pt] {\dfrac{\matrix{ }}{ {{{\color{MidnightBlue}{\Gamma}}}\vdash_{{\color{MidnightBlue}{\Sigma,a\sim\tau}}} {{{\color{MidnightBlue}{{@a}}}}:{{\color{Maroon}{\tau}}}}} }}\\[6pt] {\dfrac{\matrix{ {{{\color{MidnightBlue}{\Gamma}}}\vdash_{{\color{MidnightBlue}{\Sigma,a\sim\tau}}} {{{\color{MidnightBlue}{e}}}:{{\color{MidnightBlue}{\tau}}}}} }}{ {{{\color{MidnightBlue}{\Gamma}}}\vdash_{{\color{MidnightBlue}{\Sigma,a\sim\tau}}} {{{\color{MidnightBlue}{{a := e}}}}:{{\color{MidnightBlue}{\tau}}}}} }}\\[6pt] \end{gather}\]

Then, instead of starting with a big step evaluation judgement \({{{\color{MidnightBlue}{M}}}\Rightarrow{{\color{Maroon}{N}}}}\), we will begin with judgements about states and state transitions. A state is written \({\nu\,\Sigma\,\{e\parallel \mu\}}\), where \(\Sigma\) is a signature of assignables, \(e\) is an expression, and \(\mu\) is a store or piece of memory; we say that \({{{\color{MidnightBlue}{\mu}}}:{{\color{MidnightBlue}{\Sigma}}}}\) when \(\mu\) assigns values to each variable in \(\Sigma\).

\[\begin{gather} {\dfrac{\matrix{ {{{\color{MidnightBlue}{}}}\vdash_{{\color{MidnightBlue}{\Sigma}}} {{{\color{MidnightBlue}{e}}}:{{\color{MidnightBlue}{\tau}}}}} & {{{\color{MidnightBlue}{}}}\vdash_{{\color{MidnightBlue}{\Sigma}}} {{{\color{MidnightBlue}{\mu}}}:{{\color{MidnightBlue}{\Sigma}}}}} }}{ {{\nu\,{{\color{MidnightBlue}{\Sigma}}}\,\{{{\color{MidnightBlue}{e}}}\parallel {{\color{MidnightBlue}{\mu}}}\}}\ \mathit{state}} }} \end{gather}\]

Transitions between states are given as follows (presupposing wellformedness of input and output states):

\[\begin{gather} {\dfrac{\matrix{ {{\nu\,{{\color{MidnightBlue}{\Sigma}}}\,\{{{\color{MidnightBlue}{e_1}}}\parallel {{\color{MidnightBlue}{\mu}}}\}}\mapsto{\nu\,{{\color{Maroon}{\Sigma}}}\,\{{{\color{Maroon}{e_1'}}}\parallel {{\color{Maroon}{\mu'}}}\}}} }}{ {{\nu\,{{\color{MidnightBlue}{\Sigma}}}\,\{{{\color{MidnightBlue}{{\nabla a:=e_1.\, e_2}}}}\parallel {{\color{MidnightBlue}{\mu}}}\}}\mapsto{\nu\,{{\color{Maroon}{\Sigma}}}\,\{{{\color{Maroon}{{\nabla a:=e_1'.\, e_2}}}}\parallel {{\color{Maroon}{\mu'}}}\}}} }}\\[6pt] {\dfrac{\matrix{ {{{\color{MidnightBlue}{e_1}}}\ \mathit{val}_{{\color{MidnightBlue}{\Sigma}}}} }}{ {{\nu\,{{\color{MidnightBlue}{\Sigma}}}\,\{{{\color{MidnightBlue}{{\nabla a:=e_1.\, e_2}}}}\parallel {{\color{MidnightBlue}{\mu}}}\}}\mapsto{\nu\,{{\color{Maroon}{\Sigma,a\sim\tau}}}\,\{{{\color{Maroon}{e_2}}}\parallel {{\color{Maroon}{\mu,a\hookrightarrow e_1}}}\}}} }}\\[6pt] {\dfrac{\matrix{ }}{ {{\nu\,{{\color{MidnightBlue}{\Sigma,a\sim\tau}}}\,\{{{\color{MidnightBlue}{{@a}}}}\parallel {{\color{MidnightBlue}{\mu,a\hookrightarrow e}}}\}}\mapsto{\nu\,{{\color{Maroon}{\Sigma,a\sim\tau}}}\,\{{{\color{Maroon}{e}}}\parallel {{\color{Maroon}{\mu,a\hookrightarrow e}}}\}}} }}\\[6pt] {\dfrac{\matrix{ {{\nu\,{{\color{MidnightBlue}{\Sigma}}}\,\{{{\color{MidnightBlue}{e}}}\parallel {{\color{MidnightBlue}{\mu}}}\}}\mapsto{\nu\,{{\color{Maroon}{\Sigma'}}}\,\{{{\color{Maroon}{e'}}}\parallel {{\color{Maroon}{\mu}}}\}}} }}{ {{\nu\,{{\color{MidnightBlue}{\Sigma}}}\,\{{{\color{MidnightBlue}{{a := e}}}}\parallel {{\color{MidnightBlue}{\mu}}}\}}\mapsto{\nu\,{{\color{Maroon}{\Sigma'}}}\,\{{{\color{Maroon}{{a := e'}}}}\parallel {{\color{Maroon}{\mu'}}}\}}} }}\\[6pt] {\dfrac{\matrix{ {{{\color{MidnightBlue}{e}}}\ \mathit{val}_{{\color{MidnightBlue}{\Sigma,a\sim\tau}}}} }}{ {{\nu\,{{\color{MidnightBlue}{\Sigma,a\sim\tau}}}\,\{{{\color{MidnightBlue}{{a := e}}}}\parallel {{\color{MidnightBlue}{\mu,a\hookrightarrow \_}}}\}}\mapsto{\nu\,{{\color{Maroon}{\Sigma,a\sim\tau}}}\,\{{{\color{Maroon}{e}}}\parallel {{\color{Maroon}{\mu,a\hookrightarrow e}}}\}}} }} \end{gather}\]

These aren’t all the rules, but they suffice to give a general gist of what’s going on. Then, we define the judgement \({{\nu\,{{\color{MidnightBlue}{\Sigma}}}\,\{{{\color{MidnightBlue}{e}}}\parallel {{\color{MidnightBlue}{\mu}}}\}}\mapsto^\star{\nu\,{{\color{Maroon}{\Sigma'}}}\,\{{{\color{Maroon}{e'}}}\parallel {{\color{Maroon}{\mu'}}}\}}}\) as the transitive closure of \({{\nu\,{{\color{MidnightBlue}{\Sigma}}}\,\{{{\color{MidnightBlue}{e}}}\parallel {{\color{MidnightBlue}{\mu}}}\}}\mapsto{\nu\,{{\color{Maroon}{\Sigma'}}}\,\{{{\color{Maroon}{e'}}}\parallel {{\color{Maroon}{\mu'}}}\}}}\) and \({{{\color{MidnightBlue}{e}}}\ \mathit{val}_{{\color{MidnightBlue}{\Sigma}}}}\). Finally, the old big step evaluation judgement which previously was primitive in MLTT is reconstructed in terms of the above judgement:

\[ {\dfrac{\matrix{ {{\nu\,{{\color{MidnightBlue}{\cdot}}}\,\{{{\color{MidnightBlue}{e}}}\parallel {{\color{MidnightBlue}{{\langle\rangle}}}}\}}\mapsto^\star{\nu\,{{\color{Maroon}{\cdot}}}\,\{{{\color{Maroon}{e'}}}\parallel {{\color{Maroon}{{\langle\rangle}}}}\}}} }}{ {{{\color{MidnightBlue}{e}}}\Rightarrow{{\color{Maroon}{e'}}}} }} \]

At this point, we have rebuilt the computation system of type theory out from under it.

A memoizing operator in computational type theory

In order to illustrate how we may work with a benign effect in type theory, let us devise a memoizing combinator: \[ {{{\color{MidnightBlue}{{\mathsf{Memo}}}}}\equiv{{\color{Maroon}{ \lambda F. {\nabla r:={\mathsf{inl}}(\bullet).\, \lambda\_.\ {\mathsf{decide}({@r};\ u. {\mathbf{let}\ x = f(\bullet)\ \mathbf{in}\ {\mathbf{let}\ \_ = {r := {\mathsf{inr}}(x)}\ \mathbf{in}\ x}};\ u. u)}} }}}} \]

Memoization and laziness are the canonical examples of benign effects. If my earlier claim is to be true, then the following judgement should become evident to us under the meaning explanations of the judgements and the definitions of the types:

\[ {{{\color{MidnightBlue}{{\mathsf{Memo}}}}}\in{{\color{MidnightBlue}{{\cap A:{\mathbb{U}_{i}}. ({\mathsf{unit}}\to A) \to {\mathsf{unit}}\to A}}}}} \]

If we shall unfold the meaning explanation, then (ignoring trivial wellformedness subgoals), we must demonstrate the following:

\[ {{{\color{MidnightBlue}{A:{\mathbb{U}_{i}}, F : {\mathsf{unit}}\to A, X : {\mathsf{unit}}}}}\gg {{{\color{MidnightBlue}{{\mathsf{Memo}}(F)(X)}}}\in{{\color{MidnightBlue}{A}}}}} \]

Remember that the meaning of the sequent judgement includes functionality (i.e. it requires that the conclusion respect the equality of the premises). But this sequent is clearly evident by the definition of \({\mathsf{Memo}}\), inversion of the sequent rules, inversion of the big step evaluation rule, and finally by induction on the state transition rules.

A side-note on non-determinism

What about non-determinism? Couldn’t we give a wicked combinator which acted like a memoizer for the first 1000 tries, and after that would return a random number? Even with only free assignables, we may introduce non-determinism, and so it would seem that we have added something unjustified to the computation system. How can we rectify this?

This is where it gets sort of interesting. Howe suggests that the specification of computation systems for type theories might be broadened to permit non-determinism by essentially changing the meaning explanation of \({{{\color{MidnightBlue}{M}}}\in{{\color{MidnightBlue}{A}}}}\) to the following:9

To know \({{{\color{MidnightBlue}{M}}}\in{{\color{MidnightBlue}{A}}}}\) is to know that \({{{\color{MidnightBlue}{M}}}\Rightarrow{{\color{Maroon}{M'}}}}\) and for any \(M''\) such that \({{{\color{MidnightBlue}{M}}}\Rightarrow{{\color{Maroon}{M''}}}}\), that \(M'\) and \(M''\) are equal canonical verifications of \(A\).

Another option would be to not change the meaning of the typing judgements, but rather to make the modal/intensional character of the state transition judgements precise, and then reconstruct the \({{{\color{MidnightBlue}{e}}}\Rightarrow{{\color{Maroon}{e'}}}}\) judgement by quantifying over worlds, requiring that the state transition be evident at each world. This is tempting, but it fixes in advance an intensional concept of determinism which is not sufficient in general to support use-cases like the quotiented encryption example I showed in ML.

Howe’s proposed meaning explanation would permit a random number oracle to be introduced into the computation system, so long as it is assigned a type like \(x,y:\mathbb{N}/\mkern-5mu/ {\mathsf{unit}}\).


Thanks to Conal Elliot for inspiring me to think about benign effects with his unambiguous choice operator.