Propagating, artfully

Just came across [this paper] today, and it was super-interesting. (I’m assuming the title is a reference to [an earlier one] by Steele and Sussman).

You can read the abstract, or [watch Sussman talk about it], so I won’t waste your time summarizing it here. Instead, I’ll skip all the way ahead to the [thesis version] of the paper, past the “Conclusions” and on to the section titled Philosophical Insights. This part was quite … enlightening.

First off, the “concurrency problem” really isn’t one:

… the problem comes from trying to maintain the illusion that the events of a concurrent system occur in a particular chronological order.

But this artificiality is about concurrency in programs, not in the natural world, since

… concurrency is the natural state of affairs, and synchronicity is what’s difficult. The physical world in which we live is perfectly concurrent: every little patch of universe evolves on its own, according to local rules, and physical effects travel from one patch to another at a finite speed …

More importantly,

Our experience of time appears as a linear stream of events only because our memory imposes an order on them, which is the order in which events are remembered to have occurred … the illusion of synchronous time is sustainable for our society, however, because the patch of universe we occupy synchronizes far faster than the temporal resolution of “event” our society expects.

So, then, why is this a problem? Because we work around the behavior of the “analog world” to create the “digital world”.

A great deal of engineering effort is expended to build from this substrate computational devices that provide the illusion of synchronous time.

This would be fine, except that these cores share memory. Memory is what creates this problem.

This large memory implicitly imposes a linear order on the reads or writes that it experiences. Since synchronizing concurrent things well is hard, the implicit synchronization done by such a memory is terrible.

And this is what makes concurrent programming so hard, which is where the propagator network steps in.

… the structure of a propagator network is analogous to space, and the speed at which information propagates through it is analogous to the speed of light … the idea of partial information lets us build a propagation infrastructure that does not force the notion of synchronous time.

Let’s take a step back now and consider the bigger picture.

An electronic computer is a fixed electrical circuit that performs a universal computation, that is, it implements a layer of indirection which allows it to mimic any computation that is described to it … we understand the meaning of machine instructions as sequences of things for the computer to do, with a very powerful notion of the flow of time

Seen in this way, we can derive many concepts in Computer Science as notions introduced to rearrange or describe parts of these sequences, such as imperative programming …

… pure linear sequences, or course, are insufficient for describing the computations we want to describe, so we add means of naming portions of such sequences and referring to them …

… or virtual memory …

… virtual memory introduces a notion of space. The memories of separate programs running on the same computer are kept separate … each program, therefore, has its own flow of time …

… or lexical scope …

… lexical scope introduces a finer-grained notion of space … each lexical scope is therefore a sort of point in space; and their interconnections give space a topology.

This leads us, inexorably, to thinking of the fundamental structure of computations as trees rather than lines.

Tree-shared computing systems can of course be made out of line-shaped ones with some additional mechanism, such as a recursive interpreter of the tree structure of a program.

… and I love this next bit …

Conceptually, the space of such a system is topologically a tree, and time is inherited from the underlying linear interpreter, and therefore constitutes some traversal of the tree that is space. The use of names to refer to reusable partial results of a tree computation turns it into a directed graph computation. The sequential time of the underlying computer constitutes a topological sort of the computation, forcing the graph to be acyclic.

Whew. Ok! So now we know! With this insight, let’s tackle the next bit, the bane of all FP-advocates: Side effects.

(Note: I would like the paper to end right here, at this high point. I could split this article into two halves, but if I stop here I probably won’t get around to the next one)

The essential reason why side effects tend to mess up models of computation is that side effects inescapably introduce time.

So the fundamental tension can be expressed as …

… we want to be able to build programs whose observable actions are what we want them to be; for that we need the observable actions of our programs to be predictable from the programs themselves.

… or equivalently, as …

… between giving our computers the freedom to operate as they will on their internal structures, where we are not concerned with the details of progress but only with its result; and constraining our computers to act the way we wish externally …

Ok, we know this, we’ve seen this before. What exactly is new about this?

The same tension exists in the architecture of human minds. Indeed, people too can both think and act … we have a word for the transition humans make between thought and action: “decision”.

This is something I found quite novel. Thinking doesn’t respect the flow of time at all. But action is constrained entirely by it. So we are faced with an (apparent ?) cartesian-like duality of thought and action, and we have to choose how to deal with it.

So the result is either that the primitives for action amount to always doing everything whenever one thinks about doing it, or that the story about thinking offers no primitives for action at all.

So either thoughts are constrained in a way to make actions predictable, or else “bridges” between these two worlds of thought and action are needed. Again, the propagator paradigm can step in and resolve this.

We can have three worlds, of thoughts, actions and decisions, each being some sort of propagator. To put it more concretely,

… when a network has many fan ins and cycles, and operates on interesting partial information structures, propagators can run in a chaotic order, and be run many times, and let the merging of the partial information ensure that they produce a coherent result at the end; this is thought.
… when a network is acyclic and the “partial” information structure is the all-or-nothing structure, then propagation mimics conventional evaluation, and the order of events is perfectly predictable; this is action
… propagators that will accept partial inputs, that can perhaps be refined with the passage of time, and will take upon themselves the responsibility of picking a moment, making a choice, and committing to an all-or-nothing answer; this is decision

That’s it! Read it once, read it twice, I can guarantee you that you won’t think about the computation the same way again!

The “debate” had an interlude, in which Costanza asked Sussman why MIT had switched away from Scheme for their introductory programming course, 6.001. This was a gem. He said that the reason that happened was because engineering in 1980 was not what it was in the mid-90s or in 2000. In 1980, good programmers spent a lot of time thinking, and then produced spare code that they thought should work. Code ran close to the metal, even Scheme – it was understandable all the way down. Like a resistor, where you could read the bands and know the power rating and the tolerance and the resistance and V=IR and that’s all there was to know. 6.001 had been conceived to teach engineers how to take small parts that they understood entirely and use simple techniques to compose them into larger things that do what you want.

But programming now isn’t so much like that, said Sussman. Nowadays you muck around with incomprehensible or nonexistent man pages for software you don’t know who wrote. You have to do basic science on your libraries to see how they work, trying out different inputs and seeing how the code reacts. This is a fundamentally different job, and it needed a different course.

So the good thing about the new 6.001 was that it was robot-centered – you had to program a little robot to move around. And robots are not like resistors, behaving according to ideal functions. Wheels slip, the environment changes, etc – you have to build in robustness to the system, in a different way than the one SICP discusses.

And why Python, then? Well, said Sussman, it probably just had a library already implemented for the robotics interface, that was all.