Tuesday, December 6, 2011

Candle's Procedural Design

In the previous blog, I've shared my thoughts about functional programming. Sometimes it looks like to me that procedural programmers are side-effect addicted, whereas functional programmers tend to have side-effect phobia. Hard-core functional programmers tend to spread the feelings that side-effects are evil. Yes, if the side-effects are unnecessary, then they should be eliminated by all means. But if the side-effects are desirable (by requirements), then there's nothing evil about them. There's no impurity in a program if it allows desirable side-effects.

Everyone agrees that side-effects should be controlled and carried out in a structured manner. The question is how. I think the Holy Grail of current programming language design is to find the best way to mix procedural programming with functional programming. In the following blog, I'm going to talk about the procedural design in Candle, and how it compares to similar designs in languages, like Haskell, Erlang and Clojure.

Separation of Side-Effects

Separation-of-side-effects is a fundamental mechanism in Candle to cleanly segregate the functional world from the procedural world. It's not a completely new idea. A similar concept, command-query separation, is also proposed by the creator of Eiffel, Bertrand Meyer. This principle is probably also recommended in all programming languages as a good practice. The setter and getter in OOP can be thought as following this principle as well. But only Candle has make it into the design of the language.

Many new programming language, like Python, Ruby and Scala, are embracing functional programming features. But without SoSE, the compiler cannot check the purity of some seemingly functional code. The functional features can be ruined by just a single line of careless procedural code. And it's also sad to see many XQuery and XSLT implementations are introducing external routines without separating the functional ones from the procedural ones. This simply spoils the functional design of XQuery and XSLT. This shows that most programmers today still have predominantly a procedural mindset.

I'd also like to compare SoSE against I/O monad in Haskell. I was really amazed when I first studied I/O monad and discovered how similar it was to Candle's SoSE.

Haskell Candle
Actual code performing side-effects are encapsulated by the system. Actual code performing side-effects are encapsulated as system method.
A normal function cannot call functions containing action. A function cannot call any method.
Actions or functions containing action cannot be called in an expression. Expressions in Candle are always functional, thus cannot call any method.
Do notation. Procedural statements in Candle.
Functions containing action must be declared with IO type. Routines with side effects must be declared as method.

But SoSE in Candle is much easier to understand than monad. The lines in the table above are sufficient to describe it. But to understand monad, you have to go through a whole course (and a few times maybe). Yes, some may argue that understanding of monad is not required to make use of I/O monad, but I feel that is just like the way Microsoft sells its OS - you just have to know how to use it; you don't have to know how it works. Is monad really dummy-proof?

But the similarity between Candle SoSE and Haskell I/O monad does not mean I think monadic I/O is a right design today. It would be a right choice 16 years ago to introduce monadic I/O to Haskell. But as programming language evolves, byte and line oriented I/O is becoming too low-level. Modern query languages, like XQuery, XSLT, SPARQL, no longer concerns about how the source data is loaded and serialized. They just declare where to get what they want, it is all up to the underlying engine to implement the I/O; whether it is in whole chunk or buffered, pipelined, streamed, lazy or eager, cached or not cached, local or remote, they don't care. This is the right direction to go. So in Candle, you won't see low level system routines like read-line(), read-byte().

Controlling State Mutation

As mentioned in this blog article, and probably many other places, "all concurrency issues boil down to coordinating access to mutable state". Candle with its core language designed based on XQuery and XSLT, has apparent advantages over all other procedural languages in controlling mutable state. In Candle:
  • All expressions are functional.
  • Dynamic scoped variable can be used in place of thread local storage.
  • All temporal objects are not mutable after they are created.
  • The procedural set statement is just a simple language construct to facilitate while-loop. It can be used to update a variable to a new value, but it cannot change any existing value.
  • Changes to the mutable state are carried out through set-oriented CRUD statement, instead of low level assignment operation.

Providing Concurrency

A true general-purpose language cannot escape from addressing concurrency. Candle's concurrency architecture is very close to Erlang and Clojure. Before I continue, I'd like to make it clear that what I'm talking in this section is very much a blue print. The whole implementation of concurrency support in Candle will probably take a few major releases.

Here's the concurrency architecture diagram of Candle.
The architecture has 3 layers:
  • The functional core: based on XQuery and XSLT.
  • The concurrent execution model: Candle supports concurrency through Task. A task is just a declarative way to tell the system that something can run in parallel. Whether it is implemented as coroutine, thread, fiber, process or even remote process is all up to the underlying engine (of course declarative hints can be given by the programmer to indicate a more appropriate choice). A task is a share-nothing running instance. A task holds 3 kinds of date: a message queue, a stack and a heap. Tasks communicates through asynchronous messages, as in Erlang.
  • Shared and mutable state management: which can be further divided into transactional and non-transactional state.
    • Transactional states are managed by system through STM (software transactional memory). It can be a thin wrapper around existing transactional data stores, like XML and relational database, or an STM adapter to in-memory virtual data store, or local and remote file system.
    • Legacy and physical system interfacing: it's impossible to make everything transactional. It might be some legacy system, or some physical operation that cannot be reversed, like printing, burning disk, permanently erasing data from hard disk, etc. They still need to be coordinated through locking mechanism.
In the following table, I compares Candle's architecture to Erlang and Clojure:

Features Erlang Clojure Candle
Core Functional Language based on Prolog based on Lisp based on XQuery/XSLT
State Mutation Restriction single assignment immutable data structures immutable items, nodes and objects
Concurrent Model fiber/actor (Java) thread task
Shared and Mutable State Mnesia (transaction based data store) STM STM

Similar architecture is also mentioned in this and this LtU post. I think this is no coincidence. If you start with a functional core, such architecture is just natural.

The above descriptions just paint a very high-level picture and there's really a lot of details to iron out in the design and implementation.


To summarize, I'd like to reiterate the design goals of Candle's procedural support:
  • Clean separation of functional and non-functional features.
  • To provide high level procedural features, like CRUD, declarative serialization and network operations, so as to eliminate primitive procedural operations, like assignment and byte-and-line based I/O. 
  • To provide a robust and efficient concurrent programming model. It should eliminate primitive synchronization operations, and avoid common concurrent problems like deadlock and race conditions.
Your support and feedback is needed to make these dreams come true.

1 comment:

  1. I think your description of Haskell is a little off:

    "A normal function cannot call functions containing action."

    "Actions or functions containing action cannot be called in an expression."

    "Functions containing action must be declared with IO type."

    See http://codepad.org/2Cno4TkT