Friday, June 24, 2011

Why I Invented Candle (I)

The very first idea of inventing a new programming language hit upon me more than 10 years ago. At that time, I was still in the university, and I was trying to build a web forum for my peer scholars. I was using Perl. The pain in using Perl drove me to invent a better language that treats markup as built-in data type. Initially I was trying to mix XML with Java, which I considered to be the best markup and programming languages in the world. Then came XSLT and XQuery, which changed my design of the new language fundamentally.

So as said, the single and most important reason for inventing Candle is to create a new general-purpose programming language that treats markup data as built-in data type.

I think I don't have to emphasize too much on the importance of the markup data. Just look at the sheer amount of HTML documents on the Internet; and the wide-spread usage of XML documents in all applications, including office suite (ODF and OOXML), vector graphics (SVG), 3D graphics (X3D); and even more data are now accessible in XML format through channels like RSS feeds, Web Services and RDB mapping.

So it obviously makes sense to treat markup data as built-in data type. When I say built-in data type, it means:

  • more than just some APIs to parse and process XML documents, like DOM or SAX;
  • and more than some OXM tools that tries to map markup data into object data, like XStream, Spring OXM;
  • and more than just the ability to generate markup documents as string, like most server-side scripting languages do;
  • and more than some half-baked solutions like LINQ to XML, E4X;
It really means to process markup data the way XPath, XSLT, XQuery do - to be able to parse, serialize, validate, query, transform and update markup data natively. Not as string and not as mapped object.

But then why not just go with XSLT or XQuery. That's because I felt they are not good enough. Firstly, XSLT is too verbose to be usable, at least for me. Secondly, I think we should have one unified XML processing language instead of two - XSLT and XQuery, or three if you count in XProc, or four if you consider XML Schema as a processing language, or five if you include the one that is going to be defined by W3C soon. Thirdly, they are DSLs (domain specific languages). Embedding them as DSLs in general-purpose programming languages is just as painful as embedding SQL.

So Candle is invented. The merits of Candle are:
  • It treats markup data as built-in data type. Just to digress a bit, Candle not only reinvents a new script language, but also a new markup language, Candle Markup. The new markup language is created to address two major issues of XML. Firstly, without schema, XML is only a semi-structured or weakly-typed markup language; whereas Candle Markup is always a strong-typed markup language. The data type of the nodes in Candle can be derived from the carefully designed literal syntax. Secondly, Candle eliminates the ambiguity of whitespace nodes in XML by requiring text nodes to be explicitly quoted. While ambiguous whitespace node is not a problem for presentation usage, as in HTML, it can be a big headache for precise programming usage. We programmers know how many hard-to-detect bugs are cause by the differences between NULL, "", " ", "\n" and "\r\n".

    Of course, Candle runtime supports not just Candle Markup, but also other popular markup formats, like XML, XHTML, HTML, MIME messages, JSON and CSV. Actually any abstract text can be easily converted into XML markup through Candle's grammar-based parsing function. The ideal of Candle is to process any hierarchical data, not just XML, natively.
  • It provides rich processing capabilities on markup data. Candle is able to parse, serialize, validate, query, transform and update markup data all in one unified language, instead of using many DSLs (XSLT, XQuery, XQuery Update, XML Schema, RELAX NG, RegEx, BNF). In Candle, you'll see the consistencies in the data model, the pattern and query processing, rather than the mess of overlapping features, conflicts and incompatibilities of all these DSLs combined. With the unification, the learning-curve is substantially reduced and productivity of users is greatly increased.
  • It's also a general-purpose programming language. I think most programmers understand the pain of embedding DSLs in GPLs, if they have worked with SQL embedded in GPLs. So I don't want Candle to be just another embedded DSL. I want it to be a full-fledged GPL. Actually XQuery can already be considered as a general-purpose functional language. So Candle just extend it further. Firstly, Candle introduced statement-based syntax, besides the expression-based syntax in XQuery. That's because the former is much more readable and familiar to most programmers than the latter. You'll have to see that yourself. Secondly, Candle added some common features in most procedural languages, like variable assignment and while loop. And they are carefully managed through a new mechenism, I called separation-of-side-effects, so as not to spoil the functional nature of the query language. Thirdly, Candle provides library functions to do file operation and stdio, to access DB, and even invoke native routines in DLL. Candle can be just as general-purpose as Python or Java. Of course the library functions in Candle are not as rich as in these GPLs

    With Candle script, you can easily develop command-line programs, web applications, and desktop-top GUI applications. How easy they are, you'll have to see the tutorials and documents yourself. Candle is designed to be general enough for both client-side and server-side programming, for both desktop and web applications.
What's more in Candle? We'll have to discover and develop together.

5 comments:

  1. Ever looked at Lisp? I think tha it's code-as-data principle and powerful macro system actually provides almost everything you're putting forward as arguments for creating Candle. Even if you plan to do Candle anyway, I recommend you to thoroughly check it out, you'll certainly get some good ideas out of the language.

    ReplyDelete
  2. Link to Candle Markup documentation broken.

    Also, grammatical error: "Firstly, without schema, XML is only a semi-structured or weekly-typed markup language; whereas Candle Markup is always a strong-typed markup language." Should say weakly-typed. Of course, weakly-typed itself is viewed as a poor way of describing a language. See Benjamin Pierce's book Types and Programming Languages for an explanation.

    I can't comment further until your links are no longer broken.

    ReplyDelete
  3. Your Candle Markup documentation is linked to correctly from LtU, but not from this blog post. Please fix it.

    With regards to Candle's ability to peer inside the values of attributes, I don't understand the benefit to enforcing a particular syntax/semantics pair. Why do I have to use your encoding? Is there a way I can write my own encoding?

    For example, Microsoft's XAML language veers pretty far from normal XML processing. One of the features that veers far from it is the use of MarkupExtension's to define the meaning of an element or attribute.

    ReplyDelete
  4. Thanks, Denwash for suggesting Lisp. I'm definitely aware of it, including several other functional languages like Scheme and Haskell.

    The unique features in functional languages (comparing to procedural languages) do inspire me a lot, during my process of designing Candle.

    The code-as-data principle is definitely one of the principles that I honor. In Candle's context, I see any script as a hierarchical tree. And Candle has the ability to parse any abstract text into AST based on BNF grammar. This AST can then be easily processed with the powerful tree processing capabilities in Candle.

    My personal opinion on XQuery/XSLT/XPath (and Candle), comparing to Lisp, is that they bring functional programming to a higher-level of semantics, like C++ comparing to C. Yes, everything done in C++ can be done in C, but C++ is more than C.

    ReplyDelete
  5. Thanks John for pointing out some errors in the blog. I've fixed them.

    The reason for Candle to invent a way of encoding literal values is that I want the markup to be strong-typed. This has pros and cons comparing to XML.

    XML is more general and allows the value to be encoded in any way. The pro is that everybody can choose his/her own encoding of values. The cons are: 1) we need full schema to determine the types of the values in the markup, thus making the processing more difficult 2) we need to translate between the encodings.

    This is like the issue of character encoding of I8N text. In the beginning, every language invented its own encoding. But in the end, we unifies to Unicode. It doesn't mean Unicode is superior to any of the other encodings. It's because the world needs the convenience of unification on character encoding.

    I won't claim Candle's way of encoding of literal values is the best. But in the end, the world has to agree on some encoding, so that we can exchanged strong-typed markup data easily. It might be JSON, YAML, or Candle. Only time can tell.

    And I think this is one of the primary reasons that make people switch from XML to JSON.

    ReplyDelete