Among the people who are rethinking XML, James Clark has suggested 3 approaches: XML 2.0, XML.next, MicroXML, which pus forward a solid framework to start with. Candle is definitely along the XML.next line. Candle is not compatible with XML; it actually goes beyond XML to unify markup data model with object data model.
In this blog, I'll explain in more details on how Candle Markup (especially with the new object notation) addresses many problems of XML, and how it compares with other formats like JSON and YAML.
Syntax wise, Candle Markup has the following advantages over XML:
- whitespace non-ambiguity: you might think this is trivial, but if you take a look at the recent discussion in xml-dev list on how XML editing tools have to come up with "creative" ways to tackle this issue, you'll see the troubles it causes. And if you ask a DB administrator, he/she will tell you a single whitespace is definitely different from 1000 consecutive whitespace characters.
- cleaner namespace syntax:
there has been enough sigh on XML's namespace, so I'll only show you
the relief. In Candle, you can now write a fully expanded Qname as
ns:domain:foo:bar
a hierarchical name similar to Java. - strongly typed literal
values: Candle uses unique
syntax to denote the type of a literal value. Thus Candle is always
strongly typed, whereas XML is only weakly typed without schema. Some
people questioned why would people adopt the literal syntax of
Candle. I won't claim that Candle's literal syntax is the best, but in
the end the world has to agree on some syntax, so that we'll be able to
exchange typed value.
Candle's literal value syntaxes are carefully designed, and they are based on widely accepted conventions: - empty:
(), "", ''
- as intuitive as they can be; - boolean:
true, false
as usual; - string: double quoted as usual;
- number:
integer
,decimal
anddouble
types follow common syntaxes you've always used;- type
suffix for integer types, like
byte
andshort
, is a convention used by major programming languages, including C/C++, Java, .Net, Python; - measure: widely used in CSS, SVG and SMIL;
- qname: based on Candle's new hierarchical namespace syntax, and similar to C++ and Java. It can't be simpler than that;
- uri: follows the standard URI syntax, except that some schemes are reserved to represent special literal values in Candle;
- specially single-quoted literal values: datetime, color, id, binary, only this part might have other options, like:
- option 1: use Turtle
kind of postfix annotation, e.g.
"2011-11-19"^date
; - option 2: just use
object notation, e.g.
dt{"2011-11-19"}
; - option 3: constructor
syntax, e.g.
dt("2011-11-19")
;
- strongly-typed: no harm to emphasize it one more time;
- clean data type hierarchy: Candle strips all the DTD related types from XML Schema types; namespaces are not modeled as data node; and processing instructions are combined with comment node. This results in a very clean data type hierarchy, comparing to XML Schema Data Model;
- unification with OOP object data model: I'll talk about this later;
- extended the data model to file and directory level: this part has not been fully implemented in current beta release, so I'll not go into details. The basic idea is that you should be able to work with file and directory nodes just like element and text nodes within a document.
For years, developers and architects are baffled by the mismatch between markup data model and object data mode, and have tried hard to find the best way to map one into the other. Candle takes a different approach by unifying the two data models. Whether this is the right direction down the road, I leave it up to you to judge.
If you ask me what are the major differences between the two models, then I think it boils down to just this:
- in markup data model, attribute can only hold simple content, not complex content;
- in object data model, an object can only has attributes, but no child nodes;
In Candle, the new object notation is just an alternative syntax to the element notation. Data model wise, an object is treated exactly the same as an element.
Here's an example of an object in 3 different notations:
Candle Markup Comparing to Other Alternatives
In the following is a feature comparison of Candle against other alternatives. I've selected XML, JSON, YAML, JavaFX object notation. They are not exhaustive, but sufficiently representative, I think:
XML | JSON | YAML | JavaFX Literal Object | Candle | |
Specific Features | |||||
Unicode Support | yes | yes | yes | yes | yes |
Whitespace non-ambiguity | no | yes | yes | yes | yes |
Strongly typed literal values | needs Schema | yes | yes | yes | yes |
Extended literal values (like datetime, uri, qname) |
needs Schema | no | needs type annotation | needs to use object constructor | yes |
Namespace support | yes (but messy) |
no | partial (only the type annotation is namspaced) |
yes (clean hierarchical ns) |
yes (clean hierarchical ns) |
Complex attribute content | no (XML Schema defines a general value list syntax, but is highly ambiguous, and not usable at all) |
yes | yes | yes | yes |
Child node support | yes | no (no direct support) |
no (no direct support) |
no (no direct support) |
yes |
Formal data model | yes (but messy) |
yes | yes | yes | yes |
Schema language | yes (XML Schema is over-complicated; RELAX NG is cleaner, but less used) |
no | no | yes (attribute only, no child content model support) |
yes (similar to RELAX NG) |
Embeddable in programming
languages (as structured nodes not as quoted string) |
yes (.Net, Scala, etc.) |
yes (JavaScript) |
no | yes (JavaFx) |
yes (Candle) |
Advanced processing (path language, query and update language) |
yes (but with overlapping and conflicting features) |
no | no | limited (not as high-level as XPath, XQuery) |
yes (unified query language) |
General Features | |||||
Readability | good for mixed text
content, but verbose for structured data |
good for structured data | good for structured data | good for structured data | good for both (you have object notation; and literal values do not need to be quoted) |
Cross platform | yes | yes | yes | yes | yes |
Open source | yes | yes | yes | promised (but not delivered yet) |
yes |
Lightweight runtime | yes (if you only uses
XML); no (if you starts to use XML Schema, XSLT, XQuery, WS, etc.) |
yes | yes | no | yes (entire runtime is only 2MB when compressed) |
Standards status | W3C standard | RFC standard | no | Oracle only (might become Java standard in future) |
not yet |
Good for structured data | no | yes | yes | yes | yes |
Good for mixed text content | yes | no | no | no | yes |
Generally, YAML can be seen as a superset of JSON, and JavaFX literal object can be seen as superset of YAML. These 3 formats are good for structured data exchange. Candle can be seen as a superset of XML (excluding DTD) and the other 3 object formats.
In the next blog, I'll give you some illustrative examples to show you how Candle Markup can naturally express data which currently requires domain-specific formats.
No comments:
Post a Comment