Monday, July 27, 2009

The Summer XML 2009 Conference Notes - Day 1

I'm sitting listening to Leigh White open the conference, speaking toward the motivations for structuring content, in open standards-compliant formats. She asks the group, how many are taking their content to multiple outputs... few raise their hands. They don't realize that unity is plural, and at a minimum, two: even when a content collection someone manages goes to "one" output, that output is often composed of multiple output forms, and includes additional forms of data at the periphery -- like the stuff that gets fed into searching or indexing features. Every function is really an output, and every output is a function. So saying that you don't have multiple outputs is akin to saying you don't do anything with your content, which is nonsense if you're doing anything at all.

Anyway, the "selling" of XML should be a bit of a past-tense proposition: if professionals cannot see that they've been encircled by the technology, data assets, and other stakeholders depending upon XML, they are blind. XML has invaded the economy like Kudzu in North Carolina. Still, White makes important points about the pain points involved, and the real economic costs from failing to adopt structured methods.

Her recounting reminds me of a discussion at a telecom manufacturer, explaining why so much capital was moving to China and India. The infrastructures of the US and western Europe had been built-out with copper wire for a hundred years. New major paths for laying new networks are hard to find, and retrofitting all that infrastructure is a huge and costly endeavor. In China and India, it's all new from the ground-up work, using the latest technology and most modern techniques. So the US telecom infrastructure is actually a liability, almost as much as it is an asset.

As with an organic, evolutionary system, many of the earliest decisions end up dictating the robustness and fecundity of the system -- the ability to scale without collapse and to provide continued richness of available forms.

She also came prepared with graphs: they went from a 1-1 to a 1-4 source to output ratio over three years, adding over 5000 sources along the way. They also effectively increase their own productivity to the point that they had the capacity to deliver more outputs than their internal customers could demand. Imagine being able to deliver everything the customer asks for... and being ready to deliver before other segments of the team... novel ideas!

White observes: about one in four people cannot grasp or accept working with structured content. That's an interesting anecdotal statistic.

Sitting in on Doug Schepper's ex-HTML session. He's going over a lot of pre-history of XML... maybe too basic, but I know a few people here who aren't familiar with it from the perspective of a markup geek. Talking about dead or DOA efforts: Compound Documents, XHTML 1.1, XHTML 2, HTML 5 splinter cell started by browser vendors as a reaction to W3C's XHTML/XML focus: about 3750 times more people use html than xhtml. XHTML 2 is effectively dead as a working group as of December 2009.

Differences? Syntax and error handling vs recovery: HTML5 supports error recovery, accepts slightly smelly garbage in and can still produce a viewable document, but XHTML rejects invalid documents by design, in that they took a restrictive interpretation of an XML processor. HTML5 doesn't allow arbitrary XML because it down-converts the tag names and loses their true identity. Further, attribute values need not be quoted, and support boolean attrributes (like checked or selected in an option). Weird exceptions. The HTML parser doesn't follow any of the XML parsing rules. Yet it seems to be a monolithic one-size-fits-all specification. HTML5 DOES NOT DO NAMESPACES, so it cannot accept foreign elements or indeed any XML designed for realistic content workflows. Instead, they introduce microformats using RDFa and ARIA ("Accessible Rich Internet Applications"). using RelaxNG and Schematron, good news there! An XML5 with error correction, no DOCTYPEs ?

Did my presentation... cluttered and spotty examples. I could blame PowerPoint for botching my printed speaker notes -- it blanked out the slide so that I couldn't track the examples with the slides -- but despite the fits and starts the feedback was generally good. I've got to get away from Microsoft formats though. Power Point not only messed up my speaker notes but I lost a good bit of an animation I had done in it as well.

No comments: