Tuesday, July 28, 2009

The Summer XML 2009 Conference Day 2

C.M. Michael Sperberg-McQueen is speaking today on preserving data across time, positing that the effort comprises the sending of a message to the future. I appreciate the depth of thinking, and agree with most if not all of what he is saying, but my mind wanders to the thoughts of yesterday. It is almost as important to be able to forget data -- to discard the clutter that occurred during the growth phases of the individual form in re-establishing new, efficient forms which for practical purposes contain, if not the same, then at least the relevant, information.

High order biological systems accomplish the feat of directed, constructive memory loss by multiple means. Sexual reproduction. The first three years of development. Death of the individual. (As I write, Michael refers to this as failures of the channel.) Mutation of oral narratives passed on from generation to generation. Misreading or re-interpretation of written history. One might say that it is the "normal" mode, the evolutionary environment which co-developed along with our, human, experience. Is there any reason to expect that, in doing better in one aspect for a short while, we do not also lose some other aspects of robustness which we would have retained? To put it more directly shouldn't our information systems be designed explicitly to be reproductive?

Michael's talk also makes me wonder: is there a place for a Super Master Virtual Emulator Service? Some networked environment that provides a capability of reproducing the most arcane obsolete computing environments? It may not be truly possible, but is that ideal even approachable for a practical purpose of regaining access to data the way it was meant to be experienced? Aside from technical issues, is there a standard legal framework which would allow this to occur, or even require it in the case of data which is deemed important to the public good?

Tag abuse. Isn't the substitution of Uracil for Thymine to go from RNA to DNA a form of tag abuse?

Semantic failures. Michael mentions failure to grasp full meaning and import as an endemic characteristic. Indeed, the phrasing suggests a 19th-century view of perfection. Is there always a "full meaning" to grasp, or is it that the articulation may be full or incomplete but the mapping can only be considered "full" in consideration of what it is you are mapping to?

As I write, Michael alludes to detecting errors in migration or failures in emulation. The assertion is that faults are difficult to assess, since syntactical checking may not show irregularities. We use markup validation as an aid, using syntactic levels, then "supravalidation" to apply higher rules, and "false-color proofs" to highlight the purportedly correct structures of a certain type for an aid in human visual review. Also sorting, since garbage often falls to the bottom or top of a list.

An allusion to user-serviceable XML editorial applications: "padded cells" by reference to Henry Thompson to allow limited re-marking of markup: XForms is a practical basis for such interfaces. Colored Markup? Flavored Markup?

Michael is also giving practical work practice. A 1-10-100 practice for loading: the reason for the practice is that it gives you immediate feedback to avoid rework. An idiom for writing markup vocabulary specifications, using skeleton sentences in plain English to make a formal tag set library. Express the assumptions in first-order logic, allowing detection of potential degeneration.

Where to go next: I'm torn between the IBM and Mulberry presentations, so I sit in on both, but end up in Debby's Schematron presentation. I'm drawn to the publishing-oriented presentations, perhaps because the data means more: it sticks around longer and has more worth to both the organization creating it and society at large.

Yes, I'm listening, but again my mind wanders: oXygenXML is getting an awful lot of floor time, given that they have never had a physical presence at any of Kay's conferences! Also, I get an email from Google, saying that I've been approved for access to the Google Wave sandbox. Woohoo! Playtime! Ah, but I've got to pay attention now, getting to some detail...

Assert vs Report in schematron, and relation to XSLT: the vocabulary is ambiguous to people; as Debby observes sooner or later you'll get it wrong. Assert means that you want the validator to tell you when the assertion has been violated, and Report means that you want the validator to tell you when the assertion has been satisfied. I think. Yeah, that's right. Anyway, poor choice of terms for us native English speakers -- not especially bad, just vague, the splitting of a hair. It is an ISO standard now, so I suppose that choice is no longer plastic but has hardened into concrete. Yet OASIS has another effort going on for expressing generalized test assertions in XML, and the Oxford U. Press people showed how they've used yet another similar system.

The filter-to-validate idiom seems to be running rampant now. Do people realize that the "to-validate" idiom is not the most interesting application of a pipeline?

No comments: