Tuesday, July 28, 2009

The Summer XML 2009 Conference Day 2

C.M. Michael Sperberg-McQueen is speaking today on preserving data across time, positing that the effort comprises the sending of a message to the future. I appreciate the depth of thinking, and agree with most if not all of what he is saying, but my mind wanders to the thoughts of yesterday. It is almost as important to be able to forget data -- to discard the clutter that occurred during the growth phases of the individual form in re-establishing new, efficient forms which for practical purposes contain, if not the same, then at least the relevant, information.

High order biological systems accomplish the feat of directed, constructive memory loss by multiple means. Sexual reproduction. The first three years of development. Death of the individual. (As I write, Michael refers to this as failures of the channel.) Mutation of oral narratives passed on from generation to generation. Misreading or re-interpretation of written history. One might say that it is the "normal" mode, the evolutionary environment which co-developed along with our, human, experience. Is there any reason to expect that, in doing better in one aspect for a short while, we do not also lose some other aspects of robustness which we would have retained? To put it more directly shouldn't our information systems be designed explicitly to be reproductive?

Michael's talk also makes me wonder: is there a place for a Super Master Virtual Emulator Service? Some networked environment that provides a capability of reproducing the most arcane obsolete computing environments? It may not be truly possible, but is that ideal even approachable for a practical purpose of regaining access to data the way it was meant to be experienced? Aside from technical issues, is there a standard legal framework which would allow this to occur, or even require it in the case of data which is deemed important to the public good?

Tag abuse. Isn't the substitution of Uracil for Thymine to go from RNA to DNA a form of tag abuse?

Semantic failures. Michael mentions failure to grasp full meaning and import as an endemic characteristic. Indeed, the phrasing suggests a 19th-century view of perfection. Is there always a "full meaning" to grasp, or is it that the articulation may be full or incomplete but the mapping can only be considered "full" in consideration of what it is you are mapping to?

As I write, Michael alludes to detecting errors in migration or failures in emulation. The assertion is that faults are difficult to assess, since syntactical checking may not show irregularities. We use markup validation as an aid, using syntactic levels, then "supravalidation" to apply higher rules, and "false-color proofs" to highlight the purportedly correct structures of a certain type for an aid in human visual review. Also sorting, since garbage often falls to the bottom or top of a list.

An allusion to user-serviceable XML editorial applications: "padded cells" by reference to Henry Thompson to allow limited re-marking of markup: XForms is a practical basis for such interfaces. Colored Markup? Flavored Markup?

Michael is also giving practical work practice. A 1-10-100 practice for loading: the reason for the practice is that it gives you immediate feedback to avoid rework. An idiom for writing markup vocabulary specifications, using skeleton sentences in plain English to make a formal tag set library. Express the assumptions in first-order logic, allowing detection of potential degeneration.

Where to go next: I'm torn between the IBM and Mulberry presentations, so I sit in on both, but end up in Debby's Schematron presentation. I'm drawn to the publishing-oriented presentations, perhaps because the data means more: it sticks around longer and has more worth to both the organization creating it and society at large.

Yes, I'm listening, but again my mind wanders: oXygenXML is getting an awful lot of floor time, given that they have never had a physical presence at any of Kay's conferences! Also, I get an email from Google, saying that I've been approved for access to the Google Wave sandbox. Woohoo! Playtime! Ah, but I've got to pay attention now, getting to some detail...

Assert vs Report in schematron, and relation to XSLT: the vocabulary is ambiguous to people; as Debby observes sooner or later you'll get it wrong. Assert means that you want the validator to tell you when the assertion has been violated, and Report means that you want the validator to tell you when the assertion has been satisfied. I think. Yeah, that's right. Anyway, poor choice of terms for us native English speakers -- not especially bad, just vague, the splitting of a hair. It is an ISO standard now, so I suppose that choice is no longer plastic but has hardened into concrete. Yet OASIS has another effort going on for expressing generalized test assertions in XML, and the Oxford U. Press people showed how they've used yet another similar system.

The filter-to-validate idiom seems to be running rampant now. Do people realize that the "to-validate" idiom is not the most interesting application of a pipeline?

Monday, July 27, 2009

The Summer XML 2009 Conference Notes - Day 1

I'm sitting listening to Leigh White open the conference, speaking toward the motivations for structuring content, in open standards-compliant formats. She asks the group, how many are taking their content to multiple outputs... few raise their hands. They don't realize that unity is plural, and at a minimum, two: even when a content collection someone manages goes to "one" output, that output is often composed of multiple output forms, and includes additional forms of data at the periphery -- like the stuff that gets fed into searching or indexing features. Every function is really an output, and every output is a function. So saying that you don't have multiple outputs is akin to saying you don't do anything with your content, which is nonsense if you're doing anything at all.

Anyway, the "selling" of XML should be a bit of a past-tense proposition: if professionals cannot see that they've been encircled by the technology, data assets, and other stakeholders depending upon XML, they are blind. XML has invaded the economy like Kudzu in North Carolina. Still, White makes important points about the pain points involved, and the real economic costs from failing to adopt structured methods.

Her recounting reminds me of a discussion at a telecom manufacturer, explaining why so much capital was moving to China and India. The infrastructures of the US and western Europe had been built-out with copper wire for a hundred years. New major paths for laying new networks are hard to find, and retrofitting all that infrastructure is a huge and costly endeavor. In China and India, it's all new from the ground-up work, using the latest technology and most modern techniques. So the US telecom infrastructure is actually a liability, almost as much as it is an asset.

As with an organic, evolutionary system, many of the earliest decisions end up dictating the robustness and fecundity of the system -- the ability to scale without collapse and to provide continued richness of available forms.

She also came prepared with graphs: they went from a 1-1 to a 1-4 source to output ratio over three years, adding over 5000 sources along the way. They also effectively increase their own productivity to the point that they had the capacity to deliver more outputs than their internal customers could demand. Imagine being able to deliver everything the customer asks for... and being ready to deliver before other segments of the team... novel ideas!

White observes: about one in four people cannot grasp or accept working with structured content. That's an interesting anecdotal statistic.

Sitting in on Doug Schepper's ex-HTML session. He's going over a lot of pre-history of XML... maybe too basic, but I know a few people here who aren't familiar with it from the perspective of a markup geek. Talking about dead or DOA efforts: Compound Documents, XHTML 1.1, XHTML 2, HTML 5 splinter cell started by browser vendors as a reaction to W3C's XHTML/XML focus: about 3750 times more people use html than xhtml. XHTML 2 is effectively dead as a working group as of December 2009.

Differences? Syntax and error handling vs recovery: HTML5 supports error recovery, accepts slightly smelly garbage in and can still produce a viewable document, but XHTML rejects invalid documents by design, in that they took a restrictive interpretation of an XML processor. HTML5 doesn't allow arbitrary XML because it down-converts the tag names and loses their true identity. Further, attribute values need not be quoted, and support boolean attrributes (like checked or selected in an option). Weird exceptions. The HTML parser doesn't follow any of the XML parsing rules. Yet it seems to be a monolithic one-size-fits-all specification. HTML5 DOES NOT DO NAMESPACES, so it cannot accept foreign elements or indeed any XML designed for realistic content workflows. Instead, they introduce microformats using RDFa and ARIA ("Accessible Rich Internet Applications"). Validator.nu using RelaxNG and Schematron, good news there! An XML5 with error correction, no DOCTYPEs ?


Did my presentation... cluttered and spotty examples. I could blame PowerPoint for botching my printed speaker notes -- it blanked out the slide so that I couldn't track the examples with the slides -- but despite the fits and starts the feedback was generally good. I've got to get away from Microsoft formats though. Power Point not only messed up my speaker notes but I lost a good bit of an animation I had done in it as well.

Saturday, July 18, 2009

Names don't constitute knowledge

I've worked (or studied) in some STEM area for around 20 years or so, usually in a company engaged in engineering and technology. As a student of software development, I have sought a sense of groundedness and found it to be elusive in that particular domain. Why is that?

The title of this entry comes from something Richard Feynman's dad taught him in a discussion about a bird. The brown throated thrush has many names in many human languages, but even if you were to learn every one of its names in every language you would still know nothing about the bird.

That's a big part of what's wrong with information technology today. People believe their own marketing so much, that they really conceive that giving something a new name constitutes invention; that labeling things constitutes learning and the labels constitute knowledge. As Feynman also observed, labeling has utility when you want to talk to other people about things and concepts. But beyond that knowing labels is not the same as understanding a subject.

Feynman's view of "social science" is a related viewpoint, and applies well IMHO to technology, or at least the technological public's view of the basis.

Sunday, July 12, 2009

Failing to migrate is not a lean strategy

How many companies realize the true cost of failing to plan for technology migrations? The death march is not just a kind of failing project, it is also the character of an entire organization when managed with non-lean methods.



As time goes by, the technology terrain shifts and is sculpted by forces of (a) academic transfers, especially but not exclusively of STEM research, (b) open source and proprietary developments, including those of standards organizations and (c) government and quasi-government fiat. Organizations such as Orange UK attempting to set a fixed goal fail before they've deployed a single line of code, because the target itself is moving, but the space it is in is also growing, shrinking, skewing, or otherwise being subtly or suddenly transformed. Not to adjust the sights once in a while is a blatant management failure, not a mistake.

Living next to a void

I read this today and felt the need to share. It speaks deeply.

My wife has been diagnosed with depression and anxiety. She has been like this off and on since we have been married, but things have gotten worse. I have tried to be supportive, upbeat, etc. for many years, but it is taking its toll. It is hard for us to have a “fun” conversation anymore, because she seems to always bring it back to something bad in her life. I respond by trying to offer something cheerful or hopeful, but the game just keeps on. Today, she is at work and I am actually glad that she is gone. I know I should want to be around her, but in all honesty, I need a break from her negative thinking.


I can relate in more ways than one. We've all been in social situations where there's a buzz-kill, but just by working with or for an inhabitant of the void, or being married to one, you are killing your dendrites. The sure sign of that neural retraction is that that you start to feel that withdrawing depression yourself.

Employers can be fired -- don't think that you can change a boss when she demonstrates a depression-inducing, self-reinforcing negativism. Get out while you still have some positive attitute left, and leave constructively.

It is much harder to deal with a slow-burning depression in an intimate relationship. My only advise is to recognize that mild, ongoing depression may be a consequence of a refusal to establish social networks and be open to intimacy -- not a cause for such social dysfunctions. Make sure that you are personally connecting at a social level with people on a regular basis, in such a way that people get to know some real aspect of you, and you get to know them. If you're not already doing this, your relationship isn't healthy; if you can't do this as a couple, at least make an effort yourself without hiding anything or cheating.

Thursday, July 2, 2009

StumbleUpon is unusable

What a crappy user interface!

I started using the site a few minutes ago, and already I'm thinking I should dump it and move on. The user interface is Just Plain Lousy.

Five minutes into the experience, I try to find the place to post a link. It is a bookmarking site after all, right? No link. I have to navigate somewhere else to find it. That's incredibly stupid right off the bad -- maybe stupid on my part, maybe I missed it -- but then again, if it wasn't obvious enough to see doesn't that suggest a design failure?

I post a link, from my "Home page". Well, no, I just Stumbled it. I go back, and realize the link posting is an icon, not the entry box. Stupid design. I post a link. No, StumbleUpon tells me:
Missing parameters
MISSING PARAMETERS

If this still doesn't work for you, please Contact Us to let us know :)


Um, yeah, sure, let's go with that. Or not. I haven't even started yet. Do people really tolerate this kind of garbage? The internet sure has dumbed-down user's expectations. Used to be you expected an application to work the first time. Then again, it's free. Not that I'll be using it, but it was only five minutes of my time.