Tuesday, May 18, 2010

A Simple Technique for Composing Long Expressions (STCLE)

Reading Doug Crockford's Javascript: The Good Parts book recently, I was surprised to see Crockford drawing attention to the ugliness arising from large Regular Expressions yet overlooking the obvious solution. Throughout the skinny book, Crockford relies on a common Javascript idiom, which is to patch objects with a new method.  In this case, we can patch the RegExp object with a compose method that accepts array-literal notation and returns a RegExp object. Using Crockford's "method" mutator,

Function.prototype.method = function(name, func) {
 if (!this.prototype[name]) { this.prototype[name] = func; return this; }
}
RegExp.method("compose", function( parts ) {
 var i, r=""; 
 for( i=0; i < parts.length; i+=1) {
   r += (is_array( parts[i]) ? "("+RegExp.prototype.compose(parts[i])+")" : parts[i] ); 
 } 
 return r;
});

Then suppose we call this new method as so:

var re=(new RegExp).compose(
  [ "^",
    [ "?:", ["[A-Za-z]+"], ":"],
    "?"
  ]);

What we get is the composed string with array nesting used to imply the presence of groups. Note that the regular expression may now be broken up into multiple lines, and commented heavily, but you still get ^?:([A-Za-z]):)?. A similar technique using array stuffing and the join method is quite useful for creating page output in a manner that avoids most of the output coding. The array is a template, and one just specifies a single join and output command:



var pg = [ 
   "

Mushrooms

" "placeholder", "foo" ]; pg.push("

Some stuff

"); document.getElementById("bar").innerHTML = pg.join();






Thursday, May 13, 2010

In number theory, we have a concept of an "equivalence classes", eg. what makes two numbers behave as equals when you do some operator like addition or multiplication.  Two numbers are said to be equal when they give the same result under the operator. For instance, under binary addition, 1+1 = 2, but the integer 2 is in the same class as 0 with respect to binary addition, so 1+2=1. 

XHTML documents are discrete chunks of data, that is, they are just really big numbers. (Those numbers happen to have a really complicated internal structure, but then again, so do integers, and reals, and complex numbers, vectors, matrices, tensors...) Browsers operate on those really big numbers to give us the browsing experience we love and hate so much.  But IE6 gives you 0, zip, nada, nothing when you give it application/xhtml+xml arguments and so does IE9.  So in a very fundamental sense with respect to Web standards, IE9 = IE6.

Thursday, May 6, 2010

What draws us to REST?

Often in conversations with corporate developers, if the subject of the REST architectural style comes up, an immediate concern raised is the principle of the statelessness of the protocol. In the context of real Web services, that protocol is for all practical purposes HTTP.  (Another protocol such as SMTP may provide a useful Internet service, but being "on the Web" by definition means it is accessible to HTTP user agents. See RFC 2616 if you are confused about that concept.)

So whether a service deployed using WS-* stack tools is in fact a Web service depends on whether it uses HTTP. If it does, then it can be said to be a service "on the Web" at least in a trivial way. The remarkable thing is how WS-* deployments coerce and overload the HTTP methods, particularly POST and GET, in contradiction to Web standards like RFC 2616. Without being pejorative it is clearly a deliberate design decision to apply HTTP in an off-label manner. When it hasn't been shown that the benefits of a coercive practice are worth the consequences, the term I usually use is gratuitous complexity. Systems should say what they mean and mean what they say.

That's what draws people to REST even when they don't understanding its principles. The hope is that their systems can shed their gratuitous complexities. They would do well to remember Einstein's advice, that things should be as simple as possible, but no simpler. The good doctor recognized that complexity and inelegance in proposed explanations can arise from overly simplistic models just as they can from models that are overly detailed. That is the true elegance of REST, as an articulation of just what it is we aught to be focusing on to get services that will fit well on the Web.

Wednesday, May 5, 2010

Final day of WWW 2010

Karl Malmut starts off the day. The conference clearly is winding down -- only about a third of the room is filled. This is unfortunate. At least he is honest about his prognostication prowess, having dismissed Tim Berners Lee's HTML research project as an unscalable design. He sets the stage for his premise, that standards should be free and openly available: he couldn't afford them. As a writer, he advocated for his position, and obtained an invitation to discuss the issue with the ITU. Calling his bluff, ITU offered them mainframe tapes of the Blue Book, saying the problem was an engineering difficulty; but Malmut and others were successful in converting the standard to a TROFF tarball. It spread. They had called the ITUs hand, and raised the stakes a hundredfold.

The ITU, realizing they had given up the content, tried to pull it back in, but it was too late.

"If standards code is law, then surely law is code".

Malmut goes on to describe how his non-profit radio station was able to get a feed of the Edgar data, and provided FTP, Gopher, WAIS, and other services, and was able to get the SEC to publish information it was not previously able to deliver, on the Web. Then he goes on to talk about the video venture, FedFlix, how NTIS had to recover costs and couldn't provide tapes. They ended up getting loaners instead, made copies, and published them to the Web.  Lesson: keep rephrasing the question until they can say "Yes". 


He notes the deluge of legacy formats, and how the government has turned to "no cost to the government" deals, which turned into terrible fiascos. The Thompson West deal for instance, in which Thompson West claimed ownership of 60 million pages of scanned data. The Amazon deal with the US Government Public Archives, in which the Government advertised 1800 videos that Amazon could sell as DVDs; the government paid for digitization but got about $3000 for their trouble. Malmut bought 28 of the videos and posted them to public sites, then another 20. They demonstrated that the government had sacrificed public access in a bad deal.  The International Amateur Scanning League was formed to go to the archives and rip the videos to be copied for public posting.

Malmut is paid by his 501(c)3, a non-profit which has a charter to make laws more freely available.Access to the materials is a 10 billion dollar business, with the government often putting up roadblocks. An example of a roadblock is Pacer, which charges 8 cents per page for US Court filings. It is a $120 million business. He promoted a "Thumbdrive Corp" in which volunteers downloaded documents from free terminals and uploaded them to public locations.

During the break, got a chance to chat briefly with Dave Raggett, Doug Schepers, and Thomas Roessler, of the W3C. Dave is driving a discussion on ARIA for developers later in the day. What impresses me about ARIA is how far it goes in creating what is, in effect, the semantic vocabulary for a Web Application framework. Dave indicates that what is missing is an eventing system, and that tools to develop effectively in the environment are not yet there. That's an opportunity. Need to take a look at the incubator Wiki where this stuff has been discussed. 

Sitting in on Bob Young's Lulu presentation. The pattern recognition skills of those working in one medium (like TV) don't translate into the patterns of the new medium (the Web), and schools take a long time to catch up before they can start offering curricula. The Web is a Many-To-Many communication medium, with a very bad signal-to-noise ratio. How do you filter out the background static to get an interpretable signal? That is where the application providers come in -- to solve the problems of users on the 'net. For instance, the anarchy of having too many friends, is addressed by Face book. Search.twitter.com solves the search problem from within the context of timely communications.  They define a concept of relevance.

Young moves on to describe Red Hat's leveraging of open source and the meritocracy of how software engineers on the net would bubble up based on their talent. Lulu is the result of Young trying to understand, quantify, and convey the value of these experts in their specialized fields. How do we use the free market to reward these people to put their knowledge into a format that could be of great value to a niche market?  Young remarks that many people have some level of expertise, but poses that the real problem may be that the market your expertise serves may be too small. For instance, you may be an expert regarding the subject of left handed tennis players in Raleigh, or timing differentials for subatomic particle physics (a market of less than 300 people). A second aspect he relates is that the knowledge of those niche expert areas may otherwise disappear unless it is published. Traditional publishers reject 19 out of 20 finished book submissions. His thesis is that the market is no longer operating in a healthy, Adam Smith free market manner.


A second thesis that we find the books we read through social recommendations, and social networking sites have tools for making recommendations which can facilitate getting the useful information.

Young is asked, "What hasn't worked?" His answer is "Collaboration." The idea that there is a genius behind great works is true for these books as much as for pop culture media: someone needs to convey the vision in a meaningful, entertaining way. None of the industry players trying to promote collaborative writing systems are not finding great success for that reason.

What is the nature of the book in the modern age in which content is packaged digitally? Dead tree readers are in trouble, but they've had 500 years since Gutenberg to optimize the user experience. Screen based displays are far more recent, about 20 years old, and yet now we have full motion video on pocket phones. The printed book is going out, and electronic readers will take over. Do books go the route of advertising ala TV? The small niche market books won't support that model. But what of the increased non-linear experience of the book?

What of librarians? Youngs response is essentially that Librarians will experience upheaval due to people like him. This is a similar fate of printed books. At least, Young recognizes that there is a segment of society that is well served by library systems. Sony Bono, of Sony and Cher, as congressman, had a mission of protecting copyright of his musical compositions permanently. Yet, if that happened, it would destroy innovation. 

Young says that the authors often charge too low a price for their works. 

The discussion makes me wonder again, if there isn't some place for a kind of credit-union style cooperative for many of the information technology resources that we need.  Also, the thought creeps up again of the use of the internet as a tool to dynamically extend your own mental faculties. Exobrain. 

Someone asks about the dilution of the quality from the large number of book submissions? Young quips that he often attempts to be extra colorful in interviews, just to get more attention. "Lulu is the biggest publisher of bad poetry". But the markets are niche, and the books might not ever get published otherwise. The Zen Cart book is an example of a book that is sold at a couple of thousand copies per year, at about $50 each. That's a deal, because he's making over 100k/year publishing himself. How do you replicate such success? That's the point of the social networking tools, Lulu's Weread tool, and Amazon's reviews.  Lulu provides the reviews to your circle of friends.

What of accessibility? Lulu's position is that the author is the publisher, and has to chose what formats to make the materials available. Audible.com is suggested as an intermediate workaround until Lulu supports the tools. 

One NCSU attendee asks, "Where are the costs of book production?" Young's initial response: "time is money". His email address, he offers, is Bob at lulu , and he goes on to discuss the production of illustrations, licensing, multicolor, etc. that can add up to the costs. The professor/author didn't want to let PDF or electronic copies out. Young mentions Digital Rights Management (DRM) tools, noting that they are optional. He notes that dental schools have serious issues about proprietary rights to make copies. He mentions VitalSource, now owned by Ingram, who works on the idea of how to make electronic copies available while still retaining rights.

Another attendee asks about the large amount of content, and how Lulu can facilitate the formulation of RDF documents and referencing that resource to solve the contextualization problems. Lulu is not actively working on those pieces of the technology, but notes that they will use the work of others. The attendee turns out to be Sina Barham, someone I've connected with via social networking sites. 

Someone asks about subscription-based reading, fee per month model, etc. Thompson and O'Reilly Safari both have that model.

Another lady asks a question. When will this model take over? I think I know her well from a past job, but can't recall her name. Yes, I know her, Ghazala. Young notes that the publishers do make popular books available that let kids read the books their friends are reading. Companies like eBay took advantage of the many-to-many model to enable new business, not eat into the existing businesses, and that's how he sees Lulu. They are serving an untapped market, not competing against an existing market.

What's next? Young doesn't know. He thinks Lulu may well be his last big project. He'll go after an academic career instead, teaching things the business professors overlook.  The problem with solving "the widget problem" today is that they solve today's problems without thinking through how it will be in the future, and their work becomes irrelevant. I'd sign up to that class right now if he offered it.