Sunday, January 1, 2012

Reading Blocks in POW and Node.js

A writer's block is when you've got some serious motivation to write, but can't seem to find the inspiration to get started or the cleverness to get past a simple difficulty.  Readers can face blocks too, especially when the writer has done a poor job of weaving the threads of the story and bringing them together.

The same problems exist for programmers when writing and reading code, except much more so.  As programmers, we often need to tease out the answers to questions about where some control path came from, and were it is going.

I'm thinking now of a snag hit while trying to convert my personal site to use HTML5's caching manifests:

Application Cache Error event: Invalid manifest mime type (application/octet-stream)

I was using the 37Signals' POW server to do my local testing.  It looks like POW serves up the manifest file as an octet stream, instead of the recommended type (text/cache-manifest). I searched for the mime type handling in POW. It is not there. As a reader, I face a small block.

POW is reusing modules defined in Node.js, the V8 evented I/O library. So naturally, there is nothing present in the POW source code base that will clarify how it resolves missing mime types.

More precisely, POW is calling a connect.static method. There is no "connect" module in POW or Node.js, and nothing in the source code base that would suggest what "connect" is, other than an anonymous Node.js module. So we have to search elsewhere, elusively.

Implicit in using Node.js is use of a flat global namespace for modules, implemented by npm, the Node Package Manager. It isn't clear to me how (or whether) npm manages versions, or what people will do if and when a competing package manager is released, or when packages adopt similar or identical names. It seems as if git repo urls are implicit in the packaging, but not in any way that is definitive in the client source code.

Now, a programmer who is experienced in a given library will become a priest, familiar with all the dangling threads of the righteous library. But that's no excuse for leaving threads dangling. This is the stuff of a cult-like priesthood, not of a profession.  all a bit of a guessing game.

[edit: turns out that Node uses package.json for version/dependency management; however, unless you've downloaded all the sources, UNIX tools like "find" or "grep" will obviously not find anything.]

Connect is (very probably) some a version (which one???) of a Node.js based middleware framework.  specified by package.json in the root of the source tree. The git repo says that static is a middleware module packaged with connect. The static middleware calls mime.lookup(path).  The mime package and version are not anywhere to be seen in the Connect source code. I'm seeing a pattern here, or rather an anti-pattern.

References should not be more exact than necessary, but nor should they be so ambiguous that they contain insufficient information to find the referenced entity.

So I locally clone Connect, and Node.js, and POW, and use OSX' find command to sniff out the possible locations of the mime type handling, to see if there is an idiomatic way of adding a new type. The mime module is a package included with Node.js, down in deps/npm/node_modules/request/mimetypes.js.

I'm repeating myself, but the mime module is not part of the Connect source code base, and I have to assume that it isn't referring to some other module also called "connect".  This case is simple enough - mimetypes.js is just an associative array with a lookup function - but in the general case, which version might we be linking to, and who is the owner?

Writing like this isn't just an interruption to the reader, it is a failure of the writer to pull together the threads of the story. That makes the story more difficult to follow than necessary, and it isn't sufficient to deliver a reliable piece of software except by fortunate accident.

Avoiding searches for ambiguous references is what Integrated Development Environments were designed for.  But using an IDE is missing the point: it shouldn't be necessary when reading sources to go through an unreliable search for every dependency.

I'll chalk this up to my own inexperience with Node.js.

Maybe I don't didn't grok Node.js and npm well enough yet. Maybe there are some conventions that make the resolution more definitive and repeatable. The convention requires that you use npm like one would use bundler and gems in Ruby.

But writing code that requires mental long-jumps to anonymously named, un-versioned modules seems like a very stupid way to program.  Perhaps Node.js needs its own version of bundler.

[edit: Yeah, node.js has its own means of packaging components -- I had missed that in my haste. I still think there is a disconnect between where entities are defined and where they are used by clients... the dependencies wind a path through multiple packages, give opportunity for mysteriously similar names amongst package methods and variables, and generally make piecing together the story more difficult than it needs to be.]

No comments: