Tuesday, July 29, 2014

URL Bending

URL means "Universal Resource Location", and thus named the construct finds much use as a substitute for a lot of structures, many if not most of which have nothing to do with resource locations.

When any term such as a URL gets assigned multiple meanings, whether these are different people's interpretations of the same purpose or the usages originate as means for different ends, that term becomes a homonym. 

In the context of Web applications, we're told that a URL does just one thing: locate a resource. But the "universal" constructs that URLs attempt to address via a one dimensional string, are multidimensional and contain subspaces that link pervasively between one another.  We are faced with many purposes, many decompositions of the data, many formats, many relationships, and so on... and somehow all those facets are supposed to be encoded as a single human readable identifier. 

Even on something as conceptually straightforward as a topological map, we use discrete coordinates to differentiate the dimensional components of an address. More generally, an address is an N-tuple. That N-tuple can (or must) be represented as a string, but the representation does not usually utilize nesting or containment - the primary dimensions are orthogonal and vary independently of one another.  Yet in a URL most often the string is read left-to-right, and path segments form an implicit hierarchy. Or they don't. There is no single interpretation that is actually canonical in the sense that everyone actually follows it.

Here is, syntactically, where URLs break down: we cannot both, at once, infer hierarchy where it was intended to be implied and not infer hierarchy where it was not, without overloading the URL with a one-off domain specific syntax.

So we see a proliferation of syntactical forms appearing, starting with "?query+parameters".  We argue over meaningless forms - should it be /new/user or /user/new or /users (PUT) or /user (PUT) or whatnot - and the amount of argument is inversely proportional to the triviality of the distinctions to be made. A sound, common grammar is a necessity.

A URL isn't really an address in a sense analogous to a cartesian coordinate. It is a parameter binding. In trying to represent multiple twists and turns by way of a single mangled string, in effect we are tying a knot. Or a bend or hitch, if you will, depending upon the object and subject being tied. The moves in tying this knot form a sort of grammar, which for lack of a better term and because it sounds like binding, I'll call "URL Bending".  For reference, a bend is a knot used to join two or more lengths of rope. 

A grammar for bending could be formalized, I suppose. We would need to grok the distinct dimensions along which resources are addressed in various bounded contexts represented in the solution space. (Determining open addressing models is a heavy focus of ISO/IEC 10744:1997, aka "HyTime".) We would need to grasp whether those bits should be included in the knot tying, or should more opaquely be mapped to components of the transaction concomitant with the use of the URL, like the HTTP method or POST or PUT content payloads or HTTP headers. 

Monday, July 21, 2014

Separation of Concerns

If you are a developer and you work long enough on business applications, you start to sense the corrosive effects when divergent interests and viewpoints are forced into a single representation.

It may be a vague suspicion - a code smell. You don't necessarily know precisely what those conflicting concerns may be, or why they were conflated in the first place, or the possible consequences of trying to separate them. But you know that a valid stakeholder concern can be addressed only if is identified. Separating concerns is a necessary, but not sufficient, step in the right direction.

The volatility of your codebase - the rate at which changes tend to grow - depends on how well matched the codebase is to the concerns it seeks to address. If the code tends to cover a mere fraction of one concern with each coding unit, its volume will blow out and thus also the difficulty of managing all the moving pieces.  If the code tends to cover many concerns in a few monolithic coding units, it may be too terse for a human to sensibly and reliably decode and have so few points o articulation that the simplest of local changes gives rise to a cascade of far reaching fissures.

Cohesion is the term often used to describe the continuum between these extremes, but the success of biological systems calls this dogma into question. Cohesion is neither necessary nor sufficient for a dynamically stable, long-running, self-maintaining system; so I think it is not really necessary or sufficient even for our crude software approximations of real world processes.  A better metaphor is the optic system, principally the concept of focal length.

In optical systems, the focal length is a measure of how strongly a lens bends light, determining among other things the magnification and angular field of view. It also influences the degree to which an image blurs when cast upon a surface parallel to the principle plane of the lens. When the distance is just right, the projected image is sharp and all detail is to scale. When the distance is too close and/or too far away, the projected image is blurred and/or skewed.

Being cohesive isn't enough. A coding unit that does not represent an optimal fraction of concerns is either out of focus, skewed, or both blurred and skewed.

Sunday, July 20, 2014

Conservation of Information

Information must always be available, but it is not necessarily usable. Like energy, information is conserved - it cannot really be lost or created, only changed in form. Entropy really has to do with the energy required to put the bits together into a suitable form for a decision to be made. 

We can account for the costs of information through various means, one one which is the human work hours (or some other unit of time) expended to express and reformulate the mechanisms used to move the bits around. 

Information flowing between domains always causes a loss of usability of some finite fraction of the information, unless sufficient energy input is present to counteract the small-scale distinctions and fine-grained anisomorphisms introduced at the boundaries of the domains. Information content is conserved, but some bits may be masked, mutated, or combined in some manner that is intractable given extant means and methods.  

Relativistic effects also come into play. Two or more highly localized bounded contexts of information necessarily give rise to complex distortions of views between the respective reference frames. Sense making only occurs when one takes into account one's own reference frame and those being observed. 

[I know this is probably bit of BS. It was edited from a brainstorming journal entry originally made 1-2-1997]