Tuesday, August 16, 2011

Is DRY a little wet? Part 2

A more fundamental fallacy behind DRY is the illusion of orthogonality. That is, hierarchical categorization is by and large achieved through a self-imposed discipline. Even where structural relationships are apparently observable in physical, neither the naming or the affiliations of parts are essential, intrinsic properties of the system.

Like Feynman's birds, one can know all the names for a creature in every human language, and yet know nothing at all about the animal itself.

I know, I know, it is a fine distinction, possible to the point of splitting hairs. Software systems are systems of logic, and as such they are synthetic human contrivances, are they not? So why can't we just assert that one perspective take precedence, and rationalize that the collection of all properties forms a space with an orthogonal basis?

Well, we can deliberately live in such denial. That's just the kind of linear thinking that has served the scientific and engineering communities well over the past few centuries. It is quite pragmatic in fact.

Yet it is also self-limiting. Eventually, the addition of features causes the system to undergo a phase change, as thresholds are reached in the expressiveness of the supposed authoritative source, and the constructions of the relations from the source to the dependent sinks.

More pragmatically, the fallacy can take root in the form of overt and aggressive over-simplification, in which the developer mistakes a source for a sink, and summarily discards it.

Algebra is the branch of mathematics that speaks to orthogonality, giving us the axioms and theorems by which to formally comprehend how such systems are structured. If we look at a software system as if it were some abstract vector space, this would tell us that there must exist some set of primitive vectors that form a basis for the space. Such a basis need not be finite, but ideally it is free of interdependencies... any one given basis vector cannot be generated by any combination of the remaining basis vectors. Approaching that kind of behavior is an objective of DRY. Improperly identifying the supposed basis, the "single source of information," is the underlying cause of the fallacy described here.

What Algebra cannot tell us, is how will one representation of information varies with respect to another, between stakeholders or over time. It is dead on exact regarding the dependent representations, but an Algebraic system cannot determine its own axioms. The real world is the primary source. In practical terms, this is why normalization, founded though it is on mathematical theory, is yet still a very subjective practice.

DRY must be just as subjective, if not more so, for it is a(n informal) form of normalization. Coincidence in space does not imply connection, and precedence in time does not imply causality, yet it seems to be common practice for some programmers to rely heavily upon such contingent phenomena to "DRY up" their code.

Programmers may even assert that as professionals, they know better than the customer, swapping out explicit distinctive representations in exchange for implicit dependencies upon co-incident data. Doing so may be a useful contingency while prototyping, but it is still a rationalization based upon assumptions. Not acknowledging that it compromises the model, if just a little bit at a time, is personal myopia but more critically this mode of thinking encourages retroactive imbedding of structural flaws.

Dealing with customers over how software should behave is a very probabilistic endeavor, and a world governed by rapidly changing probabilistic entities will only exhibit linear structure as an average over all the events and elements in its history. This indicates that doing "just enough" with respect to comprehending the source representation's dimensions will be a persistent cause of of rework, and a major injector of latent flaws. Yet we don't want Big Design Up Front- the shifting of the problem space makes that an even bigger risk factor. What to do?

I would suggest adopting confrontational forced choice testing whenever a supposed functional dependency is about to be removed and a source eliminated, and it seems the least bit questionable. In an optical exam, a confrontation test forces each eye to inspect a target independently, and a forced choice test is given when corrective lenses are progressively passed through, and beyond, optimal ranges. The idea is to force the subject just a little beyond their capabilities, to better assess the best overall focal corrections to make along multiple dimensions. The candidate bit - the "information source" for removal - is potentially a critical dimension of the problem; what is the strength of the concept in the customer's language? Even if it doesn't have a name, how established is it in the domain? Things the programmer thinks he sees may be his own figments and should be subjected to increased scrutiny as information sources. Ask what would happen if some other information sources were not present or altered after factoring out the candidate bit - if the other source has any impact at all on any one of the dependencies of the candidate it should determine them all, and only together. Functional dependencies cannot be partial. Ask yourself, after removing a candidate bit, have you introduced multiple conditionals where previously there was one or none? The more conditionals introduced, the further the system has become from being based on well-factored information sources.

In general, I lean toward explicit representations of feature information. Code should say what it means, and mean what it says, within the constraints of the expressiveness of the technology. DRY is not a reasonable justification for relying upon coincidence in your code, it is telling us not to multiply factual data or relations unnecessarily in our constructions. We used to call that Occam's Razor.

No comments: