Sunday, August 3, 2014

Namespaces in Ruby

Ruby is a very plastic language. By plastic, I don't mean "fake" but easily manipulated.   I was considering namespaces, as they are in PHP and a number of languages derived syntactically from C:

namespace \MyOrg\MyDomain\MyApp\MyPackage\Foo;

I was thinking of Ruby. In Ruby, there is no single namespace declaration; instead, the language provides a Module construct to more-or-less accomplish the same goals. The difficulty being that Module is rather more syntax than less.
Poking around Google, I came across this little gist in which Justin Herrick describes how he made a short DSL to have a nice brief Clojure-like syntax:

ns 'MyOrg.MyDomain.MyApp.MyPackage.Foo' do
   def fluggelduffel

Herrick's solution takes advantage of Ruby's seemingly limitless ability to modify the module environment. And it works, with one limitation: constants referenced in a method like fluggelduffel, or anywhere in the do block for that matter, throw a NameError unless const_set is used:

ns 'MyOrg.MyDomain.MyApp.MyPackage.Foo' do
   def fluggelduffel
      puts A

I played around with the code a bit to add an options hash:

ns 'MyOrg.MyDomain.MyApp.MyPackage.Foo', {  :constants=>{ :A=>"FUBAR" } } do
   def fluggelduffel
      puts A

The code simply calls const_set in a different place. The constant A is there in module Foo, but it isn't visible in the lexical scope in which puts is referencing A. We can address A explicitly via MyOrg::MyDomain::MyApp::MyPackage::Foo::A, but how ugly is that? We can also use const_get('A') but that is pretty ugly too.

The problem is that bare references to constants are resolved in the lexical scope in which the block was created. It has nothing to do with the scope the constant is defined in. What to do?

There isn't a lot that can be done. If you're using unqualified constants, that's pretty ugly in itself... polluting your code with global references and all. If you really need that (dis)ability, const_get('A') follows the nesting chain all the way up. I've found that self::A works fine for the globals I've defined locally using const_set, though I'm uncertain if there are any side-effects or weird interactions. In this way, constants can be defined dynamically, and attached to the initial namespace definition.

Saturday, August 2, 2014

HTML is BAD, and YOU SHOULD FEEL BAD for using it

No, I don't really think this, but that's a catchy headline, isn't it?

On the other hand, there is a part of me that thinks that HTML represents a sort of dishonesty, a kind of technological plagiarism. 

The Not Invented Here (NIH) reinventing of wheels often the standard of practice across the business world - reinforced with Intellectual Property portfolios and litigation. Technologists thrive on NIH. The behavior may be simply in part because technology oriented humans just enjoy tinkering with something we perceive as being new. 

It is further promoted by broad based illiteracy among practitioners. The Internet helps people self-educate, but as masses of people learn rudimentary basics of programming they are apt to stop when they learn just enough to be dangerous, that is, just enough to earn some money from a skill. Those with any real interest in the science will be doomed to wander through parts of the discipline that were already-well-explored decades ago. 

Yet the most corrosive aspect of NIH on platforms-substituted-as-standards such as HTML is intellectual dishonesty. The same kind of intellectual dishonesty that pervades business advertising, the posturing of vendors towards clients in fixed bid contracts, and that lawyers and politicians seem to consider acceptable in love and war.  Even if they were self-aware of their own agendas, they would not admit to it; and it corrupts both the culture and the outcomes at once. 

Don't get me wrong. Society as a whole benefitted greatly from the worse-is-better approach embodied in HTML. The world's peoples gained experience in a domain previously occupied by a few brave geeky souls. We got cool toys and new ways of doing medicine - and innumerable other unspeakable benefits from exploring the space with just enough technology, even if it was a bit broken. 

The problem space will eventually press in on the field. We see pseudo-standards such as micro-formats and Web Components competing to represent multiple parallel domains of information in HTML encoded resources. Technologically, they are neat, and I have little doubt that they serve to further the interests of Google and various social media manipulators. But they also work at cross-purposes to the original intent of markup, which is to represent information with integrity and to make it accessible and open over the long term for all stakeholders.   

As people move forward with Web Components, I'm reminded that XML offered us the ability to use our own tag names to represent information content. A Web Component can be designed in such a way as to be a Graphical User Interface widget, but the higher usage is to use it to isolate or entirely occlude for-the-Browser behaviors with elements that express only the problem domain's semantics. Otherwise, we're just back to writing 4GL applications again, and we did that back in the '80s. 

Tuesday, July 29, 2014

URL Bending

URL means "Universal Resource Location", and thus named the construct finds much use as a substitute for a lot of structures, many if not most of which have nothing to do with resource locations.

When any term such as a URL gets assigned multiple meanings, whether these are different people's interpretations of the same purpose or the usages originate as means for different ends, that term becomes a homonym. 

In the context of Web applications, we're told that a URL does just one thing: locate a resource. But the "universal" constructs that URLs attempt to address via a one dimensional string, are multidimensional and contain subspaces that link pervasively between one another.  We are faced with many purposes, many decompositions of the data, many formats, many relationships, and so on... and somehow all those facets are supposed to be encoded as a single human readable identifier. 

Even on something as conceptually straightforward as a topological map, we use discrete coordinates to differentiate the dimensional components of an address. More generally, an address is an N-tuple. That N-tuple can (or must) be represented as a string, but the representation does not usually utilize nesting or containment - the primary dimensions are orthogonal and vary independently of one another.  Yet in a URL most often the string is read left-to-right, and path segments form an implicit hierarchy. Or they don't. There is no single interpretation that is actually canonical in the sense that everyone actually follows it.

Here is, syntactically, where URLs break down: we cannot both, at once, infer hierarchy where it was intended to be implied and not infer hierarchy where it was not, without overloading the URL with a one-off domain specific syntax.

So we see a proliferation of syntactical forms appearing, starting with "?query+parameters".  We argue over meaningless forms - should it be /new/user or /user/new or /users (PUT) or /user (PUT) or whatnot - and the amount of argument is inversely proportional to the triviality of the distinctions to be made. A sound, common grammar is a necessity.

A URL isn't really an address in a sense analogous to a cartesian coordinate. It is a parameter binding. In trying to represent multiple twists and turns by way of a single mangled string, in effect we are tying a knot. Or a bend or hitch, if you will, depending upon the object and subject being tied. The moves in tying this knot form a sort of grammar, which for lack of a better term and because it sounds like binding, I'll call "URL Bending".  For reference, a bend is a knot used to join two or more lengths of rope. 

A grammar for bending could be formalized, I suppose. We would need to grok the distinct dimensions along which resources are addressed in various bounded contexts represented in the solution space. (Determining open addressing models is a heavy focus of ISO/IEC 10744:1997, aka "HyTime".) We would need to grasp whether those bits should be included in the knot tying, or should more opaquely be mapped to components of the transaction concomitant with the use of the URL, like the HTTP method or POST or PUT content payloads or HTTP headers. 

Monday, July 21, 2014

Separation of Concerns

If you are a developer and you work long enough on business applications, you start to sense the corrosive effects when divergent interests and viewpoints are forced into a single representation.

It may be a vague suspicion - a code smell. You don't necessarily know precisely what those conflicting concerns may be, or why they were conflated in the first place, or the possible consequences of trying to separate them. But you know that a valid stakeholder concern can be addressed only if is identified. Separating concerns is a necessary, but not sufficient, step in the right direction.

The volatility of your codebase - the rate at which changes tend to grow - depends on how well matched the codebase is to the concerns it seeks to address. If the code tends to cover a mere fraction of one concern with each coding unit, its volume will blow out and thus also the difficulty of managing all the moving pieces.  If the code tends to cover many concerns in a few monolithic coding units, it may be too terse for a human to sensibly and reliably decode and have so few points o articulation that the simplest of local changes gives rise to a cascade of far reaching fissures.

Cohesion is the term often used to describe the continuum between these extremes, but the success of biological systems calls this dogma into question. Cohesion is neither necessary nor sufficient for a dynamically stable, long-running, self-maintaining system; so I think it is not really necessary or sufficient even for our crude software approximations of real world processes.  A better metaphor is the optic system, principally the concept of focal length.

In optical systems, the focal length is a measure of how strongly a lens bends light, determining among other things the magnification and angular field of view. It also influences the degree to which an image blurs when cast upon a surface parallel to the principle plane of the lens. When the distance is just right, the projected image is sharp and all detail is to scale. When the distance is too close and/or too far away, the projected image is blurred and/or skewed.

Being cohesive isn't enough. A coding unit that does not represent an optimal fraction of concerns is either out of focus, skewed, or both blurred and skewed.

Sunday, July 20, 2014

Conservation of Information

Information must always be available, but it is not necessarily usable. Like energy, information is conserved - it cannot really be lost or created, only changed in form. Entropy really has to do with the energy required to put the bits together into a suitable form for a decision to be made. 

We can account for the costs of information through various means, one one which is the human work hours (or some other unit of time) expended to express and reformulate the mechanisms used to move the bits around. 

Information flowing between domains always causes a loss of usability of some finite fraction of the information, unless sufficient energy input is present to counteract the small-scale distinctions and fine-grained anisomorphisms introduced at the boundaries of the domains. Information content is conserved, but some bits may be masked, mutated, or combined in some manner that is intractable given extant means and methods.  

Relativistic effects also come into play. Two or more highly localized bounded contexts of information necessarily give rise to complex distortions of views between the respective reference frames. Sense making only occurs when one takes into account one's own reference frame and those being observed. 

[I know this is probably bit of BS. It was edited from a brainstorming journal entry originally made 1-2-1997] 

Friday, May 30, 2014

One Language Puritanism for Acceptance Testing ?

A good Web acceptance test - or to be more precise, a tool chain that well-supports end-to-end tests, is apparently quite difficult to achieve.

The problem is not in the high-level languages. Domain Specific Languages like Gherkinm, and hosted language syntaxes, have been used productively.

The main problem is not in the deployment techniques, although lack of closure over the environment is the first major stumbling block to doing any kind of meaningful testing.  Whether it is the local isolated deployment through Ruby On Rails' bundler, rake, and tools like RVM; or CI tool chains like Travis CI, Jenkins, or Bamboo; or just a skunkworks set of scripts, config files, and GIT practices; there are many paths to creating Closure Over a Deterministic Environment  (CODE).

A slightly more interesting problem is generating meaningful and valuable configurations of test data. I've worked with Fixtures and Factories - both are fragile and both take a considerable fraction of the development time to work with consistently. They don't scale all that well, but there is an effective process that can be started and applied in substantive ways across large swaths of functionality.

An even more interesting problem is getting rigorous (meaning: formally translatable) specifications of the expected data and objects. The difficulty is that this takes a lot of long, deep thinking and reflective writing, tactics that were prized activities in decades past but which have met with extreme disfavor in Agile Internet time. Present first-world Homo Sapiens seem to have trained their working memories so that they are barely able to consider meaning in a full length Tweet, let alone a missive as long as this rant.

But I digress. Formal and complete specifications are only plausible, if at all, as an end result of a long development process.  We could expect to possibly accrue such comprehensive specifications over the duration of a project, as an executable or translatable artifact: code; test specification; or literate programming via configuration parameters represented in markup documents. The point is, specifications don't just pop into existence, they are a side-effect of long term thinking and communication processes.

Whether we choose to value that content higher than working project code (which is likely a waste unless it is a research project); or devalue that content to the point of immediately discarding it like so many candy wrappers and beer bottles on the ground (which is a likely waste unless it is a trivial project); or find a way to make that content an integral aspect of the artifacts under construction (literate programming, configuration by convention, specification translation); so long as the team can maintain an adequate level of context and understanding, there is a path forward.

No, the devil in acceptance testing is in the details: our tools on the client side are immature, and one reason that people are stuck on stupid is an irrational obsession with using one, and only one, language to solve the problem.

Don't get me wrong: that language you use, be it Javacript or PHP or Gherkin or whatever, is just fine. But that language choice, for end-to-end acceptance testing, is barely even relevant.  The specific language choice is not necessary and not sufficient.  What matters is whether the tooling can actually perform a simple step like input a select option and observe a change happen as a result of a back-end transaction. I recently ran into trouble with Codeception/Behat/Mink because of this issue. Most of the acceptance testing frameworks I've seen give the developer way too much grief to work around these scenarios, if they can even work with them correctly at all.

I'd rather use Javascript (or better yet, CoffeeScript) to write acceptance tests for a PHP application, if it means that the test engine will be able to smoothly traverse the client side interactions. Codeception is a nice integration of PHP toolkits - one of the best I've seen - but the fact that a test framework uses PHP exclusively as a testing language means LESS THAN NOTHING to me. That a client-side test engine which understands Javascript can be bridged to a library of modules in a second language (PHP) simply presents on-going risk of flaws with no opportunity for accruing benefits.

It really amounts to introducing multiple layers of indirection in order to avoid using a native language of the actual test platform. As always the problem this strategy creates is too many layers of indirection.

Thursday, April 24, 2014

Creative Accounting: Charging Employees for Overtime ?

So I came across this question recently.  Is it acceptable accounting to debit an hourly employee's earned time off ("vacation time") for the time-and-a-half overtime pay when the employee works more than 40 hours in a week?

I'm not sure what, if any, laws would apply. The question arises in the State of North Carolina, in the good old US of A.  In NC, it is not legal for an employer to retroactively reduce pay that has been promised. To change the terms of promised pay requires a written, published policy.  I have no idea if this was done (most likely not), but the person who raised the question did not see it in the initial employment agreement.

In any case, after Googling and asking my little circle of friends I have not seen a clear answer. The radio-silence of the internet suggests that this sort of underhanded accounting is not a generally accepted accounting practice by any means.

Even more devious is that the employer - a large private medical practice - have such poor scheduling and break their own business hour policies so often that it forces certain hourly employees to work overtime.  In effect, the employer is forcing their hourly employees to earn vacation time, and then lose earned time off at an accelerated rate when overtime is paid out.

I'm no lawyer, but it seems criminal to me.

[Update:  I spoke with an NC DOL representative, who gave the following advice:

1) If the paystub shows that the employee was paid out of vacation time, and that overtime was not paid, then there is a current issue with the manner in which payment was made and there is a basis to file a formal complaint. The observation to be made is that the time and a half may not have been paid.

2) Any untaken earned time off must be paid out to the employee at the end of employment. If at that time, the employee can show a record of vacation time earned but not in fact taken, that is greater than the amount the employer is seeking to pay out, then there is a basis to file a formal complaint.  The observation here is that earned vacation time cannot be disputed until an event converts it to a monetary exchange.

It is unclear to me, but I would suspect that actually taking the real amount of earned vacation time off would raise the same issue, as the next paycheck would then show that the paid time off earned up to that point was not in fact paid out.