Wednesday, April 17, 2013

Fixed Points in a simple script

I had a simple task of taking a database extraction from some Oracle financial database view into  MySQL database tables.

Simple, right?

We're using Perl, which has flexible built-in data types to handle this sort of thing. It aught to be a clean mapping:

my $orahandle = $oracledb->prepare("SELECT * FROM $table") or die('Query failed on '.$table);
while( $data = $orahandle->fetchrow_hashref() ){
  if ( ! $inserthandle ) {
    @columns = map(lc, keys(%$data));
    $paramstr =  ('?,' x ((scalar (@columns))-1)).'?';
    $inserthandle = $targetdb->prepare("INSERT INTO $mysqltable (".join(',',@columns).") VALUES ( $paramstr )") or die "Problem can not prepare:  @{[$targetdb->errstr()]}\n";
  $inserthandle->execute( values(%$data) ) or die "Error: Failed to insert. @{[$orahandle->errstr()]}\n";  

The two tables are isomorphic so the problem scenario is one of a map. Theoretically, we do not need to know the names of the fields, or their data types, because the types are isomorphic too, more or less. The point is that discrete conversion is not demanded by the semantics of the problem scenario.

Having a rule, an isomorphic mapping if you will, is important even when "more or less" means there are exceptions. Treating code as a morphism facilitates articulation of the exceptions differentially, with respect to how they change the mapping, rather than obscuring all in undifferentiated procedural code. 

The problem in this specific case is that there are several DATE data type fields in the table, and the default format for Oracle is 'MM/DD/YYYY'.  MySQL has a fixed constraint upon the format of inserted dates, the ISO 'YYYY-MM-DD' format. It is a fixed point.

Programmers are taught to just hack it. List the field names individually in the SELECT and INSERT (more fixed points). Manipulate each DATE field in a discrete variable and substitute the transformed value through in-line code (even moar fixed points). It works, at the cost of tossing out the inherent symmetry in the scenario. And it adds several gratuitous fixed points - at least a couple for each discretely mangled field.

Now, one might think to coerce Oracle put the format back to:

alter session set NLS_DATE_FORMAT = 'YYYY-MM-DD'

But the upstream supplier of the view decided to make all the dates VARCHAR2's, so none of the normal date formatting is applied. Yet another fixed point. MySQL's fixed point was, at least, a standard. This fixed point is completely gratuitous, and is the real reason the approach breaks: Oracle normally uses the ISO YYYY-MM-DD format by default. My solution is to define functional data-scrubbing callbacks, to inject data manipulation into the otherwise symmetric process.

A footnote: Not all fixed points are undesirable. A couple of the views thrown obliquely over the wall at me don't even have keys. No primary keys. No keys at all. Keys define the symmetries and structure of a relational data set. Without them a view presents no relations, being merely an undifferentiated amorphous mass of records. Such a practice is to IT design, as mud-pies are to fine cuisine.

[edit: Cleaned up code examples. Don't know why Blogger's editor keeps doing this, but it keeps chopping up my markup. ]

No comments: