…of Structure and Content.

Ran across this interesting article posted in the MIS Class Blog by Antonio Montanez. The MIS Class Blog was created by Sonya Zhang and her students at Cal State Fresno and Cal Poly Pomona.

First, it is encouraging to have students focused on Data Management. Thanks to Antonio for writing the article and reminding us of Malcolm Chisholm’s work (He published another book recently, ‘Definitions in Information Management’ that can be found on Amazon). Data Management is a field that is under served and can always use fresh talent. Sure there are the tool fads and vendor hype but far too few real practitioners of data content management.

When you talk about reference data you quickly encounter the difference between STRUCTURE (the data model and even physical tables) and CONTENT (the values in the columns).

Antonio reminds us accurately, from Malcolm’s article, that reference data is not visible in the data model. We cannot stress enough the danger of relying solely on the data model entity relationship diagram (ERD) for understanding. In fact the ERD is extremely restrictive in the facts it presents. It is like looking at a building’s architecture diagram and expecting you ‘know’ what the house will look like and contain when completed. Another analogy is that the Data Model is like a someone’s shadow. You get the basic characteristics of a person but could not know their eye color or their smile.

Take one example of currency (CCY) for instance. On most data models you would just see the attribute CCY of type text and size 3 (if it is ISO 4217). Rather innocuous. This may be down in the list and seem relatively unimportant. However, in a financial data model, you will usually find it links back to detail transactions. If missed in development, at some point in testing, the realization of the importance of Currency will manifest itself. In fact, it likely will be a part of determining the uniqueness of a data row.

Reference Data Content strongly introduces the issue of Data Governance and Quality. How will CCY be populated and maintained? Given there is an ISO Standard is that what will be utilized? Are conversions or cleansing necessary upon the source data?

I once had a mentor and, despite this one habit, am really appreciative of what he shared. His bad habit was that he would design a logical model, without looking at the data in any detail, and proclaim it was “90% done”. “You can finish off the last 10%” he would say. Quite often, much of the “10%” was data content analysis and reference data. Building a data model (structure) without looking at the reference data values (content) is to data management what an architectural drawing is to a finished and occupied building. An important start but by no means 90% complete.

Thanks, Antonio, for the Reference Data Focus. And to those looking for a long career consider opportunities in data management. Every day information becomes larger and more complex. There are vast areas of unstructured data that we have not yet begun to tame. There is large data and virtualization, to name a few, that will result in advances to data management.

This entry was posted on February 1, 2012 at 10:28 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

(Master) Reference Data Management