The hitchhikers guide to Data Quality

« Using regular expressions to check data quality Part 2 | Data quality challenges: behavioral inertia and its evil opposite »

The hitchhikers guide to Data Quality

Posted by James Standen on 9/29/10 • Categorized as Humour

The hitchhikers guide to data quality has this to say about enterprise data;

“Enterprise Data is big. Really big. I mean, you might think that the spreadsheet you saw accounting using last Thursday was huge, but thats just nothing compared to enterprise data, just listen…”

After a while, the style settles down a bit and it starts to tell you things you need to know.

Like that fact that the most duplicated human known is Barney Snodgrass of Austin Texas, who spends all of his time filling in registration forms online on various websites, and purchases everything he needs using alternate names and fictitious addresses. Mr. Snodgrass, who coincidently also holds the world record for most active email accounts for one person, is employeed by a data quality consultancy, which makes huge amounts of money cleaning customer master tables for corporations.

On the subject of data quality performance, the guide says that “The company with the best data quality record ever is the Dracodia shoe company of New South Wales.”
It goes on to explain that they maintain their spotless record by avoiding storing data of any kind. Visitors to their offices will notice a complete lack of computers, paper or in fact signs of any kind.
“It’s very simple.” explains R. Todin, the CEO, “If you don’t store it, it can’t be wrong.”
While many data quality skeptics hold Dracodia’s poor financial performance as clear evidence that “even perfect data quality doesn’t help performance”, its widely believed that Todin is in fact a data quality genius.
Based on Todins clear success, a number of cults have formed, the most radical being the “Data destruction league” which is dedicated to “cleansing the worlds data through its elimination.”

On the subject of spreadsheets the guide tells us;

“The worlds largest and most complex spreadsheet was created by Mz Martha Groten, of the Zortrad Insurance corporation. By cross linking fifty laptops together, and connecting directly into the Zortrad data warehouse, Mz Groten managed to create a massive spreadsheet, with self-adaptive macros, and dynamic three-dimensional pie chart generation. As the spreadsheets mass increased, however, it underwent gravitational collapse, formed a singularity, and the Zortrad Insurance companies main offices and most of the town of Willowdown disappeared behind a row/cell event horizon. Only the very quick thinking of a local IT support person avoided total disaster by hitting the little known Ctrl-Z “Undo” shortcut, saving the planet. After this close call, a series of gravitational observatories are planned to monitor excessive spreadsheet use in corporations.”

And finally, the guide has this to say about cross-dimensional object generation in modern databases;

“The only reported instance of actual object generation within a relational database occurred under questionable circumstances, and with the involvement of a number of controlled substances, however Gord Betrod of Tweed Ontario claims that late one night, he stumbled upon a valid SQL statement that created not just records, but actual customers.”

Gord explained; “We were just sort of chilling, had a bit of DBA work to do in the CRM system, and I botched up the INSERT statement a few times, so was having one last run at it, and suddenly there were 5000 people standing there, all with credit cards out wanting to buy. Best sales month we had that year.”

The claim is highly contested- both because Gord has never been able to reproduce the query, but also because a direct correlation between this alleged event, and a spike of approximately 5000 credit card chargebacks was noted approximately four weeks later.

(This very silly blog post was of course inspired by the great Douglas Adams and his masterwork The Hitchhikers Guide to the Galaxy.)

Tagged as: Just for fun

« Using regular expressions to check data quality Part 2 | Data quality challenges: behavioral inertia and its evil opposite »

1 Comment

Jim Harris
September 29, 2010 • 11:14 am

Data Perfect, a researcher for The Hitchhiker’s Guide to Data Quality (#HHGTDQ on Twitter), recently filed this entry:

“The Infinite Improbabilistic Matching Engine, which is used to power the Dataship Heart of Quality, and is capable of the almost unfathomably fast processing speed of one beeblebroxabyte every 42 nano-seconds, has definitely concluded that the total amount of duplicated data found within all of the databases and spreadsheets on the planet Earth is Mostly Harmless.”