Datamartist gives you data profiling and data transformation in one easy to use visual tool.

« | »

Fake Data

Its amazing how hard it is to make fake data.  If you don’t believe me you probably havn’t tried it.

By this I mean data that provides a reasonable test set for data analysis – not trying to fake data for some shady or illicit purpose. (I imagine, in some ways that might actually be easier).  Although dishonest data is also an interesting topic from the point of view of fraud detection.

Even harder to make is Fake Bad data.  That is, data that is computer generated, but has human type errors and issues in it.

Not that there isn’t enough genuine, naturally occurring bad data in the world- but sometimes (particularly if you’re a software company developing data transformation tools) you just need some artificial stuff to run through the works.

Thank goodness for the internet- source of all types of data- some of it decidedly fake, and bad to boot.

Of course then you can start to get into different levels of “realism” in your fake numbers- there is a wonderful mathematical truth described by Benfords Law– a counter intuitive little reality that numbers (all sorts of numbers from all sorts of sources) actually follow patterns, and that certain first digits are more common than others.  More than that, the distribution of first digits (how many values start with a 1, or a 2 or a 3 etc.) is actually pretty constant.

This fact is used by auditors to detect faked data– because very often the fraudster isn’t up on his mathematical oddities- and generates the numbers randomly- rather than conforming to what Benford tells us.

 

 

Tagged as: ,

Twitter

« | »

Comments are closed.