Rapid data transformation for the team- Do you have the right tool?

Team-Spreadsheet-hell-cartoon-final-final
Spreadsheets have problems:

  • Using the wrong version.
  • Forgetting a cut and paste step.
  • Failure to check spreadsheet formulas.

And the enterprise ETL is just too big:

  • ETL Server setup and cost.
  • ETL Developer costs.
  • Development effort just not suited to exploration and rapid analysis.

But Datamartist is an ETL tool designed to be used like a scratchpad.

  • Extract and load data simply into and out of databases, spreadsheets and files.
  • Structure your transformations with a Block and connector ETL environment.
  • Use data type flexibility to quickly combine data from multiple sources.
  • Transform, join, segment and merge datasets quickly
  • Automatically generate and update destination tables.

A Data warehouse prototype example:

The Challenge: Quickly prototype a new star schema for profitability analysis

The Team:

  • Ron (Sales system)
  • Susan (Finance system)
  • Ed (Manufacturing and maintenance systems)

Using-datamartist-in-a-team

With Datamartist, the team gets right to work

  • They don’t need to spend lots of time designing the data model up front because Datamartist lets it evolve.
  • Once the sandbox database is created, they don’t need DBA support- even for generating new tables or changing structures- datamartist handles it.
  • Each team member dives into the data, and starts to generate the transforms and data sets needed.
  • They coordinate with each other regarding structure as it evolves- often via the sandbox itself, actually debugging and testing on the fly.

So I never need to data model again?

Of course you still have to worry about data models. But detailed data modelling work has a time and place. Making rapid prototypes to try out new ideas is a more agile way to narrow down what you need to do with the data. There is still a need for lots of formal (and expensive) data modelling if it is determined that a full data warehouse project is required.

Ed has a bunch of different data sets to deal with

  • He pulls data from the maintenance data base.
  • He finds a MS Access based “side system” that has some useful information- no problem importing that.
  • He uses an excel spreadsheet that is used for product code lookup as a source
  • Everything gets treated, combined and exported to the sandbox.

Shouldn’t all that data be in a formal source system?

Absolutely. We’re using this example because in a lot of case it simply isn’t. Datamartist, and the practice of building rapid prototypes for data transformation is an approach that is based on pragmatisim. And the reality is, there are lots of shadow systems and Excel spreadsheets out there.

The Datasets take shape

  • Susan creates some filters on the chart of accounts to isolate the right costs.
  • Ron segments and allocates the transportation costs using information from the ERP
  • Ed publishes a lookup table for product codes that Susan and Ron merge into their models.
  • Ron pulls in sales costs, and cross references with Susans data.

And the star schema emerges

  • Susan takes the lead on building the final Datamartist canvas that combines all the dimensional information with the factset.
  • The star schema is created, and exported directly to a database that is accessible by the OLAP tool for testing prototype cubes.

Thanks to the prototype:

  • The key data quality issues have been identified.
  • All the data sources needed have been found and informal sources can be brought into the ERP system and business processes.
  • Users have learned a lot about what they want in a cube- reducing design risk and speeding development.
  • Download Button