Monday, October 4, 2010

Entity Attribute Value model for flexible normalised data capture


I am reproducing a long winded thesis I wrote on StackOverflow about the appropriate data schemata to facilitate OLAP. I wrote it because OLAP vendors are plain lazy to provide hooks to allow flexible and fast cube generation. OLAP at its current status is useless for engineers and good only for mesmerizing managers and their secretaries with imprecise and out-of-phase information. OLAP software writers seem to be so excited about their multi-dimensional Entity-Relationship schema designing skills, but never thought it necessary to optimise the flexibility and response of their cube building processes. Their response has always been, "Duh, it's OLAP, it takes time to build a cube. OLAP is not meant to be real-time."

Here goes ...
As a generalisation, for multiple dimensional analysis to be effective you need two levels data modelling.
  • A substrate detail capture
  • An abstract cube
The answer is indeed EAV as the substrate detail capture. People can afford to be lazy and skip the detail substrate design. Every data mining tool I had been forced (by management) to evaluate has been inadequate because the inability to perform time-profiling.
Ultimately, it required us to write our own data-mining application - because the amount of kludge to make the software work, albeit ineffectively, equals to the amount of effort it takes to create one that works effectively. And even works extra-ordinarily spectacularly when we fed slices of the abstract cube into SAS/Insight. While the vendors' software took an hour to build the cube, and sometimes 24 hours, we needed our cubes to be built within 5 minutes and frequently within 15 seconds - due to effective substrate data modelling to enable efficient slicing of data to build the cube.
In industrial practice, we might need to flexibly move and range the time window of an analysis to match conditions that occured 6 months ago. Or that, we need to review the performance of processes for the last financial year.
We need to have correct time-phase correlation. Let's say a church has a per night statistic for a particular evangelistic rally as number of people "saved", number of "rededications", number of "contribution$", total contribution$, number of attendance - over 10 nights of the rally.
Then the church has statistics for number of tracts distributed by volunteer and the location each volunteer was assigned, the radio and tv advertisements and google ads placed. So the church statistician would need to collect info from each convert and $contributor - location of residence, when and where a tract was received, when and where an advertisement was encountered. And if they could collect such information from everyone who attended and who told them about it and when and where those who told them about the rally received a tract or encountered an advertisement. With such info at hand, the statistician would then be able a more truthfully matched multi-dimensional cube to let church leaders decide how to effectively organise their future pre-rally out-reach.
Data model optimization may not be possible for churches but for it is very possible for a factory full of equipment, robots, and trained operators. It is extraordinarily helpful to get phase-correlated data to treat an epidemic.
The EAV model is necessary because the number of parameters collectible, and number of attributes, vary from equipment to equipment and from process to process and from disease to disease. We cannot afford to have a table each for each set of parameters. Frequently a process or piece of equipment collects different sets of parameters, depending on the product that is being processed.
Sometimes the number of parameters collected can number a thousand. We cannot have a table with a thousand columns, can we? We even violate data normalization principles to store data in a single row or in a blob because of inefficient data access in a highly normalized table.
Further, we also need to version our data set. Let us say we designed an experiment for 2004. In 2006, we discovered that we needed to include new dimensions and discard some useless dimensions, and so we created a new version of the experiment. Then when we analyse the performance of the experiment between 2002 and 2008, we need to provide proper treatment to the change in the sets of dimensions in the experiment. In biological experiments and social behaviour surveys, the version changes would be more frequent.
I have a relational model of variable-dimensionality for attributes and parameters here:http://appdesign.blessedgeek.com/discrete-flow-resource-management. Not exactly EAV, but gives an idea what industrial multi-dimensional data modelling entails.

7 comments:

  1. Ce sac cartable Michael Kors est si universel qu'il pourrait plaire à tout le monde. Dans cet article, je vais essayer de passer en revue les caractéristiques principales et les éléments qui distinguent ce sac à main des autres.

    tags:Bolsos Michael Kors Rebajas,Bolsos Michael Kors Baratos,Bolsos Michael Kors El Corte Ingles

    L’extérieur de ce fourre-tout Michael Kors est un cuir métallisé de laiton froncé et froncé. Si vous connaissez les sacs à main Michael Kors, vous savez que ce cuir sera doux et souple. Les deux grandes poignées supérieures sont attachées au sac avec quelques centimètres de chaînes dorées, puis les moitiés supérieures des bretelles présentent le même cuir couleur laiton.

    tags:Bolsos Michael Kors Baratos,Michael Kors Örhängen

    La diversité des couleurs et la taille utilisable de ce sac le rendent parfait pour un usage quotidien. Les poches intérieures ne manquent pas pour aider à garder les choses en ordre. Il y a des endroits parfaits pour votre téléphone ou votre Blackberry. Le zip top empêche les malfaiteurs de saisir votre sac dans un bar et aide également à contenir tout ce qui se trouve à l'intérieur lorsque vous courez pour créer ce train.

    tags:Windguru Longchamps,Longchamp Soldes Destockage,Pronote College Longchamp

    ReplyDelete
  2. Michael Kors Outlet Online was a huge success there and became very well known. The designer Kors then branched out into accessories including Michael Kors Handbags Clearance. He never lost focus on his American chic styling. Some of his most notable collections include the Michael Kors Totes, Michael Kors satchel and the Designer MK Outlet and Michael Kors Outlet Online Store.

    No one has to know that they are cheap Ray Ban Sale UK and no one will know unless you tell them. Each of the cheap designer Ray Ban Sunglasses Sale UK carries the signature brand on the lens to let you know that you do have authentic designer Cheap Ray Ban Sunglasses UK.

    tags: Cheap Oakley Sunglasses UK,MK Outlet

    ReplyDelete