27 August 2009

The Flaw of Averages

I have heard it said before to trust the law of averages. In the present day and age, everyone wants to predict the future and to be able to prepare for it. In doing so, we often try to use averaging only to be disappointed / shocked / outraged at the results when we compare reality with our predictions.

If we look at a popular index, being CPI (Consumer Price Index) which measures how much more expensive things are from year to year. In Australia, we are used to an annual CPI of 3% - 3.5%, however anyone that has been inside a grocery store in the last 6 months knows that 3.5% would be a wonderful dream come true. We often see grocery prices rise by anything from 10% - 25% in a given year, not counting seasonal items for which prices spike and dip according to the supply-demand equation. I am most certain this is not unique to Australia, as having lived in the USA and South Africa for some time, I recall the same shocked realization, even though the periodic statistics that were released assured me of a CPI well below my own rudimentary findings.


Sam Savage recently published a book titled "The Flaw of Averages" which takes us into a full-blown study of the dangers of decisions based on averages. You can also find more information here.

There is the story of the statistician who drowned while crossing a river that was on average 0.6 m (2 feet) deep. As you can image, the deepest part of the river may well have been several times deeper than the man was tall. Or perhaps the hapless soul lost his footing and remained horizontal (making him maybe 0.4 m tall) for too long and not able to breathe.

In the same way, in business we know that if we experienced a 5% overall growth or contraction, it is never assumed that all pro
ducts / services grew / shrank by the same as the average. So be careful when business decisions are made based on average figures.

Let's take an example KPI: assuming we had 100 products to sell and we sold 100,000 of each product for the year (10 million items sold).

If our QA policy allows for 2% returns due to quality problems (clearly not Six Sigma compliant), this means that 95,000 items returned in the year (0.95 % returns) means we are doing OK and well within the allowed range. Therefore as long as returns stay under 2% (gosh we even managed to get it under half of that), then the Production manager should have no reason to jump up and down ranting and raving like a lunatic!

However, if we dig further, and find that of the 95,000 returns, as much as 94,000 are on one single product and it just happens to be a very profitable line (excluding
returns), then we can start to understand the reason the Production manager is constantly screaming and about to have a cardiac arrest, as this represents a 94% deficiency rate.

Here is another example with a report (please pardon haziness due to resizing of image to fit the page) to show the outliers both good and bad.







Here we can clearly see that not all sales reps are pulling their weight equally in terms of the dollar value of products sold. However, this also shows that a single aggregate measure in and of itself should not lead to knee-jerk reactions. While this report indicates turnover, the individual sales reps profitability may look completely different, and if we combine this with returns / rejects, and possibly customer service scores, the skewing might not look nearly as bad as initially perceived.

Thus we conclude that averages by themselves are a dangerous measure and to get a better overall picture, we need multiple measures. KPI's should not be restricted to a single measure, but an indicator could be a complex algorithm that include multiple measures to derive a single indicator value that immediately spells out to its audience whether the picture of events is good, bad or neutral.

Have an awesome week!


03 August 2009

The Right sequence of questions for business

We were taught in school that there are 6 main question words: how, what, who, why, where & when. Of course as you get some business experience under your belt you realize there are many ways to ask questions, but in essence it does come back to these 6 basics.

As a former DBA (Database Administrator) in the late 90's, I used to focus a lot on providing answers to the "how" and the "what" questions. On one such occasion, working in Dallas, TX, I was asked to create a replicated 1 TB Oracle database (I exaggerate, at the time it was just shy of 800 GB) with nightly refreshes on a standby instance, I found out the best way to do this (and it was not using Oracle 8 replication). To cut a long story short, in a little under a week (after all the hardware was in place), I had a system going that used Quest Shareplex to replicate the transactions from our live production database to the standby database, using log-based replication. Neat! Time to don the cape and assume the role of hero-of-the-moment.

After this was set up, I one day asked the CIO what purposes this standby instance served, because I saw some lengthy queries running on it, almost daily. I was then told that this was for operational & financial reports that took about 3 - 4 hours to run (3 million customers, 1.5 billion transactions), which could not be run in the OLTP instance as it would drastically affect performance for 3,000 or so online users. So my next questions (as a curious DBA) of course were:
  • "Which reports?"
  • "What type of reports?"
  • "Run how often?"
  • "Via what interface?"
  • "Delivering information how?"
  • "How much data scanned / delivered?"

Upon some deeper and further investigation by the ever-more curious tech-head I was (using tkprof , PMON and sql tracing), I found a 16 way join on a query (not so bad if you consider some of the queries from hell generated by Siebel) that involved 4 of our largest tables, 2 of these joined by correlated subqueries using suboptimmal index range scans, simply to check for records that did or did not exist in a particular entity (the details are now fuzzy). Day old data was perfectly acceptable
.

When I reverse-constructed the logic (from undocumented code and 8 level deep nested DECODE expressions, you HAVE to love that!) and verified it with the developers, I found that if I ran a process every day at 5 p.m. using cron to simultaneously create 3 temporary datasets from the master tables, then indexed these datasets properly, this piece (creating the temporary data sets) would take about 90 seconds to run, using about 10 MB of storage. If we rewrote the main exception reports to run against these, the longest report would run in about 20 seconds, while most of them completed in under 10 seconds. Quite a lot better than the 3+ (and growing) hours it was currently using. At this rate, considering these queries ran once per day, the need for the standby instance vanished.

Now the next question was another "how" question, as in "how do I tell my boss, the CIO, that the 100 brand new 18 GB EMC Symmetrix (then: state of the art, today: state of the ark) disks he has bought to facilitate this reporting need, could have been solved with about 2 weeks' worth of investigation and rewriting the queries so that they ran smart and in the same instance in a separate schema that self-refreshed?". I think I have subconsciously blotted that day from memory but I do remember a lot of tension and things being thrown around to the beat of expletives. As of that day, I was included on the design phase of every major impact solution that was to be created, as a solutions architect.

The right sequence of questions (by whomever) in this case would have been to ask "Why is this database needed? What is it going to be used for? By whom? How often?". By asking those questions, we possibly could have designed a different solution and saved the company a minimum 7 figures in cost. This simple education was probably more expensive than an entire Phd in astro-physics would have cost at the best universities in the world.

As the saying goes "As long as we learn from our mistakes ...". Let me add "... and someone else is picking up the tab for it ..."

Often in the world of BI, we jump right into the virtues of a given toolset without really understanding why we are doing this and "because the boss wants it this particular way" is a pathetic justification. One reason for this is pre-negotiated contracts for toolsets before the problem is clearly stated, tsk, tsk!. We then often land up with a solution that is short-sighted and that the end users of the system will simply "have to endure since it is now built". Hello shelfware!

In order to streamline the costs and harness the collective wisdom of the people on any given project, having a top-down approach that allows bidirectional open questioning will not only result in a better solution being built, but also in the sharing of knowledge and the reduction of cost. I realize that this is often easier said than done.

I have seen many times how technical people have the vanilla solution of "let's build a cube" to answer any of the challenging questions from business. They then "throw it across the fence, to see how they like it" to startled amazement / disappointment when the users complain that while this is nice, it does not actually solve their problems. Often the additional computing / resource overhead of populating the cube/s has now made the problem worse.

It is relatively important to consider that the first word in BI is "Business". Therefore, it is imperative that the questions that are asked right at the outset have nothing to do with technology and everything to do with the business. This can only happen when there is a good understanding of the drivers behind the business and what is important. The questions facilitating this could be:
  • What?
  • Why?
  • To whom?
  • How Often?
  • Via what mechanism?
  • What will be done with the information? (This is often the one that will decide the detailed scope and next steps)
  • What latency?
These one cannot outsource to IT (even more so if the IT is offshore in a different social and economic culture to the users and customers of the organization), the business MUST be intimately involved if a successful solution is sought. On another note, having an intuitive toolset that does not require a 2 - 4 day class to learn to run (especially for end users) is imperative. For end users, knowing how a solution fits into the business is (or should be) a higher priority than flashy bells, whistles and gadgets.

Getting off my soapbox now ....

Have an intelligent week!