As a former DBA (Database Administrator) in the late 90's, I used to focus a lot on providing answers to the "how" and the "what" questions. On one such occasion, working in Dallas, TX, I was asked to create a replicated 1 TB Oracle database (I exaggerate, at the time it was just shy of 800 GB) with nightly refreshes on a standby instance, I found out the best way to do this (and it was not using Oracle 8 replication). To cut a long story short, in a little under a week (after all the hardware was in place), I had a system going that used Quest Shareplex to replicate the transactions from our live production database to the standby database, using log-based replication. Neat! Time to don the cape and assume the role of hero-of-the-moment.
After this was set up, I one day asked the CIO what purposes this standby instance served, because I saw some lengthy queries running on it, almost daily. I was then told that this was for operational & financial reports that took about 3 - 4 hours to run (3 million customers, 1.5 billion transactions), which could not be run in the OLTP instance as it would drastically affect performance for 3,000 or so online users. So my next questions (as a curious DBA) of course were:
- "Which reports?"
- "What type of reports?"
- "Run how often?"
- "Via what interface?"
- "Delivering information how?"
- "How much data scanned / delivered?"
Upon some deeper and further investigation by the ever-more curious tech-head I was (using tkprof , PMON and sql tracing), I found a 16 way join on a query (not so bad if you consider some of the queries from hell generated by Siebel) that involved 4 of our largest tables, 2 of these joined by correlated subqueries using suboptimmal index range scans, simply to check for records that did or did not exist in a particular entity (the details are now fuzzy). Day old data was perfectly acceptable.
When I reverse-constructed the logic (from undocumented code and 8 level deep nested DECODE expressions, you HAVE to love that!) and verified it with the developers, I found that if I ran a process every day at 5 p.m. using cron to simultaneously create 3 temporary datasets from the master tables, then indexed these datasets properly, this piece (creating the temporary data sets) would take about 90 seconds to run, using about 10 MB of storage. If we rewrote the main exception reports to run against these, the longest report would run in about 20 seconds, while most of them completed in under 10 seconds. Quite a lot better than the 3+ (and growing) hours it was currently using. At this rate, considering these queries ran once per day, the need for the standby instance vanished.
Now the next question was another "how" question, as in "how do I tell my boss, the CIO, that the 100 brand new 18 GB EMC Symmetrix (then: state of the art, today: state of the ark) disks he has bought to facilitate this reporting need, could have been solved with about 2 weeks' worth of investigation and rewriting the queries so that they ran smart and in the same instance in a separate schema that self-refreshed?". I think I have subconsciously blotted that day from memory but I do remember a lot of tension and things being thrown around to the beat of expletives. As of that day, I was included on the design phase of every major impact solution that was to be created, as a solutions architect.
The right sequence of questions (by whomever) in this case would have been to ask "Why is this database needed? What is it going to be used for? By whom? How often?". By asking those questions, we possibly could have designed a different solution and saved the company a minimum 7 figures in cost. This simple education was probably more expensive than an entire Phd in astro-physics would have cost at the best universities in the world.
As the saying goes "As long as we learn from our mistakes ...". Let me add "... and someone else is picking up the tab for it ..."
Often in the world of BI, we jump right into the virtues of a given toolset without really understanding why we are doing this and "because the boss wants it this particular way" is a pathetic justification. One reason for this is pre-negotiated contracts for toolsets before the problem is clearly stated, tsk, tsk!. We then often land up with a solution that is short-sighted and that the end users of the system will simply "have to endure since it is now built". Hello shelfware!
In order to streamline the costs and harness the collective wisdom of the people on any given project, having a top-down approach that allows bidirectional open questioning will not only result in a better solution being built, but also in the sharing of knowledge and the reduction of cost. I realize that this is often easier said than done.
I have seen many times how technical people have the vanilla solution of "let's build a cube" to answer any of the challenging questions from business. They then "throw it across the fence, to see how they like it" to startled amazement / disappointment when the users complain that while this is nice, it does not actually solve their problems. Often the additional computing / resource overhead of populating the cube/s has now made the problem worse.
It is relatively important to consider that the first word in BI is "Business". Therefore, it is imperative that the questions that are asked right at the outset have nothing to do with technology and everything to do with the business. This can only happen when there is a good understanding of the drivers behind the business and what is important. The questions facilitating this could be:
- What?
- Why?
- To whom?
- How Often?
- Via what mechanism?
- What will be done with the information? (This is often the one that will decide the detailed scope and next steps)
- What latency?
Getting off my soapbox now ....
Have an intelligent week!
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.