Unconventional Emergency Oracle Support

Donald K. Burleson's picture
articles: 

Being an emergency support DBA is a job with huge stress-levels and it always provides a giant adrenaline rush for even the most experienced DBA. Senior management is in a panic, and high-level executives are continuously asking for a status update. Most of these databases are brand-new to me, and I have only a few minutes to access the situation and devise a plan to quickly relieve their bottleneck.

Unlike academic and scientific Oracle DBAs who demand proofs and reproducible results before making an Oracle change, the emergency Oracle DBA has no such luxury. The emergency DBA must use every weapon at their disposal to get the client running as quickly as possible. These unconventional methods are almost always driven by the client who does not appreciate the benefits of an elegant, long-term fix for the root cause of the problem.

Also, many DBAs have a common misconception about the fundamental nature of Oracle tuning. There are many who don't believe that any Oracle 'silver bullets' exist and these are usually the DBAs who have never experienced the epiphany of issuing a single command that improved the performance of a whole database. However, I'm here to assure you that silver-bullets do exist, and that with a little insight and experience you can issue a small set of commands that improves the performance of an entire database. These silver bullets are broad-brush techniques that can be used to quickly speed-up a crippled Oracle database. Some that I see most often include:

  • Change the SQL optimizer - Change the optimizer_mode, optimizer_index_cost_adj and optimizer_index_caching parameters.

  • Create materialized views - A great silver-bullet for data warehouses

  • Add missing indexes - Many shops get unnecessary long-table full-table scans because of missing indexes, especially function-based indexes.

  • Adjust SGA RAM resources - Often increasing the db_cache_size, sort_area_size, or assigning tables with small-table full-table scans to the KEEP pool will result in system-wide performance improvements.

Clients are impatient and they often insist on symptom-treating stop-gap remedies that are neither elegant nor comprehensive. In many cases, the client does not want to hear the time-consuming tasks that are required to address the root cause of the problem.

Try as I may, the client is the boss, and there is nothing that I can do to force them to spend the time and money to undertake the 'proper' solution to a complex tuning issue. Here are some examples of brute-force (yet highly effective) techniques that have allowed crippled systems to limp-along until the root cause of the performance problem can be addressed:

Throwing hardware at a poor database design

On dozens of occasions every year I must explain to a client that their database was designed and implemented by a beginner and it could take months to re-write.

When their mission-critical Oracle database is down, the clients business is usually at stake, and I must employ novel approaches to get them going fast. Often, a fast hardware replacement can help:

  • Using RAM-SAM (solid-state disks) instead of tuning the SQL - For I/O-bound databases this can also relieve transient I/O problems, and is especially effective for redo, redo and temporary tablespace files.

  • Moving to a system with faster (or more) CPUs - For database with a CPU-bottleneck, more processing power provides a fast fix to a complex performance problem.

I recently had a client whose database was one of the worst designs and SQL coding that I have ever seen. I explained to the client that a re-design would take 4 months and cost over $270k, but I could throw hardware at the problem within 24 hours. For a total cost of $85k I moved the whole database onto solid-state disks and moved the instance to a super-fast 8-way Itanium2 64-bit processor array. The database was just as inefficient as before, but it executed queries five times faster!

Using rule-based SQL optimization

Whenever I encounter an emergency where the performance problem is related to a complicated CBO issue, I always consider using the rule-based optimizer (RBO) as a stop-gap measure. Often as not, changing optimizer_mode=rule will improve SQL performance, buying me the time required to analyze the root cause of the CBO problem. On more than one client, they were satisfied with the performance of the RBO and refused to pay for a 'real' solution to the issue. Personally, I don't mind because I know that they will be back to see me when they migrate to Oracle10g, when the RBO disappears.

Conclusion

These are just a few of the non-traditional tools that an emergency DBA uses when faced with a crisis. By the way, this issue will become even more pronounced for Oracle10g clients who start with the automated management features (ASM, AMM, etc.) because they may eventually outgrow the broad-brush automation capabilities and require the full flexibility of the Oracle10g database.

The major differences between traditional Oracle support and emergency support are dramatic:

  • Forget traditional tuning practices - In an emergency there is no time to comprehensively analyze the database and develop a detailed tuning plan.

  • Get fast results - The time-honored (and time-consuming) Oracle tuning techniques often do not apply to an emergency situation.

  • Money talks - In an emergency the client always wants fast, cheap solutions. I have no problem with a cheaper, sub-optimal fast-fix to an acute performance problem.