Results 1 to 2 of 2
  1. #1
    Join Date
    Oct 2003

    Tip: Think in "sets," not "procedures"

    When you are working with an SQL-type database, it is very important to approach a problem in terms of sets of records, manipulated by queries; not procedural (as in "stored procedures") code.

    An SQL database is designed around the concept of "sets of records" being returned by (or updated or deleted or inserted by) a query, "all at once." It has all sorts of clever optimizations that it can use (and which it will EXPLAIN to you if you like) to handle even very-large sets of records quickly. Furthermore, the database can adapt to changing conditions in its quest to obtain the answers quickly every time.

    But what can it do if you write a stored-procedure that starts like this:
    SELECT .. FROM bletch WHERE ...
    SCAN the recordset...
      IF Exists(select foo from bar...)
    What can it do? Absolutely squat, except to slog through that recordset one record at a time, executing a new query for every single record... No wonder it's dog-slow!

    But you can re-write that procedure:
    SELECT .. FROM bletch INNER JOIN bar ...
    Scan the recordset...
      No "IF" is required here!
    Now the procedure doesn't examine each record procedurally to see if it matches (especially not by repetitively executing another query!); it has structured the query that it issued so that only (or mostly) qualifying records are presented to it in the first place. That allows the query optimizer to do its job!

    Also don't overlook the notion of temporary tables as they might be implemented in your particular DBMS. A problem might be solved by executing one query, storing it result in a temporary table, executing another query, and manipulating these sets of records in subsequent steps within your procedure. Any DBMS worth its salt can "prepare a set of 10,000 records" much faster and more efficiently than it can "slog through" them in a loop. The stored-procedure languages are designed to be flexible; they are not particularly efficient.

    In our example above, if table foo contains 10,000 records and bletch contains 5,000 records ... a query is being written, prepared and executed 10,000 times and it's being exposed to (potentially) 5,000 records each and every time. Double the number of records in foo and it gets worse; double the number of records in bletch and it (maybe) gets a whole lot worse.

    But given the opportunity, the query optimizer might simply whip up a 10,000-element vector of numbers in memory, then whip up a 5,000-element vector, logic-AND them together, and voila! But in order for it to be able to do it that way, you have to write it that way!

    Now, obviously, you can write an SQL query "the wrong way" too. You can force the computer to generate every possible combination of {5,000 * 10,000 = 50 million} rows only to throw away 49.995 million of them ... but that's what EXPLAIN is for. (This tells you how the DBMS plans to execute your query, so that you can catch in advance if it's about to do something really stupid, and rewrite your query accordingly.)

    If you find unacceptable performance from any modern day DBMS, you can be almost certain that a major source of the problem is your algorithm. You can be sure that no amount of "futzing with the database," optimizing this or tweaking that, is going to have any impact that even comes close to the mantra that Kernighan and Plauger stated in The Elements of Programming Style: don't diddle with the code to make it faster; find a better algorithm!

    This principle applies equally to any SQL-based or relational database you can name; old or new, big or small.
    ChimneySweep(R): fast, automatic
    table repair at a click of the

  2. #2
    Join Date
    Dec 2003
    Tokyo Japan
    i for one believe that one should check the design FIRST, before the code...

    no matter how you change your algorithm for this, or for that... if the design isn't really that good, then everything would be for naught...


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts