Results 1 to 4 of 4
  1. #1
    Join Date
    Jun 2004
    Location
    Kuala Lumpur
    Posts
    4

    Question Efficient table alteration (RDBMS implementation)

    A lot of the large/advanced DBMS's (Oracle, Informix, etc.) allow the DBA to use 'ALTER TABLE' commands to add/remove/modify columns without modifying the existing data in the table.
    In older and more primitive databases, we often don't have such functionality, and the data can be unavailable for a long time while we
    (i) create a new table with the new structure
    (ii) select all data from the old table to the new table
    (iii) re-create all indexes, constraints, permissions, etc. on the new table
    (iv) drop the old table

    Are there any more efficient strategies that can be implemented in a simple 'old-fashioned' RDBMS in order to allow table alteration (mainly adding new columns) with less down-time?
    Or any suggestions for working around the limitation in the RDBMS?

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    A lot depends on which database engine you are using. Many of them implemented work-arounds (some rather Machiavellian) for this specific problem. If the engine supports a work-around, that would be my first choice most of the time.

    As long as you can tolerate missing (NULL) data, you can add another table with the PK and the new columns then join them. This is easy and essentially zero down time.

    If the data (table/file/whatever) you are using is non-transactional, you can build the new copy while the users are still accessing the old one. When you are satisfied, you can rename the two so that the change appears to take place in a heartbeat.

    Going back to my original statement, an awful lot depends on your circumstances.

    -PatP

  3. #3
    Join Date
    Jun 2004
    Location
    Kuala Lumpur
    Posts
    4
    Thanks very much for those suggestions.

    It's our own database engine (one file per table, no transactions), and doesn't (yet) have any clever/Machiavellian workarounds implemented (any ideas?), so our current method is to create a new table (as described earlier) and copy all the data into it.

    There are things we can do to optimise that (e.g. make sure the old/new table are on different disks), and there are also things we can do to reduce the time that data is unavailable (or the down-time for 'writer' processes). That helps to minimise the pain, but we're still left with the problem that the whole process takes a long time - a long SELECT query and a long time to build the required indexes.

    Creating a new table (an extension to the original) isn't a good solution for us in most cases, as all the additional tables become a nightmare for DBAs and application-programmers.

    So we're currently thinking about whether we can improve our database engine in the following two ways:

    (a) improve the speed of SELECT from one large table into another, especially in the special case where the second table is the same as the first apart from extra columns.
    However, I don't like the idea of modifying SELECT to handle such special cases- the regression testing would be huge.

    (b) improve the speed of index building (we've already established that it's much slower to build the indexes before populating the table).

    Maybe we also need to think about whether we can be cleverer and handle new columns in a similar way to the 'big' databases, but that might be too much development effort and too much change to our data structures, so we might be stuck with mediocre performance in this area...

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    A couple of observations, for what they may be worth.

    If your table is transactional,and those transactions are NOT logged, there isn't really any way to buffer ongoing activity to permit changes "on the fly" as far as I know. It doesn't sound like you are doing logging, so that implies that the table needs to be either static (read-only) or off-line in order to make a schema change.

    There have been a couple of companies exploring new storage technologies that have gotten me really excited in the last few years, but none of them have brought any product to market. You might be able to buy/borrow/steal some of their ideas for use in your engine.

    Several of them were pursuing declarative database technologies (more closely mirroring relational algebra than SQL does), and those products would have offered the features you want almost as side effects of their design. One of the companies even planned to offer a SQL-92 compliant interface just because it would be easy to do and would provide a nice, clean migration path.

    A couple of the companies were pursuing ORB like data managers that would have treated SQL as a special case. It isn't hard to imagine the modifications needed to support the kind of functionality that you want.

    Getting back to your existing engine design, yes, the indexing is often the most "expensive" part of the database. Indexing usually takes more time/space/etc than everything else in the database combined, especially if it is effective indexing.

    Rather than modifying the SELECT operation to improve performance, you might consider a PROJECT operation. The projection has most of the features of a selection, but it is optimized for disk-to-disk transfers, and you can code specifically to allow/restrict certain options to improve the performance. As you guessed, this is much safer than conditionally modifying the SELECT code.

    Anyhow, good luck!

    -PatP

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •