Results 1 to 12 of 12
  1. #1
    Join Date
    Aug 2008
    Posts
    25

    When are joins slow

    I am aware of the fact that normally its better not to worry about performance until one really has a performance problem. So, worry about the db design first.

    But just in general, can there be said something about when joins become expensive/slow? The first obvious reason might be:
    lots of rows. Not sure about how many, but obviously querying 1 billion rows is slower then 1k rows.

    Are there other reasons? Certain types of joins? certain types of keys? Certain indexes?

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    one example: when applying a function to a column, the optimizer cannot use an index on the column

    suppose you have a table, let's say it's for blog entries or something, with a DATETIME column to specify the date and time of the posting

    now say you want all postings for August 13th

    if you do this --
    Code:
    WHERE DATE(postdatetime) = '2009-08-13'
    (here DATE() is a mysql function, in other database systems you have similar stuff) then you'll get an index scan, or, at worst, a table scan

    best practice is to use a range test with an open upper end, i.e.
    Code:
    WHERE postdatetime >= '2009-08-13'
      AND postdatetime  < '2009-08-14'
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    By "join" you mean a logical operation, which doesn't tell us anything about performance. Join performance will be determined by the physical operations the DBMS carries out to execute that logical join. So it's not wise to make blanket assumptions about the performance characteristics of a join.

    For example the join may be equivalent to a scan or seek in a single index or table if the query gets rewritten by the optimiser. A join may even be a no-op. Or in a column-store system the physical operations to execute a join may be identical or very similar to a query against single table. To be sure you have to look at the query plan used in your DBMS or be knowledgeable enough about that system to know what the physical operations will be.

    That is why the idea of "denormalization for performance" is bogus and no sound basis for database design (I'm guessing that might be what lies behind your question). Normalization implies nothing about performance because performance is always a function of the physical implementation and the actual physical operations being done. Normalization tells us nothing about the physical implementation and therefore has nothing to do with performance. One might make assumptions based on knowledge of the DBMS you are using but there are no universal rules.

  4. #4
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    Joining on text columns is slow, particularly large text columns.

    Joining on columns of different types is slow, say one column is an integer and the other is a varchar that holds an integer.

    Joining on a ton of columns is usually slow, even if you have a matching index.

  5. #5
    Join Date
    Aug 2008
    Posts
    25
    Thanks for the replies, interesting stuff!

    @r937: I'll remember that. About the indexes: if you join on primary key, there's always an index used, isn't it? Or except when a function is used in one of the columns for the primary key?

    @dportas: I understand I have to dig in deeper into the inner workings of the db engine to really understand this. But until then I'm just curious about some general rules.

    @markatrombley: with "joining on a ton of columns", what do you mean exactly? Joins are done on only one column, isn't it?

    Does it also matter how large the tables are compared to each other? Like when one table is very big, and another table on which you join is very small? Compared to when both tables are big?

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    primary keys are indexed, yes, but joining on a primary key may or may not utilize that index, it depends on the query

    joins can be done on multiple columns, for example...
    Code:
    SELECT ...
      FROM statistics
    INNER
      JOIN citynames
        ON citynames.statecode = statistics.statecode
       AND citynames.citycode = statistics.citycode
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    Just as he says, usually I do joins on an indexed primary key, but your join may involve part number plus who's part number it is plus what type of part number it is (development, production, obsolete) plus any number of other fields.

    When you are joining on a huge (millions of rows) table by index you might see some performance issues. If joining by something other than index you definately will. When a table is less than 100 rows an index won't make a search any faster than having an index. The relative difference in size between tables in a join makes no difference, the size of each of the tables taken separately does.

  8. #8
    Join Date
    Aug 2008
    Posts
    25
    @r937: ok, I'll look into the joins on multiple columns. Haven't used that so far, but that could happen

    @MarkATrombley: ok, thanks, good to know

  9. #9
    Join Date
    Dec 2002
    Posts
    5

    When are joins slow

    I read all the post to find answers to my problem, if you can help me a really appreciate:
    I have 2 small table: "Customer" (3 Column) id(integer)+code+description(varchar) a few thousand of rows only on this table.
    And "Invoice" ( less then 10 columns) id, number (integer), Reference(varchar), Customer_Id, Amount...Date... almost 1 Million row.
    If I need to search on each table separately I have the result on 0 second instantly. I have PK clustered on id, indexes on columns etc...

    But if I JOIN on Customer_Id (clustered id on first table and indexed on second)
    and the Where condition only on 3 columns (Number, Reference, Description) the result is slow 3 to 5 sec. (WHERE i.Number LIKE '%820701%' OR i.Reference LIKE '%820701%' OR cc.Description LIKE '%820701%') the search word can be a number, a Reference or a Customer Description we don't know.

    This join I used is with TOP 100 and order by Number Desc (on a search screen on a main application) and MsSql engine standard 2008.
    Why the join is so slow ? Thanks for any suggestion.

  10. #10
    Join Date
    Nov 2003
    Posts
    2,933
    Quote Originally Posted by r937 View Post
    one example: when applying a function to a column, the optimizer cannot use an index on the column
    Unless you create an index on the expression used in the query.

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by shammat View Post
    Unless you create an index on the expression used in the query.
    which you cannot do in every database system, but yeah, "smoke if you got 'em"
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  12. #12
    Join Date
    Nov 2003
    Posts
    2,933
    Quote Originally Posted by r937 View Post
    which you cannot do in every database system
    Wow, are there still DBMS today which cannot index expressions? (I know that Firebird, PostgreSQL, Oracle, DB2, Informix, SQL Server and Teradata can..)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •