| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

09-05-09, 11:47
|
|
Registered User
|
|
Join Date: Aug 2008
Posts: 25
|
|
|
When are joins slow
|
|
I am aware of the fact that normally its better not to worry about performance until one really has a performance problem. So, worry about the db design first.
But just in general, can there be said something about when joins become expensive/slow? The first obvious reason might be:
lots of rows. Not sure about how many, but obviously querying 1 billion rows is slower then 1k rows.
Are there other reasons? Certain types of joins? certain types of keys? Certain indexes?
|
|

09-05-09, 13:38
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
one example: when applying a function to a column, the optimizer cannot use an index on the column
suppose you have a table, let's say it's for blog entries or something, with a DATETIME column to specify the date and time of the posting
now say you want all postings for August 13th
if you do this --
Code:
WHERE DATE(postdatetime) = '2009-08-13'
(here DATE() is a mysql function, in other database systems you have similar stuff) then you'll get an index scan, or, at worst, a table scan
best practice is to use a range test with an open upper end, i.e.
Code:
WHERE postdatetime >= '2009-08-13'
AND postdatetime < '2009-08-14'
|
|

09-05-09, 15:47
|
|
Registered User
|
|
Join Date: Dec 2007
Location: London, UK
Posts: 732
|
|
|
|
By "join" you mean a logical operation, which doesn't tell us anything about performance. Join performance will be determined by the physical operations the DBMS carries out to execute that logical join. So it's not wise to make blanket assumptions about the performance characteristics of a join.
For example the join may be equivalent to a scan or seek in a single index or table if the query gets rewritten by the optimiser. A join may even be a no-op. Or in a column-store system the physical operations to execute a join may be identical or very similar to a query against single table. To be sure you have to look at the query plan used in your DBMS or be knowledgeable enough about that system to know what the physical operations will be.
That is why the idea of "denormalization for performance" is bogus and no sound basis for database design (I'm guessing that might be what lies behind your question). Normalization implies nothing about performance because performance is always a function of the physical implementation and the actual physical operations being done. Normalization tells us nothing about the physical implementation and therefore has nothing to do with performance. One might make assumptions based on knowledge of the DBMS you are using but there are no universal rules.
|
|

09-05-09, 21:36
|
|
Registered User
|
|
Join Date: Jul 2009
Location: Michigan
Posts: 125
|
|
Joining on text columns is slow, particularly large text columns.
Joining on columns of different types is slow, say one column is an integer and the other is a varchar that holds an integer.
Joining on a ton of columns is usually slow, even if you have a matching index.
|
|

09-06-09, 02:49
|
|
Registered User
|
|
Join Date: Aug 2008
Posts: 25
|
|
Thanks for the replies, interesting stuff!
@r937: I'll remember that. About the indexes: if you join on primary key, there's always an index used, isn't it? Or except when a function is used in one of the columns for the primary key?
@dportas: I understand I have to dig in deeper into the inner workings of the db engine to really understand this. But until then I'm just curious about some general rules.
@markatrombley: with "joining on a ton of columns", what do you mean exactly? Joins are done on only one column, isn't it?
Does it also matter how large the tables are compared to each other? Like when one table is very big, and another table on which you join is very small? Compared to when both tables are big?
|
|

09-06-09, 04:20
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
primary keys are indexed, yes, but joining on a primary key may or may not utilize that index, it depends on the query
joins can be done on multiple columns, for example...
Code:
SELECT ...
FROM statistics
INNER
JOIN citynames
ON citynames.statecode = statistics.statecode
AND citynames.citycode = statistics.citycode
|
|

09-06-09, 09:12
|
|
Registered User
|
|
Join Date: Jul 2009
Location: Michigan
Posts: 125
|
|
Just as he says, usually I do joins on an indexed primary key, but your join may involve part number plus who's part number it is plus what type of part number it is (development, production, obsolete) plus any number of other fields.
When you are joining on a huge (millions of rows) table by index you might see some performance issues. If joining by something other than index you definately will. When a table is less than 100 rows an index won't make a search any faster than having an index. The relative difference in size between tables in a join makes no difference, the size of each of the tables taken separately does.
|
|

09-06-09, 12:29
|
|
Registered User
|
|
Join Date: Aug 2008
Posts: 25
|
|
@r937: ok, I'll look into the joins on multiple columns. Haven't used that so far, but that could happen
@MarkATrombley: ok, thanks, good to know
|
|

12-22-11, 16:04
|
|
Registered User
|
|
Join Date: Dec 2002
Posts: 5
|
|
|
When are joins slow
I read all the post to find answers to my problem, if you can help me a really appreciate:
I have 2 small table: "Customer" (3 Column) id(integer)+code+description(varchar) a few thousand of rows only on this table.
And "Invoice" ( less then 10 columns) id, number (integer), Reference(varchar), Customer_Id, Amount...Date... almost 1 Million row.
If I need to search on each table separately I have the result on 0 second instantly. I have PK clustered on id, indexes on columns etc...
But if I JOIN on Customer_Id (clustered id on first table and indexed on second)
and the Where condition only on 3 columns (Number, Reference, Description) the result is slow 3 to 5 sec. (WHERE i.Number LIKE '%820701%' OR i.Reference LIKE '%820701%' OR cc.Description LIKE '%820701%') the search word can be a number, a Reference or a Customer Description we don't know.
This join I used is with TOP 100 and order by Number Desc (on a search screen on a main application) and MsSql engine standard 2008.
Why the join is so slow ? Thanks for any suggestion.
|
|

12-23-11, 03:21
|
|
Registered User
|
|
Join Date: Nov 2003
Posts: 2,407
|
|
Quote:
Originally Posted by r937
one example: when applying a function to a column, the optimizer cannot use an index on the column
|
Unless you create an index on the expression used in the query.
|
|

12-23-11, 03:30
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
Quote:
Originally Posted by shammat
Unless you create an index on the expression used in the query.
|
which you cannot do in every database system, but yeah, "smoke if you got 'em"
|
|

12-23-11, 04:00
|
|
Registered User
|
|
Join Date: Nov 2003
Posts: 2,407
|
|
Quote:
Originally Posted by r937
which you cannot do in every database system
|
Wow, are there still DBMS today which cannot index expressions? (I know that Firebird, PostgreSQL, Oracle, DB2, Informix, SQL Server and Teradata can..)
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|