I am aware of the fact that normally its better not to worry about performance until one really has a performance problem. So, worry about the db design first.
But just in general, can there be said something about when joins become expensive/slow? The first obvious reason might be:
lots of rows. Not sure about how many, but obviously querying 1 billion rows is slower then 1k rows.
Are there other reasons? Certain types of joins? certain types of keys? Certain indexes?
By "join" you mean a logical operation, which doesn't tell us anything about performance. Join performance will be determined by the physical operations the DBMS carries out to execute that logical join. So it's not wise to make blanket assumptions about the performance characteristics of a join.
For example the join may be equivalent to a scan or seek in a single index or table if the query gets rewritten by the optimiser. A join may even be a no-op. Or in a column-store system the physical operations to execute a join may be identical or very similar to a query against single table. To be sure you have to look at the query plan used in your DBMS or be knowledgeable enough about that system to know what the physical operations will be.
That is why the idea of "denormalization for performance" is bogus and no sound basis for database design (I'm guessing that might be what lies behind your question). Normalization implies nothing about performance because performance is always a function of the physical implementation and the actual physical operations being done. Normalization tells us nothing about the physical implementation and therefore has nothing to do with performance. One might make assumptions based on knowledge of the DBMS you are using but there are no universal rules.
Just as he says, usually I do joins on an indexed primary key, but your join may involve part number plus who's part number it is plus what type of part number it is (development, production, obsolete) plus any number of other fields.
When you are joining on a huge (millions of rows) table by index you might see some performance issues. If joining by something other than index you definately will. When a table is less than 100 rows an index won't make a search any faster than having an index. The relative difference in size between tables in a join makes no difference, the size of each of the tables taken separately does.
I read all the post to find answers to my problem, if you can help me a really appreciate:
I have 2 small table: "Customer" (3 Column) id(integer)+code+description(varchar) a few thousand of rows only on this table.
And "Invoice" ( less then 10 columns) id, number (integer), Reference(varchar), Customer_Id, Amount...Date... almost 1 Million row.
If I need to search on each table separately I have the result on 0 second instantly. I have PK clustered on id, indexes on columns etc...
But if I JOIN on Customer_Id (clustered id on first table and indexed on second)
and the Where condition only on 3 columns (Number, Reference, Description) the result is slow 3 to 5 sec. (WHERE i.Number LIKE '%820701%' OR i.Reference LIKE '%820701%' OR cc.Description LIKE '%820701%') the search word can be a number, a Reference or a Customer Description we don't know.
This join I used is with TOP 100 and order by Number Desc (on a search screen on a main application) and MsSql engine standard 2008.
Why the join is so slow ? Thanks for any suggestion.