If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Microsoft SQL Server > Challange to all Experts

Reply
 
LinkBack Thread Tools Display Modes
  #16 (permalink)  
Old 02-05-10, 09:29
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Quote:
Originally Posted by n_i View Post
I doubt that one can directly translate algorithmic complexity into query performance metrics.
Where on earth do you get the idea that he does? It is not the complexity of the algorithm but the straightforward nature of a theta join.

I really wouldn't dismiss Itzik Ben Gan's writings too lightly. He is easily up there in the top 10 of SQL Server gurus, and very possibly the very top T-SQL expert on the planet.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #17 (permalink)  
Old 02-05-10, 09:51
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Yeah, that sounds a bit defensive doesn't it?

Point is, he isn't arguing a general case here but a very specific one. A theta join like yours is will involve half the number of rows a Cartesian product would involve.
Code:
Rows in Table A   Rows in Table A   Cartesian   Theta
2                 2                 4           2
20                20                400         200
200               200               40000       20000
2000              2000              4000000     2000000
20000             20000             400000000   200000000
There is a calculable and exponential increase in the number of rows processed.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.

Last edited by pootle flump; 02-05-10 at 09:54.
Reply With Quote
  #18 (permalink)  
Old 02-05-10, 10:48
n_i n_i is online now
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,229
Quote:
Originally Posted by pootle flump View Post
Where on earth do you get the idea that he does?
Here's a quote from the paper:
Quote:
Originally Posted by Itzik Ben Gan et al.
As you can see, the algorithmic complexity of this plan is N^2. With a large number of rows per employee, you get enormous numbers. For example, with 5 employees and 100,000 rows per employee, you get 25,000,250,000 rows scanned. In terms of run time this query would run over an hour. With a higher number of rows per employee, the performance degrdation is not linear, rather N^2. For example, having a single partition and 10,000,000 rows, this query would run for about a year!
Quote:
Originally Posted by pootle flump View Post
I really wouldn't dismiss Itzik Ben Gan's writings too lightly. He is easily up there in the top 10 of SQL Server gurus, and very possibly the very top T-SQL expert on the planet.
I'm not dismissing Mr. Ben Gan's expertise, far from it. I'm just saying that reading twice the rows does not necessarily take twice the time. The number of rows read may be growing exponentially as data accumulate, but query run time does not have to, and in my experience often does not.
Reply With Quote
  #19 (permalink)  
Old 02-05-10, 11:38
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Quote:
Originally Posted by n_i View Post
Here's a quote from the paper:
Beg your pardon. I thought you were referring to complexity of the query algorithm, not the plan.

Quote:
Originally Posted by n_i View Post
I'm just saying that reading twice the rows does not necessarily take twice the time. The number of rows read may be growing exponentially as data accumulate, but query run time does not have to, and in my experience often does not.
For set based processors the processing of each successive row has a lower cost of the previous one. Under normal circumstances, reading twice as many rows results in twice as many rows being processed. And normally we notice little difference in processing normal queries as the data accumulates.

The point here though is that number of rows read <> the number of rows processed. In this specific case the addition of a single, average row in your inner table results in N/2 more rows being processed (the N being the number of rows in your outer table).

I am happy to script up some benchmarks if you are sufficiently motivated to put your observations up against Ben Gan's
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #20 (permalink)  
Old 02-05-10, 12:56
n_i n_i is online now
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,229
Quote:
Originally Posted by pootle flump View Post

The point here though is that number of rows read <> the number of rows processed.
May be it's a terminology issue, but for me "reading" and "processing" here means the same thing. I understand perfectly what you (and Ben Gan) say, but still I'm a bit uncomfortable with the perfect exponent on the benchmark diagram. I'm going to run some tests for myself when there's a slow day in my schedule.


Quote:
Originally Posted by pootle flump View Post
(the N being the number of rows in your outer table).
Not table, partition {as in "OVER (PARTITION BY...)"}. In the example that kicked off this thread we were aggregating data by customer ID. The formula, of course, still stands.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On