# Thread: Urgent help request: Need help writing a query

1. Registered User
Join Date
Oct 2003
Posts
163

## Unanswered: Urgent help request: Need help writing a query

I urgently need help and guidance with writing a query to solve the
following scenario which is a bit complex:

In tableA I have product and their qty as below
ProdID ProdQty AllocatedAmount
P1 700
P2 400
P3 300
.... and so on

I'll explain the last column of tableA (AllocatedAmount) in a while for
which I have posted this query.

I have a one to many relationship where each multiple products are
defined against a category. Suppose I have defined the relationship as
such that both products P1 and P2 belong to Category C1. Product P3
belonging to category C2 and so on...

Now in tableB I have categories and amounts for each category recorded.
This could be the total amount of transactions for the category or
whatever. SO suppose tableB is like follows:
CatID CatAmount
C1 10,000
C2 14,000
C3 6,000
... and so on

And here's the challenge that I want to solve: I want to allocate the
amount for each category in a proportionate manner to the products
belonging to that category and using the ProdQty of each product I.e.
taking the example of P1 and P2 which both belong to category C1, I want
the amount of category C1 which is 10,000 to be distributed as per the
quantities of P1 and P2...thus if you do the basic math you'll see that
P1 should be allocated 63.63% of the amount (700+400 = 1100 and 700/1100
*100 = 63.63%) I.e. 10,000 * 63.63% = 6363 and P2 will be allocated
36.36% of 10,000 I.e around 3636. The calculated amounts are to be
stored in the AllocatedAmount column of tableA.
I hope this is clear? Though I have tried to explain as best as I could
but I do realize that this is a pretty challenging task

Eagerly looking forward to help.

2. Registered User
Join Date
Jul 2002
Posts
58

## Missing some info

This shouldn't be too difficult, but you columns in TableA and TableB do not contain the info to tell which Product is in which Category. Is there a CatID column in TableA or perhaps a TableC with columns of ProdID and CatID? Either way, the technique is to put a corelated subquery in the SET clause. It's just slightly more complex if there's a third table.

Here's answers for either scenario. For case 1 (there's a column in TableA that marks a MTO relationship to TableB) just ignore my #tableC. For case 2 (there's a third table defining the MTO relationship, just ignore the existance of the CatID column in my #tableA

The explicitly rolled back transaction allows you to 'play around' or rerun the queries to see the results. I think both cases below give the results you're looking for.....

Code:
```create table #tableA
(ProdID VARCHAR(10),
ProdQty int,
AllocatedAmount int,
CatID VARCHAR(10))
create table #tableB
(CatID VARCHAR(10),
CatAmount int)
CREATE TABLE #tableC
(ProdID VARCHAR(10),
CatID VARCHAR(10))

insert into #tableA
values('P1',700,NULL,'C1')
insert into #tableA
values('P2',400,NULL,'C1')
insert into #tableA
values('P3',700,NULL,'C2')
insert into #tableA
values('P4',400,NULL,'C2')
insert into #tableA
values('P5',200,NULL,'C2')

insert into #tableB
values('C1',10000)
insert into #tableB
values('C2',14000)

insert into #tableC
values ('P1','C1')
insert into #tableC
values ('P2','C1')
insert into #tableC
values ('P3','C2')
insert into #tableC
values ('P4','C2')
insert into #tableC
values ('P5','C2')

/* Case 1 - MTO column in #tableA */
BEGIN TRAN
SELECT *
FROM #tableA

UPDATE #tableA
SET AllocatedAmount = #tableB.CatAmount * ProdQty / (SELECT SUM(ProdQty)
FROM #tableA AS A2
WHERE A2.CatID = #tableB.CatId)
FROM #tableB
WHERE #tableA.CatID = #tableB.CatID

SELECT *
FROM #tableA
ROLLBACK TRAN

/* Case 2 - third table defining MTO relationship */
BEGIN TRAN
SELECT *
FROM #tableA

UPDATE #tableA
SET AllocatedAmount = #tableB.CatAmount * ProdQty / (SELECT SUM(ProdQty)
FROM #tableA AS A2
JOIN #tableC AS C2 ON A2.ProdID = C2.ProdID
WHERE C2.CatID = #tableB.CatId)
FROM #tableB, #tableC
WHERE #tableA.ProdID = #tableC.ProdID
AND #tableC.CatID = #tableB.CatID

SELECT *
FROM #tableA
ROLLBACK TRAN```
Last edited by Steve Duncan; 12-21-03 at 19:03.

3. Registered User
Join Date
Oct 2003
Posts
163
Hello Steve,

Many thanks for your reply and help. I should have posted some more details...

BTW, what do you mean by 'MTO"??

Let's assume that tblA also contains a column called CatID.... I'm taking your suggestion and I agree that's this will be a better approach...

Also suppose that both tblA abd tblB already exist and contain the data.

It appears that the heart of your solution lies in writing the following query:

UPDATE #tableA
SET AllocatedAmount = #tableB.CatAmount * ProdQty / (SELECT SUM(ProdQty)
FROM #tableA AS A2
WHERE A2.CatID = #tableB.CatId)
FROM #tableB
WHERE #tableA.CatID = #tableB.CatID

Can you kindly explain how the above will actually work. I kind of understand the logic but not fully. Your explanation will help me clear up some of my own concepts and the order of execution of the various parts of the above query. I'll appreciate your help here.

And last but not least: Can I write this as a query or do I have to write this as a Stored Procedure or something???

4. Registered User
Join Date
Jul 2002
Posts
58
Certainly the UPDATE is the heart of the thing, everything else was just to allow you to see the 'before' and 'after' views of the temp tables I created. So there's no need to make this a stored proc - you can just execute the UPDATE statement.

MTO is just shorthand for 'Many To One'.

Correlated subqueries are a SQL analog to nested loops in a program. You most often see them in SELECT statements, but you can use them in UPDATE's as well.

Although it's not necessarily how the Query Compiler will actually execute this query, what the engine really does will be logically equivilent to this:

Think of the 'outer' portion of the statement - the UPDATE part - as scanning through #tableA. For each row in #tableA, the SELECT that appears in () in the SET clause get's run. See the WHERE clause in that SELECT? The condition WHERE A2.CatID = #tableB.CatId compares rows from one or more of the tables in the subqueries FROM or JOIN cluases to one of the tables in the outer query - in this case #tableB that appears in the FROM clause of the UPDATE statement.

Perhaps this will be easier to see if we do a simpler correlated subquery in a SELECT because this UPDATE has some complications. Suppose we want to find the Produt with the highest ProdQty value within each category. This is easy with a correlated subquery:

Code:
```SELECT *
FROM #tableA
WHERE ProdQty = (SELECT MAX(ProdQty)
FROM #tableA AS A
WHERE A.CatID = #tableA.CatID)```
As the outer SELECT runs through each row in #tableA, the value of ProdQty is comapred to the results of the inner SELECT. But the key feature, and what makes this a CORRELATED subquery is that once again you see the WHERE clause of the inner SELECT linking one of the inner tables (the only one in this case) with one of the outer tables which restricts the inner query to the range of rows appropriate for the row currently being considered by the outer SELECT. Since in this case the two tables are actually the same, that makes this an even more specialized beast, a self-correlated subquery, but that's not a requirement - frequently the related tables are not the same, they just share a column that defines some relationship between the tables. When you do a self-correlated subquery, you need to give the inner copy of the table an alias so the query engine can keept the two copies straight.

The UPDATE I gave you is more complex on two counts. First, having a FROM clause in an UPDATE statement is not ANSI standard, it's a TSQL extension. Let's assume for some bizzare reason you want to set the AllocatedAmount in each row to the CatAmount for that part's category. In ANSI standard SQL you would do that like this:

Code:
```   UPDATE #tableA
SET AllocatedAmount = (SELECT CatAmount
FROM #tableB
WHERE  #tableA.CatID = #tableB.CatID)```
Which is just another correlated subquery. TSQL lets you write the same query like this:

Code:
```   UPDATE #tableA
SET AllocatedAmount = #tableB.CatAmount
FROM #tableB
WHERE  #tableA.CatID = #tableB.CatID```
Which does exactly the same thing. It's just some people - particularly the designers of SQL Server - think this way is easier to read. But puting a FROM clause in a UPDATE is just a terse way of hiding a correlated subquery in it.

So, going back to the query I gave you, the second complication is that you can see that it actually has TWO correlated subqueries - a visiblly explicit one in the SET clause, and another hidden in the FROM clause of the outer UPDATE. In fact, you can do this as two explicit subqueries, even making one of them a self-correlated subquery like this:

Code:
```   UPDATE #tableA
SET AllocatedAmount = (SELECT CatAmount
FROM #tableB
WHERE #tableB.CatID = #tableA.CatID) * ProdQty / (SELECT SUM(ProdQty)
FROM #tableA AS A2
WHERE A2.CatID = #tableA.CatId)```
But that's a really busy SET clause.

IF you turn on 'Show Execution Plan', you see that the engine doesn't really make a multiple passes through the subquery's tables, but whatever it actually does to evaluate a given correlated subquery will be logically the same as what I explained above. Turning on 'Show execution plan' on the stuff I gave you with small temp tables with no indexes, the engine handled the subquery that generates totals of PartQty by creating a worktable of values from #tableA, sorting it by CatID, and then making a single pass through the worktable calculating subtotals for each unique CatID as it goes. Later, these results got joined with the outer UDATE's rows with a nested loop, but this nested loop was thereby working with a smaller worktable of intermediate results instead of having to scan all of #tableA. Different path to the same result, but the engine can often do things like this that will save a lot of I/O over actually doing the whole subquery explicitly for every row in the outer table.

Steve

Originally posted by Joozh
Hello Steve,

Many thanks for your reply and help. I should have posted some more details...

BTW, what do you mean by 'MTO"??

Let's assume that tblA also contains a column called CatID.... I'm taking your suggestion and I agree that's this will be a better approach...

Also suppose that both tblA abd tblB already exist and contain the data.

It appears that the heart of your solution lies in writing the following query:

UPDATE #tableA
SET AllocatedAmount = #tableB.CatAmount * ProdQty / (SELECT SUM(ProdQty)
FROM #tableA AS A2
WHERE A2.CatID = #tableB.CatId)
FROM #tableB
WHERE #tableA.CatID = #tableB.CatID

Can you kindly explain how the above will actually work. I kind of understand the logic but not fully. Your explanation will help me clear up some of my own concepts and the order of execution of the various parts of the above query. I'll appreciate your help here.

And last but not least: Can I write this as a query or do I have to write this as a Stored Procedure or something???

5. SQL Consultant
Join Date
Apr 2002
Location
Posts
20,002
steve, excellent post

bookmarked

rudy

6. Registered User
Join Date
Jul 2002
Posts
58
By the way, unrelated to your original question, I heartily recommend the technique I used in my first response whenever you are trying out a query with complex logic.

Code:
```BEGIN TRAN
SELECT that shows 'before' state

Big  Hairy Query

SELECT same as above, showing 'after' state
ROLLBACK TRAN```
Saves you from having to undo the results of Big Hairy Query if it's wrong - with for me it often is on the first try if it's especially complex or uses some obscure technique. If the BHQ affects the number of rows you're expecting it to, and the second SELECT show the correct results, you can just run BHQ by itself, or change ROLLBACK to COMMIT and rerun the whole batch. In fact, you can just highlight everything BUT the ROLLBACK TRAN part, and Query Analyzer will just run that highlighted part, and then you either execute a ROLLBACK TRAN or COMMIT TRAN afterwards as appropriate , but you have to be careful to remember to finish the transaction or you'll likely lock other people out of the affected tables.

And I ALWAYS do this any time I'm going to 'raise the hood' and monkey with the system tables. As much to see that I'm affecting the number of rows I expect to, as well as seeing that the results are what I intend. I learned that the hard way, as once I set a suspect database's status to EMERGENCY RECOVERY by updating the SYSDATABASES table in master, but I left out the WHERE clause. DOH!!!! Got every DB, including master. (When you put a DB in Emergency Recovery mode, you can't update it. So I basically painted myself into a corner) Couldn't undo the damage and had to restore the master from a backup. If I'd done the explicit transaction and rolled it back, I'd have seen more then one row affected and caught it before it was too late.

7. Registered User
Join Date
Oct 2003
Posts
163
Steve,

I am still going through your reply and trying to make sure that I understand the solution fully. I have to say that these corelates queries though powerful are quite some beast to digest

Anyways, I just wanted to thank you for the reply and more importantly for the EXCELLENT and really IMPRESSIVE explanation that you have given

I sincerely appreciate the time you took to write down all that explanation.

Before I pen-up, I would really like to request if you can recommend some good book(s) which can help me IMPROVE my SQL skills. I am convinced that if the basic (foundations) are clear that building on top of that will not be a problem....

Many sincere thanks once again

8. Registered User
Join Date
Jul 2002
Posts
58
Many thanks for you kind words. Don't feel bad, I found correlated subqueries oddly difficult to understand at first, though now they seem second nature. I think when you're first learning SQL, grasping these things is one of the first "Ahhhh" moments. I think it's one of the first thing most people learn that isn't immedately obvious in how it works.

The best I book I can recommend is "Inside SQL Server 2000" from Microsoft Press. Unfortunately, that's NOT a SQL-Language book, but it does have a excellent chapter on querying data. And it's no slouch as an addition to your library, since it has TONS of useful data about SQL Server.

I wish I could recommend a good book about SQL alone, but I can't. Believe me, if I could I would, but you see I started learning this stuff about ten years ago, and most of it I picked up by actually having to find solutions to oddball queries - OJT in other words. I used the web a lot, and consulted "Inside SQL" and the BOL, plus some random books on SQL, but I don't remember their titles. They were just books co-workers had lying around, so there's no guarentee they were really very good. Who knows, my progress might have been faster if I'd had better books

They best thing you can do is, if possible, get in the habit of trying to solve hard queries. Ask other DBA's and developers where you work to give you their oddballs. At first, perhaps, you won't have much luck, but it's an explosive process - as soon as you get a rep as a being any good at all, people will come to you with harder and harder problems, and before long you get very good.

Let me leave you with one small piece of advice - if you aren't already, get in the habit of using explicit JOIN clauses instead of the older method of listing all the tables in the FROM clause and mixing the join conditions with the where stuff. It will really help to make you 'think' set operations, and belive me, it makes maintenance *much* easier. It's a lot easier to look at a query you didn't write (or wrote months ago) and figure out what it's really doing, if the JOIN stuff is visibly separate from the row-selection conditions.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•