Thread: Group by & Having in relational algebra

1. Registered User
Join Date
May 2010
Posts
3

Unanswered: Group by & Having in relational algebra

hello every one, I have a question about relational algebra, I can't figure out how to translate the group by and having part of my sql query into relational algebra I have syntax examples but I don't know the correct order of all the parts
here's the query for example:

SELECT table.1
FROM table
WHERE table.4=5
GROUP BY table.1
HAVING count (table.3)>4

so far I did this part->

Π table.1 table.4=5 (table))

is it possible that that's how u do it?
table.1 g (count (table.3))>4(Π table.1 table.4=5 (table)))

2. Registered User
Join Date
Oct 2002
Location
Posts
697
Okay, first off, naming your attributes as numbers just ain't Christian. I'm going to call them One, Two, Three, etc. Secondly, table.1 is a SQLism, and is peculiar to the language. The relational algebra uses the rename operator, usually ρ, if you need to rename an attribute because you have a name clash.

Π One (σ Four=5 (table))

You're right: the where clause is the equivalent of a restrict operation in the relational algebra.

Group by and having tend to be hard to translate because a. SQL's aggregate operations use a nonstandard object that users can't see directly and b. because they aren't one of the 8 basic operators, so everyone has a different idea of what they are.

But let's break down that SQL.

Code:
```SELECT table.One
FROM table
WHERE table.Four=5
GROUP BY table.One
HAVING count (table.Three)>4```
Pardon my notation, but when I'm writing algebra in text, I usually just write, e.g. ∏Attribute Table as Table ∏ Attribute. I find it a lot easier to follow what's going on, too, rather than (essentially) writing half the damned thing backwards and in squinty little text to make it look more mathey.

Anyway, as you suggested: table σ (Four = 5). That takes care of FROM and WHERE.

Next we bundle it into groups of whatever One contains (the GROUP BY). Then we have to add the aggregate function count(Three), and use sigma to restrict that to being > 4 (for the HAVING clause), and then finally project out One (the SELECT clause).

Code:
`table σ (Four = 5) G One F count(Three) σ count(Three) > 4 ∏ One`
I think your guess is pretty close, all I added was explicitly extending the table with the function count(Three). Again, I'm not sure of the semantics of your group operator, so I can't be sure that's how you'd do it.

3. Registered User
Join Date
May 2010
Posts
3
thank you, that was very helpful!
I'm gonna push my luck and ask one more question :P

how can I write this in relational algebra :

Code:
```SELECT table2.one
FROM table2
WHERE table2.two IN (
SELECT table.One
FROM table
WHERE table.Four=5
GROUP BY table.One
HAVING count (table.Three)>4);```
the inner query is like u said but how do I add the outer part?
do I write the outer part and in the σ-where part add the inner query?
Last edited by uydf54; 05-19-10 at 09:04.

4. Registered User
Join Date
Oct 2002
Location
Posts
697
Originally Posted by uydf54
thank you, that was very helpful!
I'm gonna push my luck and ask one more question :P
Nah, I love doing algebra stuff. Case in point, something I'm writing.

The weird thing about SQL is that it combines the calculus style of expression things (IN subqueries and such) with the algebra.

Any IN subquery can be rewritten as a join, so a query:

Code:
```SELECT * FROM Foo
WHERE bar IN
(SELECT bar FROM Qux)```
is identical to:

Code:
``` SELECT Foo.*
FROM Foo INNER JOIN Qux
ON Foo.bar = Qux.bar```
The inner join is matching values, right? That's precisely what "IN" means, too, which is why it translates neatly. (Performance note: often times a good optimizer will translate an IN expression directly to an equivalent join, just as I showed here, so a lot of claims that subexpressions are slow just ain't true.)

So, basically, if you can write the inner portion as relational algebra, all you need to do is use a natural join operator to handle the IN expression. Your relational algebra should really just be:

Code:
`(InnerExpression ρ One -> Two) ⋈ (table2 ∏ Two)`
So we take whatever the inner expression is, rename the column One to be Two, then join that with (table2 projecting out only the column Two).

The natural join operator here acts like an intersection, do you see why?

5. Registered User
Join Date
May 2010
Posts
3
thanks for the answer and WOWZA on that thing you're writing! looks amazing!

6. Registered User
Join Date
Oct 2002
Location