Unanswered: Group by & Having in relational algebra
hello every one, I have a question about relational algebra, I can't figure out how to translate the group by and having part of my sql query into relational algebra I have syntax examples but I don't know the correct order of all the parts
here's the query for example:
GROUP BY table.1
HAVING count (table.3)>4
so far I did this part->
Π table.1 (σ table.4=5 (table))
is it possible that that's how u do it?
table.1 g (count (table.3))>4(Π table.1 (σ table.4=5 (table)))
Okay, first off, naming your attributes as numbers just ain't Christian. I'm going to call them One, Two, Three, etc. Secondly, table.1 is a SQLism, and is peculiar to the language. The relational algebra uses the rename operator, usually ρ, if you need to rename an attribute because you have a name clash.
Π One (σ Four=5 (table))
You're right: the where clause is the equivalent of a restrict operation in the relational algebra.
Group by and having tend to be hard to translate because a. SQL's aggregate operations use a nonstandard object that users can't see directly and b. because they aren't one of the 8 basic operators, so everyone has a different idea of what they are.
But let's break down that SQL.
GROUP BY table.One
HAVING count (table.Three)>4
Pardon my notation, but when I'm writing algebra in text, I usually just write, e.g. ∏Attribute Table as Table ∏ Attribute. I find it a lot easier to follow what's going on, too, rather than (essentially) writing half the damned thing backwards and in squinty little text to make it look more mathey.
Anyway, as you suggested: table σ (Four = 5). That takes care of FROM and WHERE.
Next we bundle it into groups of whatever One contains (the GROUP BY). Then we have to add the aggregate function count(Three), and use sigma to restrict that to being > 4 (for the HAVING clause), and then finally project out One (the SELECT clause).
table σ (Four = 5) G One F count(Three) σ count(Three) > 4 ∏ One
I think your guess is pretty close, all I added was explicitly extending the table with the function count(Three). Again, I'm not sure of the semantics of your group operator, so I can't be sure that's how you'd do it.
The weird thing about SQL is that it combines the calculus style of expression things (IN subqueries and such) with the algebra.
Any IN subquery can be rewritten as a join, so a query:
SELECT * FROM Foo
WHERE bar IN
(SELECT bar FROM Qux)
is identical to:
FROM Foo INNER JOIN Qux
ON Foo.bar = Qux.bar
The inner join is matching values, right? That's precisely what "IN" means, too, which is why it translates neatly. (Performance note: often times a good optimizer will translate an IN expression directly to an equivalent join, just as I showed here, so a lot of claims that subexpressions are slow just ain't true.)
So, basically, if you can write the inner portion as relational algebra, all you need to do is use a natural join operator to handle the IN expression. Your relational algebra should really just be:
(InnerExpression ρ One -> Two) ⋈ (table2 ∏ Two)
So we take whatever the inner expression is, rename the column One to be Two, then join that with (table2 projecting out only the column Two).
The natural join operator here acts like an intersection, do you see why?