Results 1 to 6 of 6
  1. #1
    Join Date
    May 2010
    Posts
    3

    Unanswered: Group by & Having in relational algebra

    hello every one, I have a question about relational algebra, I can't figure out how to translate the group by and having part of my sql query into relational algebra I have syntax examples but I don't know the correct order of all the parts
    here's the query for example:

    SELECT table.1
    FROM table
    WHERE table.4=5
    GROUP BY table.1
    HAVING count (table.3)>4

    so far I did this part->

    Π table.1 table.4=5 (table))

    is it possible that that's how u do it?
    table.1 g (count (table.3))>4(Π table.1 table.4=5 (table)))

  2. #2
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Okay, first off, naming your attributes as numbers just ain't Christian. I'm going to call them One, Two, Three, etc. Secondly, table.1 is a SQLism, and is peculiar to the language. The relational algebra uses the rename operator, usually ρ, if you need to rename an attribute because you have a name clash.

    Π One (σ Four=5 (table))

    You're right: the where clause is the equivalent of a restrict operation in the relational algebra.

    Group by and having tend to be hard to translate because a. SQL's aggregate operations use a nonstandard object that users can't see directly and b. because they aren't one of the 8 basic operators, so everyone has a different idea of what they are.

    But let's break down that SQL.

    Code:
    SELECT table.One
    FROM table 
    WHERE table.Four=5
    GROUP BY table.One
    HAVING count (table.Three)>4
    Pardon my notation, but when I'm writing algebra in text, I usually just write, e.g. ∏Attribute Table as Table ∏ Attribute. I find it a lot easier to follow what's going on, too, rather than (essentially) writing half the damned thing backwards and in squinty little text to make it look more mathey.

    Anyway, as you suggested: table σ (Four = 5). That takes care of FROM and WHERE.

    Next we bundle it into groups of whatever One contains (the GROUP BY). Then we have to add the aggregate function count(Three), and use sigma to restrict that to being > 4 (for the HAVING clause), and then finally project out One (the SELECT clause).

    Code:
    table σ (Four = 5) G One F count(Three) σ count(Three) > 4 ∏ One
    I think your guess is pretty close, all I added was explicitly extending the table with the function count(Three). Again, I'm not sure of the semantics of your group operator, so I can't be sure that's how you'd do it.

  3. #3
    Join Date
    May 2010
    Posts
    3
    thank you, that was very helpful!
    I'm gonna push my luck and ask one more question :P

    how can I write this in relational algebra :

    Code:
    SELECT table2.one
    FROM table2
    WHERE table2.two IN (
                             SELECT table.One
                             FROM table 
                             WHERE table.Four=5
                             GROUP BY table.One
                             HAVING count (table.Three)>4);
    the inner query is like u said but how do I add the outer part?
    do I write the outer part and in the σ-where part add the inner query?
    Last edited by uydf54; 05-19-10 at 10:04.

  4. #4
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Quote Originally Posted by uydf54 View Post
    thank you, that was very helpful!
    I'm gonna push my luck and ask one more question :P
    Nah, I love doing algebra stuff. Case in point, something I'm writing.

    The weird thing about SQL is that it combines the calculus style of expression things (IN subqueries and such) with the algebra.

    Any IN subquery can be rewritten as a join, so a query:

    Code:
    SELECT * FROM Foo 
    WHERE bar IN 
    (SELECT bar FROM Qux)
    is identical to:

    Code:
     SELECT Foo.*
     FROM Foo INNER JOIN Qux 
    ON Foo.bar = Qux.bar
    The inner join is matching values, right? That's precisely what "IN" means, too, which is why it translates neatly. (Performance note: often times a good optimizer will translate an IN expression directly to an equivalent join, just as I showed here, so a lot of claims that subexpressions are slow just ain't true.)

    So, basically, if you can write the inner portion as relational algebra, all you need to do is use a natural join operator to handle the IN expression. Your relational algebra should really just be:

    Code:
    (InnerExpression ρ One -> Two) ⋈ (table2 ∏ Two)
    So we take whatever the inner expression is, rename the column One to be Two, then join that with (table2 projecting out only the column Two).

    The natural join operator here acts like an intersection, do you see why?

  5. #5
    Join Date
    May 2010
    Posts
    3
    thanks for the answer and WOWZA on that thing you're writing! looks amazing!

  6. #6
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Thanks... it's been quite a few years of work, and I'm starting to actually get a working prototype together. It's to the point where I have to tell myself not to tear stuff down and start over.

    By a coincidence, I'm now working out my first try at an implementation of grouping.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •