Results 1 to 5 of 5
  1. #1
    Join Date
    Jul 2005
    Posts
    63

    Unanswered: Datatype for Status column of a large table

    On a high-traffic website with millions of rows of data, which is more ideal for the status flag of rows?

    tiny int: 0 - inactive, 1 - active, 2 - pending (NOTE: where status > 0)
    char: I - inactive, A - active, P - pending (NOTE: where status != 'I')

    There are a couple of more flags (overall not more than 5) but which datatype is more ideal in terms of performance, readability, etc?

    If the table is large with over a million rows, do i have to create index on status column that has no more than 5 unique values? The status column is used in all queries, though.


    Further, I used to work on Oracle and Sybase before where I used WHERE clauses as under:

    where p.ProductId = invoice.ProductId


    but MySQL seems to have a different way of joining tables:

    FROM orders o INNER JOIN customer c
    ON c.customer_id = o.customer_id

    which is a bit hard for me to understand. However, I find that the Oracle-style where clause works fine in mysql. so, is it ok to use Oracle style syntax or should I learn the new mySql syntax?

    Further, could someone give me some hints on the 'driving table' on how to order the tables in the FROM clause?

    thanks,
    Last edited by screenmates; 02-19-06 at 00:06.

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by screenmates
    On a high-traffic website with millions of rows of data, which is more ideal for the status flag of rows?

    tiny int: 0 - inactive, 1 - active, 2 - pending (NOTE: where status > 0)
    char: I - inactive, A - active, P - pending (NOTE: where status != 'I')
    the tinyint

    Quote Originally Posted by screenmates
    If the table is large with over a million rows, do i have to create index on status column that has no more than 5 unique values?
    not a separate index, no, but you might want to add the status column as a secondary column on other indexes

    Quote Originally Posted by screenmates
    but MySQL seems to have a different way of joining tables:

    FROM orders o INNER JOIN customer c
    ON c.customer_id = o.customer_id
    that is actually easier to understand, not harder

    plus, it is the standard sql way of joining tables

    by the way, both oracle and sybase -- as well as all other databases -- also support the JOIN syntax

    Quote Originally Posted by screenmates
    Further, could someone give me some hints on the 'driving table' on how to order the tables in the FROM clause?
    don't worry about this, as the optimizer does this job for you

    however, if you mix INNER and OUTER joins, i have found that it helps comprehension to start the FROM clause with the table(s) that have filter conditions in the WHERE clause, and whose rows you want all of, and then attach first INNER and then LEFT OUTER JOIN tables one at a time
    Last edited by r937; 02-19-06 at 08:17.
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jul 2005
    Posts
    63
    Quote Originally Posted by r937
    the tinyint
    not a separate index, no, but you might want to add the status column as a secondary column on other indexes
    If my table has ID, Description and Status columns and the select statement is:

    select ID, Description from... where Status > 0

    will the index be on ID+Status ?

    ID is the 'primary key', Status does not have to be in the result set and Description does not have to be indexed.
    [/QUOTE]


    Quote Originally Posted by r937
    that is actually easier to understand, not harder

    plus, it is the standard sql way of joining tables

    by the way, both oracle and sybase -- as well as all other databases -- also support the JOIN syntax
    A bit confused... Did you say this is the standard sql:

    FROM orders o INNER JOIN customer c
    ON c.customer_id = o.customer_id

    Will the optimizer have a problem if I use:

    where c.customer_id = o.customer_id

    Quote Originally Posted by r937
    don't worry about this, as the optimizer does this job for you

    however, if you mix INNER and OUTER joins, i have found that it helps comprehension to start the FROM clause with the table(s) that have filter conditions in the WHERE clause, and whose rows you want all of, and then attach first INNER and then LEFT OUTER JOIN tables one at a time
    Is there anything i should do to reduce burden on the optimizer since it is a high traffic site?

    Thanks much...

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by screenmates
    If my table has ID, Description and Status columns and the select statement is:

    select ID, Description from... where Status > 0

    will the index be on ID+Status ?
    no, because that index won't be used -- you're asking for all rows with status>0 and that will definitely use a table scan


    Quote Originally Posted by screenmates
    Did you say this is the standard sql:

    FROM orders o INNER JOIN customer c
    ON c.customer_id = o.customer_id

    Will the optimizer have a problem if I use:

    where c.customer_id = o.customer_id
    yes it is, and no it won't, but you do realize that your WHERE condition will be completely redundant to the ON condition, right?

    Quote Originally Posted by screenmates
    Is there anything i should do to reduce burden on the optimizer since it is a high traffic site?
    use stored procedures (which require the optimizer only at time of declaration) instead of queries (which require the optimizer on each call)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Jul 2005
    Posts
    63
    Quote Originally Posted by r937
    no, because that index won't be used -- you're asking for all rows with status>0 and that will definitely use a table scan
    If i create an index on Status column and use:

    where Status > 0

    will it use the index? Is index required for 4-5 unique values of a column?

    Quote Originally Posted by r937
    but you do realize that your WHERE condition will be completely redundant to the ON condition, right?
    Is the standard standard sql the recommended way to go?

    Quote Originally Posted by r937
    use stored procedures (which require the optimizer only at time of declaration) instead of queries (which require the optimizer on each call)
    That's a great piece of advice!

    Thanks...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •