Page 1 of 3 123 LastLast
Results 1 to 15 of 32
  1. #1
    Join Date
    Mar 2007
    Posts
    212

    Unanswered: How to Maintain and Optimize SQL Queries with Large Tables?

    Hi, I have a table called products which consist of various supplier
    products and has the following fields:

    ID int not null primary key auto_increment
    SupplierID int not null
    ProductCode varchar(30) not null
    productName varchar(255) null
    Length varchar(20) null
    Width varchar(20) null
    Height varchar(20) null
    Weight varchar(20) null


    The table will be frequently searched online using the following fields using the LIKE where clause: productCode, ProductName, length, width, height, weight

    The table will frequently be updated with supplier products.

    Up to 300 suppliers are anticipated.

    Each supplier may have a product catalogue consisiting of 5000 to 10,000 products.

    A supplier can update their catalogue by adding, deleting products to the above table or delete the entire catalogue and insert a new catalogue in which case all the products supplied by the supplier will be deleted (This could be 10,000 records) and then insert a different set of 10,000 products.

    As you can see the table will be very large approx 1 million records and needs to be optimized for fast SQL queries whilst maintainig a balance to compensate for the frequent updates i.e large table inserts and deletes.

    Now my questions to acheive the above functionality:

    1. Which fields should I Index? How many fields does MySQl allow to be index per table?

    2. If I Index the fields on which the SQL query will take place
    how long will it take mySQL to re-index those fields with the frequent updates that I require?

    3. Is there a more efficient way to achieve the above functionality?

    4. Should I be aware of other issues for maintaining such a large table and running the SQL search query?

    Your help and advice will be much appreciated

  2. #2
    Join Date
    Nov 2003
    Posts
    2,934
    Provided Answers: 12
    Why do you store numeric values (width, height, lenght, weight) in varchar columns? The should be stored a some number datatype (preferrable an integer type)

  3. #3
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    Unless you are going to be using FULLTEXT searches your first step is to set the table to InnoDB engine type. After that you have to ask yourself what you expect your users to be searching on. i.e. productName AND productCode , just productCode, just productName? In which case there are three indexes there. As rightly stated by shammat your measurements should DEFINITELY be integers and NOT varchars. You do NOT want to be doing LIKE where clauses on varchars that should be integers. Integer comparison is quicker than char comparison. Plus it restricts and helps maintain integrity to stop idiots inserting strings where numbers should be.

  4. #4
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    I would also like to state that you have a lot of null allowed columns there. If you are going to be having a lot of products that DON'T have values for their measurements then I would recommend segregating this off into another table and using a subquery that uses an index to return the results. How efficient this will be I'm not sure but it is something that will probably want to be tested.

  5. #5
    Join Date
    Mar 2007
    Posts
    212
    Quote Originally Posted by shammat
    Why do you store numeric values (width, height, lenght, weight) in varchar columns? The should be stored a some number datatype (preferrable an integer type)
    The reason why the numeric values are stored in varchar columns is because the products have different unit types e.g. the length, width, height of a product can be in mm, cm, inches etc. With a product catalogue that has 10000 products all with different units of measurement for each itme it would be very difficult to separate the unit of measurement type from the actual unit value.

  6. #6
    Join Date
    Mar 2007
    Posts
    212
    Quote Originally Posted by aschk
    Unless you are going to be using FULLTEXT searches your first step is to set the table to InnoDB engine type. After that you have to ask yourself what you expect your users to be searching on. i.e. productName AND productCode , just productCode, just productName? In which case there are three indexes there. As rightly stated by shammat your measurements should DEFINITELY be integers and NOT varchars. You do NOT want to be doing LIKE where clauses on varchars that should be integers. Integer comparison is quicker than char comparison. Plus it restricts and helps maintain integrity to stop idiots inserting strings where numbers should be.
    The table is already set to InnoDB as it contains foreign keys - SupplierID which i should have noted in my original post and table defs - my apologies! As far as the search is concerned users can search the table using all five search criteria - ProductCode, ProductName, width, Length, height. A dynamic SQL query is generated from the user input. e.g. if the user specifies a search string in all five html input boxes for the above fields then the query simply appends a AND operator between each search clause.



    Would I therefore need to index all 5 fields? Also bear in mind that some of those fields may be null as rightly pointed out in your other post?

    My major concern is the frequent large insert/deletes that will be required on this table. How quickly can MySql reindex search a large table?

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by aschk
    As rightly stated by shammat your measurements should DEFINITELY be integers and NOT varchars.
    i disagree strongly

    VARCHAR is perfectly acceptable for the intended usage
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    Ok another option in the case that you need units would be to include a units fields for each measurement, so you have height_unit, width_unit etc. typed by an ENUM('mm','cm','metres','feet','inches').

    Reasoning : user enters 10mm or 10 mm or 10 millimeters to search on.
    what are you going to use LIKE on? 10? In that case use a case comparison = and integer types. Because 10mm, 10 mm and 10 millimeters are ALL different in a LIKE clause you will get different results for each, whereas you ACTUALLY want all those results to come back.

    n.b. this is just the way I would model things.

    As for reindexing. Obviously for each index you have every time another entry is added depending on if it is indexed or not you will have to reindex your binary tree. However I would guess that stock input is going to be relatively low in comparison to stock viewing (output/select).

    An index is needed on what you deem the most commonly searched parts. So productName is mostly likely to want an index all to itself. I would also index productCode on it's own because this will make data retrieval very efficient on single results. I'm guessing of course that each productCode is likely to be unique? If someone knows the productCode they're unlikely to put in the productName (no need). To be honest I wouldn't consider indexing the measurements at all. Problem is you're likely to get a large set of results all with the same height, or width, etc.
    Anyone else think that measurements should be indexed?

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    ENUM??? aaaaaaaarrrrrrrrgggggggghhhhhhhhh!!!!!!!!!!!
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  10. #10
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    LOL, just had a quick discussion with one of the other chaps im working with. Indexing measurement fields will be fine so long as you expect most of the values to be NOT NULL and the majority of the values are likely to be different. You may want to index (height,width,length) as one and (weight) as another.

    r937 : educate me and tell me what's wrong with ENUM? I've not been in this game long so hopefully I can get some good tips :-)

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    1. Which fields should I Index? How many fields does MySQl allow to be index per table?

    any columns used in ON clauses in joins, and in the WHERE clause

    MySQL allows 64 indexes per MyISAM table, other engines are documented in the manual

    2. If I Index the fields on which the SQL query will take place
    how long will it take mySQL to re-index those fields with the frequent updates that I require?


    practically instantaneously

    3. Is there a more efficient way to achieve the above functionality?

    no

    4. Should I be aware of other issues for maintaining such a large table and running the SQL search query?

    none that spring immediately to mind

    p.s. ozzii, i do not provide private support via PM or email
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    one reason not to use ENUM -- it's non-standard

    another -- it requires ALTER TABLE permission to change
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  13. #13
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    Very true, it is non-standard, which means migration to another platform will be hell ;-)
    Alter table is not a particularly big issue. How many measurements units are you every likely to utilise, the basic metric ones I expect which will consist of 3 (mm,cm,metre). No point in modelling that in it's own table.

    However, if using ENUM is not your gaff then model your unit lengths in a another table and map them. Just trying to get around this problem of using a LIKE clause on a non-likeable (get the pun?) set of search options.

    Why work with strings when you can work with integers...

    As for indexing it depends on the type of indexes you are using as to where the slow down will occur. If you are using unique productCodes you may want to consider a hash index instead of b-tree. Lookup is O(1), whereas the same in B-tree is O(log n) where n is number of items. On the other side though, if you're doing writes insertion using B-tree is O(log n).

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by aschk
    Alter table is not a particularly big issue.
    it most certainly is, when the user wants another value added to the list of valid values, and the project manager comes back with an estimate for twenty thousand dollars which represents the total effort of all the various people in Development, Unit Testing, Turnover, and Post-Production Quality Control, to say nothing of the scheduling problems

    live tables just do ~not~ get altered willy-nilly in any meaningfully competent information technology department

    of course, if the IT dept is just you, then go ahead, knock yourself out
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Nov 2003
    Posts
    2,934
    Provided Answers: 12
    Quote Originally Posted by r937
    i disagree strongly

    VARCHAR is perfectly acceptable for the intended usage
    But this model is not normalized then. I already see the questions: "how do I calculate the average weight?" or "what's the maximum length?" which will be very hard to answer with a VARCHAR based column that also mixes different units.

    In my opinion all sizes should be stored in mm (converting input upon storage) and all weights in g or kg. There is a base unit for each "type" (length, weight, volume, etc) and that should be used. Then all those numbers are comparable.
    user enters 10mm or 10 mm or 10 millimeters to search on.
    The input style of the unit should be handled by the front end not by the user. The user enters e.g. 10 and selects "kg" from a dropdown. The entered value is then converted to the base unit used in the table, and the comparison (e.g. greater than 100kg) can be achieved with a simple numeric comparsion. There is hardly ever a reason to use LIKE on values that are in fact numbers.

    measurement for each itme it would be very difficult to separate the unit of measurement type from the actual unit value.
    Then simply don't store the unit. "Normalize" all values on a common base unit and you are all set.

    I think storing numbers in VARCHAR columns is just as bad (and wrong) as storing DATEs in VARCHAR columns.

    Just my 0.02

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •