If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > How to Maintain and Optimize SQL Queries with Large Tables?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-26-07, 05:27
ozzii ozzii is offline
Registered User
 
Join Date: Mar 2007
Posts: 194
How to Maintain and Optimize SQL Queries with Large Tables?

Hi, I have a table called products which consist of various supplier
products and has the following fields:

ID int not null primary key auto_increment
SupplierID int not null
ProductCode varchar(30) not null
productName varchar(255) null
Length varchar(20) null
Width varchar(20) null
Height varchar(20) null
Weight varchar(20) null


The table will be frequently searched online using the following fields using the LIKE where clause: productCode, ProductName, length, width, height, weight

The table will frequently be updated with supplier products.

Up to 300 suppliers are anticipated.

Each supplier may have a product catalogue consisiting of 5000 to 10,000 products.

A supplier can update their catalogue by adding, deleting products to the above table or delete the entire catalogue and insert a new catalogue in which case all the products supplied by the supplier will be deleted (This could be 10,000 records) and then insert a different set of 10,000 products.

As you can see the table will be very large approx 1 million records and needs to be optimized for fast SQL queries whilst maintainig a balance to compensate for the frequent updates i.e large table inserts and deletes.

Now my questions to acheive the above functionality:

1. Which fields should I Index? How many fields does MySQl allow to be index per table?

2. If I Index the fields on which the SQL query will take place
how long will it take mySQL to re-index those fields with the frequent updates that I require?

3. Is there a more efficient way to achieve the above functionality?

4. Should I be aware of other issues for maintaining such a large table and running the SQL search query?

Your help and advice will be much appreciated
Reply With Quote
  #2 (permalink)  
Old 03-26-07, 06:30
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,407
Why do you store numeric values (width, height, lenght, weight) in varchar columns? The should be stored a some number datatype (preferrable an integer type)
Reply With Quote
  #3 (permalink)  
Old 03-26-07, 07:07
aschk aschk is offline
Registered User
 
Join Date: Mar 2007
Location: 636f6d7075746572
Posts: 770
Unless you are going to be using FULLTEXT searches your first step is to set the table to InnoDB engine type. After that you have to ask yourself what you expect your users to be searching on. i.e. productName AND productCode , just productCode, just productName? In which case there are three indexes there. As rightly stated by shammat your measurements should DEFINITELY be integers and NOT varchars. You do NOT want to be doing LIKE where clauses on varchars that should be integers. Integer comparison is quicker than char comparison. Plus it restricts and helps maintain integrity to stop idiots inserting strings where numbers should be.
Reply With Quote
  #4 (permalink)  
Old 03-26-07, 07:10
aschk aschk is offline
Registered User
 
Join Date: Mar 2007
Location: 636f6d7075746572
Posts: 770
I would also like to state that you have a lot of null allowed columns there. If you are going to be having a lot of products that DON'T have values for their measurements then I would recommend segregating this off into another table and using a subquery that uses an index to return the results. How efficient this will be I'm not sure but it is something that will probably want to be tested.
Reply With Quote
  #5 (permalink)  
Old 03-26-07, 09:26
ozzii ozzii is offline
Registered User
 
Join Date: Mar 2007
Posts: 194
Quote:
Originally Posted by shammat
Why do you store numeric values (width, height, lenght, weight) in varchar columns? The should be stored a some number datatype (preferrable an integer type)
The reason why the numeric values are stored in varchar columns is because the products have different unit types e.g. the length, width, height of a product can be in mm, cm, inches etc. With a product catalogue that has 10000 products all with different units of measurement for each itme it would be very difficult to separate the unit of measurement type from the actual unit value.
Reply With Quote
  #6 (permalink)  
Old 03-26-07, 09:41
ozzii ozzii is offline
Registered User
 
Join Date: Mar 2007
Posts: 194
Quote:
Originally Posted by aschk
Unless you are going to be using FULLTEXT searches your first step is to set the table to InnoDB engine type. After that you have to ask yourself what you expect your users to be searching on. i.e. productName AND productCode , just productCode, just productName? In which case there are three indexes there. As rightly stated by shammat your measurements should DEFINITELY be integers and NOT varchars. You do NOT want to be doing LIKE where clauses on varchars that should be integers. Integer comparison is quicker than char comparison. Plus it restricts and helps maintain integrity to stop idiots inserting strings where numbers should be.
The table is already set to InnoDB as it contains foreign keys - SupplierID which i should have noted in my original post and table defs - my apologies! As far as the search is concerned users can search the table using all five search criteria - ProductCode, ProductName, width, Length, height. A dynamic SQL query is generated from the user input. e.g. if the user specifies a search string in all five html input boxes for the above fields then the query simply appends a AND operator between each search clause.



Would I therefore need to index all 5 fields? Also bear in mind that some of those fields may be null as rightly pointed out in your other post?

My major concern is the frequent large insert/deletes that will be required on this table. How quickly can MySql reindex search a large table?
Reply With Quote
  #7 (permalink)  
Old 03-26-07, 10:03
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Quote:
Originally Posted by aschk
As rightly stated by shammat your measurements should DEFINITELY be integers and NOT varchars.
i disagree strongly

VARCHAR is perfectly acceptable for the intended usage
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #8 (permalink)  
Old 03-26-07, 10:33
aschk aschk is offline
Registered User
 
Join Date: Mar 2007
Location: 636f6d7075746572
Posts: 770
Ok another option in the case that you need units would be to include a units fields for each measurement, so you have height_unit, width_unit etc. typed by an ENUM('mm','cm','metres','feet','inches').

Reasoning : user enters 10mm or 10 mm or 10 millimeters to search on.
what are you going to use LIKE on? 10? In that case use a case comparison = and integer types. Because 10mm, 10 mm and 10 millimeters are ALL different in a LIKE clause you will get different results for each, whereas you ACTUALLY want all those results to come back.

n.b. this is just the way I would model things.

As for reindexing. Obviously for each index you have every time another entry is added depending on if it is indexed or not you will have to reindex your binary tree. However I would guess that stock input is going to be relatively low in comparison to stock viewing (output/select).

An index is needed on what you deem the most commonly searched parts. So productName is mostly likely to want an index all to itself. I would also index productCode on it's own because this will make data retrieval very efficient on single results. I'm guessing of course that each productCode is likely to be unique? If someone knows the productCode they're unlikely to put in the productName (no need). To be honest I wouldn't consider indexing the measurements at all. Problem is you're likely to get a large set of results all with the same height, or width, etc.
Anyone else think that measurements should be indexed?
Reply With Quote
  #9 (permalink)  
Old 03-26-07, 10:35
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
ENUM??? aaaaaaaarrrrrrrrgggggggghhhhhhhhh!!!!!!!!!!!
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #10 (permalink)  
Old 03-26-07, 10:41
aschk aschk is offline
Registered User
 
Join Date: Mar 2007
Location: 636f6d7075746572
Posts: 770
LOL, just had a quick discussion with one of the other chaps im working with. Indexing measurement fields will be fine so long as you expect most of the values to be NOT NULL and the majority of the values are likely to be different. You may want to index (height,width,length) as one and (weight) as another.

r937 : educate me and tell me what's wrong with ENUM? I've not been in this game long so hopefully I can get some good tips :-)
Reply With Quote
  #11 (permalink)  
Old 03-26-07, 10:50
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
1. Which fields should I Index? How many fields does MySQl allow to be index per table?

any columns used in ON clauses in joins, and in the WHERE clause

MySQL allows 64 indexes per MyISAM table, other engines are documented in the manual

2. If I Index the fields on which the SQL query will take place
how long will it take mySQL to re-index those fields with the frequent updates that I require?


practically instantaneously

3. Is there a more efficient way to achieve the above functionality?

no

4. Should I be aware of other issues for maintaining such a large table and running the SQL search query?

none that spring immediately to mind

p.s. ozzii, i do not provide private support via PM or email
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #12 (permalink)  
Old 03-26-07, 10:51
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
one reason not to use ENUM -- it's non-standard

another -- it requires ALTER TABLE permission to change
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #13 (permalink)  
Old 03-26-07, 11:09
aschk aschk is offline
Registered User
 
Join Date: Mar 2007
Location: 636f6d7075746572
Posts: 770
Very true, it is non-standard, which means migration to another platform will be hell ;-)
Alter table is not a particularly big issue. How many measurements units are you every likely to utilise, the basic metric ones I expect which will consist of 3 (mm,cm,metre). No point in modelling that in it's own table.

However, if using ENUM is not your gaff then model your unit lengths in a another table and map them. Just trying to get around this problem of using a LIKE clause on a non-likeable (get the pun?) set of search options.

Why work with strings when you can work with integers...

As for indexing it depends on the type of indexes you are using as to where the slow down will occur. If you are using unique productCodes you may want to consider a hash index instead of b-tree. Lookup is O(1), whereas the same in B-tree is O(log n) where n is number of items. On the other side though, if you're doing writes insertion using B-tree is O(log n).
Reply With Quote
  #14 (permalink)  
Old 03-26-07, 11:23
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Quote:
Originally Posted by aschk
Alter table is not a particularly big issue.
it most certainly is, when the user wants another value added to the list of valid values, and the project manager comes back with an estimate for twenty thousand dollars which represents the total effort of all the various people in Development, Unit Testing, Turnover, and Post-Production Quality Control, to say nothing of the scheduling problems

live tables just do ~not~ get altered willy-nilly in any meaningfully competent information technology department

of course, if the IT dept is just you, then go ahead, knock yourself out
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #15 (permalink)  
Old 03-26-07, 18:02
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,407
Quote:
Originally Posted by r937
i disagree strongly

VARCHAR is perfectly acceptable for the intended usage
But this model is not normalized then. I already see the questions: "how do I calculate the average weight?" or "what's the maximum length?" which will be very hard to answer with a VARCHAR based column that also mixes different units.

In my opinion all sizes should be stored in mm (converting input upon storage) and all weights in g or kg. There is a base unit for each "type" (length, weight, volume, etc) and that should be used. Then all those numbers are comparable.
Quote:
user enters 10mm or 10 mm or 10 millimeters to search on.
The input style of the unit should be handled by the front end not by the user. The user enters e.g. 10 and selects "kg" from a dropdown. The entered value is then converted to the base unit used in the table, and the comparison (e.g. greater than 100kg) can be achieved with a simple numeric comparsion. There is hardly ever a reason to use LIKE on values that are in fact numbers.

Quote:
measurement for each itme it would be very difficult to separate the unit of measurement type from the actual unit value.
Then simply don't store the unit. "Normalize" all values on a common base unit and you are all set.

I think storing numbers in VARCHAR columns is just as bad (and wrong) as storing DATEs in VARCHAR columns.

Just my 0.02€
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On