If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Optimal index size on VARCHAR - tricky...

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 12-27-06, 11:15
DeanC DeanC is offline
Registered User
 
Join Date: Jun 2005
Posts: 5
Optimal index size on VARCHAR - tricky...

Hi,

I have a huge InnoDB table with 40 million rows -- this is the current schema:

CREATE TABLE `titles` (
`id` mediumint(8) unsigned NOT NULL,
`title` varchar(255) collate utf8_bin NOT NULL,
PRIMARY KEY (`id`),
KEY `title` (`title`(25))
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin PACK_KEYS=1

I try to find out the 'optimal' index size for the title column. The title field is UNIQUE, but I'm wasting lots of space when using an unique index or even the field title as a primary key.

Is there any formula to calculate the optimum index length for such a field? The content itself doesn't change and I know that the average length is 25 characters. This is the reason I made the index 25 characters long.

Another solution would be to make the index just 5 characters long, having to scan a couple hundreds of rows to find out the corresponding id for a given title.

A total different thing came also into my mind: delete the title index and create the MD5 value for title and store the first (or last) 4-6 bytes. Todays CPUs are fast enough to do this, and I would save some I/Os and disk space.

I appreciate any hints and suggestions!
Reply With Quote
  #2 (permalink)  
Old 12-27-06, 11:32
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
how exactly is the index going to be used? with a LIKE predicate?

and why exactly do you have an id column? isn't it a bit superfluous?
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 12-27-06, 11:38
DeanC DeanC is offline
Registered User
 
Join Date: Jun 2005
Posts: 5
I don't use a like - just a plain

SELECT id FROM titles WHERE title = 'term'

If the term exists, I get the id for that row (for other tables) and if not, I insert a new record.

The id is needed for the other tables and joins. Basically, right now there are 3 tables with a 16 million row limit for each table, that's why I'm using mediumint to make it somewhat manageable.
Reply With Quote
  #4 (permalink)  
Old 12-27-06, 12:46
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
i think your idea of a narrow index makes sense, but i happen to be right out of 40 million row tables to test this on

so with an index on title(10) you can try this --

SELECT id FROM titles WHERE title LIKE concat(left('term',10),'%')

and then read through the results looking for equality on all 255 characters
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 12-27-06, 16:20
DeanC DeanC is offline
Registered User
 
Join Date: Jun 2005
Posts: 5
Yes, I know how to select and compare, but who tells me what the 'optimum' index size is. Right now it is title(25), your example is title(10), another extreme is title(255) or unique and on the lower end is title(4).

I'm just wondering if there is a magic formula to determine the right index length :-) (... since it is almost impossible to test this - the creation of each database takes hours...)
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On