Results 1 to 6 of 6
  1. #1
    Join Date
    Aug 2003
    Location
    Reston, VA
    Posts
    59

    Unanswered: MySQL Column Indexes

    Hi All,

    A couple of quick questions....

    1.) I am reading from a data source then writing the contents of that source to a table (or tables in my DB). As part of the process I need to check that the DB to make sure the record does not exists.

    Should I .....

    a.) Perform a SELECT to check for existance then an INSERT if the record does not exist.

    or....

    b.) simply put a UNIQUE INDEX on the field that represents the records unique value and just perform an INSERT...the record would simply would not update.......seems to save time

    Also, I have several tables where I could end up querying with a WHERE clause on almost any value. What are the PROS/CONS of adding many indexes? I've seen great performance increase through indexing.....now I have a number of additional queries that I'd like to do on indexed fields.

    Thoughts are appreciated.....

    Many Thanks!!!!!

  2. #2
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1

    Re: MySQL Column Indexes

    Originally posted by DrSmartman
    Hi All,

    A couple of quick questions....

    1.) I am reading from a data source then writing the contents of that source to a table (or tables in my DB). As part of the process I need to check that the DB to make sure the record does not exists.

    Should I .....

    a.) Perform a SELECT to check for existance then an INSERT if the record does not exist.

    or....

    b.) simply put a UNIQUE INDEX on the field that represents the records unique value and just perform an INSERT...the record would simply would not update.......seems to save time

    Also, I have several tables where I could end up querying with a WHERE clause on almost any value. What are the PROS/CONS of adding many indexes? I've seen great performance increase through indexing.....now I have a number of additional queries that I'd like to do on indexed fields.

    Thoughts are appreciated.....

    Many Thanks!!!!!
    Use method B.

    Many indexes slow the UPDATEs and INSERTs. But if most of what you do is SELECTs and few INSERTs/UPDATEs, then it could be worth it. On the other hand, if you spend most of your time updating and inserting, it could affect performance. As always, it also depends on how many records you have in those tables, how big are your indexes and how much RAM you have on the server.

  3. #3
    Join Date
    Aug 2003
    Location
    Reston, VA
    Posts
    59

    MySQL Indexes

    Thanks for the response.....

    Well, the DB is both INSERT/UPDATE and SELECT intensive. Let me attempt to explain....

    1.) I am doing a conversion process of one DB design to another (very, very different). In addition to the conversion, we have a process to ingest new objects into our database. This works by our users putting files/folders into our file system. We 'ingest' these new objects by running a script that....a.) checks for existance of the object entry in the DB (SELECT) then b.) INSERTs the new object into the DB (assuming the object did not exist).

    2.) The system is a research tool that will, more often be queried for specific objects (SELECTS).

    INSERTS and UPDATES will occur daily though not in heavy volume....however, they will be in heavy volume when the 'ingest' script is run. In the long run, I'm thinking the end-users (SELECT) are more important than the ingest process/batch job (INSERT) in the long run....however, any tips on efficiency are appreciated.....

    Below are the fields in question for the OBJECTS TABLE which, along with an EVENTS (tracks ingest, changes, notes, et. al.) table, are the core of the system in terms of use and volume. We are looking at over 100k OBJECTS and double, triple or more events.

    OBJECT_ID (PK)
    OBJECT_TITLE (FULLTEXT)
    OBJECT_DATE_ENTERED (?)
    OBJECT_DESC (FULLTEXT)
    OBJECT_RAW_PATH (UNIQUE INDEX)
    OBJECT_POSTING (?)
    OBJECT_COPYRIGHT (?)
    OBJECT_SEARCHABLE_TEXT (FULLTEXT)
    MEDIA_TYPE_ID (?)
    STATUS_ID (?)
    CONSENT_ID (?)
    SOURCE_ID (?)

    Okay....so a couple of follow-up questions....

    1.) Do I need to do a regular index on the FULLTEXT fields? They'll mainly be used for FULLTEXT searches and rarely, if never, for a WHERE search.

    2.) The last four columns of my table act essential (although referential integrity is not supported in MySQL) as foreign keys. It is very likely that a search could be broken down by these in a WHERE clause. Should I add INDEXES?

    3.) Lastly, OBJECT_POSTING/COPYRIGHT (enum value) will definately be used with WHERE. Add an index? Also, OBJECT_DATE_ENTERED?

    I guess I'm just wondering at what point to indexes become a bad thing? I know I'll have to weight the cost benefit for this system (I think I did....SELECT speed more important than INSERT/UPDATE)....but when, if ever, do SELECTs slow due to indexes?

    Anyway, I'm babbling (badly)....thanks for the comments. This forum is a great exchange and learning tool. I appreciate the expert opinion(s).

  4. #4
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1

    Re: MySQL Indexes

    Originally posted by DrSmartman
    Thanks for the response.....

    Well, the DB is both INSERT/UPDATE and SELECT intensive. Let me attempt to explain....

    1.) I am doing a conversion process of one DB design to another (very, very different). In addition to the conversion, we have a process to ingest new objects into our database. This works by our users putting files/folders into our file system. We 'ingest' these new objects by running a script that....a.) checks for existance of the object entry in the DB (SELECT) then b.) INSERTs the new object into the DB (assuming the object did not exist).

    2.) The system is a research tool that will, more often be queried for specific objects (SELECTS).

    INSERTS and UPDATES will occur daily though not in heavy volume....however, they will be in heavy volume when the 'ingest' script is run. In the long run, I'm thinking the end-users (SELECT) are more important than the ingest process/batch job (INSERT) in the long run....however, any tips on efficiency are appreciated.....

    Below are the fields in question for the OBJECTS TABLE which, along with an EVENTS (tracks ingest, changes, notes, et. al.) table, are the core of the system in terms of use and volume. We are looking at over 100k OBJECTS and double, triple or more events.

    OBJECT_ID (PK)
    OBJECT_TITLE (FULLTEXT)
    OBJECT_DATE_ENTERED (?)
    OBJECT_DESC (FULLTEXT)
    OBJECT_RAW_PATH (UNIQUE INDEX)
    OBJECT_POSTING (?)
    OBJECT_COPYRIGHT (?)
    OBJECT_SEARCHABLE_TEXT (FULLTEXT)
    MEDIA_TYPE_ID (?)
    STATUS_ID (?)
    CONSENT_ID (?)
    SOURCE_ID (?)

    Okay....so a couple of follow-up questions....

    1.) Do I need to do a regular index on the FULLTEXT fields? They'll mainly be used for FULLTEXT searches and rarely, if never, for a WHERE search.

    2.) The last four columns of my table act essential (although referential integrity is not supported in MySQL) as foreign keys. It is very likely that a search could be broken down by these in a WHERE clause. Should I add INDEXES?

    3.) Lastly, OBJECT_POSTING/COPYRIGHT (enum value) will definately be used with WHERE. Add an index? Also, OBJECT_DATE_ENTERED?

    I guess I'm just wondering at what point to indexes become a bad thing? I know I'll have to weight the cost benefit for this system (I think I did....SELECT speed more important than INSERT/UPDATE)....but when, if ever, do SELECTs slow due to indexes?

    Anyway, I'm babbling (badly)....thanks for the comments. This forum is a great exchange and learning tool. I appreciate the expert opinion(s).
    Well, you could also import your data into a "workzone" type of table. From there you could just query and DELETE rows that are not needed or present into the destination table unsing plain SQL (JOIN or NOT IN with sub-select) instead of checking existence for every single record. That way you won't slow down the "real" table with tons of requests to check existence of records while you import.

    You could also format your files and use mysqlimport to load the data. This tool is usually pretty fast.

    You'll need to create a FULLTEXT index for the FULLTEXT fields.
    Create indexes for the MEDIA_TYPE_ID, STATUS_ID, CONSENT_ID and SOURCE_ID. I don't know the distribution of your data in those columns but try to create indexes on columns with good selectivity. Creating an index on a columns whose values are always 'Y' and 'N' doesn't do any good!

    I'd say start with the minimum and add indexes as you need them. Get a feel of how your application responds in terms of searches and IMPORTS to detect where it could be worth adding/dropping an index. Usually, 20% of the queries do 80% of the job...

    You could also generate fake data to stress test the system and experiment. Generate a lot more than what you'll actually be dealing with. That way, you won't have surprises when your database will be filled with 5 years worth of data! And with lots of data, you'll be able to track whcih index is useful and which aren't... which sometimes doesn't show up as clearly when you don't have enough volume to test with. For instance, I never had any problem using a COUNT(*) with InnoDb because my tables were so small... The day I tested my app with a 60 million records table, COUNT(*) was taking ** 9 minutes ** to return... If I hadn't test my app with that volume, I would have never noticed the problem!

    Here are some links related to FULLTEXT that might be useful:
    http://www.mysql.com/doc/en/Fulltext_Search.html
    http://www.mysql.com/doc/en/Fulltext_Fine-tuning.html

    Use --log-slow-queries to detect queries that might be optimized. See:
    http://www.mysql.com/doc/en/Slow_query_log.html

    As I said, add indexes one at a time to see the effect on the whole system. And try to add indexes on columns with selectivity. Columns where selectivity is at least 80% (meaning 80% of the values of that column are unique) otherwise MySQL might do a full table scan anyway... If those columns are FK, then add an index anyway... Might help whenever you have joins...

    Carefully read the server variables (e.g. buffer key size, etc). Those things can make MySQL crawl or fly...

    Your best bet? Generate data and experiment. Every app is different, every database has it's "features" and I don't believe in "one solution fits all". And it won't be time to experiment the day you'll have problems on a production server...

    Hope this helps!

  5. #5
    Join Date
    Aug 2003
    Location
    Reston, VA
    Posts
    59

    MySQL Indexes con't

    bstjean,

    thanks so much for taking the time to lend some comments and advice.

    1.) unfortunately, i do not have sub-queries in my implementation of MySQL (3.23.x). So, I don't think I can run the NOT IN query you mentioned.

    If you were to try to INSERT something ONLY if it did NOT exist, what would the query look like? At least, in the way you propose....

    I'm thinking...'SELECT * FROM table WHERE table_id NOT IN (SELECT * FROM TABLE WHERE table_id)

    I'm confusing myself here....also, how to put a WHERE sub-query in an INSERT?

    2.) Point taken about testing procedure. You're right...strategy can you take you far, but good old hands on testing with test data is the best way. We are over two months from deployment/conversion, so this is an integral part of what we're doing. I'm probably going to code the application, then tweak the queries for each user function as needed.

    3.) I'm thinking of moving to PostgreSQL to handle some advanced functionality. I actually just posted in that forum. Although, the topic of index is generic for DBs...

    4.) I will index all FKs. It will be a standard way of breaking down records. Also, why would a 'yes/no' field be bad to index?

    5.) The part about the server variable tweak is a good point and will be checked out.\

    6.) Sincerest thanks for the help. I've worked DBs before, but am getting invloved in more decision-making, configuration, design and application development that I have before. Yourself and the rest of the user community are such an integral part of gaining the necesary skills.....Have a good one!

  6. #6
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1

    Re: MySQL Indexes con't

    Originally posted by DrSmartman
    bstjean,

    thanks so much for taking the time to lend some comments and advice.

    1.) unfortunately, i do not have sub-queries in my implementation of MySQL (3.23.x). So, I don't think I can run the NOT IN query you mentioned.

    If you were to try to INSERT something ONLY if it did NOT exist, what would the query look like? At least, in the way you propose....

    I'm thinking...'SELECT * FROM table WHERE table_id NOT IN (SELECT * FROM TABLE WHERE table_id)

    I'm confusing myself here....also, how to put a WHERE sub-query in an INSERT?

    2.) Point taken about testing procedure. You're right...strategy can you take you far, but good old hands on testing with test data is the best way. We are over two months from deployment/conversion, so this is an integral part of what we're doing. I'm probably going to code the application, then tweak the queries for each user function as needed.

    3.) I'm thinking of moving to PostgreSQL to handle some advanced functionality. I actually just posted in that forum. Although, the topic of index is generic for DBs...

    4.) I will index all FKs. It will be a standard way of breaking down records. Also, why would a 'yes/no' field be bad to index?

    5.) The part about the server variable tweak is a good point and will be checked out.\

    6.) Sincerest thanks for the help. I've worked DBs before, but am getting invloved in more decision-making, configuration, design and application development that I have before. Yourself and the rest of the user community are such an integral part of gaining the necesary skills.....Have a good one!
    1) Migrating to MySQL 4.x is really not a problem. Just use mysqldump to export your database, install MySQL 4.x and import... Seriously, works like a charm!

    To insert in a table and check for non-existence, I'd do just like you wrote:

    SELECT *
    FROM work_table
    WHERE condition1 AND
    condition2 AND
    condition3 AND
    table_id NOT IN (
    SELECT table_id FROM destination_table WHERE condition4)

    Having a real example of your purge and import process would definitely help here!

    It could also be done via LEFT/RIGHT joins...

    2) Or just let the log-queries-log do the job for you... Let the app run and have a look at the log. Once you've made changes to the indexes/tables, delete the log. Goto step 2 and repeat as needed!

    3) There's not much that PostGreSQL can do and MySQL can't! No need for that! What "advanced" functionnalities do you need?

    4) Yes/No index are no useful as index because the are not selective enough. For example, if you have 400000 rows with the column set to 'Y' and 600000 to 'N', it's faster for MySQL to do a full table scan than to seek 400000 times for the value 'Y' while using the index. When the index doesn't limit the result set to less than 30% of the rows in a table, MySQL prefers to do a table scan rather than lookup-and-seek with the index... It's usually *way* faster to do a table scan in that case. This is an optimization on almsot all RDBMS.

    6) MySQL has a few great mailing lists and pretty good support on IRC channels. Lots of documentation on MySQL and associated tools/languages and a very helpful community. Compared to PostGRESQL, MySQL community is way more active, supportive and quick to respond than the PostGRESQL community.

    By the way, I'm a DB/2 fan and I am not working for or related to MySQL in any way! I just think it's a great product/community!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •