Results 1 to 4 of 4
  1. #1
    Join Date
    Apr 2016
    Posts
    1

    Unanswered: Very slow performance processing 40M+ records MySQL tables

    Hi guys,
    I am processing over 40M records on a MySQL database. The scenario is as follows:

    Given 2 tables with same structure containing over 40M price info records:

    Table 1

    product_id price date
    101 5.7 2016/1/1
    102 11.6 2016/1/1
    104 8 2016/1/1
    … … …
    Table 2

    product_id price date
    101 5.9 2016/1/2
    103 20.3 2016/1/2
    104 8 2016/1/2
    … … …
    I'm looking to find out how many product_id's exist on both tables and I'm using the below queries to search:

    SELECT count(*) FROM t1 a,t2 b where a.product_id=b.product_id ;

    SELECT count(*) FROM t1 a,t2 b on a.product_id=b.product_id ;

    It takes over half an hour to get the results, is there any way to improve the performance?

  2. #2
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    product_ID is indexed in both tables?
    don't knolw if it will make a difference but if all you want is the count of common prodcut ID's then narrowing the columns countred may make sense

    Code:
    select count(a.product_ID),a.product_id from t1 as a
    join t2 as b on t1.product_id = t2.product_id
    group by a.product_id
    ..the group by may be optional if you dont' need unique product_id's or if the product_id is already unique
    I'd rather be riding on the Tiger 800 or the Norton

  3. #3
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    I would think an exists subselect might perform slightly better. I say this because not knowing your unique constraints I am guessing a particular product id could exist in both tables multiple times. Also, you are just wanting a count of product ids that exist in the other, not how many times it exists in both.
    So something along the lines of:
    Code:
    SELECT count(*) FROM t1 a
    where exists (select 1 from t2 b
                       where a.product_id=b.product_id) ;
    Dave

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I would also index the product_id column on the table with fewer rows. If you don't have an index on the column, adding it should help performance quite a bit.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •