Results 1 to 15 of 15
  1. #1
    Join Date
    Jul 2009
    Posts
    6

    Unanswered: Terrible performance in a join ?

    I have a system that logs different data from 2 different type of devices (we'll call them A and B). A logs data every 5 minutes, B logs every 15 minutes.
    Once a month, the data of these 2 devices is used in a calculation. Basically, the value of each set of data from A is multiplied by the value of B.
    However, since the timestamps are different, there are 3 data rows from A for each data row from B.

    This is what the tables look like :
    A
    -
    id int not null (PK)
    name varchar(30)
    B_id int not null (indexed)

    Adata
    -----
    A_id int not null (PK)
    logdatetime timestamp not null (PK)
    data decimal(7,3)

    B
    -
    id int not null (PK)
    name varchar(30)

    Bdata
    -----
    B_id int not null (PK)
    logdatetime timestamp not null (PK)
    data decimal(7,3)


    The joins are below in the first reply.

    Both take a long time, although we're talking about only 8640 rows from Adata and 2880 from Bdata.
    Even when getting just 3 days (instead of 1 month), it takes 50 seconds !

    This is all running on MySQL Cluster 7.0.6 on 2 data nodes and 2 MySQL nodes.

    Does anyone have a clue what I'm doing wrong here ?
    Last edited by therunner; 07-18-09 at 18:41.

  2. #2
    Join Date
    Jul 2009
    Posts
    6
    Formatted it a bit better :

    Code:
    select
        Adata.logdatetime,
        Adata.data,
        Bdata.data
    from
        A join Adata
            on A.id = Adata.A_id
        left join Bdata
            on A.B_id = Bdata.B_id
            and
            floor(Adata.logdatetime / 900) * 900 = Bdata.logdatetime
    where
        A.id = 280
        and
        Adata.logdatetime between "2009-06-01 00:00:00" and "2009-06-30 23:59:59"
    Another one I tried :
    Code:
    select
        Adata.logdatetime,
        Adata.data,
        Bdata.data
    from
        A join Adata
            on A.id = Adata.A_id
        left join Bdata
            on A.B_id = Bdata.B_id
            and
            Adata.logdatetime between Bdata.logdatetime and addtime(Bdata.logdatetime, "00:15:00")
    where
        A.id = 280
        and
        Adata.logdatetime between "2009-06-01 00:00:00" and "2009-06-30 23:59:59"

  3. #3
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by therunner
    Does anyone have a clue what I'm doing wrong here ?
    To be fair I haven't a clue what you're trying to do. I attempted to read your 1st post to see what you're trying to do but gave up. It would have helped if you described what you wanted in simple English rather than pseudo code, pseudo SQL or pseudo English. However I doubt if MySQL would use any indexes on logdatetime as you're using a function on the field and not it's actual value. I doubt if that's the only issue here though.

    Mike

  4. #4
    Join Date
    Jul 2009
    Posts
    6
    First time I hear someone calling my English pseudo-English. But anyways... here's an attempt to explain things in 'simple' English :

    There's 2 sets of physical devices (pieces of hardware) :
    - Devices of type A : they log data every 5 minutes
    - Devices of type B : they log different data every 15 minutes
    Nothing I can do to change that...

    Every month, the customer wants to see a report. This report needs to list the data captured from both devices.
    The data from device A needs to be multiplied by the data from device B. But since they don't log at the same time, this is a problem.
    Therefore, I want to get this :
    Code:
    A      B
    00:00 00:00
    05:00 00:00
    10:00 00:00
    15:00 15:00
    20:00 15:00
    25:00 15:00
    30:00 30:00
    So what I want to do is link every data row from B with the 3 data rows from A that are within those 15 minutes. That way, I can multiply the data from A with the data from B.

    I haven't found any other solution (other than storing the B data rows 3 times, but that's just plain silly).

    I hope this explains what I'm trying to do a bit better. Any help would be much appreciated !

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Not all versions of MySQL will support this code because of the sub-select within it, but my first attempt would be:
    Code:
    SELECT *
       FROM A AS a
       JOIN AData AS ad
          ON (ad.a_id = a.id)
       JOIN BData AS bd
          ON (bd.b_id = a.b_id
          AND bd.logdatetime = (SELECT Max(x1.logdatetime)
             FROM AData AS z1
             WHERE  x1.a_id = a.id
                AND x1.logdatetime <= bd.logdatetime)
       WHERE  280 = a.id
          AND Adata.logdatetime BETWEEN "2009-06-01 00:00:00" AND "2009-06-30 23:59:59"
    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  6. #6
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I should of said simple English. Hopefully Pat's solution works for you but, if not, can you post the create table DDL with indexes and we'll have another look.

  7. #7
    Join Date
    Jul 2009
    Posts
    6
    I'm afraid that query doesn't work well. I stopped it after it ran for over 20 minutes.

    As requested, the DDL is below. Note that 'A' in fact is called 'channel' and 'B' is called 'pb' :
    Code:
    CREATE TABLE `channel` (
      `channel_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `channel_name` varchar(25) DEFAULT NULL,
      `pb_id` int(11) DEFAULT NULL,
      `channel_lastdownload` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`channel_id`),
      KEY `channel_pb_id` (`pb_id`)
    ) ENGINE=ndbcluster DEFAULT CHARSET=latin1;
    
    CREATE TABLE `channeldata` (
      `channeldata_datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      `channel_id` int(10) unsigned NOT NULL,
      `channeldata_data` decimal(6,3) DEFAULT NULL,
      PRIMARY KEY (`channeldata_datetime`,`channel_id`),
      KEY `channeldata_FKIndex1` (`channel_id`)
    ) ENGINE=ndbcluster DEFAULT CHARSET=latin1;
    
    CREATE TABLE `pb` (
      `pb_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `pb_name` varchar(25) DEFAULT NULL,
      `pb_lastdownload` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`pb_id`)
    ) ENGINE=ndbcluster DEFAULT CHARSET=latin1;
    
    CREATE TABLE `pbdata` (
      `pbdata_datetime` datetime NOT NULL,
      `pb_id` int(10) unsigned NOT NULL,
      `pbdata_data` decimal(7,3) DEFAULT NULL,
      PRIMARY KEY (`pbdata_datetime`,`pb_id`),
      KEY `pbdata_FKIndex1` (`pb_id`)
    ) ENGINE=ndbcluster DEFAULT CHARSET=latin1;

  8. #8
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I'm afraid I have to nip out this evening but I think I can see the first issue. The field pb_id is defined differently in the tables - this will upset the optimiser in most RDBMSs.
    • pb_id int(11) DEFAULT NULL
    • pb_id int(10) unsigned NOT NULL

    Also the pbdata_datetime field is defined as a datetime and not as a timestamp. I'd certainly start by standardising the definition of fields across tables.

  9. #9
    Join Date
    Jul 2009
    Posts
    6
    I changed pb_id to unsigned int(10) everywhere. Also changed to timestamp.
    Makes no difference in performance tough...

  10. #10
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by therunner
    I changed pb_id to unsigned int(10) everywhere. Also changed to timestamp.
    Makes no difference in performance tough...
    The solutions already given look correct but as you've already noted the join is causing you issues. Can you create some intermediary tables that allow a proper join and then run your report on these tables?

    If this doesn't improve things then seeing as the customer only wants to see the report once a month then why not just run the report overnight (perhaps off a snapshot of the live tables)?

  11. #11
    Join Date
    Jul 2009
    Posts
    6
    In fact, I think Pat's join isn't correct.
    I don't see the join of data from channeldata and pbdata working at all... there's no reference to the 15 minute difference between the data ?

  12. #12
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    be very very careful about which date datatype you choose.
    timestamp isn't the right one in my books.. you want a datetime datatype.
    using timestamp may well lead to data corruption......
    I'd rather be riding on the Tiger 800 or the Norton

  13. #13
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by therunner
    In fact, I think Pat's join isn't correct.
    I don't see the join of data from channeldata and pbdata working at all... there's no reference to the 15 minute difference between the data ?
    I guess you have the advantage of being able to run the SQL against some real tables where as we have to try and just picture the results. We're not particularly helped by your initial description of the problem, then we have one set of table/field names in your SQL and then another totally different set in your DDL. And sadly neither set of names make any sense to me. So you can't expect instant answers to your problem.

    You don't need to specify the 15 minute gap, in fact hard coding this into the SQL is probably a mistake. It's normally better to simply look for values between one timestamp and the next greatest timestamp. Also as healdem says, if the value your getting is a datetime value then you should really be storing it as a datetime value rather than declaring it as a timestamp then treating it as an integer one moment and a datetime the next.

  14. #14
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    This is my attempt at the code. I've split it into a few subselects to try and give the optomiser a fair chance at working out what it should do first. I don't use the more modern JOIN syntax but that's simply because I'm ancient.
    Code:
    select channel.channel_name, pb.pb_name, timedata.channeldata_datetime,
            timedata.pbdata_data,  channeldata.channeldata_data
    from  ( select channeldata.channeldata_datetime, channeldata.channel_id,
                   channeldata.channeldata_data, t1.data
            from   ( select pbdata.pbdata_datetime as from_time, t2.to_time, 
                            pbdata.pb_id, pbdata.pbdata_data
                     from   pbdata,
                            ( select min(b2.pbdata_datetime) as to_time
                              from   pbdata b2
                              where  b2.pb_id = pbdata.pb_id
                                     and b2.pbdata_datetime > pbdata.pbdata_datetime
                                     ) t2 
                     where  pbdata.pbdata_datetime between "2009-06-01 00:00:00" 
                                                       and "2009-06-30 23:59:59"
                            ) t1,
                    channeldata
            where   channeldata.channel_id = 280
                    and channeldata.channeldata_datetime >= t1.from_time
                    and channeldata.channeldata_datetime < t1.to_time
                    ) timedata,
          channel,
          pb
    where channel.channel_id = timedata.channel_id
          and pb.pb_id = timedata.pb_id
    No doubt I've screwed up on the field names a few times, you can only type channeldata.channeldata_data a few times before I start to loose it. Sadly after after typing all the field names in above I've now run out of energy to go on about how to name fields. I guess if you're happy to type channeldata.channeldata_datetime each time then all's fine - that's presuming you remember it's not actually a datetime.

    Mike

  15. #15
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    going back to your first shot at the SQL. How about something like:
    select
    Adata.logdatetime,
    Adata.data,
    Bdata.data
    from
    A join Adata
    on A.id = Adata.A_id
    and Adata.logdatetime between "2009-06-01 00:00:00" and "2009-06-30 23:59:59"
    left join Bdata
    on A.B_id = Bdata.B_id
    and
    Adata.logdatetime >= Bdata.logdatetime + 14 minutes
    where
    A.id = 280

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •