| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |
|

03-18-04, 12:32
|
|
Registered User
|
|
Join Date: Mar 2004
Location: Eindhoven, Netherlands
Posts: 8
|
|
|
Performance issue retrieving latest data
|
|
Hi,
I've got this table:
CREATE TABLE Telemetry (
TagID INT NOT NULL,
DataType INT NOT NULL REFERENCES DataTypes,
SeqNr INT AUTO_INCREMENT,
Value FLOAT,
DateIn DATETIME NOT NULL,
PRIMARY KEY (TagId, DataType, SeqNr)
);
It contains telemetry data from tags that can deliver multiple kinds of data. Each time new data is received, the data is stored in this table.
What I want is pretty simple: from all telemetry data series (tagid/datatype tuples) I want to see the latest measurement. I've succeeded using a single query but at a terrible speed. Not fast enough for user-interactivity. The performance really drops as the series get larger.
I've tried LEFT JOIN queries in MySQL 4.0 and sub-select queries in MySQL 4.1-alpha (can't wait for it to be stable!)
The sub-select is so much easier to read than the JOIN queries...:
SELECT t1.tagid, t1.datatype, t1.value
FROM Telemetry t1
WHERE t1.seqnr = (
SELECT MAX(t2.seqnr)
FROM Telemetry t2
WHERE t1.tagid=t2.tagid
AND t1.datatype=t2.datatype
);
Is there a way to optimize for performance? Right now I'm using a workaround by iterating over all tuples "manually" and each time executing a query that sorts on the SeqNr descending and limiting the output to 1 row. To me, this seems ugly but it works.
However I've got more complex queries coming up where I've got to do combine multiple of these series. Doing multiple queries "manually" would result in a lot of queries (not linear with the number of series that must be combined). So really need a better solution! Help!
As a human, it's easy to see that the primary key can be used to find the entries that I'm looking for. Do a 'group by tagid, datatype' and take the one with the highest seqnr. Using that intelligence would make the query's performance linear with the number of series. Is there a way to make MySQL see things the way I do?
Thanx!
Mark
PS. This seems to me as a very common problem that other people must have dealt with. Are there something like design/implementation patterns for databases?
|
|

03-18-04, 16:50
|
|
Registered User
|
|
Join Date: Oct 2003
Location: Slovakia
Posts: 482
|
|
|
Re: Performance issue retrieving latest data
Quote:
Originally posted by markoez
Hi,
I've got this table:
CREATE TABLE Telemetry (
TagID INT NOT NULL,
DataType INT NOT NULL REFERENCES DataTypes,
SeqNr INT AUTO_INCREMENT,
Value FLOAT,
DateIn DATETIME NOT NULL,
PRIMARY KEY (TagId, DataType, SeqNr)
);
It contains telemetry data from tags that can deliver multiple kinds of data. Each time new data is received, the data is stored in this table.
What I want is pretty simple: from all telemetry data series (tagid/datatype tuples) I want to see the latest measurement. I've succeeded using a single query but at a terrible speed. Not fast enough for user-interactivity. The performance really drops as the series get larger.
I've tried LEFT JOIN queries in MySQL 4.0 and sub-select queries in MySQL 4.1-alpha (can't wait for it to be stable!)
The sub-select is so much easier to read than the JOIN queries...:
SELECT t1.tagid, t1.datatype, t1.value
FROM Telemetry t1
WHERE t1.seqnr = (
SELECT MAX(t2.seqnr)
FROM Telemetry t2
WHERE t1.tagid=t2.tagid
AND t1.datatype=t2.datatype
);
Is there a way to optimize for performance? Right now I'm using a workaround by iterating over all tuples "manually" and each time executing a query that sorts on the SeqNr descending and limiting the output to 1 row. To me, this seems ugly but it works.
However I've got more complex queries coming up where I've got to do combine multiple of these series. Doing multiple queries "manually" would result in a lot of queries (not linear with the number of series that must be combined). So really need a better solution! Help!
As a human, it's easy to see that the primary key can be used to find the entries that I'm looking for. Do a 'group by tagid, datatype' and take the one with the highest seqnr. Using that intelligence would make the query's performance linear with the number of series. Is there a way to make MySQL see things the way I do?
Thanx!
Mark
PS. This seems to me as a very common problem that other people must have dealt with. Are there something like design/implementation patterns for databases?
|
"The sub-select is so much easier to read than the JOIN queries...:" - Yes but also could be slower...
What the explain plan says?
|
Last edited by ika; 03-18-04 at 16:52.
|

03-19-04, 03:31
|
|
Registered User
|
|
Join Date: Mar 2004
Location: Eindhoven, Netherlands
Posts: 8
|
|
|
Re: Performance issue retrieving latest data
|
|
Quote:
Originally posted by ika
"The sub-select is so much easier to read than the JOIN queries...:" - Yes but also could be slower...
What the explain plan says?
|
I'm sorry, I don't understand your question. What do you mean with "explain plan"?
|
|

03-19-04, 14:14
|
|
Registered User
|
|
Join Date: Oct 2003
Location: Slovakia
Posts: 482
|
|
|
Re: Performance issue retrieving latest data
|

03-19-04, 14:50
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
|
|
This may be a silly question, but do you have an index on the received column, or better yet (id, received)?
-PatP
|
|

03-20-04, 08:08
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
|
Re: Performance issue retrieving latest data
Quote:
Originally posted by markoez
As a human, it's easy to see that the primary key can be used to find the entries that I'm looking for. Do a 'group by tagid, datatype' and take the one with the highest seqnr. Using that intelligence would make the query's performance linear with the number of series. Is there a way to make MySQL see things the way I do?
|
yes, the "group by" you describe is exactly how you solve this problem in the join approach
PHP Code:
select t1.TagID
, t1.DataType
, t1.Value
from Telemetry t1
inner
join Telemetry t2
on t1.TagID = t2.TagID
and t1.DataType = t2.DataType
group
by t1.TagID
, t1.DataType
, t1.Value
having t1.SeqNbr = max(t2.SeqNbr)
once you see how this works, you will immediately recognize that it is the same as the subquery
now, whether you find the join approach or the subquery approach easier to understand, is entirely up to you...

|
|

03-22-04, 03:39
|
|
Registered User
|
|
Join Date: Mar 2004
Location: Eindhoven, Netherlands
Posts: 8
|
|
Re: Performance issue retrieving latest data
Thanks everybody and especially r937. This is really the query I was looking for. I had found the GROUP BY part myself but didn't know how to extract the data from the table. Obviously, the JOIN and HAVING parts do the trick.
And Ika, I'll definitely look at the articles you mentioned.
Thanks all!
|
|

03-22-04, 05:36
|
|
Registered User
|
|
Join Date: Mar 2004
Location: Eindhoven, Netherlands
Posts: 8
|
|
|
Re: Performance issue retrieving latest data
Looking at the query I was convinced it was going to work. But I was too optimistic.  I get errors I don't know how to solve. I had to rework the query somewhat (at least I thought I had to). My query:
PHP Code:
select t2.TagID
, t2.DataType
, t2.Value
from Telemetry t1
inner
join Telemetry t2
on t1.TagID = t2.TagID
and t1.DataType = t2.DataType
group
by t1.TagID
, t1.DataType
having max(t1.SeqNr) = t2.SeqNr
Running this query I get: 'ERROR 1054: Unknown column 't2.SeqNr' in 'having clause''. And yes, there really is a column named SeqNr. When I change the HAVING clause into 'where t2.SeqNr = max(t1.SeqNr)' and put it before the 'group by' then I get the error 'ERROR 1111: Invalid use of group function'...
Help! How do I link tables t1 and t2 together using max(t1.SeqNr)?
|
|

03-22-04, 06:22
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
Code:
...
group
by t1.TagID
, t1.DataType
, t1.Value
having t1.SeqNr = max(t2.SeqNr)
first, the GROUP BY must contain all the columns of the SELECT
second, since you are grouping by t1 columns, you must apply the aggegate functions only to t2 columns
|
|

03-22-04, 07:49
|
|
Registered User
|
|
Join Date: Mar 2004
Location: Eindhoven, Netherlands
Posts: 8
|
|
Quote:
|
first, the GROUP BY must contain all the columns of the SELECT
|
I don't understand that. A telemetry series consists of telemetry values that are received from a certain tag with a certain datatype. Each measurement is made unique by the sequence number. I don't understand why the column value must be used in the group. For instance, if a telemetry series follows the temperature of your fridge it could follow a sinus-like pattern between 4 degrees Celcius and 7 degrees Celcius. The group by part should result in the latest temperature that was measured (the highest SeqNr of that series) and not in the highest sequence numbers where the temperature was last 4 degrees, 5 degrees, 6 degrees and 7 degrees. As I see it, that's what adding the Value culumn to the group by would do.
Quote:
|
second, since you are grouping by t1 columns, you must apply the aggegate functions only to t2 columns
|
That was a typo. I definitely tried 'having t2.SeqNr = max(t1.SeqNr)'. It resulted in the same 'ERROR 1054: Unknown column 't2.SeqNr' in 'having clause'' error. Could it be that t2 may not be used in the HAVING clause at all? Can this be MySQL specific?
I do think I understand how GROUP BY/HAVING can be used but I've only seen very simple use of it. Do you know a web-resource where I can read up on more advance use of this functionality?
Maybe I don't see the strategy of your query correctly. This is how I see it. First, using the GROUP BY and MAX functionality we retrieve the highest sequence numbers of all telemetry series. Second, the obtained result is JOINed with another instance of the table to retrieve the telemetry values.
|
|

03-22-04, 08:24
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
in a GROUP BY query, the SELECT list may contain column expressions and/or aggregate expressions
the GROUP BY must include all non-aggregate columns in the SELECT list
mysql does not enforce this but let's not go there for now,m that's a different thread and you are not ready for it
so if you have SELECT A,B,SUM(C) then you must GROUP BY A,B
in your case, the query i gave you joins each row of the table to every other row with the same TagID and DataType
the joining operation is actually performed before the grouping
in order to keep column names separate, you need aliases
it may help to visualize this as two "copies" of the table being joined, a "t1" copy and a "t2" copy
for each grouping (i.e. each row of the t1 copy of the table), the highest SeqNbr of all the rows from the t2 copy is selected in the HAVING clause
note you can only apply aggregate functions in the HAVING clause to the t2 columns
helps?
|
|

03-22-04, 10:18
|
|
Registered User
|
|
Join Date: Mar 2004
Location: Eindhoven, Netherlands
Posts: 8
|
|
Yes, it does! Thank you very much. I think I've got a much better understanding of GROUP BY queries now and the way they can be used.  I very much appriciate your time and patience with me.
But I'm not where I want to be yet. Your query was near perfect. As I mentioned earlier, your query returned the error that t1.SeqNr was not known in the HAVING clause.
With my renewed insight I tried to fix the problem. I think I got rid of the error by adding t1.SeqNr to the SELECT and GROUP BY clause:
Code:
select t1.TagID
, t1.DataType
, t1.SeqNr
, t1.Value
from Telemetry t1
inner
join Telemetry t2
on t1.TagID = t2.TagID
and t1.DataType = t2.DataType
group
by t1.TagID
, t1.DataType
, t1.Value
, t1.SeqNr
having t1.SeqNr = MAX(t2.SeqNr)
This query seems to give me the answers I want (although I don't need the t1.SeqNr culumn but alas). However the reason I started this thread was that I had a performance problem and with this query I still do (having only an index on (TagId, Datatype, SeqNr) which should do the trick).
I've got a test table with 2 tags with each 2 datatypes with each a 1000 values (and sequence numbers of course). Running the query above takes about 22 seconds. Really
Doing this query:
Code:
select t1.TagID, t1.DataType, max(SeqNr)
from Telemetry t1
group by t1.TagID, t1.DataType
takes about 0.02 seconds. It gives the key to the values I need to retrieve. Creating another query that uses these results and returns the values should take about the same amount of time. I had hoped that it would be possible to create a single query that would do the same things in the same amount of time.
Do you still think it is possible to get that kind of performance with a single query?
|
|

03-22-04, 11:05
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
you are right, SeqNbr did belong in both the SELECT and GROUP BY
see, that's why i always advise people to test the sql i come up with, because i can't
as far as performance goes, hey, if subqueries were allowed, you could time them and see
there are several ways to "work around" not being able to write subqueries
your simple query, the one that takes 0.02 seconds, all you need to do is join those results back to the same table, right? that's exactly how a subquery works under the covers, no?
okay, so one workaround is to store those results as a temporary table, and run a second query
just as you suggested

|
|

03-22-04, 11:30
|
|
Registered User
|
|
Join Date: Mar 2004
Location: Eindhoven, Netherlands
Posts: 8
|
|
Quote:
|
your simple query, the one that takes 0.02 seconds, all you need to do is join those results back to the same table, right?
|
True.
Quote:
|
that's exactly how a subquery works under the covers, no?
|
You would expect that wouldn't you? Well, in the current MySQL 4.1-alpha this is not the case.  At the start of the thread I gave my subquery-query. It is has the same performance issues... But maybe the optimization engine isn't optimal yet. The way to do something like this with a sub-query would be doing something like 'someIdField IN (SELECT ...)'. However when the key is not a single column, I don't know of a way to match multiple columns from the result of a single subquery (in my case TagId, DataType and SeqNr).
Quote:
okay, so one workaround is to store those results as a temporary table, and run a second query
just as you suggested
|
Always nice to see a guru agree with you.  But really, since I'm quite the novice (as far as SQL goes) and the problem seems to trivial. I'ld expect that thousands of people to have ran into this problem before.
I want to show you a totally different solution somebody else suggested to me. It's not a generic solution and I can't use it in this manner since my Value is a float but I guess it can be used in a couple of cases:
Code:
SELECT TagID, DataType, MAX(SeqNr), MAX(1000000*SeqNr+Value)%1000000
FROM Telemetry
GROUP BY TagID, DataType
|
|

03-22-04, 11:36
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
oh that's just ugly
(there's a "max concat" trick documented on the mysql site, maybe this came from there)
as far as the subselect syntax goes, you said 'someIdField IN (SELECT ...)' but that's not the only way to do it
you could use a correlated subquery, which would allow you to join on multiple columns
PHP Code:
select TagID
, DataType
, SeqNr
, Value
from Telemetry t1
where SeqNr =
( select max(SeqNr)
from Telemetry
where TagID = t1.TagID
and DataType = t1.DataType )
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|