If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Simple moving avg (SMA)

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-01-10, 22:14
Diesel Dud Diesel Dud is offline
Registered User
 
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
Simple moving avg (SMA)

Hello,

I’d like to calculate a simple moving average (SMA).
Processing time appears to be long. I’m hoping someone can help me with an efficient syntax.

E.g. tblTemp
ID, Date, Temperature
1, 2010-01-01, 5
2, 2010-01-02, 6
3, 2010-01-03, 5
4, 2010-01-04, 7
5, 2010-01-05, 2
6, 2010-01-06, 1
7, 2010-01-07, 6
8, 2010-01-08, 7
9, 2010-01-09, 9
10, 2010-01-10, 9

A 5-day SMA would require using the last 5 days of temperature data to calculate the average.
I.e. for period 5 use ID=1 to 5, period 6 use ID=2 to 6, etc...

***

This is an oversimplified example; my table will actually contain over 30,000 records and I would like to calculate SMAs for, say, 60, 120, 250 day periods.

Right now, running a SMA=150 for 500 records takes 10 minutes

Can someone help??
Reply With Quote
  #2 (permalink)  
Old 03-02-10, 04:10
healdem healdem is offline
Jaded Developer
 
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
instead of calculating a SMA have you considered using exponential smoothing instead. the downside is that you'd probably store the smoothed value.

might need a bit of juggling to get the correct smoothing factor
exponential smoothing moving average - Google Search
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
Reply With Quote
  #3 (permalink)  
Old 03-02-10, 04:34
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by Diesel Dud
Right now, running a SMA=150 for 500 records takes 10 minutes
It shouldn't take 10 minutes to run that type of query so just a few questions ...
  • Is the id field a foreign key to another table or is it just a unique id for the table?
  • What are the indexes on the table?
  • Does date appear as the first field in any of these indexes or are you just using id or id and then date?
  • Can we see the SQL you are using and the table creation (with indexes) please?
Mike
Reply With Quote
  #4 (permalink)  
Old 03-02-10, 08:36
Diesel Dud Diesel Dud is offline
Registered User
 
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
Quote:
Originally Posted by healdem View Post
instead of calculating a SMA have you considered using exponential smoothing instead. the downside is that you'd probably store the smoothed value.

might need a bit of juggling to get the correct smoothing factor
exponential smoothing moving average - Google Search
i am open to using an EMA, although i would suspect it would require more processing
Reply With Quote
  #5 (permalink)  
Old 03-02-10, 09:52
Diesel Dud Diesel Dud is offline
Registered User
 
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
Quote:
Originally Posted by mike_bike_kite View Post
It shouldn't take 10 minutes to run that type of query so just a few questions ...
  • Is the id field a foreign key to another table or is it just a unique id for the table?
  • What are the indexes on the table?
  • Does date appear as the first field in any of these indexes or are you just using id or id and then date?
  • Can we see the SQL you are using and the table creation (with indexes) please?
Mike
i feel kinda stupid i was (almost) 100% positive the PK was ID.. just checked this morning and it wasnt indexed properly!!

31,000 records now takes about 70 seconds.. much better!
could i be more efficient in my query?

this e.g. uses a 5-day SMA, i will likely use 150 or 300
SELECT x.Id,
AVG(y.Temperature) as SMA
FROM tblTemp as x,
tblTemp as y
WHERE x.Id>=5 AND x.Id BETWEEN y.Id AND y.Id+4
GROUP BY x.Id
ORDER BY x.Id


thanks!
Reply With Quote
  #6 (permalink)  
Old 03-02-10, 11:06
healdem healdem is offline
Jaded Developer
 
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
Quote:
Originally Posted by Diesel Dud View Post
i am open to using an EMA, although i would suspect it would require more processing
nope it requires less processing, althou8gh it does require storing intermediate / last resuklt

the forescat/.average for the next item is CurrentValue + (CurrentValue-LastForecast)*SmoothingFactor
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
Reply With Quote
  #7 (permalink)  
Old 03-02-10, 11:33
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by Diesel Dud View Post
31,000 records now takes about 70 seconds.. much better!
could i be more efficient in my query?
I don't know anything about EMA so I'll just comment on your existing SQL. It can almost certainly be improved - you should be able to make it under a second if that's important to you. A few points though:
  • Make your table names more descriptive (tblTemp means nothing to someone trying to understand your code).
  • The id field appears unnecessary, and indeed hides the logic of what you are doing, wouldn't it be better to use date_add()? This is must also be calculated somewhere as well so it's just more code to maintain and possibly go wrong.
  • Your join looks odd to me but then I'm not totally sure what you're after - perhaps if you could give the sample output using the test data above.
Mike
Reply With Quote
  #8 (permalink)  
Old 03-02-10, 15:20
Diesel Dud Diesel Dud is offline
Registered User
 
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
Quote:
Originally Posted by mike_bike_kite View Post
I don't know anything about EMA so I'll just comment on your existing SQL. It can almost certainly be improved - you should be able to make it under a second if that's important to you. A few points though:
  • Make your table names more descriptive (tblTemp means nothing to someone trying to understand your code).
  • The id field appears unnecessary, and indeed hides the logic of what you are doing, wouldn't it be better to use date_add()? This is must also be calculated somewhere as well so it's just more code to maintain and possibly go wrong.
  • Your join looks odd to me but then I'm not totally sure what you're after - perhaps if you could give the sample output using the test data above.
Mike
In an effort to simply my problem, I had created a simple example.
I will use the actual example.

`ID` is being used as the PK and is being incremented automatically on the insert.
An alternative would be to use `Dte` and `Tme`, although this seems a little more complex to work with.


--
-- Table structure for table `IntradayPairs`
--

CREATE TABLE IF NOT EXISTS `IntradayPairs` (
`ID` int(11) NOT NULL auto_increment COMMENT 'ID',
`Sym_A` varchar(25) NOT NULL COMMENT 'Pair Symbol A',
`Sym_B` varchar(25) NOT NULL COMMENT 'Pair Symbol B',
`Dte` date NOT NULL COMMENT 'Date',
`Tme` time NOT NULL COMMENT 'Time',
`Sym_A_Cls` decimal(8,4) NOT NULL COMMENT 'Symbol Pair A Closing Price',
`Sym_B_Cls` decimal(8,4) NOT NULL COMMENT 'Symbol Pair B Closing Price',
`Prc_Rto` decimal(6,4) NOT NULL COMMENT 'Price Ratio',
`SMA` decimal(6,4) NOT NULL COMMENT 'Simple Moving Average',
PRIMARY KEY (`ID`),
KEY `ID_Day` (`ID`,`Sym_A`,`Sym_B`,`Dte`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=31501 ;

--
-- Dumping data for table `IntradayPairs`
--

INSERT INTO `IntradayPairs` (`ID`, `Sym_A`, `Sym_B`, `Dte`, `Tme`, `Sym_A_Cls`, `Sym_B_Cls`, `Prc_Rto`, `SMA`) VALUES
(1, 'HD', 'LOW', '2009-08-06', '01:00:00', '26.5000', '22.7500', '1.1648', '0.0000'),
(2, 'HD', 'LOW', '2009-08-06', '01:01:00', '26.5100', '22.7400', '1.1658', '0.0000'),
(3, 'HD', 'LOW', '2009-08-06', '01:02:00', '26.5000', '22.7400', '1.1653', '0.0000'),
(4, 'HD', 'LOW', '2009-08-06', '01:03:00', '26.4900', '22.7400', '1.1649', '0.0000'),
(5, 'HD', 'LOW', '2009-08-06', '01:04:00', '26.5100', '22.7600', '1.1648', '0.0000'),
(6, 'HD', 'LOW', '2009-08-06', '01:05:00', '26.4900', '22.7400', '1.1649', '0.0000'),
(7, 'HD', 'LOW', '2009-08-06', '01:06:00', '26.5000', '22.7600', '1.1643', '0.0000'),
(8, 'HD', 'LOW', '2009-08-06', '01:07:00', '26.5100', '22.7600', '1.1648', '0.0000'),
(9, 'HD', 'LOW', '2009-08-06', '01:08:00', '26.5300', '22.7700', '1.1651', '0.0000'),
(10, 'HD', 'LOW', '2009-08-06', '01:09:00', '26.5300', '22.7700', '1.1651', '0.0000'),
(11, 'HD', 'LOW', '2009-08-06', '01:10:00', '26.5000', '22.7700', '1.1638', '0.0000'),
(12, 'HD', 'LOW', '2009-08-06', '01:11:00', '26.5000', '22.7700', '1.1638', '0.0000'),
(13, 'HD', 'LOW', '2009-08-06', '01:12:00', '26.5200', '22.7800', '1.1642', '0.0000'),
(14, 'HD', 'LOW', '2009-08-06', '01:13:00', '26.5100', '22.7800', '1.1637', '0.0000'),
(15, 'HD', 'LOW', '2009-08-06', '01:14:00', '26.5300', '22.7800', '1.1646', '0.0000'),
(16, 'HD', 'LOW', '2009-08-06', '01:15:00', '26.5400', '22.8000', '1.1640', '0.0000'),
(17, 'HD', 'LOW', '2009-08-06', '01:16:00', '26.5500', '22.8000', '1.1645', '0.0000'),
(18, 'HD', 'LOW', '2009-08-06', '01:17:00', '26.5400', '22.8000', '1.1640', '0.0000'),
(19, 'HD', 'LOW', '2009-08-06', '01:18:00', '26.5500', '22.7900', '1.1650', '0.0000'),
(20, 'HD', 'LOW', '2009-08-06', '01:19:00', '26.5500', '22.7800', '1.1655', '0.0000'),
(21, 'HD', 'LOW', '2009-08-06', '01:20:00', '26.5500', '22.7900', '1.1650', '0.0000'),
(22, 'HD', 'LOW', '2009-08-06', '01:21:00', '26.5600', '22.7900', '1.1654', '0.0000'),
(23, 'HD', 'LOW', '2009-08-06', '01:22:00', '26.5700', '22.8100', '1.1648', '0.0000'),
(24, 'HD', 'LOW', '2009-08-06', '01:23:00', '26.5700', '22.8200', '1.1643', '0.0000'),
(25, 'HD', 'LOW', '2009-08-06', '01:24:00', '26.5700', '22.8300', '1.1638', '0.0000'),
(26, 'HD', 'LOW', '2009-08-06', '01:25:00', '26.5700', '22.8100', '1.1648', '0.0000'),
(27, 'HD', 'LOW', '2009-08-06', '01:26:00', '26.5500', '22.8000', '1.1645', '0.0000'),
(28, 'HD', 'LOW', '2009-08-06', '01:27:00', '26.5500', '22.7800', '1.1655', '0.0000'),
(29, 'HD', 'LOW', '2009-08-06', '01:28:00', '26.5500', '22.7800', '1.1655', '0.0000'),
(30, 'HD', 'LOW', '2009-08-06', '01:29:00', '26.5600', '22.7800', '1.1659', '0.0000'),
(31, 'HD', 'LOW', '2009-08-06', '01:30:00', '26.5800', '22.8100', '1.1653', '0.0000'),
(32, 'HD', 'LOW', '2009-08-06', '01:31:00', '26.6000', '22.8100', '1.1662', '0.0000'),
(33, 'HD', 'LOW', '2009-08-06', '01:32:00', '26.6000', '22.8100', '1.1662', '0.0000'),
(34, 'HD', 'LOW', '2009-08-06', '01:33:00', '26.6000', '22.8000', '1.1667', '0.0000'),
(35, 'HD', 'LOW', '2009-08-06', '01:34:00', '26.5800', '22.8000', '1.1658', '0.0000'),
(36, 'HD', 'LOW', '2009-08-06', '01:35:00', '26.5800', '22.7900', '1.1663', '0.0000'),
(37, 'HD', 'LOW', '2009-08-06', '01:36:00', '26.5600', '22.7800', '1.1659', '0.0000'),
(38, 'HD', 'LOW', '2009-08-06', '01:37:00', '26.5600', '22.7900', '1.1654', '0.0000'),
(39, 'HD', 'LOW', '2009-08-06', '01:38:00', '26.5600', '22.7700', '1.1664', '0.0000'),
(40, 'HD', 'LOW', '2009-08-06', '01:39:00', '26.5400', '22.7400', '1.1671', '0.0000'),
(41, 'HD', 'LOW', '2009-08-06', '01:40:00', '26.5300', '22.7500', '1.1662', '0.0000'),
(42, 'HD', 'LOW', '2009-08-06', '01:41:00', '26.5100', '22.7200', '1.1668', '0.0000'),
(43, 'HD', 'LOW', '2009-08-06', '01:42:00', '26.5000', '22.6900', '1.1679', '0.0000'),
(44, 'HD', 'LOW', '2009-08-06', '01:43:00', '26.4800', '22.7200', '1.1655', '0.0000'),
(45, 'HD', 'LOW', '2009-08-06', '01:44:00', '26.4600', '22.7100', '1.1651', '0.0000'),
(46, 'HD', 'LOW', '2009-08-06', '01:45:00', '26.4600', '22.7200', '1.1646', '0.0000'),
(47, 'HD', 'LOW', '2009-08-06', '01:46:00', '26.4600', '22.7200', '1.1646', '0.0000'),
(48, 'HD', 'LOW', '2009-08-06', '01:47:00', '26.4800', '22.7100', '1.1660', '0.0000'),
(49, 'HD', 'LOW', '2009-08-06', '01:48:00', '26.4400', '22.7000', '1.1648', '0.0000'),
(50, 'HD', 'LOW', '2009-08-06', '01:49:00', '26.4400', '22.7000', '1.1648', '0.0000'),
(51, 'HD', 'LOW', '2009-08-06', '01:50:00', '26.4400', '22.6700', '1.1663', '0.0000'),
(52, 'HD', 'LOW', '2009-08-06', '01:51:00', '26.4400', '22.6800', '1.1658', '0.0000'),
(53, 'HD', 'LOW', '2009-08-06', '01:52:00', '26.4500', '22.6800', '1.1662', '0.0000'),
(54, 'HD', 'LOW', '2009-08-06', '01:53:00', '26.4400', '22.6800', '1.1658', '0.0000'),
(55, 'HD', 'LOW', '2009-08-06', '01:54:00', '26.4000', '22.6600', '1.1650', '0.0000'),
(56, 'HD', 'LOW', '2009-08-06', '01:55:00', '26.4000', '22.6500', '1.1656', '0.0000'),
(57, 'HD', 'LOW', '2009-08-06', '01:56:00', '26.4100', '22.6500', '1.1660', '0.0000'),
(58, 'HD', 'LOW', '2009-08-06', '01:57:00', '26.4100', '22.6300', '1.1670', '0.0000'),
(59, 'HD', 'LOW', '2009-08-06', '01:58:00', '26.4100', '22.6300', '1.1670', '0.0000'),
(60, 'HD', 'LOW', '2009-08-06', '01:59:00', '26.4000', '22.6200', '1.1671', '0.0000'),
(61, 'HD', 'LOW', '2009-08-06', '02:00:00', '26.3800', '22.6100', '1.1667', '0.0000'),
(62, 'HD', 'LOW', '2009-08-06', '02:01:00', '26.3400', '22.5600', '1.1676', '0.0000'),
(63, 'HD', 'LOW', '2009-08-06', '02:02:00', '26.3300', '22.5500', '1.1676', '0.0000'),
(64, 'HD', 'LOW', '2009-08-06', '02:03:00', '26.3300', '22.5600', '1.1671', '0.0000'),
(65, 'HD', 'LOW', '2009-08-06', '02:04:00', '26.3400', '22.5600', '1.1676', '0.0000'),
(66, 'HD', 'LOW', '2009-08-06', '02:05:00', '26.3300', '22.5200', '1.1692', '0.0000'),
(67, 'HD', 'LOW', '2009-08-06', '02:06:00', '26.3100', '22.5200', '1.1683', '0.0000'),
(68, 'HD', 'LOW', '2009-08-06', '02:07:00', '26.3100', '22.5300', '1.1678', '0.0000'),
(69, 'HD', 'LOW', '2009-08-06', '02:08:00', '26.3100', '22.5600', '1.1662', '0.0000'),
(70, 'HD', 'LOW', '2009-08-06', '02:09:00', '26.3100', '22.5600', '1.1662', '0.0000'),
(71, 'HD', 'LOW', '2009-08-06', '02:10:00', '26.3100', '22.5500', '1.1667', '0.0000'),
(72, 'HD', 'LOW', '2009-08-06', '02:11:00', '26.3000', '22.5400', '1.1668', '0.0000'),
(73, 'HD', 'LOW', '2009-08-06', '02:12:00', '26.3000', '22.5400', '1.1668', '0.0000'),
(74, 'HD', 'LOW', '2009-08-06', '02:13:00', '26.3000', '22.5400', '1.1668', '0.0000'),
(75, 'HD', 'LOW', '2009-08-06', '02:14:00', '26.3100', '22.5600', '1.1662', '0.0000'),
(76, 'HD', 'LOW', '2009-08-06', '02:15:00', '26.3100', '22.5400', '1.1673', '0.0000'),
(77, 'HD', 'LOW', '2009-08-06', '02:16:00', '26.3400', '22.5400', '1.1686', '0.0000'),
(78, 'HD', 'LOW', '2009-08-06', '02:17:00', '26.3500', '22.5400', '1.1690', '0.0000'),
(79, 'HD', 'LOW', '2009-08-06', '02:18:00', '26.3500', '22.5300', '1.1696', '0.0000'),
(80, 'HD', 'LOW', '2009-08-06', '02:19:00', '26.3400', '22.5300', '1.1691', '0.0000');

UPDATE `IntradayPairs`
SET `Prc_Rto` = `Sym_A_Cls` / `Sym_B_Cls;
Reply With Quote
  #9 (permalink)  
Old 03-02-10, 15:21
Diesel Dud Diesel Dud is offline
Registered User
 
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
I would like to calculate the SMA for column `Prc_Rto`

The following query takes 0.0003 seconds to execute on this data set on 80 records:
SELECT x.Id,
AVG(y.`Prc_Rto`) as SMA
FROM `IntradayPairs` as x,
`IntradayPairs` as y
WHERE x.Id>=5 AND x.Id BETWEEN y.Id AND y.Id+4
GROUP BY x.Id
ORDER BY x.Id


I will have 30,000+ records in the actual table and would like the optimal way to code this calculation.

Thanks.
Reply With Quote
  #10 (permalink)  
Old 03-02-10, 16:14
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by Diesel Dud
I will use the actual example.
I see you want to store the SMA value in the table rather than calculate it on the fly - which is fine by the way assuming you cope with price updates etc altering the SMA. Because you're only updating the data infrequently it shouldn't really matter whether it takes 70 secs or not. It seems silly to spend your time trying to bring this time down unless there's a good reason. A side benefit will be that I won't need to work my way through your data and SQL
Reply With Quote
  #11 (permalink)  
Old 03-02-10, 19:14
Diesel Dud Diesel Dud is offline
Registered User
 
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
Quote:
Originally Posted by mike_bike_kite View Post
I see you want to store the SMA value in the table rather than calculate it on the fly - which is fine by the way assuming you cope with price updates etc altering the SMA. Because you're only updating the data infrequently it shouldn't really matter whether it takes 70 secs or not. It seems silly to spend your time trying to bring this time down unless there's a good reason. A side benefit will be that I won't need to work my way through your data and SQL
calculation of the SMA is one of several steps needed to determine if the pair is profitable; e.g. after SMA, a simple moving avg for the standard deviation will be calculated

the process will be repeated for periods of 60, 120, 240, and 390 periods

these are just some of the steps which will be executed on this workfile of 30,000 records; the workfile will then be flushed and the next pair will undergo the same process; i have hundreds of pairs to process.

as you can see, if i can shave off some time calculating these SMAs, total process time savings will be huge
Reply With Quote
  #12 (permalink)  
Old 03-02-10, 19:42
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by Diesel Dud View Post
as you can see, if i can shave off some time calculating these SMAs, total process time savings will be huge
The savings may well be huge but I joined the thread under the impression it was a scientific issue - you said originally that it was temperature you were modelling rather than prices. I make my living working with financial databases and just feel uncomfortable doing it for free. My apologies but I'm sure someone else will chip in soon to help you out.

Mike
Reply With Quote
  #13 (permalink)  
Old 03-04-10, 20:39
Diesel Dud Diesel Dud is offline
Registered User
 
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
.....bump
Reply With Quote
  #14 (permalink)  
Old 03-13-10, 16:47
Diesel Dud Diesel Dud is offline
Registered User
 
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
....... bump
Reply With Quote
  #15 (permalink)  
Old 03-13-10, 16:52
healdem healdem is offline
Jaded Developer
 
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
bump
exponential smoothing
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On