| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

03-01-10, 22:14
|
|
Registered User
|
|
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
|
|
|
Simple moving avg (SMA)
|
|
Hello,
I’d like to calculate a simple moving average (SMA).
Processing time appears to be long. I’m hoping someone can help me with an efficient syntax.
E.g. tblTemp
ID, Date, Temperature
1, 2010-01-01, 5
2, 2010-01-02, 6
3, 2010-01-03, 5
4, 2010-01-04, 7
5, 2010-01-05, 2
6, 2010-01-06, 1
7, 2010-01-07, 6
8, 2010-01-08, 7
9, 2010-01-09, 9
10, 2010-01-10, 9
A 5-day SMA would require using the last 5 days of temperature data to calculate the average.
I.e. for period 5 use ID=1 to 5, period 6 use ID=2 to 6, etc...
***
This is an oversimplified example; my table will actually contain over 30,000 records and I would like to calculate SMAs for, say, 60, 120, 250 day periods.
Right now, running a SMA=150 for 500 records takes 10 minutes
Can someone help??
|
|

03-02-10, 04:10
|
|
Jaded Developer
|
|
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
|
|
instead of calculating a SMA have you considered using exponential smoothing instead. the downside is that you'd probably store the smoothed value.
might need a bit of juggling to get the correct smoothing factor
exponential smoothing moving average - Google Search
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
|
|

03-02-10, 04:34
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
|
|
Quote:
|
Originally Posted by Diesel Dud
Right now, running a SMA=150 for 500 records takes 10 minutes
|
It shouldn't take 10 minutes to run that type of query so just a few questions ...- Is the id field a foreign key to another table or is it just a unique id for the table?
- What are the indexes on the table?
- Does date appear as the first field in any of these indexes or are you just using id or id and then date?
- Can we see the SQL you are using and the table creation (with indexes) please?
Mike
|
|

03-02-10, 08:36
|
|
Registered User
|
|
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
|
|
Quote:
Originally Posted by healdem
instead of calculating a SMA have you considered using exponential smoothing instead. the downside is that you'd probably store the smoothed value.
might need a bit of juggling to get the correct smoothing factor
exponential smoothing moving average - Google Search
|
i am open to using an EMA, although i would suspect it would require more processing
|
|

03-02-10, 09:52
|
|
Registered User
|
|
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
|
|
Quote:
Originally Posted by mike_bike_kite
It shouldn't take 10 minutes to run that type of query so just a few questions ...- Is the id field a foreign key to another table or is it just a unique id for the table?
- What are the indexes on the table?
- Does date appear as the first field in any of these indexes or are you just using id or id and then date?
- Can we see the SQL you are using and the table creation (with indexes) please?
Mike
|
i feel kinda stupid i was (almost) 100% positive the PK was ID.. just checked this morning and it wasnt indexed properly!!
31,000 records now takes about 70 seconds.. much better!
could i be more efficient in my query?
this e.g. uses a 5-day SMA, i will likely use 150 or 300
SELECT x.Id,
AVG(y.Temperature) as SMA
FROM tblTemp as x,
tblTemp as y
WHERE x.Id>=5 AND x.Id BETWEEN y.Id AND y.Id+4
GROUP BY x.Id
ORDER BY x.Id
thanks!
|
|

03-02-10, 11:06
|
|
Jaded Developer
|
|
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
|
|
Quote:
Originally Posted by Diesel Dud
i am open to using an EMA, although i would suspect it would require more processing
|
nope it requires less processing, althou8gh it does require storing intermediate / last resuklt
the forescat/.average for the next item is CurrentValue + (CurrentValue-LastForecast)*SmoothingFactor
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
|
|

03-02-10, 11:33
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Quote:
Originally Posted by Diesel Dud
31,000 records now takes about 70 seconds.. much better!
could i be more efficient in my query?
|
I don't know anything about EMA so I'll just comment on your existing SQL. It can almost certainly be improved - you should be able to make it under a second if that's important to you. A few points though:- Make your table names more descriptive (tblTemp means nothing to someone trying to understand your code).
- The id field appears unnecessary, and indeed hides the logic of what you are doing, wouldn't it be better to use date_add()? This is must also be calculated somewhere as well so it's just more code to maintain and possibly go wrong.
- Your join looks odd to me but then I'm not totally sure what you're after - perhaps if you could give the sample output using the test data above.
Mike
|
|

03-02-10, 15:20
|
|
Registered User
|
|
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
|
|
Quote:
Originally Posted by mike_bike_kite
I don't know anything about EMA so I'll just comment on your existing SQL. It can almost certainly be improved - you should be able to make it under a second if that's important to you. A few points though:- Make your table names more descriptive (tblTemp means nothing to someone trying to understand your code).
- The id field appears unnecessary, and indeed hides the logic of what you are doing, wouldn't it be better to use date_add()? This is must also be calculated somewhere as well so it's just more code to maintain and possibly go wrong.
- Your join looks odd to me but then I'm not totally sure what you're after - perhaps if you could give the sample output using the test data above.
Mike
|
In an effort to simply my problem, I had created a simple example.
I will use the actual example.
`ID` is being used as the PK and is being incremented automatically on the insert.
An alternative would be to use `Dte` and `Tme`, although this seems a little more complex to work with.
--
-- Table structure for table `IntradayPairs`
--
CREATE TABLE IF NOT EXISTS `IntradayPairs` (
`ID` int(11) NOT NULL auto_increment COMMENT 'ID',
`Sym_A` varchar(25) NOT NULL COMMENT 'Pair Symbol A',
`Sym_B` varchar(25) NOT NULL COMMENT 'Pair Symbol B',
`Dte` date NOT NULL COMMENT 'Date',
`Tme` time NOT NULL COMMENT 'Time',
`Sym_A_Cls` decimal(8,4) NOT NULL COMMENT 'Symbol Pair A Closing Price',
`Sym_B_Cls` decimal(8,4) NOT NULL COMMENT 'Symbol Pair B Closing Price',
`Prc_Rto` decimal(6,4) NOT NULL COMMENT 'Price Ratio',
`SMA` decimal(6,4) NOT NULL COMMENT 'Simple Moving Average',
PRIMARY KEY (`ID`),
KEY `ID_Day` (`ID`,`Sym_A`,`Sym_B`,`Dte`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=31501 ;
--
-- Dumping data for table `IntradayPairs`
--
INSERT INTO `IntradayPairs` (`ID`, `Sym_A`, `Sym_B`, `Dte`, `Tme`, `Sym_A_Cls`, `Sym_B_Cls`, `Prc_Rto`, `SMA`) VALUES
(1, 'HD', 'LOW', '2009-08-06', '01:00:00', '26.5000', '22.7500', '1.1648', '0.0000'),
(2, 'HD', 'LOW', '2009-08-06', '01:01:00', '26.5100', '22.7400', '1.1658', '0.0000'),
(3, 'HD', 'LOW', '2009-08-06', '01:02:00', '26.5000', '22.7400', '1.1653', '0.0000'),
(4, 'HD', 'LOW', '2009-08-06', '01:03:00', '26.4900', '22.7400', '1.1649', '0.0000'),
(5, 'HD', 'LOW', '2009-08-06', '01:04:00', '26.5100', '22.7600', '1.1648', '0.0000'),
(6, 'HD', 'LOW', '2009-08-06', '01:05:00', '26.4900', '22.7400', '1.1649', '0.0000'),
(7, 'HD', 'LOW', '2009-08-06', '01:06:00', '26.5000', '22.7600', '1.1643', '0.0000'),
(8, 'HD', 'LOW', '2009-08-06', '01:07:00', '26.5100', '22.7600', '1.1648', '0.0000'),
(9, 'HD', 'LOW', '2009-08-06', '01:08:00', '26.5300', '22.7700', '1.1651', '0.0000'),
(10, 'HD', 'LOW', '2009-08-06', '01:09:00', '26.5300', '22.7700', '1.1651', '0.0000'),
(11, 'HD', 'LOW', '2009-08-06', '01:10:00', '26.5000', '22.7700', '1.1638', '0.0000'),
(12, 'HD', 'LOW', '2009-08-06', '01:11:00', '26.5000', '22.7700', '1.1638', '0.0000'),
(13, 'HD', 'LOW', '2009-08-06', '01:12:00', '26.5200', '22.7800', '1.1642', '0.0000'),
(14, 'HD', 'LOW', '2009-08-06', '01:13:00', '26.5100', '22.7800', '1.1637', '0.0000'),
(15, 'HD', 'LOW', '2009-08-06', '01:14:00', '26.5300', '22.7800', '1.1646', '0.0000'),
(16, 'HD', 'LOW', '2009-08-06', '01:15:00', '26.5400', '22.8000', '1.1640', '0.0000'),
(17, 'HD', 'LOW', '2009-08-06', '01:16:00', '26.5500', '22.8000', '1.1645', '0.0000'),
(18, 'HD', 'LOW', '2009-08-06', '01:17:00', '26.5400', '22.8000', '1.1640', '0.0000'),
(19, 'HD', 'LOW', '2009-08-06', '01:18:00', '26.5500', '22.7900', '1.1650', '0.0000'),
(20, 'HD', 'LOW', '2009-08-06', '01:19:00', '26.5500', '22.7800', '1.1655', '0.0000'),
(21, 'HD', 'LOW', '2009-08-06', '01:20:00', '26.5500', '22.7900', '1.1650', '0.0000'),
(22, 'HD', 'LOW', '2009-08-06', '01:21:00', '26.5600', '22.7900', '1.1654', '0.0000'),
(23, 'HD', 'LOW', '2009-08-06', '01:22:00', '26.5700', '22.8100', '1.1648', '0.0000'),
(24, 'HD', 'LOW', '2009-08-06', '01:23:00', '26.5700', '22.8200', '1.1643', '0.0000'),
(25, 'HD', 'LOW', '2009-08-06', '01:24:00', '26.5700', '22.8300', '1.1638', '0.0000'),
(26, 'HD', 'LOW', '2009-08-06', '01:25:00', '26.5700', '22.8100', '1.1648', '0.0000'),
(27, 'HD', 'LOW', '2009-08-06', '01:26:00', '26.5500', '22.8000', '1.1645', '0.0000'),
(28, 'HD', 'LOW', '2009-08-06', '01:27:00', '26.5500', '22.7800', '1.1655', '0.0000'),
(29, 'HD', 'LOW', '2009-08-06', '01:28:00', '26.5500', '22.7800', '1.1655', '0.0000'),
(30, 'HD', 'LOW', '2009-08-06', '01:29:00', '26.5600', '22.7800', '1.1659', '0.0000'),
(31, 'HD', 'LOW', '2009-08-06', '01:30:00', '26.5800', '22.8100', '1.1653', '0.0000'),
(32, 'HD', 'LOW', '2009-08-06', '01:31:00', '26.6000', '22.8100', '1.1662', '0.0000'),
(33, 'HD', 'LOW', '2009-08-06', '01:32:00', '26.6000', '22.8100', '1.1662', '0.0000'),
(34, 'HD', 'LOW', '2009-08-06', '01:33:00', '26.6000', '22.8000', '1.1667', '0.0000'),
(35, 'HD', 'LOW', '2009-08-06', '01:34:00', '26.5800', '22.8000', '1.1658', '0.0000'),
(36, 'HD', 'LOW', '2009-08-06', '01:35:00', '26.5800', '22.7900', '1.1663', '0.0000'),
(37, 'HD', 'LOW', '2009-08-06', '01:36:00', '26.5600', '22.7800', '1.1659', '0.0000'),
(38, 'HD', 'LOW', '2009-08-06', '01:37:00', '26.5600', '22.7900', '1.1654', '0.0000'),
(39, 'HD', 'LOW', '2009-08-06', '01:38:00', '26.5600', '22.7700', '1.1664', '0.0000'),
(40, 'HD', 'LOW', '2009-08-06', '01:39:00', '26.5400', '22.7400', '1.1671', '0.0000'),
(41, 'HD', 'LOW', '2009-08-06', '01:40:00', '26.5300', '22.7500', '1.1662', '0.0000'),
(42, 'HD', 'LOW', '2009-08-06', '01:41:00', '26.5100', '22.7200', '1.1668', '0.0000'),
(43, 'HD', 'LOW', '2009-08-06', '01:42:00', '26.5000', '22.6900', '1.1679', '0.0000'),
(44, 'HD', 'LOW', '2009-08-06', '01:43:00', '26.4800', '22.7200', '1.1655', '0.0000'),
(45, 'HD', 'LOW', '2009-08-06', '01:44:00', '26.4600', '22.7100', '1.1651', '0.0000'),
(46, 'HD', 'LOW', '2009-08-06', '01:45:00', '26.4600', '22.7200', '1.1646', '0.0000'),
(47, 'HD', 'LOW', '2009-08-06', '01:46:00', '26.4600', '22.7200', '1.1646', '0.0000'),
(48, 'HD', 'LOW', '2009-08-06', '01:47:00', '26.4800', '22.7100', '1.1660', '0.0000'),
(49, 'HD', 'LOW', '2009-08-06', '01:48:00', '26.4400', '22.7000', '1.1648', '0.0000'),
(50, 'HD', 'LOW', '2009-08-06', '01:49:00', '26.4400', '22.7000', '1.1648', '0.0000'),
(51, 'HD', 'LOW', '2009-08-06', '01:50:00', '26.4400', '22.6700', '1.1663', '0.0000'),
(52, 'HD', 'LOW', '2009-08-06', '01:51:00', '26.4400', '22.6800', '1.1658', '0.0000'),
(53, 'HD', 'LOW', '2009-08-06', '01:52:00', '26.4500', '22.6800', '1.1662', '0.0000'),
(54, 'HD', 'LOW', '2009-08-06', '01:53:00', '26.4400', '22.6800', '1.1658', '0.0000'),
(55, 'HD', 'LOW', '2009-08-06', '01:54:00', '26.4000', '22.6600', '1.1650', '0.0000'),
(56, 'HD', 'LOW', '2009-08-06', '01:55:00', '26.4000', '22.6500', '1.1656', '0.0000'),
(57, 'HD', 'LOW', '2009-08-06', '01:56:00', '26.4100', '22.6500', '1.1660', '0.0000'),
(58, 'HD', 'LOW', '2009-08-06', '01:57:00', '26.4100', '22.6300', '1.1670', '0.0000'),
(59, 'HD', 'LOW', '2009-08-06', '01:58:00', '26.4100', '22.6300', '1.1670', '0.0000'),
(60, 'HD', 'LOW', '2009-08-06', '01:59:00', '26.4000', '22.6200', '1.1671', '0.0000'),
(61, 'HD', 'LOW', '2009-08-06', '02:00:00', '26.3800', '22.6100', '1.1667', '0.0000'),
(62, 'HD', 'LOW', '2009-08-06', '02:01:00', '26.3400', '22.5600', '1.1676', '0.0000'),
(63, 'HD', 'LOW', '2009-08-06', '02:02:00', '26.3300', '22.5500', '1.1676', '0.0000'),
(64, 'HD', 'LOW', '2009-08-06', '02:03:00', '26.3300', '22.5600', '1.1671', '0.0000'),
(65, 'HD', 'LOW', '2009-08-06', '02:04:00', '26.3400', '22.5600', '1.1676', '0.0000'),
(66, 'HD', 'LOW', '2009-08-06', '02:05:00', '26.3300', '22.5200', '1.1692', '0.0000'),
(67, 'HD', 'LOW', '2009-08-06', '02:06:00', '26.3100', '22.5200', '1.1683', '0.0000'),
(68, 'HD', 'LOW', '2009-08-06', '02:07:00', '26.3100', '22.5300', '1.1678', '0.0000'),
(69, 'HD', 'LOW', '2009-08-06', '02:08:00', '26.3100', '22.5600', '1.1662', '0.0000'),
(70, 'HD', 'LOW', '2009-08-06', '02:09:00', '26.3100', '22.5600', '1.1662', '0.0000'),
(71, 'HD', 'LOW', '2009-08-06', '02:10:00', '26.3100', '22.5500', '1.1667', '0.0000'),
(72, 'HD', 'LOW', '2009-08-06', '02:11:00', '26.3000', '22.5400', '1.1668', '0.0000'),
(73, 'HD', 'LOW', '2009-08-06', '02:12:00', '26.3000', '22.5400', '1.1668', '0.0000'),
(74, 'HD', 'LOW', '2009-08-06', '02:13:00', '26.3000', '22.5400', '1.1668', '0.0000'),
(75, 'HD', 'LOW', '2009-08-06', '02:14:00', '26.3100', '22.5600', '1.1662', '0.0000'),
(76, 'HD', 'LOW', '2009-08-06', '02:15:00', '26.3100', '22.5400', '1.1673', '0.0000'),
(77, 'HD', 'LOW', '2009-08-06', '02:16:00', '26.3400', '22.5400', '1.1686', '0.0000'),
(78, 'HD', 'LOW', '2009-08-06', '02:17:00', '26.3500', '22.5400', '1.1690', '0.0000'),
(79, 'HD', 'LOW', '2009-08-06', '02:18:00', '26.3500', '22.5300', '1.1696', '0.0000'),
(80, 'HD', 'LOW', '2009-08-06', '02:19:00', '26.3400', '22.5300', '1.1691', '0.0000');
UPDATE `IntradayPairs`
SET `Prc_Rto` = `Sym_A_Cls` / `Sym_B_Cls;
|
|

03-02-10, 15:21
|
|
Registered User
|
|
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
|
|
I would like to calculate the SMA for column `Prc_Rto`
The following query takes 0.0003 seconds to execute on this data set on 80 records:
SELECT x.Id,
AVG(y.`Prc_Rto`) as SMA
FROM `IntradayPairs` as x,
`IntradayPairs` as y
WHERE x.Id>=5 AND x.Id BETWEEN y.Id AND y.Id+4
GROUP BY x.Id
ORDER BY x.Id
I will have 30,000+ records in the actual table and would like the optimal way to code this calculation.
Thanks.
|
|

03-02-10, 16:14
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Quote:
|
Originally Posted by Diesel Dud
I will use the actual example.
|
I see you want to store the SMA value in the table rather than calculate it on the fly - which is fine by the way assuming you cope with price updates etc altering the SMA. Because you're only updating the data infrequently it shouldn't really matter whether it takes 70 secs or not. It seems silly to spend your time trying to bring this time down unless there's a good reason. A side benefit will be that I won't need to work my way through your data and SQL 
|
|

03-02-10, 19:14
|
|
Registered User
|
|
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
|
|
Quote:
Originally Posted by mike_bike_kite
I see you want to store the SMA value in the table rather than calculate it on the fly - which is fine by the way assuming you cope with price updates etc altering the SMA. Because you're only updating the data infrequently it shouldn't really matter whether it takes 70 secs or not. It seems silly to spend your time trying to bring this time down unless there's a good reason. A side benefit will be that I won't need to work my way through your data and SQL 
|
calculation of the SMA is one of several steps needed to determine if the pair is profitable; e.g. after SMA, a simple moving avg for the standard deviation will be calculated
the process will be repeated for periods of 60, 120, 240, and 390 periods
these are just some of the steps which will be executed on this workfile of 30,000 records; the workfile will then be flushed and the next pair will undergo the same process; i have hundreds of pairs to process.
as you can see, if i can shave off some time calculating these SMAs, total process time savings will be huge
|
|

03-02-10, 19:42
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Quote:
Originally Posted by Diesel Dud
as you can see, if i can shave off some time calculating these SMAs, total process time savings will be huge
|
The savings may well be huge but I joined the thread under the impression it was a scientific issue - you said originally that it was temperature you were modelling rather than prices. I make my living working with financial databases and just feel uncomfortable doing it for free. My apologies but I'm sure someone else will chip in soon to help you out.
Mike
|
|

03-04-10, 20:39
|
|
Registered User
|
|
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
|
|
.....bump 
|
|

03-13-10, 16:47
|
|
Registered User
|
|
Join Date: Feb 2010
Location: Montreal, CA
Posts: 32
|
|
....... bump 
|
|

03-13-10, 16:52
|
|
Jaded Developer
|
|
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
|
|
bump
exponential smoothing
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|