Results 1 to 15 of 15
  1. #1
    Join Date
    Feb 2010
    Location
    Montreal, CA
    Posts
    32

    Unanswered: Simple moving avg (SMA)

    Hello,

    Id like to calculate a simple moving average (SMA).
    Processing time appears to be long. Im hoping someone can help me with an efficient syntax.

    E.g. tblTemp
    ID, Date, Temperature
    1, 2010-01-01, 5
    2, 2010-01-02, 6
    3, 2010-01-03, 5
    4, 2010-01-04, 7
    5, 2010-01-05, 2
    6, 2010-01-06, 1
    7, 2010-01-07, 6
    8, 2010-01-08, 7
    9, 2010-01-09, 9
    10, 2010-01-10, 9

    A 5-day SMA would require using the last 5 days of temperature data to calculate the average.
    I.e. for period 5 use ID=1 to 5, period 6 use ID=2 to 6, etc...

    ***

    This is an oversimplified example; my table will actually contain over 30,000 records and I would like to calculate SMAs for, say, 60, 120, 250 day periods.

    Right now, running a SMA=150 for 500 records takes 10 minutes

    Can someone help??

  2. #2
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    instead of calculating a SMA have you considered using exponential smoothing instead. the downside is that you'd probably store the smoothed value.

    might need a bit of juggling to get the correct smoothing factor
    exponential smoothing moving average - Google Search
    I'd rather be riding on the Tiger 800 or the Norton

  3. #3
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by Diesel Dud
    Right now, running a SMA=150 for 500 records takes 10 minutes
    It shouldn't take 10 minutes to run that type of query so just a few questions ...
    • Is the id field a foreign key to another table or is it just a unique id for the table?
    • What are the indexes on the table?
    • Does date appear as the first field in any of these indexes or are you just using id or id and then date?
    • Can we see the SQL you are using and the table creation (with indexes) please?

    Mike

  4. #4
    Join Date
    Feb 2010
    Location
    Montreal, CA
    Posts
    32
    Quote Originally Posted by healdem View Post
    instead of calculating a SMA have you considered using exponential smoothing instead. the downside is that you'd probably store the smoothed value.

    might need a bit of juggling to get the correct smoothing factor
    exponential smoothing moving average - Google Search
    i am open to using an EMA, although i would suspect it would require more processing

  5. #5
    Join Date
    Feb 2010
    Location
    Montreal, CA
    Posts
    32
    Quote Originally Posted by mike_bike_kite View Post
    It shouldn't take 10 minutes to run that type of query so just a few questions ...
    • Is the id field a foreign key to another table or is it just a unique id for the table?
    • What are the indexes on the table?
    • Does date appear as the first field in any of these indexes or are you just using id or id and then date?
    • Can we see the SQL you are using and the table creation (with indexes) please?

    Mike
    i feel kinda stupid i was (almost) 100% positive the PK was ID.. just checked this morning and it wasnt indexed properly!!

    31,000 records now takes about 70 seconds.. much better!
    could i be more efficient in my query?

    this e.g. uses a 5-day SMA, i will likely use 150 or 300
    SELECT x.Id,
    AVG(y.Temperature) as SMA
    FROM tblTemp as x,
    tblTemp as y
    WHERE x.Id>=5 AND x.Id BETWEEN y.Id AND y.Id+4
    GROUP BY x.Id
    ORDER BY x.Id


    thanks!

  6. #6
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    Quote Originally Posted by Diesel Dud View Post
    i am open to using an EMA, although i would suspect it would require more processing
    nope it requires less processing, althou8gh it does require storing intermediate / last resuklt

    the forescat/.average for the next item is CurrentValue + (CurrentValue-LastForecast)*SmoothingFactor
    I'd rather be riding on the Tiger 800 or the Norton

  7. #7
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by Diesel Dud View Post
    31,000 records now takes about 70 seconds.. much better!
    could i be more efficient in my query?
    I don't know anything about EMA so I'll just comment on your existing SQL. It can almost certainly be improved - you should be able to make it under a second if that's important to you. A few points though:
    • Make your table names more descriptive (tblTemp means nothing to someone trying to understand your code).
    • The id field appears unnecessary, and indeed hides the logic of what you are doing, wouldn't it be better to use date_add()? This is must also be calculated somewhere as well so it's just more code to maintain and possibly go wrong.
    • Your join looks odd to me but then I'm not totally sure what you're after - perhaps if you could give the sample output using the test data above.

    Mike

  8. #8
    Join Date
    Feb 2010
    Location
    Montreal, CA
    Posts
    32
    Quote Originally Posted by mike_bike_kite View Post
    I don't know anything about EMA so I'll just comment on your existing SQL. It can almost certainly be improved - you should be able to make it under a second if that's important to you. A few points though:
    • Make your table names more descriptive (tblTemp means nothing to someone trying to understand your code).
    • The id field appears unnecessary, and indeed hides the logic of what you are doing, wouldn't it be better to use date_add()? This is must also be calculated somewhere as well so it's just more code to maintain and possibly go wrong.
    • Your join looks odd to me but then I'm not totally sure what you're after - perhaps if you could give the sample output using the test data above.

    Mike
    In an effort to simply my problem, I had created a simple example.
    I will use the actual example.

    `ID` is being used as the PK and is being incremented automatically on the insert.
    An alternative would be to use `Dte` and `Tme`, although this seems a little more complex to work with.


    --
    -- Table structure for table `IntradayPairs`
    --

    CREATE TABLE IF NOT EXISTS `IntradayPairs` (
    `ID` int(11) NOT NULL auto_increment COMMENT 'ID',
    `Sym_A` varchar(25) NOT NULL COMMENT 'Pair Symbol A',
    `Sym_B` varchar(25) NOT NULL COMMENT 'Pair Symbol B',
    `Dte` date NOT NULL COMMENT 'Date',
    `Tme` time NOT NULL COMMENT 'Time',
    `Sym_A_Cls` decimal(8,4) NOT NULL COMMENT 'Symbol Pair A Closing Price',
    `Sym_B_Cls` decimal(8,4) NOT NULL COMMENT 'Symbol Pair B Closing Price',
    `Prc_Rto` decimal(6,4) NOT NULL COMMENT 'Price Ratio',
    `SMA` decimal(6,4) NOT NULL COMMENT 'Simple Moving Average',
    PRIMARY KEY (`ID`),
    KEY `ID_Day` (`ID`,`Sym_A`,`Sym_B`,`Dte`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=31501 ;

    --
    -- Dumping data for table `IntradayPairs`
    --

    INSERT INTO `IntradayPairs` (`ID`, `Sym_A`, `Sym_B`, `Dte`, `Tme`, `Sym_A_Cls`, `Sym_B_Cls`, `Prc_Rto`, `SMA`) VALUES
    (1, 'HD', 'LOW', '2009-08-06', '01:00:00', '26.5000', '22.7500', '1.1648', '0.0000'),
    (2, 'HD', 'LOW', '2009-08-06', '01:01:00', '26.5100', '22.7400', '1.1658', '0.0000'),
    (3, 'HD', 'LOW', '2009-08-06', '01:02:00', '26.5000', '22.7400', '1.1653', '0.0000'),
    (4, 'HD', 'LOW', '2009-08-06', '01:03:00', '26.4900', '22.7400', '1.1649', '0.0000'),
    (5, 'HD', 'LOW', '2009-08-06', '01:04:00', '26.5100', '22.7600', '1.1648', '0.0000'),
    (6, 'HD', 'LOW', '2009-08-06', '01:05:00', '26.4900', '22.7400', '1.1649', '0.0000'),
    (7, 'HD', 'LOW', '2009-08-06', '01:06:00', '26.5000', '22.7600', '1.1643', '0.0000'),
    (8, 'HD', 'LOW', '2009-08-06', '01:07:00', '26.5100', '22.7600', '1.1648', '0.0000'),
    (9, 'HD', 'LOW', '2009-08-06', '01:08:00', '26.5300', '22.7700', '1.1651', '0.0000'),
    (10, 'HD', 'LOW', '2009-08-06', '01:09:00', '26.5300', '22.7700', '1.1651', '0.0000'),
    (11, 'HD', 'LOW', '2009-08-06', '01:10:00', '26.5000', '22.7700', '1.1638', '0.0000'),
    (12, 'HD', 'LOW', '2009-08-06', '01:11:00', '26.5000', '22.7700', '1.1638', '0.0000'),
    (13, 'HD', 'LOW', '2009-08-06', '01:12:00', '26.5200', '22.7800', '1.1642', '0.0000'),
    (14, 'HD', 'LOW', '2009-08-06', '01:13:00', '26.5100', '22.7800', '1.1637', '0.0000'),
    (15, 'HD', 'LOW', '2009-08-06', '01:14:00', '26.5300', '22.7800', '1.1646', '0.0000'),
    (16, 'HD', 'LOW', '2009-08-06', '01:15:00', '26.5400', '22.8000', '1.1640', '0.0000'),
    (17, 'HD', 'LOW', '2009-08-06', '01:16:00', '26.5500', '22.8000', '1.1645', '0.0000'),
    (18, 'HD', 'LOW', '2009-08-06', '01:17:00', '26.5400', '22.8000', '1.1640', '0.0000'),
    (19, 'HD', 'LOW', '2009-08-06', '01:18:00', '26.5500', '22.7900', '1.1650', '0.0000'),
    (20, 'HD', 'LOW', '2009-08-06', '01:19:00', '26.5500', '22.7800', '1.1655', '0.0000'),
    (21, 'HD', 'LOW', '2009-08-06', '01:20:00', '26.5500', '22.7900', '1.1650', '0.0000'),
    (22, 'HD', 'LOW', '2009-08-06', '01:21:00', '26.5600', '22.7900', '1.1654', '0.0000'),
    (23, 'HD', 'LOW', '2009-08-06', '01:22:00', '26.5700', '22.8100', '1.1648', '0.0000'),
    (24, 'HD', 'LOW', '2009-08-06', '01:23:00', '26.5700', '22.8200', '1.1643', '0.0000'),
    (25, 'HD', 'LOW', '2009-08-06', '01:24:00', '26.5700', '22.8300', '1.1638', '0.0000'),
    (26, 'HD', 'LOW', '2009-08-06', '01:25:00', '26.5700', '22.8100', '1.1648', '0.0000'),
    (27, 'HD', 'LOW', '2009-08-06', '01:26:00', '26.5500', '22.8000', '1.1645', '0.0000'),
    (28, 'HD', 'LOW', '2009-08-06', '01:27:00', '26.5500', '22.7800', '1.1655', '0.0000'),
    (29, 'HD', 'LOW', '2009-08-06', '01:28:00', '26.5500', '22.7800', '1.1655', '0.0000'),
    (30, 'HD', 'LOW', '2009-08-06', '01:29:00', '26.5600', '22.7800', '1.1659', '0.0000'),
    (31, 'HD', 'LOW', '2009-08-06', '01:30:00', '26.5800', '22.8100', '1.1653', '0.0000'),
    (32, 'HD', 'LOW', '2009-08-06', '01:31:00', '26.6000', '22.8100', '1.1662', '0.0000'),
    (33, 'HD', 'LOW', '2009-08-06', '01:32:00', '26.6000', '22.8100', '1.1662', '0.0000'),
    (34, 'HD', 'LOW', '2009-08-06', '01:33:00', '26.6000', '22.8000', '1.1667', '0.0000'),
    (35, 'HD', 'LOW', '2009-08-06', '01:34:00', '26.5800', '22.8000', '1.1658', '0.0000'),
    (36, 'HD', 'LOW', '2009-08-06', '01:35:00', '26.5800', '22.7900', '1.1663', '0.0000'),
    (37, 'HD', 'LOW', '2009-08-06', '01:36:00', '26.5600', '22.7800', '1.1659', '0.0000'),
    (38, 'HD', 'LOW', '2009-08-06', '01:37:00', '26.5600', '22.7900', '1.1654', '0.0000'),
    (39, 'HD', 'LOW', '2009-08-06', '01:38:00', '26.5600', '22.7700', '1.1664', '0.0000'),
    (40, 'HD', 'LOW', '2009-08-06', '01:39:00', '26.5400', '22.7400', '1.1671', '0.0000'),
    (41, 'HD', 'LOW', '2009-08-06', '01:40:00', '26.5300', '22.7500', '1.1662', '0.0000'),
    (42, 'HD', 'LOW', '2009-08-06', '01:41:00', '26.5100', '22.7200', '1.1668', '0.0000'),
    (43, 'HD', 'LOW', '2009-08-06', '01:42:00', '26.5000', '22.6900', '1.1679', '0.0000'),
    (44, 'HD', 'LOW', '2009-08-06', '01:43:00', '26.4800', '22.7200', '1.1655', '0.0000'),
    (45, 'HD', 'LOW', '2009-08-06', '01:44:00', '26.4600', '22.7100', '1.1651', '0.0000'),
    (46, 'HD', 'LOW', '2009-08-06', '01:45:00', '26.4600', '22.7200', '1.1646', '0.0000'),
    (47, 'HD', 'LOW', '2009-08-06', '01:46:00', '26.4600', '22.7200', '1.1646', '0.0000'),
    (48, 'HD', 'LOW', '2009-08-06', '01:47:00', '26.4800', '22.7100', '1.1660', '0.0000'),
    (49, 'HD', 'LOW', '2009-08-06', '01:48:00', '26.4400', '22.7000', '1.1648', '0.0000'),
    (50, 'HD', 'LOW', '2009-08-06', '01:49:00', '26.4400', '22.7000', '1.1648', '0.0000'),
    (51, 'HD', 'LOW', '2009-08-06', '01:50:00', '26.4400', '22.6700', '1.1663', '0.0000'),
    (52, 'HD', 'LOW', '2009-08-06', '01:51:00', '26.4400', '22.6800', '1.1658', '0.0000'),
    (53, 'HD', 'LOW', '2009-08-06', '01:52:00', '26.4500', '22.6800', '1.1662', '0.0000'),
    (54, 'HD', 'LOW', '2009-08-06', '01:53:00', '26.4400', '22.6800', '1.1658', '0.0000'),
    (55, 'HD', 'LOW', '2009-08-06', '01:54:00', '26.4000', '22.6600', '1.1650', '0.0000'),
    (56, 'HD', 'LOW', '2009-08-06', '01:55:00', '26.4000', '22.6500', '1.1656', '0.0000'),
    (57, 'HD', 'LOW', '2009-08-06', '01:56:00', '26.4100', '22.6500', '1.1660', '0.0000'),
    (58, 'HD', 'LOW', '2009-08-06', '01:57:00', '26.4100', '22.6300', '1.1670', '0.0000'),
    (59, 'HD', 'LOW', '2009-08-06', '01:58:00', '26.4100', '22.6300', '1.1670', '0.0000'),
    (60, 'HD', 'LOW', '2009-08-06', '01:59:00', '26.4000', '22.6200', '1.1671', '0.0000'),
    (61, 'HD', 'LOW', '2009-08-06', '02:00:00', '26.3800', '22.6100', '1.1667', '0.0000'),
    (62, 'HD', 'LOW', '2009-08-06', '02:01:00', '26.3400', '22.5600', '1.1676', '0.0000'),
    (63, 'HD', 'LOW', '2009-08-06', '02:02:00', '26.3300', '22.5500', '1.1676', '0.0000'),
    (64, 'HD', 'LOW', '2009-08-06', '02:03:00', '26.3300', '22.5600', '1.1671', '0.0000'),
    (65, 'HD', 'LOW', '2009-08-06', '02:04:00', '26.3400', '22.5600', '1.1676', '0.0000'),
    (66, 'HD', 'LOW', '2009-08-06', '02:05:00', '26.3300', '22.5200', '1.1692', '0.0000'),
    (67, 'HD', 'LOW', '2009-08-06', '02:06:00', '26.3100', '22.5200', '1.1683', '0.0000'),
    (68, 'HD', 'LOW', '2009-08-06', '02:07:00', '26.3100', '22.5300', '1.1678', '0.0000'),
    (69, 'HD', 'LOW', '2009-08-06', '02:08:00', '26.3100', '22.5600', '1.1662', '0.0000'),
    (70, 'HD', 'LOW', '2009-08-06', '02:09:00', '26.3100', '22.5600', '1.1662', '0.0000'),
    (71, 'HD', 'LOW', '2009-08-06', '02:10:00', '26.3100', '22.5500', '1.1667', '0.0000'),
    (72, 'HD', 'LOW', '2009-08-06', '02:11:00', '26.3000', '22.5400', '1.1668', '0.0000'),
    (73, 'HD', 'LOW', '2009-08-06', '02:12:00', '26.3000', '22.5400', '1.1668', '0.0000'),
    (74, 'HD', 'LOW', '2009-08-06', '02:13:00', '26.3000', '22.5400', '1.1668', '0.0000'),
    (75, 'HD', 'LOW', '2009-08-06', '02:14:00', '26.3100', '22.5600', '1.1662', '0.0000'),
    (76, 'HD', 'LOW', '2009-08-06', '02:15:00', '26.3100', '22.5400', '1.1673', '0.0000'),
    (77, 'HD', 'LOW', '2009-08-06', '02:16:00', '26.3400', '22.5400', '1.1686', '0.0000'),
    (78, 'HD', 'LOW', '2009-08-06', '02:17:00', '26.3500', '22.5400', '1.1690', '0.0000'),
    (79, 'HD', 'LOW', '2009-08-06', '02:18:00', '26.3500', '22.5300', '1.1696', '0.0000'),
    (80, 'HD', 'LOW', '2009-08-06', '02:19:00', '26.3400', '22.5300', '1.1691', '0.0000');

    UPDATE `IntradayPairs`
    SET `Prc_Rto` = `Sym_A_Cls` / `Sym_B_Cls;

  9. #9
    Join Date
    Feb 2010
    Location
    Montreal, CA
    Posts
    32
    I would like to calculate the SMA for column `Prc_Rto`

    The following query takes 0.0003 seconds to execute on this data set on 80 records:
    SELECT x.Id,
    AVG(y.`Prc_Rto`) as SMA
    FROM `IntradayPairs` as x,
    `IntradayPairs` as y
    WHERE x.Id>=5 AND x.Id BETWEEN y.Id AND y.Id+4
    GROUP BY x.Id
    ORDER BY x.Id


    I will have 30,000+ records in the actual table and would like the optimal way to code this calculation.

    Thanks.

  10. #10
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by Diesel Dud
    I will use the actual example.
    I see you want to store the SMA value in the table rather than calculate it on the fly - which is fine by the way assuming you cope with price updates etc altering the SMA. Because you're only updating the data infrequently it shouldn't really matter whether it takes 70 secs or not. It seems silly to spend your time trying to bring this time down unless there's a good reason. A side benefit will be that I won't need to work my way through your data and SQL

  11. #11
    Join Date
    Feb 2010
    Location
    Montreal, CA
    Posts
    32
    Quote Originally Posted by mike_bike_kite View Post
    I see you want to store the SMA value in the table rather than calculate it on the fly - which is fine by the way assuming you cope with price updates etc altering the SMA. Because you're only updating the data infrequently it shouldn't really matter whether it takes 70 secs or not. It seems silly to spend your time trying to bring this time down unless there's a good reason. A side benefit will be that I won't need to work my way through your data and SQL
    calculation of the SMA is one of several steps needed to determine if the pair is profitable; e.g. after SMA, a simple moving avg for the standard deviation will be calculated

    the process will be repeated for periods of 60, 120, 240, and 390 periods

    these are just some of the steps which will be executed on this workfile of 30,000 records; the workfile will then be flushed and the next pair will undergo the same process; i have hundreds of pairs to process.

    as you can see, if i can shave off some time calculating these SMAs, total process time savings will be huge

  12. #12
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by Diesel Dud View Post
    as you can see, if i can shave off some time calculating these SMAs, total process time savings will be huge
    The savings may well be huge but I joined the thread under the impression it was a scientific issue - you said originally that it was temperature you were modelling rather than prices. I make my living working with financial databases and just feel uncomfortable doing it for free. My apologies but I'm sure someone else will chip in soon to help you out.

    Mike

  13. #13
    Join Date
    Feb 2010
    Location
    Montreal, CA
    Posts
    32
    .....bump

  14. #14
    Join Date
    Feb 2010
    Location
    Montreal, CA
    Posts
    32
    ....... bump

  15. #15
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    bump
    exponential smoothing
    I'd rather be riding on the Tiger 800 or the Norton

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •