Results 1 to 6 of 6
  1. #1
    Join Date
    Feb 2012
    Location
    Austria
    Posts
    3

    Question Unanswered: stocks database design- slow query response

    Hello Community,

    I bought historic end-of-day data from stocks, indexes, currencies and commodities (1992 to present) from a online vendor. In order to work with this data using R, I wrote the price data into a mysql database by using a handmade script.

    The filled database now contains 15k symbols (stocks like AAPL, MSFT, ...) and 22,561,384 daily prices.

    The SQL code to retrieve all daily data from one symbol would be:
    Code:
    SELECT date,open,high,low,close,volume FROM stockname 
    LEFT JOIN stockprice ON (id=stock_id)  
    WHERE code='MSFT' AND date >='1992-01-01' ORDER BY `date`;
    
    5326 rows in set (4 min 32.58 sec)
    This delivers all data correctly, but takes 4 and a half minute, which I consider a looong time.

    Is there anything I can do to get better performance?

    I'm on Ubuntu 11.10, standard out- of the box MySQL server configuration, ntel® Core™2 Duo CPU E8200 @ 2.66GHz × 2, 2GB RAM.

    Thanks for your help,

    Andreas


    The database structure looks like this:
    Code:
    --
    -- Database: `va_stratlab`
    --
    
    CREATE TABLE IF NOT EXISTS `exchange` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `name` varchar(30) COLLATE latin1_general_ci NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=MyISAM  DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
    
    -- --------------------------------------------------------
    
    --
    -- Table structure for table `stockname`
    --
    
    CREATE TABLE IF NOT EXISTS `stockname` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `name` varchar(100) COLLATE latin1_general_ci NOT NULL,
      `code` varchar(20) COLLATE latin1_general_ci NOT NULL,
      `exchange_id` int(10) unsigned NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=MyISAM  DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
    
    -- --------------------------------------------------------
    
    --
    -- Table structure for table `stockprice`
    --
    
    CREATE TABLE IF NOT EXISTS `stockprice` (
      `stock_id` int(10) unsigned NOT NULL,
      `date` date NOT NULL,
      `open` float(12,4) DEFAULT NULL,
      `high` float(12,4) DEFAULT NULL,
      `low` float(12,4) DEFAULT NULL,
      `close` float(12,4) DEFAULT NULL,
      `volume` bigint(20) unsigned DEFAULT NULL,
      `oi` int(10) unsigned DEFAULT NULL,
      PRIMARY KEY (`stock_id`,`date`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    You specified condition for date in WHERE clause, like
    WHERE ... AND date >='1992-01-01' ...

    At that time, LEFT OUTER JOIN get equivalent to INNER JOIN.

  3. #3
    Join Date
    Jan 2012
    Posts
    84
    Create an combined index on code+id fields for stockname table:
    Code:
    create index code_id on stockname( code, id );
    You may also try to create an index on stock_id + date columns for stockprice table
    and check if this improve performance of the query further, but this index may slow down inserts of new records into this table:
    Code:
    create index stockid_date on stockprice( stock_id, date );

  4. #4
    Join Date
    Feb 2012
    Location
    Austria
    Posts
    3

    Thumbs up

    Thanks for your advice. I moved the database to my laptop (same OS, but different hardware: Intel® Core™2 Duo CPU P8600 @ 2.40GHz × 2, 2GB RAM, Solid State Disk)

    Then I tried the following queries. I always used different symbols to avoid query cache.

    Code:
    SELECT date,open,high,low,close,volume FROM stockname 
    LEFT JOIN stockprice ON (id=stock_id)  
    WHERE code='MSFT' AND date >='1992-01-01' ORDER BY `date`;
    
    5065 rows in set (2 min 50.99 sec)
    Compared to the 4 1/2 minutes at the other PC, the solid state disk in my laptop seems to be almost twice as fast.

    Code:
    SELECT date,open,high,low,close,volume FROM stockname  LEFT JOIN
    stockprice ON (id=stock_id)   WHERE code='AAPL'  ORDER BY `date`;
    
    5326 rows in set (1.22 sec)
    without the "AND date >='1992-01-01'" clause, the query is processed in just one second. Now let's add the index as kordirko suggested:

    Code:
    mysql> create index code_id on stockname( code, id );
    Query OK, 15447 rows affected (0.09 sec)
    Records: 15447  Duplicates: 0  Warnings: 0
    
    SELECT date,open,high,low,close,volume FROM stockname  LEFT JOIN
    stockprice ON (id=stock_id)   WHERE code='MO' AND date>='1992-01-01'  ORDER BY `date`;
    
    5067 rows in set (1.01 sec)
    
    SELECT date,open,high,low,close,volume FROM stockname  LEFT JOIN
    stockprice ON (id=stock_id)   WHERE code='DJ'  ORDER BY `date`;
    
    
    5244 rows in set (0.62 sec)
    whow, that's a whole lot faster! I can live with this performance.

  5. #5
    Join Date
    Feb 2012
    Location
    Austria
    Posts
    3
    @ kordirko

    the table stockprice has a primary key

    PRIMARY KEY (`stock_id`,`date`)

    isn't that a index already?

    Well I added the stockid_date index:
    Code:
    create index stockid_date on stockprice( stock_id, date );
    
    SELECT date,open,high,low,close,volume FROM stockname  LEFT JOIN stockprice ON (id=stock_id)   WHERE code='C' AND date>='1992-01-01'  ORDER BY `date`;
    
    5246 rows in set (0.67 sec)
    
    
    SELECT date,open,high,low,close,volume FROM stockname  LEFT JOIN stockprice ON (id=stock_id)   WHERE code='AA' ORDER BY `date`;
    
    5328 rows in set (1.19 sec)
    this time, the query without the AND date clause takes longer than the the query with. Overall, the additional index doesn't bring more speed.

  6. #6
    Join Date
    Jan 2012
    Posts
    84
    Quote Originally Posted by wotuzu17 View Post
    @ kordirko

    the table stockprice has a primary key

    PRIMARY KEY (`stock_id`,`date`)

    isn't that a index already?
    I apologize, I didn't notice this.
    In this case my advice is pointless.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •