Results 1 to 11 of 11
  1. #1
    Join Date
    May 2007
    Posts
    7

    How to design a database for 5000 time series of stock prices?

    Hi,

    I am a newbie to database. This is my first post. I am working on a project to create a database (databases) to store over 5000 time series of stocks, ETFs, futures and US economic data.

    We plan to get all historical data from Bloomberg first and then update them every day.

    The stock, ETF and futures time series basically have date, open, high, low, close, and volume. We may add new fields later if needed, such as open interest. Each time series may have different length because of different listing time and exchange open/holiday schedule. The US economic time series can be quarterly based, such as GDP, or monthly based, such as PPI.

    We need an efficient and extendable database for us when we retrieve some time series from other programs for research. should we have four differnet databases for stock, ETF, futures and Economic data? should we create a table for each time series inside the database or put similar time series in one big table? If choose the second one, how to handle the date?

    Can DB experts here give me some advice on the design of this database? Any suggestion is appreciated. Thanks.

  2. #2
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Quote Originally Posted by stian1
    I am a newbie to database. This is my first post.
    Yeah, I figured.

    The stock, ETF and futures time series basically have date, open, high, low, close, and volume. We may add new fields later if needed, such as open interest. Each time series may have different length because of different listing time and exchange open/holiday schedule. The US economic time series can be quarterly based, such as GDP, or monthly based, such as PPI.
    This is very vague.

    We need an efficient and extendable database for us when we retrieve some time series from other programs for research. should we have four differnet databases for stock, ETF, futures and Economic data? should we create a table for each time series inside the database or put similar time series in one big table? If choose the second one, how to handle the date?
    I'm not sure you understand what "table" and "database" mean.

    My design advice: buy "Introduction to Database Systems" by CJ Date. Read the first few chapters to learn what all the words mean. Design something, and as you work write down your guesses and assumptions as you go. It doesn't matter if you get it hopelessly wrong, because:

    1. We'll be convinced that you're serious about this and not wasting our time.

    2. You'll understand enough terminology so that your questions and our answers will make sense.

    2. You'll have made mistakes and we'll be able to show you what they are using a model that you are familiar with, since you will have designed it.

  3. #3
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    if you are an absolute beginner then scoi08y suggestion is worth while.. there may be better or worse books out there

    if you have got some understanding of the basic concepts, then the bit on realtional design by paul Litwin on r937's site is well worth a read, and re-read, and did I mention well worth a read.
    I'd rather be riding on the Tiger 800 or the Norton

  4. #4
    Join Date
    May 2007
    Posts
    7
    Hi,sco08y

    Thanks for your comments. I will check the database book you recommended. Actually, I probably don't need to design a relational database as most database guys do. My requirement is not complicate. 5000 time series of stocks or other financial instruments need to be stored in a sql server database and be updated everyday. Can you give me some initial suggestion on this before I finish reading basic database book? Thanks for your great help.




    Quote Originally Posted by sco08y
    Yeah, I figured.



    This is very vague.



    I'm not sure you understand what "table" and "database" mean.

    My design advice: buy "Introduction to Database Systems" by CJ Date. Read the first few chapters to learn what all the words mean. Design something, and as you work write down your guesses and assumptions as you go. It doesn't matter if you get it hopelessly wrong, because:

    1. We'll be convinced that you're serious about this and not wasting our time.

    2. You'll understand enough terminology so that your questions and our answers will make sense.

    2. You'll have made mistakes and we'll be able to show you what they are using a model that you are familiar with, since you will have designed it.

  5. #5
    Join Date
    May 2007
    Posts
    7
    thank you, healdem.
    Quote Originally Posted by healdem
    if you are an absolute beginner then scoi08y suggestion is worth while.. there may be better or worse books out there

    if you have got some understanding of the basic concepts, then the bit on realtional design by paul Litwin on r937's site is well worth a read, and re-read, and did I mention well worth a read.

  6. #6
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Quote Originally Posted by stian1
    Can you give me some initial suggestion on this before I finish reading basic database book?
    That was my initial suggestion. I'm not talking about weeks of research; you can read a few chapters in an afternoon.

  7. #7
    Join Date
    May 2007
    Posts
    7
    Quote Originally Posted by sco08y
    That was my initial suggestion. I'm not talking about weeks of research; you can read a few chapters in an afternoon.
    Hi, sco08y

    Can you check my two naive plans for this database?

    Plan 1 uses two tables.
    Table 1 -- Symbols
    Fields: SymbolName SymbolCode
    Table 2 -- Values
    Fields: SymbolCode Date Open High Low Close Volume OpenInterest Earnings

    Symbols represent stocks, futures and ETFs. There's a one-to-many relationship between Symbols and Values.
    If each symbol has 20 years history, meaning roughly 5000 records, there will be 25 million records in table 2. I am using sql server express edition, which limit the database size up to 4GB. Should I switch to sql server developer edition, which has no limit on the database size?
    I have no idea on the speed to retrieve all historical close prices of one single symbol from the huge table 2 and save them to other places,such as excel file. Will that be very slow?

    Plan 2 seperate symbols into three catogories: stocks, futures and ETFs. I plan to use six tables.
    Table 1--Stocks
    Fields: StockName StockCode
    Table 2--StockValues
    Fields: StockCode Date Open High Low Close Volume OpenInterest Earnings
    Table 3-- Futures
    Fields: FutureName FutureCode
    Table 4-- FutureValues
    Fields: FutureCode Date Open High Low Close Volume OpenInterest
    Table 5-- ETFs
    Fields: ETFName ETFCode
    Table 6-- ETFValues
    Fields: ETFCode Date Open High Low Close Volume OpenInterest

    Any suggestions and comments are appreciated. 3x.

  8. #8
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Quote Originally Posted by stian1
    Hi, sco08y

    Can you check my two naive plans for this database?
    The best schema is the one that makes the best compromise between fidelity to the real world and engineering constraints. Presumably, you know how investments work and your business processes. And you know or need to learn the constraints of your hardware. I know neither, so I'm the one who's naive here.

    To determine fidelity, you need to determine what your predicate for each table is and use common sense and expertise to judge how realistically it represents the data. As the book I mentioned explains it, a predicate isn't much more complicated than an English (or your preferred language) sentence.

    Symbols
    Fields: SymbolName SymbolCode

    The predicate might be "there exists a symbol with name SymbolName and code SymbolCode."

    Table 1--Stocks
    Fields: StockName StockCode

    Here the predicate might be "A corporation named StockName exists that is recognized under the stock exchange by the code StockCode."

    Plan 1 is more general, but plan 2 refers to actual things. Making your design more concrete tends to make it easier to set up integrity constraints. (For example, stocks might have restrictions on what codes are valid whereas futures markets might have different restrictions.)

  9. #9
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    The issue with SQL Server development edition is that you can't use it in a production system. Yes, you can develop with it, but, when you deploy, the database will need to run on a standard server, at a minimum.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  10. #10
    Join Date
    Aug 2014
    Posts
    22
    I am also looking for similar database examples which stores the time series of stocks and futures. If anyone is aware of such example database, then please share that. I guess it would be easier to modify such database, rather then designing it from scratch.

    Thanks a lot

  11. #11
    Join Date
    Aug 2014
    Posts
    3

    This is how reporting tools densify certain dimensions.

    If you follow the SQL Query, you would see that other enhancements to SQL include the CASE expression and partitioned outer join. Moreover CASE expressions make it possible, if the logic useful in many situations. Partitioned outer join is an extension to ANSI outer and join syntax that makes users comfortable to selectively densify certain dimensions thus keeping others sparse.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •