| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

05-18-07, 11:46
|
|
Registered User
|
|
Join Date: May 2007
Posts: 7
|
|
|
How to design a database for 5000 time series of stock prices?
|
|
Hi,
I am a newbie to database. This is my first post. I am working on a project to create a database (databases) to store over 5000 time series of stocks, ETFs, futures and US economic data.
We plan to get all historical data from Bloomberg first and then update them every day.
The stock, ETF and futures time series basically have date, open, high, low, close, and volume. We may add new fields later if needed, such as open interest. Each time series may have different length because of different listing time and exchange open/holiday schedule. The US economic time series can be quarterly based, such as GDP, or monthly based, such as PPI.
We need an efficient and extendable database for us when we retrieve some time series from other programs for research. should we have four differnet databases for stock, ETF, futures and Economic data? should we create a table for each time series inside the database or put similar time series in one big table? If choose the second one, how to handle the date?
Can DB experts here give me some advice on the design of this database? Any suggestion is appreciated. Thanks. 
|
|

05-20-07, 12:48
|
|
Registered User
|
|
Join Date: Oct 2002
Location: Baghdad, Iraq
Posts: 697
|
|
Quote:
|
Originally Posted by stian1
I am a newbie to database. This is my first post.
|
Yeah, I figured.
Quote:
|
The stock, ETF and futures time series basically have date, open, high, low, close, and volume. We may add new fields later if needed, such as open interest. Each time series may have different length because of different listing time and exchange open/holiday schedule. The US economic time series can be quarterly based, such as GDP, or monthly based, such as PPI.
|
This is very vague.
Quote:
|
We need an efficient and extendable database for us when we retrieve some time series from other programs for research. should we have four differnet databases for stock, ETF, futures and Economic data? should we create a table for each time series inside the database or put similar time series in one big table? If choose the second one, how to handle the date?
|
I'm not sure you understand what "table" and "database" mean.
My design advice: buy "Introduction to Database Systems" by CJ Date. Read the first few chapters to learn what all the words mean. Design something, and as you work write down your guesses and assumptions as you go. It doesn't matter if you get it hopelessly wrong, because:
1. We'll be convinced that you're serious about this and not wasting our time.
2. You'll understand enough terminology so that your questions and our answers will make sense.
2. You'll have made mistakes and we'll be able to show you what they are using a model that you are familiar with, since you will have designed it.
|
|

05-21-07, 06:39
|
|
Jaded Developer
|
|
Join Date: Nov 2004
Location: out on a limb
Posts: 9,246
|
|
|
|
if you are an absolute beginner then scoi08y suggestion is worth while.. there may be better or worse books out there
if you have got some understanding of the basic concepts, then the bit on realtional design by paul Litwin on r937's site is well worth a read, and re-read, and did I mention well worth a read.
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
|
|

05-22-07, 14:20
|
|
Registered User
|
|
Join Date: May 2007
Posts: 7
|
|
Hi,sco08y
Thanks for your comments. I will check the database book you recommended. Actually, I probably don't need to design a relational database as most database guys do. My requirement is not complicate. 5000 time series of stocks or other financial instruments need to be stored in a sql server database and be updated everyday. Can you give me some initial suggestion on this before I finish reading basic database book? Thanks for your great help.
Quote:
|
Originally Posted by sco08y
Yeah, I figured.
This is very vague.
I'm not sure you understand what "table" and "database" mean.
My design advice: buy "Introduction to Database Systems" by CJ Date. Read the first few chapters to learn what all the words mean. Design something, and as you work write down your guesses and assumptions as you go. It doesn't matter if you get it hopelessly wrong, because:
1. We'll be convinced that you're serious about this and not wasting our time.
2. You'll understand enough terminology so that your questions and our answers will make sense.
2. You'll have made mistakes and we'll be able to show you what they are using a model that you are familiar with, since you will have designed it.
|
|
|

05-22-07, 14:21
|
|
Registered User
|
|
Join Date: May 2007
Posts: 7
|
|
thank you, healdem.
Quote:
|
Originally Posted by healdem
if you are an absolute beginner then scoi08y suggestion is worth while.. there may be better or worse books out there
if you have got some understanding of the basic concepts, then the bit on realtional design by paul Litwin on r937's site is well worth a read, and re-read, and did I mention well worth a read.
|
|
|

05-28-07, 17:19
|
|
Registered User
|
|
Join Date: Oct 2002
Location: Baghdad, Iraq
Posts: 697
|
|
Quote:
|
Originally Posted by stian1
Can you give me some initial suggestion on this before I finish reading basic database book?
|
That was my initial suggestion. I'm not talking about weeks of research; you can read a few chapters in an afternoon.
|
|

05-29-07, 10:25
|
|
Registered User
|
|
Join Date: May 2007
Posts: 7
|
|
Quote:
|
Originally Posted by sco08y
That was my initial suggestion. I'm not talking about weeks of research; you can read a few chapters in an afternoon.
|
Hi, sco08y
Can you check my two naive plans for this database?
Plan 1 uses two tables.
Table 1 -- Symbols
Fields: SymbolName SymbolCode
Table 2 -- Values
Fields: SymbolCode Date Open High Low Close Volume OpenInterest Earnings
Symbols represent stocks, futures and ETFs. There's a one-to-many relationship between Symbols and Values.
If each symbol has 20 years history, meaning roughly 5000 records, there will be 25 million records in table 2. I am using sql server express edition, which limit the database size up to 4GB. Should I switch to sql server developer edition, which has no limit on the database size?
I have no idea on the speed to retrieve all historical close prices of one single symbol from the huge table 2 and save them to other places,such as excel file. Will that be very slow?
Plan 2 seperate symbols into three catogories: stocks, futures and ETFs. I plan to use six tables.
Table 1--Stocks
Fields: StockName StockCode
Table 2--StockValues
Fields: StockCode Date Open High Low Close Volume OpenInterest Earnings
Table 3-- Futures
Fields: FutureName FutureCode
Table 4-- FutureValues
Fields: FutureCode Date Open High Low Close Volume OpenInterest
Table 5-- ETFs
Fields: ETFName ETFCode
Table 6-- ETFValues
Fields: ETFCode Date Open High Low Close Volume OpenInterest
Any suggestions and comments are appreciated. 3x.
|
|

06-11-07, 00:02
|
|
Registered User
|
|
Join Date: Oct 2002
Location: Baghdad, Iraq
Posts: 697
|
|
Quote:
|
Originally Posted by stian1
Hi, sco08y
Can you check my two naive plans for this database?
|
The best schema is the one that makes the best compromise between fidelity to the real world and engineering constraints. Presumably, you know how investments work and your business processes. And you know or need to learn the constraints of your hardware. I know neither, so I'm the one who's naive here.
To determine fidelity, you need to determine what your predicate for each table is and use common sense and expertise to judge how realistically it represents the data. As the book I mentioned explains it, a predicate isn't much more complicated than an English (or your preferred language) sentence.
Symbols
Fields: SymbolName SymbolCode
The predicate might be "there exists a symbol with name SymbolName and code SymbolCode."
Table 1--Stocks
Fields: StockName StockCode
Here the predicate might be "A corporation named StockName exists that is recognized under the stock exchange by the code StockCode."
Plan 1 is more general, but plan 2 refers to actual things. Making your design more concrete tends to make it easier to set up integrity constraints. (For example, stocks might have restrictions on what codes are valid whereas futures markets might have different restrictions.)
|
|

06-11-07, 03:06
|
|
Super Moderator
|
|
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
|
|
The issue with SQL Server development edition is that you can't use it in a production system. Yes, you can develop with it, but, when you deploy, the database will need to run on a standard server, at a minimum.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|