11-18-10, 18:59 #1Registered User
- Join Date
- Nov 2010
Unanswered: Building Operational Data Source(ODS)
I am Integrating 3 different database sources into an ODS. The ODS is going to have the exact tables and columns as the source system. I just tried to load one sample table (3 columns) with 2 million records, just to check the systems. It took around 30 minutes, I don’t want to go into asking you guys how to optimize it.
My plan is to create a schema, and the table structure for all the sources in the ODS in the first week and once the structure is ready I will start loading data from one database source at a time
Q1) Is there a golden rule or method to follow while creating the table structure or should I just start doing the create table scripts for all the tables in the ODS?
Q2) Since the tables are relational in the source system, and my guess is that ODS need not have any relation among tables. So I don’t have to create any constraints (like PK, FK) while writing the create scripts. Correct me if I am wrong.
Q3) I am going to use SSIS to load data, so how do I create the packages, should I have each package for each source or should I break up the tables, say 5 tables in one package. Does this affect the performance?
It would be really great if you could answer these questions and also give me some tips/advice.
I owe you guys. Thanks
11-18-10, 23:02 #2Resident Curmudgeon
Provided Answers: 54
- Join Date
- Feb 2004
- In front of the computer
A1) None that I know of, but that doesn't mean a whole lot.
A2) You only need RI if you want the results to be correct. If you only need an answer, but it doesn't have to be right then save yourself time and effort and just make one up.
A3) This answer depends on many factors. There is usually a profound difference, but which of the N ways is faster usually takes about 2N attempts to be certain.
If you are moving small amounts of data (under a gigabyte, the process and technique isn't too important. They are all roughly equivalent.
If you are moving large amounts of data (several terrabytes), then the process and techniques can become a deal maker or breaker. For ease of construction, you can't beat SSIS. If performance becomes critical, you may want to resort to one of the "old standby" methods that are harder to set up, but usually several orders of magnitude faster to run.
-PatPIn theory, theory and practice are identical. In practice, theory and practice are unrelated.
11-18-10, 23:14 #3Registered User
- Join Date
- Jun 2005
1 time load or recurring? A lot depends on your reporting needs and how much scrubbing you need to do.
To improve the import you can have multiple threads running simultaneously (I am not talking 3, I am talking maybe 5x3) but you will get much better performance if you can target separate tables for the import, limit indexes. Multi-core CPU helps, proper physical db design (filegroups w/ 1 file per processor each file located on it's own disk, pre-grow the db's turn off autogrow)
You don't need to use SSIS, you can do this just through scheduled jobs (assuming recurring import not 1 time). I would wait to put it into SSIS until after I am comfortable with my code (both performance and accuracy-wise).