10-27-07, 11:32 #1Registered User
- Join Date
- Oct 2007
Best software to build large DB for meta-analysis of gene expression data?
The research lab I run at the Univ. of Pennsylvania is starting a large meta-analysis of gene expression data from about 80 existing databases. The final database will have about 2 billion entries, each with about 50 properties to track. I prefer to run this on Mac's (x86) since we already have several of these plus the university has a new cluster of 8 Mac XServes that we can use.
As we start this project, I'm looking for feedback on the best DB to use. We routinely build DB in Filemaker but for this project I believe it will be too slow and is unable to directly perform many of the SQL queries (multiple joins, nested queries, etc.) that we are interested in. I built several Sybase SQL Server databases in the mid-90's and was satisfied with that program. However, I'm not familiar with most of the current databases.
The primary criteria would be speed, able to handle large data sets, Mac x86 native, and ability handle SQL queries directly (close to SQL-92 compliant?). A reasonable level of support and the availability of strong GUI interfaces to speed up development would also be a big plus since this is unpaid research. Moderate cost (<$1000) for software plus support is okay. Don't care much about Web interfaces, multiple clients, security. We anticipate this being an in-house database with just 2-3 users.
After a bit of reading I'm favoring MySQL AB Community Server (or maybe Enterprise- is the support worth it?) plus the MySQL Administrator application package. Or is CocoaMySQL better? Or should I select a different database altogether? Any advice will be greatly appreciated. Thanks, Tim
10-27-07, 13:18 #2SQL Consultant
- Join Date
- Apr 2002
- Toronto, Canada
10-27-07, 15:34 #3Resident Curmudgeon
- Join Date
- Feb 2004
- In front of the computer
If you are looking to match the CDC base genome implmenetation, that was done using MySQL 5.
Based on your description of what you plan to do and the amount of data that you expect to use, I'm making some assumptions. The following discussion depends on those assumptions. It sounds like you are looking for meta expressions at the molecular level. Probably looking for interesting RNA strings, possibly using either chaos or micro-string theory (I happen to be diabetic, so this fascinates me).
Be picky about the MySQL installation. Specifically, for this kind of implementation you want to use MySQL 5 and unless you are willing to devote a lot of human time to quality control, you want to stay with only Inno-DB.
As a side note, if you are pursuing any of the research that was recently abandoned by B&D, some of the computer/database researchers that worked for Dr. Ginsberg are available and would love to continue that research in an academic setting. Their experience could save you thousands of hours of effort!
10-29-07, 14:47 #4Registered User
- Join Date
- May 2005
- San Antonio, Texas
That sounds similar to Bioinformatics dbs that other places have built.
One of the groups in college used postgresql for the bioinformatics db project.. but again that was a college project hehe
You could possibly do a search on the web for "bioinformatics database" and contact some of the institutions that have worked with this before specifically and get an idea maybe of the issues they had or things you may run into.
I guess the biggest worry would be search speed. There are tools you can add to search through long text strings better for some dbs. I am not sure what your needs are specifically, but some of the other places may have run iinto that.
You could probably talk to the Lyle Ungar at Penn
hopefully this is somewhat helpful :P maybe not! good luck!Vi veri veniversum vivus vici
By the power of truth, I, a living man, have conquered the universe