Welcome to the dBforums forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact support.

If you prefer not to see double-underlined words and corresponding ads, place your cursor
here for ContentLink opt out.

Go Back  dBforums > General > Database Concepts & Design > Best software to build large DB for meta-analysis of gene expression data?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-27-07, 12:32
brazelton brazelton is offline
Registered User
 
Join Date: Oct 2007
Posts: 1
Best software to build large DB for meta-analysis of gene expression data?

The research lab I run at the Univ. of Pennsylvania is starting a large meta-analysis of gene expression data from about 80 existing databases. The final database will have about 2 billion entries, each with about 50 properties to track. I prefer to run this on Mac's (x86) since we already have several of these plus the university has a new cluster of 8 Mac XServes that we can use.

As we start this project, I'm looking for feedback on the best DB to use. We routinely build DB in Filemaker but for this project I believe it will be too slow and is unable to directly perform many of the SQL queries (multiple joins, nested queries, etc.) that we are interested in. I built several Sybase SQL Server databases in the mid-90's and was satisfied with that program. However, I'm not familiar with most of the current databases.

The primary criteria would be speed, able to handle large data sets, Mac x86 native, and ability handle SQL queries directly (close to SQL-92 compliant?). A reasonable level of support and the availability of strong GUI interfaces to speed up development would also be a big plus since this is unpaid research. Moderate cost (<$1000) for software plus support is okay. Don't care much about Web interfaces, multiple clients, security. We anticipate this being an in-house database with just 2-3 users.

After a bit of reading I'm favoring MySQL AB Community Server (or maybe Enterprise- is the support worth it?) plus the MySQL Administrator application package. Or is CocoaMySQL better? Or should I select a different database altogether? Any advice will be greatly appreciated. Thanks, Tim
Reply With Quote
  #2 (permalink)  
Old 10-27-07, 14:18
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
strong gui interfaces, low number of users suggests mysql

volume suggests postgresql

can't really go wrong with either

luckily most of the sql you learned with sybase is still applicable

__________________
r937.com | rudy.ca

pre-order my book Simply SQL from Amazon
Reply With Quote
  #3 (permalink)  
Old 10-27-07, 16:34
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 9,573
If you are looking to match the CDC base genome implmenetation, that was done using MySQL 5.

Based on your description of what you plan to do and the amount of data that you expect to use, I'm making some assumptions. The following discussion depends on those assumptions. It sounds like you are looking for meta expressions at the molecular level. Probably looking for interesting RNA strings, possibly using either chaos or micro-string theory (I happen to be diabetic, so this fascinates me).

Be picky about the MySQL installation. Specifically, for this kind of implementation you want to use MySQL 5 and unless you are willing to devote a lot of human time to quality control, you want to stay with only Inno-DB.

As a side note, if you are pursuing any of the research that was recently abandoned by B&D, some of the computer/database researchers that worked for Dr. Ginsberg are available and would love to continue that research in an academic setting. Their experience could save you thousands of hours of effort!

-PatP
Reply With Quote
  #4 (permalink)  
Old 10-29-07, 15:47
amthomas amthomas is offline
Registered User
 
Join Date: May 2005
Location: San Antonio, Texas
Posts: 134
That sounds similar to Bioinformatics dbs that other places have built.

One of the groups in college used postgresql for the bioinformatics db project.. but again that was a college project hehe

You could possibly do a search on the web for "bioinformatics database" and contact some of the institutions that have worked with this before specifically and get an idea maybe of the issues they had or things you may run into.

I guess the biggest worry would be search speed. There are tools you can add to search through long text strings better for some dbs. I am not sure what your needs are specifically, but some of the other places may have run iinto that.

You could probably talk to the Lyle Ungar at Penn
http://www.pcbi.upenn.edu/

hopefully this is somewhat helpful :P maybe not! good luck!
__________________
Vi veri veniversum vivus vici
By the power of truth, I, a living man, have conquered the universe
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On