hi .. im currently in the design phase of a new database that will someday publicly share a lot of the research data that my lab has been accumulating in the past few years.
the data is representative of 3d-structural information about some hearts
in short, what we want to do is
1. store our huge datasets (multiple files, 30 MB+ each)
2. process and query specific portions of these datasets (e.g., look at this portion of certain/all hearts and do a bunch of calculations)
3. have a good interface that will be able to not only take inputs and give numerical outputs, but also be able to
4. display graphs (e.g., histograms) and allow the person to
5. view (translate, rotate, zoom, etc.) some of our 3d models
6. in the long-term, all of this should be able to be done online, without downloading additional software (or if i do need to d/l an applet, it should be fairly small - no larger than a couple of mb)
ive worked out a table structure for how i want the data to be organized, and i have some of the equations worked out for our analysis. my questions are
1. should i make processing server-side or client-side?
2. i may end up caching some of the processed data somehow .. any recommendations of a good way to do this without 'wasting space'?
4. with respect to (3), are there benchmarks for performance speeds between these different languages? these datasets are huge and do take some good amount of time to process.
5. i plan to query info from the db using sql. should i do my calculations in java, or are there different recommended languages to use?
I saw your posting today, but I didn't have an opportunity to take the time to think about my responses, so I decided that late was better than haphazard.
1) Server side processing is the only real choice for data manipulation. Due to the quantities of data that you are discussing, client side processing could take hours or days due to transmission time (in the case of wireless devices like the Samsung i700 or the Kyocera 7135).
2) Caching strategy depends entirely on what you decide to cache, how long you need to cache it, and the performance difference between retrieving data from the cache versus recomputing it from scratch. It is impossible for me to tell you what is a good strategy until you know all three of those parameters.
3) Portability should probably be your first consideration for the GUI code. There is far too much dependance on device/platform characteristics in most present day GUI tools to suit me. The devices that I really want to reach are small, highly personalized, and wildly portable. They also have only minimal features that are uniformly available across all the devices that I want to reach. My advice is to shoot very, very low in the GUI area, to reach the broadest possible selection of delivery devices.
4) Yes, there are benchmarks available for almost every platform. Every published benchmark from a vendor will show the target platform as the best possible choice that you could make. This is the nature of vendor benchmarks!
5) SQL is a very good choice. Java might be, depending on how you choose to handle it. You might also consider proprietary packages like .NET, SAS, SPSS, etc. You might also consider language packages that offer customized features to support the kind(s) of manipulation that you plan to perform on your data. If LISP or MUMPS has already developed, industry standard packages that do what you want, that makes a pretty strong case for you to consider them.