Hi there folks. Short time reader, first time poster.
I'm working on a relatively simple application for displaying real estate listings and doing some other processing on the listing data.
Our current MLS database of listings is about 100k rows. About 15k of those listing rows are active and of the most interest to folks and, to be honest, to my application.
Our MLS system allows up to 12 photos per listing. This means that I'll have about 65k photos of about 70k each if I were to download them all. Not every listing has all 12 photos.
The issue I'm having is trying to figure out how to store these files so that they are easy/fast for me to access programatically and also so that when I'm manipulating the data on the server that I don't have a folder with 65k files in it. The current ftp process that the MLS provides me dumps all the files into a single folder and it sucks. They are kind enough to use a simple naming system like [MLS#]-[picture index].jpg
I can do a new pull from the master MLS system anytime so the data does not need to be backup up or redundant, and it usually doesn't change, but on occasion it may if someone changes the photos for a given listing. The only time writes to the files are typically done is in a batch process as incremental updates occur daily.
I searched the forum and noted some other suggestions, but to be honest I'm not sure which system would be good for my application and for allowing it to grow as it does every single year with the addition of new listings.
What I came up with were the following:
1) Store all photos in the file system in a big directory. This sucks.
2) Store the photos in the file system, but do a mod100 on the mls# and then store the files in 100 separate folders hoping that the random MLS numbers generated are evenly distributed.
3) Store all files in the database
4) Store a reference to each file in the database and leave the files in the file system.
I am using mysql, php, and dojo to create my application if that helps with your giving me insight. My server seems plenty capable for the limited load that I'll have initially. It's a dual core Xenon @ 2.33GHz, 4 Segate SATAII drive of 400GB each, 4GB RAM, and we're running SUSE and Win2k3 64bit.
Any help would be great. And if I'm leaving key info out to help just let me know.
Thanks
Joe