Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Dec 2003
    Location
    South Africa
    Posts
    9

    Question Unanswered: Image Data - Where should it be stored

    Been pondering the idea of putting 43 - 56K Pdf documents into a SQL DB but everything I've read only goes as far as to explain how the data is stored or manipulated.

    I need to establish whether there are real performance issues or gains that will be experienced by doing this.

    If indeed, performance would be a big issue, would storing the files in a Folder on another drive outside the Database be all that much faster (Let windows handle the fetch and carry instead of SQL Server).

    The plan at the moment is to store all the image data in a separate data file or multiple data files. Using partitioned views is also a consideration.

    Number of records per day (pdf's) would be somewhere around a 1000.

  2. #2
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    You also need to think through the archiving requirements/possibilities. No matter where you end up storing it, no drive will be big enough aft certain period of time.

    If Image field vs. physical file are your only choices, I'd go with the first one. Your documents will be safer there. Consider SQLLiteSpeed as your backup tool, so that you can still perform your backups successfully when the database size grows to a considerable number.

  3. #3
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    My vote...

    Store them on the server and store the path in the database...keep each as separate files....

    MOO
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  4. #4
    Join Date
    Dec 2003
    Location
    South Africa
    Posts
    9
    Originally posted by rdjabarov
    You also need to think through the archiving requirements/possibilities. No matter where you end up storing it, no drive will be big enough aft certain period of time.

    If Image field vs. physical file are your only choices, I'd go with the first one. Your documents will be safer there. Consider SQLLiteSpeed as your backup tool, so that you can still perform your backups successfully when the database size grows to a considerable number.
    Agreed that doc's might be safer inside DB but the real issue is where long term performance will be the best. In or out?

    Archiving or even removal of the pdf's after a set time is definitely a must and SQL Litespeed would be the only option once the DB gets bigger.

    Maybe I should just put a few hundred thousand of these things into a test DB and see how it goes and then do the same with the Filesystem Option.

    Thanks

  5. #5
    Join Date
    Dec 2003
    Location
    South Africa
    Posts
    9
    Originally posted by Brett Kaiser
    My vote...

    Store them on the server and store the path in the database...keep each as separate files....

    MOO
    We've used this approach on another project and so far there haven't been any complaints or problems. Data size for the images on that one is at around 11 Gigs.

    Thanks

  6. #6
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Brett,

    I highly respect your opinion, but have to force myself to disagree with you this time.

    Having docs stored on the file system, while having their paths stored in a table(-s) not simplifies but rather complicates your application architecture. As per performance, - it will not be any different whether you have the actual document stored in the image field, or its path in a varchar/nvarchar field. In fact, in case you choose the latter, your performance WILL be worse because of the mere presence of that nvarchar field. Keep in mind, that field may have to be as big as 4000 in case with nvarchar (which will translate to the same size as varchar(8000)) because while the file name cannot exceed 128 characters, the total path may and possibly will.

    Also, I don't think anybody would do any mass operation on that table that would involve the Image field, so, again, the performance will actually be better.

  7. #7
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    I bow to your graciousness....

    Now to be fair, I've played with porting images in and out, and found it to be painful (MOO to that)

    Worked with a Contenet mangament app once and they stored all their images outside the db

    Then there's Bills Articel

    http://www.sqlteam.com/item.asp?ItemID=986

    And probably many more....

    EDIT:

    And a more balanced opinion

    http://www.aspfaqs.com/aspfaqs/ShowFAQ.asp?FAQID=172
    Last edited by Brett Kaiser; 02-18-04 at 17:42.
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  8. #8
    Join Date
    Nov 2003
    Location
    Phoenix,AZ
    Posts
    5

    Re: Image Data - Where should it be stored

    1000 PDF a day???!!!!! That's a fair number. Where I work, we image all the companies paper. SQL tables get large very fast, as we image several thousand pieces of paper a day. My whole point to this, if you can afford it, why not look into either a small, introductory imaging system (Kodax/kofax come to mind) or a small SAN?

    Our setup consists of 5 scanning stations, 3 quality control stations and 2 release stations. Part of the business consists of scanning pages, creating pdf's and putting those pdf's on line, to be viewed over the intranet. The process is fast and easy. We keep pointers to scanned images in approx 100-125 SQL tables (Each business unit gets it's own folder. Images are stored as bin files and written to optical media on a continious basis.) We have approx 100-125 users that view documents from a web application.


    Originally posted by glyve
    Been pondering the idea of putting 43 - 56K Pdf documents into a SQL DB but everything I've read only goes as far as to explain how the data is stored or manipulated.

    I need to establish whether there are real performance issues or gains that will be experienced by doing this.

    If indeed, performance would be a big issue, would storing the files in a Folder on another drive outside the Database be all that much faster (Let windows handle the fetch and carry instead of SQL Server).

    The plan at the moment is to store all the image data in a separate data file or multiple data files. Using partitioned views is also a consideration.

    Number of records per day (pdf's) would be somewhere around a 1000.

  9. #9
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    Provided Answers: 11
    Hmm. Let's see what that recipe was....add one can gasoline...light match..

    I am actually a bit split on this idea. I can afford that position, since I have not had to deal with any of these systems to a large extent.

    Microsoft (in their finite wisdom) believes that files should be kept in the database. At least when it comes to Sharepoint Team Services. They claim it is easier to maintain one copy of the document, and have twnety server serve it up, than it is to have twenty webservers with the same document on it. This overlooks one small point, of course. Just how much network traffic is there on that one poor network card, if twenty servers are all clamoring for this one document? Also, Sharepoint is geared more for editing documents, so they do have a point there. Easier to maintain one copy of each living document, than try to keep a bunch of other servers up to date on even 1000 documents.

    Now, are these documents of yours static? Then you can drop a pile of network traffic, because the problem of keeping a server farm up to date becomes trivial (or left as an excercise for the interested reader).

    This is all fine and well, of course, if you are "swinging for the fence" (hey, baseball season starts in 47 days). If you are planning on only one web server and one DB server, you can go either way. Depending on your expected amount of traffic, scalability requirements, size of the disk on the web server, colour of your bosses toupee, etc..

    I do not know of any reason why you would need (or want) to keep paths as nvarchar. Also, I have never seen a path of 4,000 characters. I assume rdjabarov has been bit bad by one of these, though. Main thing to keep in mind at the design time will probably be how in heck are you going to organize all of these documents?

    One last thing to note. Microsoft put out the "TerraServer" way back in 98 or so. It was a terabyte database running on SQL 7.0. Naturally, it was all image data. I have no idea how much maintenance it required, but it was running everytime I went to see the website....at least, so they said ;-)


    Opinions expressed above may not reflect those of the owners/operators of this website, or indeed anything coming close to being rational.

  10. #10
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    I indeed may sound very pationate about the whole idea, but I not only experimented with both methods, but also had to switch from file-based to database-based mid-way through development (half of the app was already in prod) And it was done not on 2K or 7.0, but in 6.5, with VB client and RDO as a "data truck." And we too were storing images, TIFF scans of HCFA-1500 and UB-92 claims, along with all the values coming from them into corresponding tables. The volume exceeded 1000 total claims a day, but the membership was growing rapidly (rigfht up untill the company went broke, and not because of database-based image storage )

    Advantages: no need to search on image field, and no affect on performance; If your sysadmin decides to move folders around (except for data and log-related folders) you are not affected; if the whole server goes CAPUT and the only thing that is left in your hands is a last-night tape backup, - you lost NOTHING!!! Contrary to "path or URL" as suggested by one of the articles mentioned by Brett (cheers ), where not only you need a db backup, but also the folder(-s) backup in order to claim you're back in business.

  11. #11
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    Provided Answers: 11
    I feel your pain. Only database corruption I ever faced (outside of a disk controller going screwy) was in image data in SQL 6.5. Although, the fool application was attempting to use the image data as an encrypted jumble of all sorts of order information, so it was constantly updating the image data.

    1000 claims per day does not sound too bad, though. For an 8 hour day, that comes out to 125 per hour....carry the one...2 per minute or so. Naturally, you would have gusts in there. Those users are never so polite as to put these things in in an orderly and uniform rate, eh? How did the application perform?

  12. #12
    Join Date
    Dec 2003
    Location
    South Africa
    Posts
    9
    Thanks People. Life seems a bit less obscure now.

    BTW the documents will all be read only.

  13. #13
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    It performed OK, the images were retrieved only when there was a dispute with a provider, or for case management purposes. Since the entire claim information was stored into appropriate table structure, there was never a need to even touch the original image, or do anything else with it. Purging was designed for a 2-year window back, after which the images were placed on microfilm, and the data was stored in an aggregate form of lag, MLR, Encounter summary, and other analytical reports.

  14. #14
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Originally posted by MCrowley
    This is all fine and well, of course, if you are "swinging for the fence" (hey, baseball season starts in 47 days). .

    Gotta be a sox fan.....

    They count the days you see.......

    Thank god for their football team, or the entire state would be in therapy...



    (Sorry Mcrowley, had to get a zinger in their)

    And as far as the topic goes...I've only seen one system deliver content from a blob...and that was home grown and on Oracle (major internal differences).

    Content management, Kodak imagining systems ( now weren't those platters expensive!), Self service apps, Peoplesodt (has the ability to store images)...and every implementation stored them on the server...

    As to the backup question, sure it's not in the db, just another part of the nightly back up process..

    Hey WHATEVER works for you...
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  15. #15
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    Provided Answers: 11
    That is quite all right, Brett. I have had some fun at others' expense here, as well ;-).

    Besides, THIS year, is THE YEAR....just like last year..........and the year before that...........and uh, well, the year before that... ;-)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •