Results 1 to 3 of 3
  1. #1
    Join Date
    Dec 2014

    Unanswered: To query or not to query- that.. (you know)

    I am looking at possible high database load in the future for a system I am developing.
    Without getting into detail, i will just ask the question using an example scenario...

    If I had a system that process incoming data records and I need to do a sql server query to validate that the data belongs to a valid member, let's say by checking that the email address exists in a table.
    If my system gets busy and needs to do that say a million times per day, I am wondering if I may be better off resource wise to use a different approach to avoid database queries.
    Again using email addresses as an easy example, instead of doing a query for every data record that needs to be checked to see if there is a row in the table containing that email address, I could do this:
    1. Whenever a new row is added to the database, write a file to the file system such as SomeEmailAddress.txt and check for that file instead of querying the database.
    I also thought maybe putting each file starting with 'a' in a folder called 'a' etc.

    Am I thinking too far outside the box?

    Maybe using caching (??) I don't need to worry about that?

    Any comments appreciated.

  2. #2
    Join Date
    Jan 2003
    Provided Answers: 11
    Provided there is an index on the email column, a query of the form
    select 1 from bigtable where email = 'a string here'
    should be very quick. If the email addresses in the table are unique, you should see an index seek each time resulting in only a few reads. Just remember, there is a 900 byte limit on index key sizes, so if you are planning on going international with your email addresses, you might get into trouble with email addresses over 450 characters in length. Still, that seems like a pretty generous limit. I would say try it without the caching layer, or the other extra complexities, and see how it goes. If you see a problem when you go live with it (or hopefully during load testing), then you can indulge in some of the more exotic optimizations.

  3. #3
    Join Date
    Feb 2004
    In front of the computer
    Provided Answers: 54
    Just to get a feel for the difference, I tried a sample with a bit over a million email addresses and used a PowerShell script to process a couple of data sets checking for duplicates. The results were heavily weighted toward SQL Server instead of toward the file system... I only had one case where the Filesystem check took less than 100 times as long as the SQL check.

    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts