Results 1 to 14 of 14
  1. #1
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912

    Unanswered: Updating n rows at once - best practice (again)

    I have put this thread on as a follow up discussion to some concepts that surfaced here:
    http://www.dbforums.com/showthread.php?t=1208316
    here:
    http://www.dbforums.com/showthread.php?t=1604936
    and also here:
    http://www.dbforums.com/showthread.php?t=1606555

    The gist is: Is it ever correct\ excusable\ advisable to pass a non-normalised csv list of data to a stored procedure? When I first asked this I thought it was simply an implementation issue. Pat thinks it is but he also holds an ideological objection too.

    If I seem a little obsessed with this topic it is simply that
    a) I don't want to hijack yet another thread
    b) I am

    Anyway - I PMed Pat about this and he (quite rightly) suggested I post publicly. So here we go:

    I think I have struggled a little getting my head round the concepts you are using. And here (I think) is why:
    Normalisation starts with a logical model and ends with the physical implementation of the database (as I see it...). Relational theory pervades these two but extends further to SQL and database objects (views, sprocs etc). I work exclusively (with the exception of some legacy I stuff I simply must update) with the sproc database-API methodology. As such, my databases are abstracted from the other tiers. Now - I think this is the crux of it for me - I typically think of the sproc API as another tier (perhaps tier 1.5). Although the database itself and the objects that manipulate it are housed by a single application I don't actually think of the sprocs as part of the database. They are dependant upon it and inextricably intertwined with it but they are not a part of it in my eyes. As such, I don't worry about producing denormalised output and I have never (up until you got me thinking about it) worried about accepting non-normalised input. The sproc is not the database. The sproc is just a code procedure that acts upon the database. As such - if it accepts a typless, denormalised input but subsequently types (validates) it and normalises it before it makes any changes to the data within the database then I struggle to see the scale of the problem. I am writing as something of a stickler for normalisation myself - I think that rather than querying how closely one should adhere to normalisation principles I am pondering at what point these principles come into play.

    As a follow up question - if SQL Server sprocs were to accept (for example) a table variable as an argument would this meet your requirement (since the data is verified and normalised before it gets to the sproc) or would you require a solution like this to be an ANSI Standard too?

    So - that's the nub of it - comments welcome on
    a) The practicality of passing csv lists to sprocs and subsequently parsing them
    and
    b) How "correct" or "incorrect" this is.

    Blindman and Pat both object on both counts (I think).
    Thrasmachus didn't have a problem with it (however to be fair that was a very early response to a specific question I asked).

    I don't hold a particularly strong opinion either way – in fact the “loop through the changes at the client and fire the sproc n times” route is probably much easier to code in any case - I'm just interested in the idea that it violates relational theory in some way.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    I don't consider sprocs to be a separate tier, but rather an (insulating) layer of the database.
    I'm not sure whether your concern is with the passing of data in CSV parameters, or the fact that the CSV data may be "non-normalized".
    "Non-normalized" implies that the CSV data represents values for multiple, different data elements. XML, perhaps. I would certainly balk at doing that, which is a step beyond passing mere comma delimited lists, but if you have to do it then stored procedures are the appropriate vehicle. As noted before, they serve as a layer of insulation between the database and the presentation layer.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Thanks Blindman

    Yeah - by csv lists I meant something like:
    Code:
    CREATE SPROC MySproc
     
    @CSVListArg AS VarChar(500)
     
    AS
    
    --Split the csv list and do stuff with it, such as validate, normalise and insert\ update
     
    GO
    And yes - the csv paramater would represent several rows. In fact - I wouldn't have considered it for a single row since I would just use n paramters (1 per field). Which kind of contradicts my thinking for it being acceptable for multiple records.

    I don't know if the distinction between tiers and insulation layers means much here. I suppose you use "of the database" whereas I have just never really thought of them as being "of the database".
    Testimonial:
    pootle flump
    ur codings are working excelent.

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    I make a strong distinction between "tiers" and "database layers". Stored procs are as much a part of the database as your skin is a part of you.
    And like Pat, I am persnickety about the form in which data is stored in the database, but I care much less about the form in which it is submitted to stored procedures. Sprocs is where the cleansing and validation occurs.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I see data integrity as a "cradle to grave" issue... There are things you do on the client, things you do in transit (at a web or app server), and things you do on the database in order to guarantee the integrity of the data.

    I see data existing across a spectrum. At the hardware end, it exists as bytes, these are rigidly defined and have exact and provable values. The wetware (humans) involved see data in a much more abstract and malleable form, much of which is at best a swag. There's a full spectrum in between that follows the data as it becomes more and more discrete and mathematically/logically precise.

    In creating the Table object for .NET, Microsoft managed to move a fair bit of the integrity managment to the client side. This makes the process more efficient in many ways, since it gets the reponse logically "closer" to the user, and also logicall extends the 'database" further into the process.

    It so happens that for SQL Server, this also allows the code to use the BCP library instead of having to schlep the data into text, fire it down the wire, for the server to convert it back to binary! Between the ability to avoid two rather "noisy" translations as well as the ability to use a drastically more efficient communications method, this performs a lot better than using SQL syntax to do the same job.

    As Blindman pointed out, stored procedures are code that is part of the database itself. While stored procedures were used to impliment a "middle tier" years ago, that practice has been depricated. At least as far as I'm concerned, sprocs implement business logic that needs to be held "close to the data". Using a stored procedure to emulate a cursor (passing multiple rows of data in a single call, then using Transact-SQL to parse and submit that data one row at a time) seems hideously wasteful to me. You're taking a scripting language that isn't particlarly fast/easy to interpret and burying it deeply inside some of your most time critical code. Egad!

    -PatP

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Quote Originally Posted by Pat Phelan
    I see data integrity as a "cradle to grave" issue... There are things you do on the client, things you do in transit (at a web or app server), and things you do on the database in order to guarantee the integrity of the data.
    That's where we differ then. I view the database as a stand-alone application unto itself, totally responsible for monitoring and validating any data entered into it. Once I set up my sprocs, I don't give a whit what interface is being used to connect to it or what kind of validation they do. The interfaces may change or evolve over time, but as long as they restrict themselves to my sprocs, all will be well.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    i don't have any problem with a csv passed to a stored proc

    for example, on the client side, to add a new widget you must type in some widget data, which gets handled (eventually) by INSERT INTO widgets ( ... ) VALUES ( ... ) (in a stored proc with passed parameters if you like)

    now let's say we have the classic junction table called widgetoutlets which is a many-to-many table to relate widgets to the outlets (stores) table they are sold in, and you get to choose which outlets for the new widget you're adding via drop-down list with the multiple option or a series of checkboxes, so when you press submit...

    ... the html form field is a comma-delimited list of FK values, which, together with the FK for the newly-added widget, are used to insert the multiple rows in the many-to-many widgetoutlets table

    i would do this with a stored proc and pass it the list of keys

    i have no problem with this concept at all

    in an ideal world (hah! send us a postcard when you get there!) maybe you would have object façades that abstract your business layer cake with isomorphic inheritance methods of bidirectional workflow via modular components, but i wouldn't

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by Pat Phelan
    Using a stored procedure to emulate a cursor (passing multiple rows of data in a single call, then using Transact-SQL to parse and submit that data one row at a time) seems hideously wasteful to me. You're taking a scripting language that isn't particlarly fast/easy to interpret and burying it deeply inside some of your most time critical code. Egad!
    One quickie on that point - whenever I have conceptualised (and tested this) I have popped the input into a table var and updated rows as a set. Agreed though that I have to loop through the input in order to produce the normalised table but the updating\ inserting is a single execution.

    Although I can certainly accept the "bugger to maintain" and "do you not have the first clue about relational integrity?" arguments I can't believe that this method is slower than updating\ inserting a row at a time (assuming we are sticking with sprocs rather than BCP etc). In fact performance was the specific reason I considered it in the first place.

    Quote Originally Posted by r937
    maybe you would have object façades that abstract your business layer cake with isomorphic inheritance methods of bidirectional workflow via modular components, but i wouldn't
    Lol - your penchant for techno babble knows no bounds eh Rudy?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    At least from my previoius tests, the Table object was the fastest, by much more than thirteen times what using RPC (stored procedure) calls, but keep in mind that I ran out of data long before I ran out of capacity to process it. I didn't specifically test using multiple rows per RPC call like what you are doing, but I'd expect that you could have gotten as much as two or three times the performance of one call per row.

    I'd bet that you could get more performance from using the CSV list than you could from one RPC call per row. I'd be shocked if you could even approximate what you can get from using a Table object... I don't know where the "bottom" is using Table objects, I only established that it went way beyond what I needed.

    -PatP

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    I don't want to get in a whole other debate but... I take it from your championing of the Table object that you are not of the "you talk to my sprocs or you talk to the hand" camp of DBAs then? I suppose it is related as you are kind of like His Tonyness - "there is a third way".
    Testimonial:
    pootle flump
    ur codings are working excelent.

  11. #11
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Yes, you are correct.

    Using stored procedures to process rows forces serialization, just like cursors do. Both stored procedures and cursors deal with one logical row at a time, even though there are tricks that allow them both to deal with more than one physical row at a time. The serialization is probably the single largest block to scaling databases... You have to find ways to deal with groups of rows at once if you want to be able to scale solutions up.

    There are kinds of stored procedures (pipelines) that behave somewhere beyond the way that MS-SQL triggers behave. They allow you to execute procedural code on "flows" or streams of rows. I don't know of any commercially successful database that implements them yet, they're probably still three to five years from seeing commercial production code, but they'll allow the kind of control that Blindman wants and the ability to scale that I need.

    -PatP

  12. #12
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    BOOOO direct access to tables.
    HOORAY beer!
    Red Stripe!
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  13. #13
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by blindman
    BOOOO direct access to tables.
    Lol - in case there was ever any ambiguity over the years of your position on this
    Testimonial:
    pootle flump
    ur codings are working excelent.

  14. #14
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    Provided Answers: 11
    Pat. Did you try any tests with several processes using the table objects? I am curious about the concurrency. Do any of the table objects try to get a table lock?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •