Results 1 to 7 of 7
  1. #1
    Join Date
    Jun 2005
    Posts
    319

    Unanswered: partitioning data using the Schema?

    So we are using SQL Standard Edition and do not have access to table partitioning. Not even sure table partitioning in itself will help with this scenario.

    I'm throwing around some options to organize our data as it comes in better.

    We have a bunch of servers collecting data, we're call them nodes. Data is split up by hour into hr_{hh} type tables. There's also the concept of a GroupID which the data eventually ends up residing in based on Client, so we have tables of the format:

    Client_{cccc}_Group_{gggg}

    Currently all groups are dumped into the same hr_{hh} table depending on what time it is.

    Part of the challenge is aggregating all of this data using sql jobs and keeping track of the last successful batch. We currently have 5 servers aggregating data from about 50 nodes. To simplify the process (read performance on the nodes) I think partitioning the data beyond just hour would be helpful.

    Options:
    1) Create a separate DB for each Group. -- Not feasible, some Groups are much larger than others and predicting this and being able to pregrow the DB's is not an option.
    2) Create a separate table for each Group/Hour combination. -- This what my gut instinct is telling me. A peer I work with thinks it is too much maintenance for the nodes to manage in terms of creating, truncating, dropping the tables. He might be right, but I think with proper coding it could be manageable.
    3) Using schema to help organize the data better. Each shema would be somthing like G1234 or G2287 using the GroupID in the schema name. Then all tables could be standard: hr_00, hr_01, ... hr_23. I think we could get by with much less dynamic sql compared 2 above which would have tables in the form of: G1234_hr_00 instead of G1234.hr_00.

    I might be wrong about 3, and we would need the same amount of dynamic sql to manage the data pull, but I am wondering if this is an option to explore or a really bad use of schema (and why?).

  2. #2
    Join Date
    Jun 2005
    Posts
    319
    Quick follow up if I want to use schema and not have to deal with dynamic sql, can I set the current schema dynamically. I tried something like this and it doesn't work of course:


    Code:
    CREATE SCHEMA G1234 AUTHORIZATION dbo
        CREATE TABLE hr_0 (trackerId int, hr_0_data int)
    GO 
    
    declare @schema varchar(10)
    select @schema = 'G1234'
    
    select * from @schema.hr_0
    Msg 102, Level 15, State 1, Line 4
    Incorrect syntax near 'hr_0'.

  3. #3
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    If you choose a different approach, you might get where you want to be. Instead of trying to directly access tables, build access layer (stored procedures), and then you can perform conditional execution based on parameter values:
    Code:
    create procedure [G1234].sp__Select (...) as ...
    create procedure [G2287].sp__Select (...) as ...
    create procedure dbo.sp__Select (@schema sysname) as
       declare @exec varchar(max) = @schema + '.sp__Select'
       exec @exec
       return (0)
    go
    It only looks like we're using dynamic-SQL. In reality this is all precompiled execution plans.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  4. #4
    Join Date
    Jun 2005
    Posts
    319
    Thanks rdja..

    So you are suggesting creating the different schemas and then creating a different version of the stored procedure for each schema? And then our current procs would stay the same but call the wrapper proc at the bottom?

  5. #5
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Gag..

    All I am saying is that since you started investigating variable-based schema assignment at the time of execution, you can take the route that I pointed you to.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  6. #6
    Join Date
    Sep 2005
    Posts
    161
    How much data are you talking about per node and per hour? What do you do with it after you have imported it? You may not need to partition your data.

  7. #7
    Join Date
    Jun 2005
    Posts
    319
    We have reports that need to query the lifetime of the group (could be 90 days long, usually 60 is the median length), there a few stages the data passes through though.

    I have seen about 8-10 million rows for 1 group in 1 hours time. The insert performance degrades pretty significantly even after just a few hours, we'll see it go from 45% complete per hour all the way down to 10% complete per hour as the table fills up. That's a separate issue we're dealing with and researching if mysql would be more cost effective.
    Last edited by Gagnon; 12-23-10 at 12:43.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •