Results 1 to 7 of 7

Thread: Join problem

  1. #1
    Join Date
    Feb 2004
    Posts
    8

    Unhappy Unanswered: Join problem

    Hi - I have a problem with an update (with a join) that I'm attempting to run on a table with 53 million records in.

    Basically I set this query off for the first time and it chugged away for 96 hours before I killed it - something was awry.
    I've restored the database to another server (just in case we have an issue with disks, compress, memory etc) and run some tests on samples of the 53 million and for low samples, the update runs in a reasonable time which increases in time directly in line with the increase in sample size:

    Top # sample from database tests:
    1,000,000 - 36 seconds
    2,000,000 - 71 seconds
    3,000,000 - 122 seconds
    but once I try 4,000,000 it runs and runs - got up to 36 minutes before I cancelled it.


    Can anyone see any reason for this? I don't think the query size is going up exponentially - because if you graph the sample size & time from 1-3 million, the line is linear.

    I'm currently adding a record ID to the table so I can select the bottom 4,000,000 so I can be sure that there's not some weird data somewhere between 3-4mill that is making the query go whoopsie.

    Here are the tables & query I am attemping to run:


    UPDATE MailingHistory_Sample
    SET MailingHistory_Sample.ERIValue = ListCategoryHierarchy2.[ERI Value]
    FROM ListCategoryHierarchy2
    WHERE MailingHistory_Sample.[SourceCode] = ListCategoryHierarchy2.[SourceCode ID]


    Each table is clustered on SourceCode ID/Sourcecode

    (varchars are COLLATE Latin1_General_CI_AS)

    CREATE TABLE [ListCategoryHierarchy2] (
    [Campaign ID] [varchar] (255) NULL,
    [Media ID] [varchar] (255) NULL,
    [Media Description] [varchar] (255) NULL ,
    [Media Selections] [varchar] (255) NULL ,
    [SourceCode ID] [varchar] (7) NULL ,
    [List ID] [varchar] (255) NULL ,
    [Mailing Date] [smalldatetime] NULL ,
    [List Category] [varchar] (255) NULL ,
    [Source Code Offer] [varchar] (255) NULL ,
    [ERI Value] [float] NULL
    ) ON [PRIMARY]
    (9,923 records)


    CREATE TABLE [MailingHistory_sample] (
    [MatchKey] [binary] (20) NULL ,
    [SourceCode] [varchar] (6) NULL ,
    [ListID] [varchar] (7) NULL ,
    [StationeryCode] [varchar] (5) NULL ,
    [PDYear] [varchar] (2) NULL ,
    [NaadID] [varchar] (10) NULL ,
    [OrderNumber] [char] (9) NOT NULL ,
    [CustomerNumber] [binary] (8) NULL ,
    [CampaignCode] [varchar] (4) NULL ,
    [ProductCode] [varchar] (4) NULL ,
    [ResponseType] [varchar] (1) NULL ,
    [HouseholdNumber] [bigint] NULL ,
    [IndividualNumber] [bigint] NULL ,
    [DataType] [varchar] (1) NULL ,
    [AddressNumber] [bigint] NULL ,
    [DateStamp] [char] (8) NULL ,
    [CountColumn] [int] NULL ,
    [MailingInstance] [int] NULL ,
    [PCPrizm] [char] (5) NULL ,
    [Postcode] [char] (7) NULL ,
    [PostalArea] [varchar] (2) NULL ,
    [TVRegion] [varchar] (3) NULL ,
    [ERIRange] [int] NULL ,
    [ERIValue] [float] NULL ,
    [MediaID] [varchar] (10) NULL
    ) ON [PRIMARY]


    SQL 2000

    Any thoughts? Could there be some critical mass of temp table size etc that I am hitting?
    thx
    w

  2. #2
    Join Date
    Feb 2004
    Posts
    8
    Ah, just got the (PRIMARY' filegroup is full) message after trying to insert an ID into the large table - could this be the bigger problem?

  3. #3
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Well, I'm not too fond of your syntax. You should link your tables in a join rather than the WHERE clause, though I don't know that this would be impacting your execution time. It might.

    UPDATE MailingHistory_Sample
    SET MailingHistory_Sample.ERIValue = ListCategoryHierarchy2.[ERI Value]
    FROM MailingHistory_Sample
    inner join ListCategoryHierarchy2 on MailingHistory_Sample.[SourceCode] = ListCategoryHierarchy2.[SourceCode ID]

    Also, drop any indexes on MailingHistory_Sample except one on [SourceCode]. This column should be indexed in both tables.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  4. #4
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Quote Originally Posted by waldopepper
    Ah, just got the (PRIMARY' filegroup is full) message after trying to insert an ID into the large table - could this be the bigger problem?

    Well that's one problem....

    Where are the Indexes for these tables?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  5. #5
    Join Date
    Feb 2004
    Posts
    8
    As I've mentioned, each table is index (clustered) on SourceCode ID/Sourcecode.

    We've actually managed to update all the 54 million rows in this table by adding an identity, and running this update in batches of 3million each - this took only a few hours.

    Its definitely a volume issue which I hit sometime after 3million where the query just runs for days and doesn't (as far as I can see) complete.

    Any ideas what this might be?

  6. #6
    Join Date
    Nov 2003
    Posts
    94
    Quote Originally Posted by waldopepper
    As I've mentioned, each table is index (clustered) on SourceCode ID/Sourcecode.

    We've actually managed to update all the 54 million rows in this table by adding an identity, and running this update in batches of 3million each - this took only a few hours.

    Its definitely a volume issue which I hit sometime after 3million where the query just runs for days and doesn't (as far as I can see) complete.

    Any ideas what this might be?
    If your database is running in Full Recovery mode, a complete before and after image of the update must be stored in the transaction log. If the log file is situated on the same physical drive as the primary data store, has too small an auto-grow value, or is badly physically fragmented, the overall update time can become very, very long.

    Options include: issuing an ALTER DATABASE command and backup to take the database into Simple Recovery before the update; relocate the transaction log to a dedicated physical drive; set the transaction log to very, very big and do not truncate its space to filing system when backing up.

  7. #7
    Join Date
    Feb 2004
    Posts
    8
    Ah thanks - that makes sense. I only learnt this morning about putting data files and transaction logs on different physical disks. That is the case for this database so we are going to slap in a new drive and split the two.

    thanks
    w

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •