Page 1 of 2 12 LastLast
Results 1 to 15 of 18
  1. #1
    Join Date
    Jan 2005
    Posts
    13

    Question Unanswered: Big table(?) or split between tables?

    Hi Guys

    I have an application that runs on several sites that has a table with 36 columns mostly ints och small varchars.

    I currently have only one table that stores the data and five indexes and since the table on one location (and others soon) has about 18 million rows I have been trying to come up with a better solution (but only if needed, I dont think I have to tell you that I am a programmer and not an dba).
    The db file size with all the indexes is more then 10gb, in it self is not an problem but is it a bad solution to have it that way?

    The questions are:

    Are there any big benefits if i split it into several smaller tables or even smaler databases and make the SPs that gets the data aware that say 2006 years data is in table a and so on?
    Its quite important that there are fast SELECTS and that need is far more important then to decrease the size of the database file and so on.

    How many rows is okay to have in one table (with 25 columns) before its too big?

    Thanks in advance.

    Best regards
    Johan, Sweden.

    Code:
    CREATE TABLE [dbo].[Cdr](
    	[Id] [int] IDENTITY(1,1) NOT NULL,
    	[Abandon] [varchar](7) NULL,
    	[Bcap] [varchar](2) NULL,
    	[BlId] [varchar](16) NULL,
    	[CallChg] [varchar](6) NULL,
    	[CallIdentifier] [uniqueidentifier] NULL,
    	[ChgInfo] [varchar](5) NULL,
    	[ClId] [varchar](16) NULL,
    	[CustNo] [smallint] NULL,
    	[Digits] [varchar](32) NULL,
    	[DigitType] [varchar](1) NULL,
    	[Dnis1] [varchar](6) NULL,
    	[Dnis2] [varchar](6) NULL,
    	[Duration] [int] NULL,
    	[FgDani] [varchar](13) NULL,
    	[HoundredHourDuration] [varchar](3) NULL,
    	[Name] [varchar](40) NULL,
    	[NameId] [int] NOT NULL,
    	[Npi] [varchar](2) NULL,
    	[OrigAuxId] [varchar](11) NULL,
    	[OrigId] [varchar](7) NULL,
    	[OrigMin] [varchar](16) NULL,
    	[Origten0] [varchar](3) NULL,
    	[RecNo] [int] NULL,
    	[RecType] [varchar](1) NOT NULL,
    	[Redir] [varchar](1) NULL,
    	[TerId] [varchar](7) NOT NULL,
    	[TermAuxId] [varchar](11) NULL,
    	[TermMin] [varchar](16) NULL,
    	[Termten0] [varchar](3) NULL,
    	[Timestamp] [datetime] NOT NULL,
    	[Ton] [varchar](1) NULL,
    	[Tta] [int] NULL,
    	[Twt] [int] NULL,
    	[DateValue] [int] NULL,
    	[TimeValue] [int] NULL,
    	[Level] [varchar](50) NOT NULL CONSTRAINT [DF_Cdr_Level]  DEFAULT ('x:'),
     CONSTRAINT [PK_Cdr] PRIMARY KEY CLUSTERED 
    (
    	[Id] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 10) ON [PRIMARY]
    ) ON [PRIMARY]
    Last edited by wendelstam; 11-20-07 at 11:09.

  2. #2
    Join Date
    Sep 2006
    Posts
    6

  3. #3
    Join Date
    Jan 2005
    Posts
    13

    not duplicate rows

    Thanks for the quick reply.
    Okay Ive read the articles, but the data in this table is records from a machine and are not duplicate, the name column is duplicated but I figured that it was a overkill to move one column out, all the other fields are diffrent from time to time. And all column always only has one value or null, so there is no need to have a parent-child relation to another table, or?

    The acctual question was not if I would split the columns in multiple tables but if I should split the rows in multiple tables.

  4. #4
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    What's the justification for splitting the table data into multiple tables of the exact structure?
    EDIT: That should probably read - "What are your thoughts behind doing this?"
    Last edited by gvee; 11-21-07 at 08:08.
    George
    Home | Blog

  5. #5
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    it's called horizontal table partitioning and it can be used to create partitioned views and this works best to increase performance if you are doing it over multiple file groups on multiple disks.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  6. #6
    Join Date
    Jan 2005
    Posts
    13
    The question is does that increase performance in a single disk/filegroup scenario.

    If I ask it like this, is it any problems related with a table with 18 million rows?
    If not I am quite happy with the current solution, but if I were to split the data in several tables maby one for the last months data in one "active" table (since most querys are on the most recent data) and all the other in an "archive" table.

    And If a split would be a good chooise, when to query a several months whats the best way to look in the two tables? Is it to have a SP that handels all this or should my data layer handle the access. Ex two SPs one for active and one for historic data and combine them with two calls if needed?

  7. #7
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by wendelstam
    If I ask it like this, is it any problems related with a table with 18 million rows?
    No - the number of rows is irrelevent. You need to see how things perform and act accordingly based on that. And although partitioning is an option there would be hundreds of things you'd want to consider first. Just to give you perspective, I work with non-partitioned tables with 1/2 billion rows and I expect a lot of the other posters here do also.

    Also Don is right, your table would benefit from normalisation and not just first normal form. Remember that it is the number of pages read\ written from disk not the number of rows that count. So if you want a performance justification (rather than logical justification) for normalising then a normalised database will distribute your data across multiple tables, reducing the row size and leading to more rows per page. YMMV of course depending on your queries.

    What sort of queries are running on it? Single row lookups or reports pulling in lots of data or a combination?

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Woah - this is probably terrible:
    Code:
     CONSTRAINT [PK_Cdr] PRIMARY KEY CLUSTERED 
    (
    	[Id] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 10) ON [PRIMARY]
    ) ON [PRIMARY]
    Unless your rows get a lot of updates making the data in the columns larger after the initial insert then the fillfactor on a monotonically increasing clustered index should be approaching 100. All your clustered index leaf pages are 90% empty (depending on data modifications).

    This is the sort of thing I mean by partitioning being one of your last options - lots of things to consider first

  9. #9
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Fancy scripting out the other indexes?

  10. #10
    Join Date
    Jan 2005
    Posts
    13
    Thanks guys seem like I have to really need to sit down and study some database architecture.

    I currently only have one SP that handels the querys that always are used in reporting, soo when a row is in the db its "never" changes. But since this tables store telephone call data records that are collected at runtime, it gets alot of small inserts quite often. Say 10-200 records every five minutes. The insert interval is different at diffrent sites. But as a I said once its in there its "never" gonna change.
    So the queries gets everyting from a couple of million rows to about 100 based on how long period your report is covering.

    How would you suggest I should normalize this table then, or point me to some good resource where I can find out.

    I'm sorry that this maby is newbe questions but thats really what I am so
    Really appreciates your help.

    Here are the rest of the indexes:

    Code:
    /****** Object:  Index [IX_DateValue]    Script Date: 11/21/2007 15:22:20 ******/
    CREATE NONCLUSTERED INDEX [IX_DateValue] ON [dbo].[Cdr] 
    (
    	[DateValue] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
    
    /****** Object:  Index [IX_Level]    Script Date: 11/21/2007 15:22:38 ******/
    CREATE NONCLUSTERED INDEX [IX_Level] ON [dbo].[Cdr] 
    (
    	[Level] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
    
    /****** Object:  Index [IX_OrigId]    Script Date: 11/21/2007 15:22:48 ******/
    CREATE NONCLUSTERED INDEX [IX_OrigId] ON [dbo].[Cdr] 
    (
    	[OrigId] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
    
    /****** Object:  Index [IX_TerId]    Script Date: 11/21/2007 15:22:56 ******/
    CREATE NONCLUSTERED INDEX [IX_TerId] ON [dbo].[Cdr] 
    (
    	[TerId] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
    
    /****** Object:  Index [IX_TimeValue]    Script Date: 11/21/2007 15:23:08 ******/
    CREATE NONCLUSTERED INDEX [IX_TimeValue] ON [dbo].[Cdr] 
    (
    	[TimeValue] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
    Last edited by wendelstam; 11-21-07 at 10:50.

  11. #11
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Bump fillfactor for the clustered index to 100.

    This is my fave db design article at the mo:
    http://www.tonymarston.net/php-mysql...se-design.html

    Please post your proc. If that is literally all you do with this table then we can tune it rather well. I suspect we will want to change your clustered index.....

  12. #12
    Join Date
    Jan 2005
    Posts
    13
    I have some reading to do

    This is the quite simple SP the table function SplitDelimitedVarChar takes a carchar and a split char and creates a table that I use to get the correct dates and [level]'s (called exchanges in the SP).

    And the way that I have come up with this code is by testing diffrent solutions and this has been the fastest, and as you can se its a quite simple select query so the columsn [DateValue] and [TimeValue] are ints that are created at insert based on the DATETIME column [timestamp] and are just the int representation of the date and time, and is there because its easier.

    Code:
    CREATE PROCEDURE [dbo].[spStudio_Get_Cdr]
    	@beginDate DATETIME,
    	@endDate DATETIME,
    	@beginTime INT,
    	@endTime INT,
    	@subscribers VARCHAR(MAX),
    	@exchanges VARCHAR(MAX) = '1:'
    AS
    BEGIN
    	SET NOCOUNT ON;
    	
    	DECLARE @exch TABLE(Item Varchar(50))	
    	INSERT INTO @exch
    	SELECT Item FROM [SplitDelimitedVarChar] (@exchanges, '|') ORDER BY Item
    
    
    	DECLARE @subs TABLE(Item Varchar(19))
    	INSERT INTO @subs
    	SELECT Item FROM [SplitDelimitedVarChar] (@subscribers, '|') ORDER BY Item
    
    	SELECT [id]
    		  ,[Abandon]
    		  ,[Bcap]
    		  ,[BlId]
    		  ,[CallChg]
    		  ,[CallIdentifier]
    		  ,[ChgInfo]
    		  ,[ClId]
    		  ,[CustNo]
    		  ,[Digits]
    		  ,[DigitType]
    		  ,[Dnis1]
    		  ,[Dnis2]
    		  ,[Duration]
    		  ,[FgDani]
    		  ,[HoundredHourDuration]
    		  ,[Name]
    		  ,[NameId]
    		  ,[Npi]
    		  ,[OrigAuxId]
    		  ,[OrigId]
    		  ,[OrigMin]
    		  ,[Origten0]
    		  ,[RecNo]
    		  ,[RecType]
    		  ,[Redir]
    		  ,[TerId]
    		  ,[TermAuxId]
    		  ,[TermMin]
    		  ,[Termten0]
    		  ,[Timestamp]
    		  ,[Ton]
    		  ,[Tta]
    		  ,[Twt]
    	      ,[Level]
    	FROM 
    		[dbo].[Cdr] AS C 
    		INNER JOIN @exch AS E
    		ON
    		C.[Level] = E.[Item]
    	WHERE
    		(C.[DateValue] BETWEEN FLOOR(CAST(@beginDate AS FLOAT)) AND FLOOR(CAST(@endDate AS FLOAT)))
    	AND
    	    (C.[TimeValue] BETWEEN @beginTime AND @endTime)
    	AND
    	    (EXISTS(SELECT * FROM @subs WHERE [Item] = C.[OrigId])
    	OR 
    	     EXISTS(SELECT * FROM @subs WHERE [Item] = C.[TerId]))
    	
    END
    thanks in advance

  13. #13
    Join Date
    Sep 2006
    Posts
    6
    Changing from a datetime to 2 different float values is NOT easier than doing a datediff or a dateadd.

  14. #14
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Agreed.

    Jeff has some pertinent comments here - at least two or three articles apply. http://weblogs.sqlteam.com/jeffs/category/283.aspx
    More reading

  15. #15
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    The idea of your sproc then is that you don't search for a range of "from 2:00pm 1st of Jan 2007 to 4:00pm 8th of April" but "1st of Jan 2007 to 8th of April, between the times of 2:00pm to 4:00 pm only". Correct?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •