Results 1 to 7 of 7
  1. #1
    Join Date
    Jan 2004
    Posts
    22

    Unanswered: Large Volumes of Varchar Data - Design Advice

    Hello all,

    I have recently been task with rewriting a database that holds large volumes of data, whilst ensuring that query can be run in optimal time. Having never really delved into this sort of thing before, I hoped you guys might be able to offer some advice and guidance.

    The design I have inherited is based around 2 main tables:

    Code:
    [captured_traps]
    [id] [int] IDENTITY (1, 1) NOT NULL
    [snmp_version] [int] NULL
    [community_name] [varchar] (255)
    [packet_type] [varchar] (50)
    [oid] [varchar] (500)
    [source_ip] [varchar] (15)
    [generic] [int] NULL
    [specific] [int] NULL
    [time_stamp] [varchar] (15)
    [trap_entered] [datetime] NULL
    [status] [int] NULL
    Code:
    [captured_varbinds]
    [id] [int] IDENTITY (1, 1) NOT NULL
    [captured_trap_id] [int] NOT NULL
    [varbind_oid] [varchar] (500)
    [varbind_text] [varchar (500)
    The relationship between the two tables is on the "captured_traps (id)" to "captured_varbinds (captured_trap_id)". Currently the "captured_traps" table contains around 350 million rows, the "captured_varbinds" table contains around 900 million rows.

    Now as you can probably gather this model runs like a....well it sort of hobbles more than runs hence the need to redesign.

    My current thoughts on this are:

    - Normalising all varchars - there is alot of duplicate values in most of the varchar fields.
    - Full Text Indexing

    However beyond that I am not sure which route to go down. After googling for most of today I have come across a number of "solutions" however I do not want to go steaming down the track of one of these to discover that it is fatally flawed somewhere.

  2. #2
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    what fields do you currently search and join on? are there indices there? if so have they been maintained? are they in seperate filegroups and on seperate disks than the data? Any thought given to partititioning?
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  3. #3
    Join Date
    Jan 2004
    Posts
    22
    As far as searches are concerned users may search any field or combination of fields. A typical search example might be:

    snmp_version = 2
    oid = 1.2.5.67345.12.34
    trap_entered > a UTC date time
    trap_entered < a UTC date time

    This would be expected to return all data from captured_traps and all associated captured_varbinds with the relationship on "captured_traps (id)" to "captured_varbinds (captured_trap_id)".

    or a user might search on the varbind table instead, and expect to see all traps with a specific varbind_oid value, for example

    varbind_oid = 2.3.4.123.6.4.5.8766
    trap_entered > a UTC date time
    trap_entered < a UTC date time

    This would be expected to return all traps within the given date range that have a varbind with of the given type

    The previous owner of this DB only put indexes on the two id columns

    As far as filegroups and partioning are concerned, I have encountered these whilst searching for solutions - and if viable may be implemented.

    Basically I have a clean slate here, so i am open to any and all suggestions

  4. #4
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    what version of SQL Server are you running?

    We will also need to see the indexing methodology....do you know how to script those?

    Also, how is all the data accessed? Stored Procedures only?

    What do you get when you do sp_depends on those 2 tables?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  5. #5
    Join Date
    Jan 2004
    Posts
    22
    As mentioned earlier I have a clean slate to work from, so version can be anything from 2000sp4 to 2005. Which ever is better suited to the task.

    Data will only ever be accessed via stored procedures, however due to the fact that searches can be on any fields or combination of fields the main search stored procedure will build and execute the SQL dynamically (unless anybody has any better ideas)

    As far as indexes are concerned, this are the full scripts for the current model

    Code:
    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[FK_snmp_captured_varbinds_snmp_captured_traps]') and OBJECTPROPERTY(id, N'IsForeignKey') = 1)
    ALTER TABLE [dbo].[snmp_captured_varbinds] DROP CONSTRAINT FK_snmp_captured_varbinds_snmp_captured_traps
    GO
    
    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[snmp_captured_traps]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
    drop table [dbo].[snmp_captured_traps]
    GO
    
    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[snmp_captured_varbinds]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
    drop table [dbo].[snmp_captured_varbinds]
    GO
    
    CREATE TABLE [dbo].[snmp_captured_traps] (
    	[id] [int] IDENTITY (1, 1) NOT NULL ,
    	[snmp_version] [int] NULL ,
    	[community_name] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    	[packet_type] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    	[oid] [varchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    	[source_ip] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    	[generic] [int] NULL ,
    	[specific] [int] NULL ,
    	[time_stamp] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    	[packet_data] [image] NULL ,
    	[trap_entered] [datetime] NULL ,
    	[status] [int] NULL ,
    	[support_reference] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    	[listener_event_id] [int] NULL 
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO
    
    CREATE TABLE [dbo].[snmp_captured_varbinds] (
    	[id] [int] IDENTITY (1, 1) NOT NULL ,
    	[trap_id] [int] NULL ,
    	[varbind_oid] [varchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    	[varbind_text] [text] COLLATE SQL_Latin1_General_CP1_CI_AS NULL 
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO
    
    ALTER TABLE [dbo].[snmp_captured_traps] WITH NOCHECK ADD 
    	CONSTRAINT [PK_snmp_captured_traps] PRIMARY KEY  CLUSTERED 
    	(
    		[id]
    	)  ON [PRIMARY] 
    GO
    
    ALTER TABLE [dbo].[snmp_captured_varbinds] WITH NOCHECK ADD 
    	CONSTRAINT [PK_snmp_captured_varbinds] PRIMARY KEY  CLUSTERED 
    	(
    		[id]
    	)  ON [PRIMARY] 
    GO
    
    ALTER TABLE [dbo].[snmp_captured_traps] WITH NOCHECK ADD 
    	CONSTRAINT [DF_snmp_captured_traps_status] DEFAULT (1) FOR [status]
    GO
    
     CREATE  INDEX [IDX_snmp_captured_varbinds_trap_id] ON [dbo].[snmp_captured_varbinds]([trap_id]) ON [PRIMARY]
    GO
    
    ALTER TABLE [dbo].[snmp_captured_varbinds] ADD 
    	CONSTRAINT [FK_snmp_captured_varbinds_snmp_captured_traps] FOREIGN KEY 
    	(
    		[trap_id]
    	) REFERENCES [dbo].[snmp_captured_traps] (
    		[id]
    	) ON DELETE CASCADE 
    GO
    Edit : in the new model the packet_data column of the snmp_captured_traps table is not required

  6. #6
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    If you are using sprocs, why would you need dynamic sql?

    In any case, you only have 1 index...

    How do the rows get added to these tables, and what is the rate of inserts

    You will need more indexes, and probably need to utilize index intersection instead of of coverering indexes

    With the additional indexes you are probably scanning the tables....
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  7. #7
    Join Date
    Jan 2004
    Posts
    22
    Well the dynamic SQL is an assumption I am carrying across from the current design. Currently there is no stored proc for doing searches, the application that sits ontop of this database builds an SQL string based on user input and then executes it. Something along the lines of:

    SELECT x,y,z FROM snmp_captured_traps WHERE a=b AND c=d AND e>f AND g<h
    I dont like the fact that that the SQL string is controlled by the app, it also means that remote users have to have select priveledges on the necessary tables. Hence the idea of moving that code logic into a stored procedure. I am open to any ideas/suggestions on how to do this using an alternative system

    As far as getting data into the tables at the moment there is simply a series of simple insert stored procedures, e.g.

    Code:
    CREATE PROCEDURE add_captured_trap
    (
    	@snmp_version int,
    	@community_name varchar(255),
    	@packet_type varchar(50),
    	@oid varchar(500),
    	@source_ip varchar(15),
    	@generic int,
    	@specific int,
    	@trap_entered datetime,
    	@packet_data image,
    	@listener_event_id int
    )
    AS BEGIN TRANSACTION
    
    INSERT INTO snmp_captured_traps (
    	snmp_version,
    	community_name,
    	packet_type,
    	oid,
    	source_ip,
    	generic,
    	specific,
    	trap_entered,
    	packet_data,
    	listener_event_id
    )
    VALUES (
    	@snmp_version,
    	@community_name,
    	@packet_type,
    	@oid,
    	@source_ip,
    	@generic,
    	@specific,
    	@trap_entered,
    	@packet_data,
    	@listener_event_id
    )
    
    select scope_identity();
    
    COMMIT TRANSACTION
    GO
    Code:
    CREATE PROCEDURE add_captured_varbind
    (
    	@trap_id int,
    	@varbind_oid varchar(500),
    	@varbind_text text
    )
    AS BEGIN TRANSACTION
    
    INSERT INTO snmp_captured_varbinds (
    	trap_id,
    	varbind_oid,
    	varbind_text
    )
    VALUES (
    	@trap_id,
    	@varbind_oid,
    	@varbind_text
    );
    
    COMMIT TRANSACTION
    GO
    As you can see nothing to clever, and it worked ok until the load became to much.

    As far as volumes of inserts, on average the system receives 50 traps a second, each trap has an average of 2 varbinds. So thats 150 inserts

    One other thing, everything I have said so far seems to indicate that I want to redesign from scratch. Thats not entirely true, if there is something that I can apply to the current model in order to get things running quicker then that is my preferred short term option
    Last edited by MrEyes; 07-06-06 at 13:16.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •