Results 1 to 6 of 6
  1. #1
    Join Date
    Jan 2005
    Posts
    19

    Unanswered: SQL2005 outperforms SQL2000? Its over 4 times slower...

    I've found a very interesting case where SQL 2005 is more than 4 times slower than its predecessor, despite superior hardware.

    To compare, I have some logs from the ISA Server has stored into the database, which is then filtered against my blocklists. The result are the most popular sites I have not blocked or trusted.
    I have done some tuning on the query to utilize the data patterns that are in the database. This allows much better scaling towards large datasets. The engine is now capable to use hash-matches instead of nested loops over the entire datasets.

    It currently runs in 1 to 1.5 minutes on SQL 2000, which is powered by a VIA C3 Nehemiah at 1 GHz, with only 350 MB RAM too spare in optimistic conditions (The remainder of the 1 GB is used by many other applications). THe server is limited to 384 MB memory usage.
    The other box running SQL 2005 is a Athlon 800 MHz, which has superior processing power, larger cache and more memory bandwidth and has 1.25 GB RAM, where SQL server can use the needed 629 MB without any problem.
    Despite these facts, my query takes more than 4 times longer on the SQL 2005 box. 1.5 minutes compared to 7 to 8 minutes.

    SQL 2000 Database Schema:
    Code:
    CREATE TABLE [dbo].[WebProxyLog](
    	[ClientIP] [bigint] NOT NULL,
    	[ClientUserName] [nvarchar](514) NOT NULL,
    	[ClientAgent] [varchar](128) NOT NULL,
    	[ClientAuthenticate] [smallint] NOT NULL,
    	[logTime] [datetime] NOT NULL,
    	[service] [smallint] NOT NULL,
    	[servername] [nvarchar](32) NOT NULL,
    	[referredserver] [varchar](32) NOT NULL,
    	[DestHost] [varchar](255) NOT NULL,
    	[DestHostIP] [bigint] NOT NULL,
    	[DestHostPort] [int] NOT NULL,
    	[processingtime] [int] NOT NULL,
    	[bytesrecvd] [bigint] NOT NULL,
    	[bytessent] [bigint] NOT NULL,
    	[protocol] [varchar](12) NOT NULL,
    	[transport] [varchar](8) NOT NULL,
    	[operation] [varchar](24) NOT NULL,
    	[uri] [varchar](2048) NOT NULL,
    	[mimetype] [varchar](32) NOT NULL,
    	[objectsource] [smallint] NOT NULL,
    	[resultcode] [int] NOT NULL,
    	[CacheInfo] [int] NOT NULL,
    	[rule] [nvarchar](128) NOT NULL,
    	[FilterInfo] [nvarchar](128) NOT NULL,
    	[SrcNetwork] [nvarchar](128) NOT NULL,
    	[DstNetwork] [nvarchar](128) NOT NULL,
    	[ErrorInfo] [int] NOT NULL,
    	[Action] [varchar](32) NOT NULL,
    	[GmtLogTime] [datetime] NOT NULL
    )
    
    CREATE TABLE [dbo].[TrustedHosts](
    	[Hostname] [varchar](60) NOT NULL,
    	[Comment] [varchar](500) NULL,
     CONSTRAINT [PK_TrustedHosts] PRIMARY KEY CLUSTERED 
     (
    	[Hostname] ASC
     ) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
    
    CREATE TABLE [dbo].[DeniedHosts](
    	[ReasonId] [smallint] NOT NULL,
    	[Hostname] [varchar](80) NOT NULL,
    	[Path] [varchar](50) NOT NULL CONSTRAINT [DF_DeniedHosts_Path]  DEFAULT ('%'),
    	[Comment] [varchar](500) NULL,
    	CONSTRAINT [PK_DeniedHosts] PRIMARY KEY CLUSTERED 
    	(
    		[Hostname] ASC,
    		[ReasonId] ASC,
    		[Path] ASC
    	) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
    
    ALTER TABLE [dbo].[DeniedHosts]  WITH NOCHECK ADD  CONSTRAINT [FK_DeniedHosts_DenyReason] FOREIGN KEY([ReasonId]) REFERENCES [dbo].[DenyReason] ([ReasonId])
    GO
    ALTER TABLE [dbo].[DeniedHosts] CHECK CONSTRAINT [FK_DeniedHosts_DenyReason]
    GO
    ALTER TABLE [dbo].[DeniedHosts]  WITH NOCHECK ADD  CONSTRAINT [CK_DeniedHosts_HostName_NoTwoDots] CHECK  ((((not([Hostname] like '%..%')))))
    GO
    ALTER TABLE [dbo].[DeniedHosts] CHECK CONSTRAINT [CK_DeniedHosts_HostName_NoTwoDots]
    GO
    ALTER TABLE [dbo].[DeniedHosts]  WITH NOCHECK ADD  CONSTRAINT [CK_DeniedHosts_NoPercentDot] CHECK  (([Hostname] <> '%.'))
    GO
    ALTER TABLE [dbo].[DeniedHosts] CHECK CONSTRAINT [CK_DeniedHosts_NoPercentDot]
    GO
    ALTER TABLE [dbo].[DeniedHosts]  WITH NOCHECK ADD  CONSTRAINT [CK_DeniedHosts_NoWildcardMiddle] CHECK  ((((not([Hostname] like '_%[%]%')))))
    GO
    ALTER TABLE [dbo].[DeniedHosts] CHECK CONSTRAINT [CK_DeniedHosts_NoWildcardMiddle]
    GO
    ALTER TABLE [dbo].[DeniedHosts]  WITH NOCHECK ADD  CONSTRAINT [CK_DeniedHosts_ValidPath] CHECK  (([Path] is null or [Path] = '%' or [Path] like '/%' and [Path] <> '' and [Path] <> '/%'))
    GO
    ALTER TABLE [dbo].[DeniedHosts] CHECK CONSTRAINT [CK_DeniedHosts_ValidPath]
    GO
    ALTER TABLE [dbo].[DeniedHosts]  WITH NOCHECK ADD  CONSTRAINT [CK_DeniedHosts_WildcardStart] CHECK  ((((not([Hostname] like '[%]%'))) or ([Hostname] like '[%].%' or [Hostname] = '%')))
    GO
    ALTER TABLE [dbo].[DeniedHosts] CHECK CONSTRAINT [CK_DeniedHosts_WildcardStart]
    I'm not going to give you any data as:
    WebProxyLog contains 1 223 878 rows; 524 MB (few more on SQL 2000 database).
    DeniedHosts contains 52 338 rows; 3 MB
    TrustedHosts contains 2 183 rows; <1 MB

    The relevant query is:
    Code:
    CREATE        PROCEDURE [dbo].[GetTrustedHosts]
    AS
    	SELECT Hosts, Requests --, DistinctRequests
    	FROM	(	SELECT AA.Hosts, COUNT(*) AS Requests, COUNT(DISTINCT Path) AS DistinctRequests
    			FROM	(	SELECT	CASE
    						  WHEN CHARINDEX('':'', SUBSTRING(URI, 8, CHARINDEX(''/'', URI, 8)-8), 8) <> 0 THEN SUBSTRING(URI, 8, CHARINDEX('':'', URI, 8)-8)
    						  ELSE SUBSTRING(URI, 8, CHARINDEX(''/'', URI, 8)-8)
    						END AS Hosts,
    						SUBSTRING(URI, CHARINDEX(''/'', URI, 8), 50) AS Path
    					FROM dbo.WebProxyLog wpl
    					WHERE URI LIKE 'http://%/%'
    					AND ResultCode BETWEEN 200 AND 399
    					AND (Service = 1)  -- filter for only forward proxy
    				) AA
    			WHERE NOT EXISTS
    				(	SELECT *
    					FROM dbo.DeniedHosts dhp
    					WHERE Path <> ''%''
    					AND LEFT(AA.Path,3) = LEFT(dhp.Path COLLATE SQL_Latin1_General_CP1_CI_AS, 3)
    					AND AA.Hosts LIKE (dhp.Hostname COLLATE SQL_Latin1_General_CP1_CI_AS)
    					AND ( AA.Path LIKE (dhp.Path COLLATE SQL_Latin1_General_CP1_CI_AS) OR AA.Path LIKE ((dhp.Path COLLATE SQL_Latin1_General_CP1_CI_AS) + ''[?]%'') )
    				)
    			AND NOT Hosts IS NULL	-- this seems to give a speed advantage
    			GROUP BY AA.Hosts
    			HAVING COUNT(*) >= 25
    		) A
    	WHERE NOT Hosts IN
    		(	SELECT Hostname COLLATE SQL_Latin1_General_CP1_CI_AS
    			FROM dbo.TrustedHosts thc
    	--		WHERE NOT thc.Hostname LIKE ''%[%]%''
    		)
    	AND NOT Hosts IN
    		(	SELECT Hostname COLLATE SQL_Latin1_General_CP1_CI_AS
    			FROM dbo.DeniedHosts dhc
    			WHERE dhc.Path = ''%''
    		)
    	AND NOT EXISTS
    		(	SELECT *
    			FROM dbo.TrustedHosts thh
    			WHERE A.Hosts LIKE (thh.Hostname COLLATE SQL_Latin1_General_CP1_CI_AS)
    			AND thh.Hostname LIKE ''[%]%''
    			-- generates a hash join instead of a of nested loop
    			AND RIGHT(A.Hosts,6) = (RIGHT(thh.Hostname,6) COLLATE SQL_Latin1_General_CP1_CI_AS)
    		)
    	AND NOT EXISTS
    		(	SELECT *
    			FROM dbo.DeniedHosts dhh
    			WHERE dhh.Path = ''%''
    			-- this reduces the cost of the most expensive query
    			AND dhh.Hostname LIKE ''[%]%''
    			AND A.Hosts LIKE (dhh.Hostname COLLATE SQL_Latin1_General_CP1_CI_AS)
    			-- generates a hash join instead of a of nested loop
    			-- performance difference is significant due to volume
    			AND RIGHT(A.Hosts,6) = (RIGHT(dhh.Hostname,6) COLLATE SQL_Latin1_General_CP1_CI_AS)
    		)
    	ORDER BY Requests DESC
    There is some mess with collations, but these don't seem to hurt performance.

    The query plans provided differ only slightly.
    SQL 2000 provides a Clustered Index Scan over WebProxyLog with predicate on resultcode. After this a filter for the LIKE operator. Cost is 78% for the scan and 10% for the filter.
    SQL 2005 combines both. With the scan costing 93%.

    However, the key does not seem to be in this data, as the plans are nearly equavent, with the SQL2005 executing plan looking slightly better (table scan is a larger part of the execution).

    How can these differences, espcially of this magnitude, be explained?
    And further, how can the query be optimized for decent performance on SQL2005? What am I doing wrong?

  2. #2
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Well you have LIKE with a leading wild card which is gonna cause a scan anyway

    Also why do you have other applications on the box?

    All bets are off...your making sql server fight for resources
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  3. #3
    Join Date
    Jan 2005
    Posts
    19
    Yes, indeed, the LIKE is part of it, but watch it, they are NOT tablescans, they are hash-matches. Notice the "=" operator with the LEFT statements. That makes it scalable very well. It has been running very happy on the box for several years now.
    The table scans are not bad, I process the entire tables. They are not a problem at all, the optimizer does very well with the hints given.

    All bets are off...your making sql server fight for resources
    Not quite, what is bothering me, is not the fact that it takes a minute or two.
    Its the fact that the server that has to fight for resources is the fastest, outperforming the device with more power and plenty memory by over a factor 4.
    I want to know why SQL 2005 'screws up', while SQL 2000 does so very well.

    Indeed, its a experimental setup. I'm just a student with limited budget when it comes to hardware (like academic license and free software for that reason).

  4. #4
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    I keep hearing 2K5 is less tolerant of bad code. At the Microsoft Testing Center it took a bad view written by someone other than myself for a ride. I was not there but that is what I heard.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  5. #5
    Join Date
    Jan 2005
    Posts
    19
    Can you give any indications or suggestions for improvement?

    I did a compare with SELECT COUNT(*) FROM WebProxyLog:
    SQL 2000, slower box: ~ 1 minute
    SQL 2005, faster box: ~30 seconds
    This does seem likely, given different systems.

  6. #6
    Join Date
    Jan 2005
    Posts
    19
    Quote Originally Posted by Thrasymachus
    I keep hearing 2K5 is less tolerant of bad code.
    Or it doesn't take hints as easily as SQL2000. SQL2005 didn't take the "RIGHT(...,6) = RIGHT(..., 6)" and still made a nested loop based on the LIKE, instead of the hash matches as did SQL2000.
    Creating indexed views on the RIGHT(hostname,6) colums does the trick.
    Its now running at 32 seconds, which is in line with the time needed to process the 500 MB table.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •