[dba-SQLServer] SQL Server Speed Testing - Setting AddrValid to codes

Wed Sep 30 08:46:29 CDT 2009

I wrote this email yesterday in the middle of a very frustrating day.  After my testing this morning 
I now realize that there is this quite significant "overhead" for the first pass processing, 
overhead which is not a fixed amount but which increases with the size of the data set.  That 
overhead will likely explain the results I encountered and documented below.  I am sending this 
email because it explains in detail the process and the stored procedure and udf which actually does 
the processing.  I am about to do some testing similar to the last email where I do processing on 
specific size chunks of records in order to see numbers similar to the last email.

The idea is to update an AddrValid Char() field with various codes depending on how the address 
plays out.  PO if the address line has a "PO BOX" in it, "INV" (invalid) if it has a null in the 
zip4, "ANK" if it has a "77" in the ANK field and so forth.

These codes then tell me various things about the addresses and allow me to select the addresses for 
  ANK processing, filter out PO boxes etc.

Awhile back I created a udf into which I passed a set of fields which will be analyzed and which
returns a code up to three characters wide (or null).  I then have a stored procedure which is a
simple update, updating the AddrValid with the results passed back by this udf.

The table being updated is 20 million records (in this case).

I just went through the exercise of passing in a view where I selected the TOP() one million rows
where AddrValid IS NULL.  Processing one million rows takes 17 seconds.  I ran that about five
times, counting the remaining rows with AddrValid IS NULL.  Each time the time to process the
million rows was in fact about 17-18 seconds.  When I got down to 15 million rows, having
successfully processed about 6 million rows, I decided to pass in the table itself instead of the
view with the TOP(1000000) statement, figuring that it would take 15 (million rows remaining) x 17
seconds to complete, or 4.25 minutes.

WRONG.

11:40 later it is still processing.  Sigh.

However ATM I am not locked up!  BTW I manually set the IO affinity mask to the same three cores
which I assigned to SQL Server.  Previously I had told it to do the assignment itself.  I also set
Max Degree of Parallelism to 2 per Nancy Lytle's suggestion.

So as of right now, it is chunking away, 16:28 now.  But I am not locked up.  I am poking around
looking at stuff.  Windows task manager shows a single CPU processing... sometimes.  Mostly it is
flatlining, but it does occasionally spring to life.

It is interesting to me that when I move from "SELECT TOP(1000000) * FROM tblX) as a saved query
passed in to the stored procedure to just passing in tblX itself, I go from one million records
every 17 seconds to "we ain't finishing for some reason".  There are only 20 million records in the
table.  It should have been a quick and easy 4:30 task to finish up the 15 million records where
AddrValid is still null.

With the exception of the field being updated (AddrValid), all of the fields used in the stored
procedure are contained in an index.

The SP itself looks like this:

BEGIN TRY	
   SELECT @SQL = 'UPDATE [' + @DBName + '].[dbo].[' + @TblName + ']
   SET [AddrValid] = _DataMaster.dbo.udf_AddrValid(ank_, Addr, Zip4, dpv_)
   WHERE     (AddrValid IS NULL)'

   exec (@SQL)
   print 'success UPDATING AddrValid on ' + @DBName + '.' + @TblName
END TRY
BEGIN CATCH
   print 'There was an error UPDATING AddrValid ANK '
   print ERROR_MESSAGE()	
END CATCH

And the udf looks like this:

ALTER FUNCTION [dbo].[udf_AddrValid]
(
	-- Add the parameters for the function here
	@ANK varchar(50), @Address varchar(50), @Zip4 varchar(50), @dpv varchar(50)
)
RETURNS char(3)
AS
BEGIN
-- Declare the return variable here
DECLARE @Return char(3)
-- Add the T-SQL statements to compute the return value here
IF @ANK = '77'
SELECT @Return = 'ANK'
ELSE
IF left(@Address,5) = 'MOVED'
SELECT @Return = 'MOV'
ELSE	
if @Zip4 is null
SELECT @Return = 'INV'
else
IF left(@Address,2) = 'PO'
SELECT @Return = 'PO'
ELSE	
IF (@dpv = 'YN')
SELECT @Return = 'V'
ELSE
IF ((@dpv = 'DN') OR (@dpv = 'SN'))
SELECT @Return = 'E'
ELSE
SELECT @Return = 'INV'

-- Return the result of the function
RETURN @Return

END

WELL... it just finished.  26:09 to process a 20 million record table.

John W. Colby
www.ColbyConsulting.com