[dba-VB] SP to replace characters

jwcolby jwcolby at colbyconsulting.com
Wed Oct 21 10:47:13 CDT 2009


Just an FYI, I ran this against 1 million records, in a query that pulled the top(1000000) City 
fields (where this specific problem lies) and it took about 13 seconds.  This is with an index in 
place that includes the city field.  I then ran it against the entire table (62.8 million records) 
and it took 2 minutes 52 seconds.

This will become a standard part of my program for cleaning up my data.

Thanks Mark Breen!

John W. Colby
www.ColbyConsulting.com


jwcolby wrote:
> Mark,
> 
> Below is the SP I created to perform this magic.  I will be running this on 66 million records so we 
> shall see about performance.
> 
> set ANSI_NULLS ON
> set QUOTED_IDENTIFIER ON
> go
> 
> -- =============================================
> -- Author:		<Author,,Name>
> -- Create date: <Create Date,,>
> -- Description:	<Description,,>
> --
> --WHILE @@rowcount > 0
> --    UPDATE  tablename
> --    SET     columnname = REPLACE(columnname,
> --                                    SUBSTRING(columnname,
> --                                              PATINDEX('%[^a-zA-Z0-9]%',
> --                                                       columnname), 1), '')
> --    WHERE   PATINDEX('%[^a-zA-Z0-9]%', columnname) <> 0
> --
> -- =============================================
> ALTER PROCEDURE [dbo].[sp_ReplaceCharacters]
> 	-- Add the parameters for the stored procedure here
> 	@DBName varchar(50), @TblName varchar(50), @FldName varchar(50)
> AS
> BEGIN
> 	-- SET NOCOUNT ON added to prevent extra result sets from
> 	-- interfering with SELECT statements.
> 	SET NOCOUNT ON;
> declare @SQL varchar(1000)
> declare @PatIndex varchar(50)
> 
> 	BEGIN TRY	
> 		set @PatIndex = '%[^a-zA-Z0-9 .,#]%'
> 		SELECT @SQL = 'WHILE @@rowcount > 0
> 						UPDATE ' + @DBName + '.dbo.' + @TblName + '
> 						SET    ' + @FldName + ' = REPLACE(' + @FldName + ',
>                                      SUBSTRING(' + @FldName + ',
>                                                PATINDEX(''' + @PatIndex + ''',
>                                                         ' + @FldName + '), 1), '''')
> 						WHERE   PATINDEX(''' + @PatIndex + ''', ' + @FldName + ') <> 0'
> 		exec (@SQL)
> 		print 'success REPLACING CHARACTERS IN ' + @DBName + '.' + @TblName + '.' + @FldName
> 	END TRY
> 	BEGIN CATCH
> 		print 'There was an error REPLACING CHARACTERS IN ' + @DBName + '.' + @TblName + '.' + @FldName
> 		print @SQL
> 		print ERROR_MESSAGE()	
> 	END CATCH
> END
> 
> 
> John W. Colby
> www.ColbyConsulting.com
> 
> 
> Mark Breen wrote:
>> Hello John,
>> I used the following SQL today, and it works a treat, in my case, I wanted
>> to exclude all except alphas, I know that it is not C#, and I know you may
>> prefer C# as practice, so if it is no useful, just bin it.
>>
>> I got the code from this site, if you have a look, there is a little more
>> there that you may be able to make use of.
>>
>> Thanks to Nigel Rivett.
>>
>> http://www.nigelrivett.net/SQLTsql/RemoveNonNumericCharacters.html
>>
>> <snip>
>>
>> WHILE @@rowcount > 0
>>     UPDATE  tablename
>>     SET     columnname = REPLACE(columnname,
>>                                     SUBSTRING(columnname,
>>                                               PATINDEX('%[^a-zA-Z0-9]%',
>>                                                        columnname), 1), '')
>>     WHERE   PATINDEX('%[^a-zA-Z0-9]%', columnname) <> 0
>>
>> </snip>
>>
>>
>> Mark
> 
> _______________________________________________
> dba-VB mailing list
> dba-VB at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-vb
> http://www.databaseadvisors.com
> 
> 



More information about the dba-VB mailing list