jwcolby
jwcolby at colbyconsulting.com
Wed Oct 21 10:47:13 CDT 2009
Just an FYI, I ran this against 1 million records, in a query that pulled the top(1000000) City
fields (where this specific problem lies) and it took about 13 seconds. This is with an index in
place that includes the city field. I then ran it against the entire table (62.8 million records)
and it took 2 minutes 52 seconds.
This will become a standard part of my program for cleaning up my data.
Thanks Mark Breen!
John W. Colby
www.ColbyConsulting.com
jwcolby wrote:
> Mark,
>
> Below is the SP I created to perform this magic. I will be running this on 66 million records so we
> shall see about performance.
>
> set ANSI_NULLS ON
> set QUOTED_IDENTIFIER ON
> go
>
> -- =============================================
> -- Author: <Author,,Name>
> -- Create date: <Create Date,,>
> -- Description: <Description,,>
> --
> --WHILE @@rowcount > 0
> -- UPDATE tablename
> -- SET columnname = REPLACE(columnname,
> -- SUBSTRING(columnname,
> -- PATINDEX('%[^a-zA-Z0-9]%',
> -- columnname), 1), '')
> -- WHERE PATINDEX('%[^a-zA-Z0-9]%', columnname) <> 0
> --
> -- =============================================
> ALTER PROCEDURE [dbo].[sp_ReplaceCharacters]
> -- Add the parameters for the stored procedure here
> @DBName varchar(50), @TblName varchar(50), @FldName varchar(50)
> AS
> BEGIN
> -- SET NOCOUNT ON added to prevent extra result sets from
> -- interfering with SELECT statements.
> SET NOCOUNT ON;
> declare @SQL varchar(1000)
> declare @PatIndex varchar(50)
>
> BEGIN TRY
> set @PatIndex = '%[^a-zA-Z0-9 .,#]%'
> SELECT @SQL = 'WHILE @@rowcount > 0
> UPDATE ' + @DBName + '.dbo.' + @TblName + '
> SET ' + @FldName + ' = REPLACE(' + @FldName + ',
> SUBSTRING(' + @FldName + ',
> PATINDEX(''' + @PatIndex + ''',
> ' + @FldName + '), 1), '''')
> WHERE PATINDEX(''' + @PatIndex + ''', ' + @FldName + ') <> 0'
> exec (@SQL)
> print 'success REPLACING CHARACTERS IN ' + @DBName + '.' + @TblName + '.' + @FldName
> END TRY
> BEGIN CATCH
> print 'There was an error REPLACING CHARACTERS IN ' + @DBName + '.' + @TblName + '.' + @FldName
> print @SQL
> print ERROR_MESSAGE()
> END CATCH
> END
>
>
> John W. Colby
> www.ColbyConsulting.com
>
>
> Mark Breen wrote:
>> Hello John,
>> I used the following SQL today, and it works a treat, in my case, I wanted
>> to exclude all except alphas, I know that it is not C#, and I know you may
>> prefer C# as practice, so if it is no useful, just bin it.
>>
>> I got the code from this site, if you have a look, there is a little more
>> there that you may be able to make use of.
>>
>> Thanks to Nigel Rivett.
>>
>> http://www.nigelrivett.net/SQLTsql/RemoveNonNumericCharacters.html
>>
>> <snip>
>>
>> WHILE @@rowcount > 0
>> UPDATE tablename
>> SET columnname = REPLACE(columnname,
>> SUBSTRING(columnname,
>> PATINDEX('%[^a-zA-Z0-9]%',
>> columnname), 1), '')
>> WHERE PATINDEX('%[^a-zA-Z0-9]%', columnname) <> 0
>>
>> </snip>
>>
>>
>> Mark
>
> _______________________________________________
> dba-VB mailing list
> dba-VB at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-vb
> http://www.databaseadvisors.com
>
>