jwcolby
jwcolby at colbyconsulting.com
Wed Oct 21 10:47:13 CDT 2009
Just an FYI, I ran this against 1 million records, in a query that pulled the top(1000000) City fields (where this specific problem lies) and it took about 13 seconds. This is with an index in place that includes the city field. I then ran it against the entire table (62.8 million records) and it took 2 minutes 52 seconds. This will become a standard part of my program for cleaning up my data. Thanks Mark Breen! John W. Colby www.ColbyConsulting.com jwcolby wrote: > Mark, > > Below is the SP I created to perform this magic. I will be running this on 66 million records so we > shall see about performance. > > set ANSI_NULLS ON > set QUOTED_IDENTIFIER ON > go > > -- ============================================= > -- Author: <Author,,Name> > -- Create date: <Create Date,,> > -- Description: <Description,,> > -- > --WHILE @@rowcount > 0 > -- UPDATE tablename > -- SET columnname = REPLACE(columnname, > -- SUBSTRING(columnname, > -- PATINDEX('%[^a-zA-Z0-9]%', > -- columnname), 1), '') > -- WHERE PATINDEX('%[^a-zA-Z0-9]%', columnname) <> 0 > -- > -- ============================================= > ALTER PROCEDURE [dbo].[sp_ReplaceCharacters] > -- Add the parameters for the stored procedure here > @DBName varchar(50), @TblName varchar(50), @FldName varchar(50) > AS > BEGIN > -- SET NOCOUNT ON added to prevent extra result sets from > -- interfering with SELECT statements. > SET NOCOUNT ON; > declare @SQL varchar(1000) > declare @PatIndex varchar(50) > > BEGIN TRY > set @PatIndex = '%[^a-zA-Z0-9 .,#]%' > SELECT @SQL = 'WHILE @@rowcount > 0 > UPDATE ' + @DBName + '.dbo.' + @TblName + ' > SET ' + @FldName + ' = REPLACE(' + @FldName + ', > SUBSTRING(' + @FldName + ', > PATINDEX(''' + @PatIndex + ''', > ' + @FldName + '), 1), '''') > WHERE PATINDEX(''' + @PatIndex + ''', ' + @FldName + ') <> 0' > exec (@SQL) > print 'success REPLACING CHARACTERS IN ' + @DBName + '.' + @TblName + '.' + @FldName > END TRY > BEGIN CATCH > print 'There was an error REPLACING CHARACTERS IN ' + @DBName + '.' + @TblName + '.' + @FldName > print @SQL > print ERROR_MESSAGE() > END CATCH > END > > > John W. Colby > www.ColbyConsulting.com > > > Mark Breen wrote: >> Hello John, >> I used the following SQL today, and it works a treat, in my case, I wanted >> to exclude all except alphas, I know that it is not C#, and I know you may >> prefer C# as practice, so if it is no useful, just bin it. >> >> I got the code from this site, if you have a look, there is a little more >> there that you may be able to make use of. >> >> Thanks to Nigel Rivett. >> >> http://www.nigelrivett.net/SQLTsql/RemoveNonNumericCharacters.html >> >> <snip> >> >> WHILE @@rowcount > 0 >> UPDATE tablename >> SET columnname = REPLACE(columnname, >> SUBSTRING(columnname, >> PATINDEX('%[^a-zA-Z0-9]%', >> columnname), 1), '') >> WHERE PATINDEX('%[^a-zA-Z0-9]%', columnname) <> 0 >> >> </snip> >> >> >> Mark > > _______________________________________________ > dba-VB mailing list > dba-VB at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-vb > http://www.databaseadvisors.com > >