Michael Mattys
michael at mattysconsulting.com
Mon Sep 3 23:04:34 CDT 2012
Hi Mark, No relation to a discussion about graphs. Take a flat projection of the earth (WGS84) to represent the four quadrants. In resolving addresses, our code can return many possible answers as to the location value of any given address string. If the first result does not have a good level of confidence (ie; the address doesn't exist in the database engine with which I am geocoding), then I can elect to have the code branch to another level of confidence for the postal code (census tract in some cases), city, county, state, or country. One can then visualize the same procedure for sifting through millions of records. I'm not sure how Hadoop and other Big Data players do it. Michael R Mattys Mattys Consulting, LLC www.mattysconsulting.com -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Mark Breen Sent: Monday, September 03, 2012 4:14 PM To: Access Developers discussion and problem solving Subject: Re: [AccessD] Using the CLR with SQL Server Hello Michael Your comment intrigues me { Your methods sound very close to address resolution and geocoding. I can almost visualize your records falling into one of four quadrants on a map. } Do you have any articles or short summary reading to explain that just a little more? Is this related to the 'Graph' database discussions we had a few weeks ago ? thanks Mark On 3 September 2012 04:42, Michael Mattys <michael at mattysconsulting.com>wrote: > John, > > One of Stuarts McLachlan's PB dll's could be fastest. > The mention of quit analysis brings Linq to mind, though. > A billion per month? I'll have to get my brother's take on that. > > Your methods sound very close to address resolution and geocoding. > I can almost visualize your records falling into one of four quadrants > on a map. > > Michael R Mattys > Mattys Consulting, LLC > www.mattysconsulting.com > > > -----Original Message----- > From: accessd-bounces at databaseadvisors.com > [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby > Sent: Sunday, September 02, 2012 2:31 PM > To: Access Developers discussion and problem solving > Subject: Re: [AccessD] Using the CLR with SQL Server > > Michael, > > I do a ton of SQL Server automation where I BCP files in and out, run > dynamic SQL to update records, control processes etc. However I have > run into one case where I need to split a CSV string in one field of a > record, and then based on the values in the string, update another > field with a value. > IOW the string has 0 to N "error codes", where N is practically > speaking about 8 maximum codes. > Depending on the error codes I need to create a "simpler" code to tell > me how to use the record. I need to do this to chunks of up to 2 > million records, and tables of hundreds of millions of records. > String handling directly in TSQL doesn't appear to be particularly > speedy so I was looking at building a function in C# to do this stuff, > returning the "simpler code" as a function return value which is then > assigned to the field being updated, directly in TSQL. IOW > > UPDATE [MyDB].[dbo].[MyTbl] > SET [AddrValid] = MyCSFunction(FieldAVal, FieldBVal) > WHERE (Whatever selection may be needed) > > Currently I do this in a UDF, i.e. MyCSVFunction() is actually MyUDF() > called from my dynamic TSQL, but the UDF does not do very thorough > error code analysis, and was designed back when I was doing a total of > a couple of hundred million records a month. I am now pushing a > billion records a month and climbing, so I am trying to rethink my > existing solution. Essentially each record update, even though done > in SQL, is evaluating a UDF in the process and I want to replace that > UDF (if possible) with a faster and more powerful C# solution. > > I have a table of these error codes which I just created, and there > are 60 of them. Some are fatal, some inconvenient, some irrelevant. > Ideally I would do an exhaustive analysis of each and every code in > the CSV string against the table to determine the "most severe error" > if you will and how that impacts the use of the record. As it is I > kinda punt and look for specific codes and ignore everything else... > not a good solution, but I am just not able to program an efficient TSQL solution to this problem. > > > I am not sure that I can program an efficient CLR solution either for > that manner but I stand a much better chance I think. However I have > never tried to use the CLR so there is the learning curve as well. > > I would like to something like: > > 1) Load the error table one time into a table in C#. > 2) Have the function parse the CSV string into a table each call and > then join to the Error table (in memory). > 3) Order by severity desc > 4) Return Max(Severity) > > Or something similar. > > Or... > > 1) Load the error table into a dictionary one time > 2) Have the function parse the CSV string and look up each error in > the dictionary > 3) Return the severity of the most severe error code found > 4) Short circuit (quit analysis) if more than a certain severity found > > Or something similar. > > I could of course do the whole thing in a UDF but I have never > discovered how to do debugging in TSQL equivalent to the C# / visual > studio environment. > > This CSV error string just has me hamstrung in terms of correctly > processing the errors, at least efficiently enough to handle millions > of records in a reasonable time frame. > > John W. Colby > Colby Consulting > > Reality is what refuses to go away > when you do not believe in it > > On 9/2/2012 11:50 AM, Michael Mattys wrote: > > Hi John, > > > > We've used the CLR for serializing, visualizing, and altering shapes > > from MapPoint using the geography capabilities in SQLS 2008. > > What are you going to be using it for? > > > > Michael R Mattys > > Mattys Consulting, LLC > > www.mattysconsulting.com > > > > > > -----Original Message----- > > From: accessd-bounces at databaseadvisors.com > > [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby > > Sent: Sunday, September 02, 2012 12:57 AM > > To: Sqlserver-Dba; Access Developers discussion and problem solving; > > VBA > > Subject: [AccessD] Using the CLR with SQL Server > > > > Just a poll to see if anyone on the list is using the CLR in SQL > > Server. I do a lot of C# work now and mostly to automate SQL Server. > > I have not been using the CLR in SQL Server but I am ready to figure > > it > out. > > > > Anyone? > > > > -- > > John W. Colby > > Colby Consulting > > > > Reality is what refuses to go away > > when you do not believe in it > > > > -- > > AccessD mailing list > > AccessD at databaseadvisors.com > > http://databaseadvisors.com/mailman/listinfo/accessd > > Website: http://www.databaseadvisors.com > > > > -- > AccessD mailing list > AccessD at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/accessd > Website: http://www.databaseadvisors.com > > -- > AccessD mailing list > AccessD at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/accessd > Website: http://www.databaseadvisors.com > -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com