John W Colby
jwcolby at gmail.com
Mon Mar 10 19:25:57 CDT 2014
>>Note: This is not Cloud computing specifically...it can be done on any server(s) you have in the office. I simply went to the ElasticSearch site and clicked every single menu item, poking around. The site is not particularly well designed, mostly marketing fluff. It is not cloud computing specifically, it is distributed computing. The data I deal with is NOT web page kind of stuff. It is not Google / Facebook / bing / search kind of stuff. It is pretty much straight up regular old data, properties about people used to select interests and therefore marketing. The vast majority of the data is single character fields. Has cat = 'Y'/'N' Has Dog = 'Y'/'N' Gender = 'M'/'F'/'U' Birthdate = some date Income = some long int ReligiousPreference = some single character - 'J' (Jewish) And of course FName Varchar(25) LName Varchar(25) Addr Varchar(50) City Varchar(50) St (char(2) Zip5 Char(5) Zip4 Char(4) Say what you will but the way I see this, this is straight up SQL stuff. Count (*) Where age (which is a dynamic calculation based on current date and the birthdate) > 30 and <=45 AND Gender = 'F' and HasDog = 'Y' and ReligiousPreference = 'J' Or alternately "Give me the names / addresses for the people with these properties in a specific zip list" These are precisely the kinds of queries I am asked to perform day in and day out. I have a third party Zip code list which has attributes of "Number of households in zip" and "Population in zip", as well as geo location (lat/lon). I am handed a list of 15 thousand zips and asked how many households and population in these zips. Or I am handed a single zip and asked for the population in a five mile radius of that one zip. These are very straight up SQL kind of questions. I know that you have a love of and hankering for these distributed processing systems but my data is very standard (and old fashioned) marketing stuff. SQL was designed precisely in order to manipulate these kinds of questions, and that is what I use. I am not searching a million paragraphs of input to blog pages looking for the word "KISS" or anything remotely like that. My data is huge, but only because the tables have anywhere from 5 million to 225 million records and anywhere from 20 fields to 640 fields. But the fields are mostly just Y/N or single character values 0-9/A-T (codes). Whatever database engine I use has to have a SQL kind of language behind it. I have to be able to pull name / address records based on these kinds of questions. SQL is NOT going away. I have read a ton of the interviews with execs from these data engines optimized for text search and to a man NONE of these folks say their systems work well for SQL type applications. They are not designed to do so. They do completely different kinds of things, and the things they do do not map to SQL apps, or what I am doing. SQL Server in fact does a great job of breaking down the queries into tasks which it feeds off to the cores. That is precisely why I went with a high core count machine. SQL Server very often pegs every core of my machine for minutes at a time as it performs the task of selecting records based on huge criteria lists. This is quite fascinating to watch as well. The biggest problem I face really is that it is simply impossible to index 640 fields in a meaningful way such that a question against any of these 640 fields always hits an index. What I do is index the most used fields, the zip, income, gender, religion etc, and then put up with the table scan of most of the other fields. I do occasionally create a cover index for a particularly large set of fields in very specific cases, but mostly not. And some of the smaller databases (smaller number of fields total) I may very well index more fields. John W. Colby Reality is what refuses to go away when you do not believe in it On 3/10/2014 7:20 PM, Jim Lawrence wrote: > Hi John: > > Note: This is not Cloud computing specifically...it can be done on any server(s) you have in the office. > --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com