[AccessD] ElasticSearch my hind leg...

John W Colby jwcolby at gmail.com
Mon Mar 10 00:02:48 CDT 2014


 >>"I was able to get (7) 200gb SSDs and form the raid array..." OMG...every home should have one. ;-)

LOL, this is a business not a home.

 >>It indexes everything and it is quick; according to the webinar, one TB can be indexed in about 
90 seconds. The application can group millions of rows of data in milliseconds.

And read the fine print.  NOBODY does those kinds of numbers without enormous cloud compute (and 
enormous budgets).

Give me some credit please for what I have managed to do for a virtual company of about 7 people, 
with a total hardware budget of around $20K over 9 years.  I started with NO hardware and had never 
even seen SQL Server, and I hand built (eventually) a dual processor 16 core machine with 96 gigs of 
RAM, 9 TB of main (rotating) storage (RAID 6), a TB of SSD storage (Raid 5) to handle SQL Server, 
and a second server with 6 cores and 32 GB of RAM and 6 VMs running third party software, CAS and 
NCOA processing 500 million addresses every month AND handling the actual orders for the client as 
well.  AND I designed and executed a very complex system in C# automating that SQL Server to push 
those 500 MILLION records to CSV files every month (that's 1000 CSV files BTW), pushing those files 
out to Accuzip on the virtual machines, babysitting Accuzip (third party software written in Visual 
Foxpro), and merging the 1000 result files back in to SQL Server.

With the exception of a student (2 year graduate) C# programmer (I met when I took my C# classes) 
helping me, I did this all BY MY SELF.

It is more than slightly annoying to have folks say "go look at xyz".  Buddy I looked at a TON of 
stuff trying to get something that I could build and handle BY MY SELF, starting in 2004 when NONE 
of this hi-falutin crap you mention was even a gleam in it's daddy's eye.

I hope you got the BY MY SELF reference.  This is NOT IBM or Google or Facebook with a 50 million 
dollar data center and a team of programmers.  This is Colby Consulting with John W. Colby doing the 
whole damned thing.  When I say EVERYTHING I mean researching and ordering hardware from Newegg, 
joining the Microsoft program to get my hands on the software, BUILDING the hardware (and 
maintaining it, and upgrading it), installing all of the Windows 2003, then 2008 and SQL Server 2000 
/ 2005 / 2008 software, researching the Accuzip solution for CAS / NCOA, buying it and learning how 
it worked and how to automate it, designing the methodology for getting these big tables (text 
files) into databases in SQL Server, designing the C# application and writing same (with my 
assistant) over 18 months, all while actually performing work on those same SQL Server databases 
providing counts and fulfilling orders for my client.

You are clueless what it took to get where I am today and what it would take to throw all this away 
just to use some other data store.  The data store is 1/4 of the business that I manage. Maybe only 
1/10th.  I look back on the last nine years and wonder how I managed to get all that crap done.

So no, it seems unlikely I am going to do that ElasticSearch thing.  Not that it isn't fascinating 
and all, but being a one man show I have to pick my battles and that isn't something I need.

Only $500 per year to monitor your first 5 nodes
$3,000 per year for each 5 node cluster thereafter

To get the numbers you mention I probably only need a thousand nodes.  Uh yea... Or rather no...

John W. Colby

Reality is what refuses to go away
when you do not believe in it

On 3/9/2014 8:18 PM, Jim Lawrence wrote:
> Hi John:
>
> "I was able to get (7) 200gb SSDs and form the raid array..." OMG...every home should have one. ;-)
>
> I know we have gone through this discussion before but given the amount of data you are working with and the complexity of the searches required, I would be so bold as to suggest that you at least look at the following technology from ElastciStretch:
>
> http://www.elasticsearch.org ...and... http://www.elasticsearch.org/resources < check out the webinar...
>
> The system in a nutshell is text based. The number of rows (document) is dependant on the hardware and can handle thirty-thousand plus columns. It indexes everything and it is quick; according to the webinar, one TB can be indexed in about 90 seconds. The application can group millions of rows of data in milliseconds. The data can be limited to a single directory, a HD, a computer or a whole cluster.
>
> Jim
>



---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com


More information about the AccessD mailing list