[AccessD] SSD, The Game Changer - SQL Man of Mystery - SQLServerCentral.com

Sun Jul 3 22:10:49 CDT 2011

Inline:

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Sunday, July 03, 2011 2:41 PM
To: Access Developers discussion and problem solving
Subject: Re: [AccessD] SSD, The Game Changer - SQL Man of Mystery -
SQLServerCentral.com

Jim,

I read the first article and I don't see how it fits at all.

* It fits very well. First in overview this database does not require a
single PC to run the database. It is distributive in nature so you can just
add nodes to a bunch of old new and old PCs. The main problem with the
database as you have it now is that it is dependant on the hardware of a
single box. Just like in 2006 the single CPU reached it maximum and
multi-core processors were then created. Cassandra runs like a multi-core
PC. When the database is running on numerous PCs it is like running a full
RAID.

1) My data is very much a normal table, with rows and columns.  In one case
(the people table) there are a small number of columns, and most of the
columns are fully populated.  In the second table, there are pushing 700
fields and the columns are rarely fully populated.

* Cassandra, for example, has a sparse data structure. This means columns
can be over 2 billion (rows as well) wide (high), if required and the
'fields' do not have to be populated. At any time you can just insert
another 'field' and there is no rebuild required or performance loss (It can
all be done in real time). You can have your cake and eat it too. 

2) The data is relational, though just barely.  One of my people tables is
directly related to the data table.  All of my people tables may have people
in the other tables.  Eventually I will be pulling all of my people into a
single table with pointers back to the non people data in the original able.

* When you say barely relational that is what Cassandra is designed for.
What you are describing in item 2 is exactly what type of database Cassandra
is. It may not use SQL as we traditionally know it but it uses a system
called 'map-reduce' which can quickly (very quickly) obtain the same
results...it is data-mining tool on steroids.

3) I rarely update the data.  The people table is updated once per month as
I validate the addresses looking for moves.  I get about 1.5% / month moves.
The data table is never updated, at least the data in the existing columns.
Very occasionally I add new columns.

* Again, your requirements and description is matching Cassandra's design
nature to a tee.

4) I only have a single user (myself) and I never expect to have more.

* That does not matter if you are the only user. What does matter is that
you can pull any data out of a huge mass of mix data fast and easy. You have
noticed how Google can pull 365,000,000 hits in somewhere around 0.03
seconds. It is not because they have huge vertical super-powerful
computers...their computer are but beaters in comparison but their database
is spans hundreds of computers...just like Cassandra is capable of.

In comparison, the NoSQL databases claim to store documents and make them
searchable.  Blog posts, web pages etc.  They claim to be good as the
numbers of people simultaneously *writing* to them climbs into the hundreds
or thousands.

So while the technology is fascinating, it hardly seems like a fit for my
application.

* You are right in all the above but with the exception that the Cassandra
is a near perfect fit.

To that end I will be starting my own Cassandra database system and just to
see how it plays out. 

Here is a little video to watch that will do a lot to explain what and how
Cassandra does things:
http://video.disruptivecode.com/video/840645/what-makes-cassandra-trick. I
think you will be pleasantly surprised.

The product runs on any OS as all that it is needs is a copy of Java and
Apache, which is free, downloads. 
http://www.quora.com/How-can-i-install-Apache-Cassandra-on-Windows-Operating
-System

John W. Colby
www.ColbyConsulting.com

On 7/3/2011 3:13 PM, Jim Lawrence wrote:
> Hi John:
>
> As your databases do not need do manage transaction queues or locks here
is
> an example of just one of the NoSQL database MongoDB vs. SQL Server 2008
> performance showdown comparison.
>
> http://tinyurl.com/38cofzg
>
> In the article it shows speeds over 100 times as fast when managing a
fairly
> large amount of data but the performance just goes up exponentially when
> presented with even larger data sets. (1000x and more...)
>
> The Cassandra (NoSQL) database (http://cassandra.apache.org/) might be one
> of the best choices in this genre as it has the support of the big players
> like FaceBook, IBM, Apache etc...
>
> There is also an ever expanding group of experts and help forums
> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/) on
this
> subject.
>
> To add to the functionality there is the new super scaling and searching
> tools called HPCC
> (http://gigaom.com/cloud/lexisnexis-open-sources-its-hadoop-killer/) which
> uses a combination of SQL and NoSQL when searching distributive data and
> clusters of data servers.
>
> In your future plans it might well be worth considering such an option
> especially as your data requirements and expected results continues to
grow.
>
>
> Jim