Jim Lawrence
accessd at shaw.ca
Sun Jul 3 22:10:49 CDT 2011
Inline: -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Sunday, July 03, 2011 2:41 PM To: Access Developers discussion and problem solving Subject: Re: [AccessD] SSD, The Game Changer - SQL Man of Mystery - SQLServerCentral.com Jim, I read the first article and I don't see how it fits at all. * It fits very well. First in overview this database does not require a single PC to run the database. It is distributive in nature so you can just add nodes to a bunch of old new and old PCs. The main problem with the database as you have it now is that it is dependant on the hardware of a single box. Just like in 2006 the single CPU reached it maximum and multi-core processors were then created. Cassandra runs like a multi-core PC. When the database is running on numerous PCs it is like running a full RAID. 1) My data is very much a normal table, with rows and columns. In one case (the people table) there are a small number of columns, and most of the columns are fully populated. In the second table, there are pushing 700 fields and the columns are rarely fully populated. * Cassandra, for example, has a sparse data structure. This means columns can be over 2 billion (rows as well) wide (high), if required and the 'fields' do not have to be populated. At any time you can just insert another 'field' and there is no rebuild required or performance loss (It can all be done in real time). You can have your cake and eat it too. 2) The data is relational, though just barely. One of my people tables is directly related to the data table. All of my people tables may have people in the other tables. Eventually I will be pulling all of my people into a single table with pointers back to the non people data in the original able. * When you say barely relational that is what Cassandra is designed for. What you are describing in item 2 is exactly what type of database Cassandra is. It may not use SQL as we traditionally know it but it uses a system called 'map-reduce' which can quickly (very quickly) obtain the same results...it is data-mining tool on steroids. 3) I rarely update the data. The people table is updated once per month as I validate the addresses looking for moves. I get about 1.5% / month moves. The data table is never updated, at least the data in the existing columns. Very occasionally I add new columns. * Again, your requirements and description is matching Cassandra's design nature to a tee. 4) I only have a single user (myself) and I never expect to have more. * That does not matter if you are the only user. What does matter is that you can pull any data out of a huge mass of mix data fast and easy. You have noticed how Google can pull 365,000,000 hits in somewhere around 0.03 seconds. It is not because they have huge vertical super-powerful computers...their computer are but beaters in comparison but their database is spans hundreds of computers...just like Cassandra is capable of. In comparison, the NoSQL databases claim to store documents and make them searchable. Blog posts, web pages etc. They claim to be good as the numbers of people simultaneously *writing* to them climbs into the hundreds or thousands. So while the technology is fascinating, it hardly seems like a fit for my application. * You are right in all the above but with the exception that the Cassandra is a near perfect fit. To that end I will be starting my own Cassandra database system and just to see how it plays out. Here is a little video to watch that will do a lot to explain what and how Cassandra does things: http://video.disruptivecode.com/video/840645/what-makes-cassandra-trick. I think you will be pleasantly surprised. The product runs on any OS as all that it is needs is a copy of Java and Apache, which is free, downloads. http://www.quora.com/How-can-i-install-Apache-Cassandra-on-Windows-Operating -System John W. Colby www.ColbyConsulting.com On 7/3/2011 3:13 PM, Jim Lawrence wrote: > Hi John: > > As your databases do not need do manage transaction queues or locks here is > an example of just one of the NoSQL database MongoDB vs. SQL Server 2008 > performance showdown comparison. > > http://tinyurl.com/38cofzg > > In the article it shows speeds over 100 times as fast when managing a fairly > large amount of data but the performance just goes up exponentially when > presented with even larger data sets. (1000x and more...) > > The Cassandra (NoSQL) database (http://cassandra.apache.org/) might be one > of the best choices in this genre as it has the support of the big players > like FaceBook, IBM, Apache etc... > > There is also an ever expanding group of experts and help forums > (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/) on this > subject. > > To add to the functionality there is the new super scaling and searching > tools called HPCC > (http://gigaom.com/cloud/lexisnexis-open-sources-its-hadoop-killer/) which > uses a combination of SQL and NoSQL when searching distributive data and > clusters of data servers. > > In your future plans it might well be worth considering such an option > especially as your data requirements and expected results continues to grow. > > > Jim