[dba-SQLServer] Microsoft is moving ahead

Hans-Christian Andersen hans.andersen at phulse.com
Sun Oct 16 22:14:05 CDT 2011



Ya, sure. I think I might have mentioned a few details about our current implementation and what problems we are solving with it already.

Essentially, our fundamental issue is that, in any system we deploy, we always have to factor in redundancy. Having a single, powerful server with backups isn't good enough. Any amount of downtime costs us money and we have bumped into many problems trying to implement old technology that were never designed to solve such problems of scale and redundancy. They often inherently break in weird ways.

What caused us to look into this in the first place was that we needed a redundant and reliable distributed file system/storage. We've played around with a few different solutions to distributed filesystems, but found each one to be lacking in some critical respect (SMB, NFS, DFS, glusterfs, etc). In the end, we decided to take a step back and take a more holistic approach, which could be used to solve more problems than just file storage. So we looked into NoSQL and decided on Cassandra in the end.

We are running Cassandra under 3 Xen VMs on 3 different Dell PowerEdge servers (one of which is offsite in another country). The servers themselves come with Intel Xeon E5640 processors for a total of 16 cores and 24G memory. However, our Cassandra VMs use less than that: only 1 core is dedicated to the VM and 2G or memory allocated to the VM. We've given each Cassandra node roughly 500G of disk space (in a RAID 5 configuration). In terms of data being handled, we are currently using about 25% of our total capacity, but bear in mind that some data is being duplicated across some of our nodes due to our chosen partitioning scheme. Cost of implementation is a hard one to answer, since they are being run as virtual machines on host servers, which also host other virtual machines, but the load is pretty negligible compare to any other VMs we run (I even run a Cassandra database on my development laptop and its still quite fast). Our load average rarely goes above 0.2 (not sure if this is a concept Windows admins would understand though, its a Unix thing) and CPU usage generally averages out to around 5% or lower, which is pretty impressive, given that Cassandra is written in Java.

We do not currently make use of Cassandra for anything more than a key-value store. Currently we use our Cassandra setup to store and serve everything from images to videos to documents and, for that, it works great and really, really fast.  We haven't yet delved into the realm of map reduce and I'm taking more of a level-headed approach to it. NoSQL does not replace our need for a RDBMS, nor would I ever advocate that NoSQL is a natural alternative to a traditional database anymore than that a train is an alternative to a car. It depends entirely on your product and what problems you are trying to solve. There are instances where your data could more effectively exist on a RDBMS, but if your ultimate requirement is something very scalable, its possible to alter your data model and rethink your approach to your application so that you can use NoSQL exclusively. Such a migration, however, does come at a significant cost in terms of technical capital, so it's not always practical unless you are doing it from scratch and you obviously need to know the pros and cons, but that goes without saying.

However, its worth just mentioning that NoSQL is more a solution to problems of scaling and having lots of data, where RDBMSs fundamentally have issues, so it should be no surprise that it is popular for highly clustered environments (i.e. super computers) and web companies and not necessarily in enterprise solutions as much. I often encounter a certain hostility towards NoSQL when mentioned in the company of DB admins and I can understand to a degree why, since it challenges a certain orthodoxy in database design that goes back decades. But, I tend to find that they are looking at it from a completely different perspective and don't quite understand what problem NoSQL databases are trying to solve and that NoSQL isn't trying to invalidate traditional database (I really hate it when articles try to portray it that way). NoSQL is what it is: a database that attempts to trim all the unnecessary layers of complexity that RDBMSs have and impose on your application design and/ore system architecture, so that it in the end can be scalable, flexible, very fast and simple to use. If this is not what you need, then it is not what you want, but, for many web companies out there, it is.

Which goes back to what I've said previously. A lot of companies/startups are looking at NoSQL precisely because they want something which will scale easily, should they suddenly become massively successful and their site traffic increase dramatically and suddenly reach some sort of limit of the database and hardware. Throwing better hardware at the problem is only going to get you so far and, even if it could, a live migration of data in a RDBMS is never easy. With Cassandra, for example, you can add and remove nodes on the fly by the stroke of a few CLI commands. I've never seen anything that simple in a traditional database, but please correct me if I am wrong.

What makes me curious is how traditional database vendors are going to marry NoSQL concepts into their existing products. Microsoft looks to be sensible (what with adopting Hadoop now) and I have no doubt they will succeed. Oracle, on the other hand...

Anyways, I hope that answers some questions.


- Hans



On 2011-10-16, at 1:57 PM, Jim Lawrence wrote:

> Hi Hans:
> 
> Maybe you can give the facts and figures on a NOSQL implementation. These
> figures would be:
> 
> 1. The size and resources of the equipment/hardware you have experience
> with.
> 2. How much data is being handled with this equipment?
> 3. Some basic guesstimates of the costs of this implementation.
> 4. How successful is NoSQL in retrieving complex data requests.
> 5. Anything thing else that you would think is relevant.
> 
> Once the facts and figures could be put together, it would put an end to the
> controversy, which at the moment is just hearsay.
> 
> Jim
> 
> -----Original Message-----
> From: dba-sqlserver-bounces at databaseadvisors.com
> [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of
> Hans-Christian Andersen
> Sent: Saturday, October 15, 2011 11:35 PM
> To: Discussion concerning MS SQL Server
> Subject: Re: [dba-SQLServer] Microsoft is moving ahead
> 
> 
> Hi John,
> 
> I don't think any of us had the intention to offend you. Obviously you know
> what your business requirements are better than anyone else.
> 
> - Hans
> 
> 
> On 2011-10-15, at 6:30 AM, jwcolby wrote:
> 
>> Hans,
>> 
>>>> To be more accurate, NoSQL is intended to be a solution for companies
> that are expecting rapid growth and cannot rely on vertical scaling alone in
> order to keep up with demands on resources.
>> 
>> I don't see this anywhere.  Point me to anywhere that any company is even
> thinking about NoSQL to run their *business* side of the house.  Show me
> *anything* where *anyone* is developing book keeping or banking or
> manufacturing kind of databases using NoSQL.
>> 
>> To be more accurate, NOSQL is intended to be a solution for companies
> expecting rapid growth in *document storage*, and needing to *search
> documents*.
>> 
>>>> It's not nonsense, just because it doesn't apply to you. :)
>> 
>> I did not say "it" (NOSQL) was nonsense, I have been saying that it it
> nonsense to keep trying to fit that square peg in this round hole.  It is
> nonsense to keep telling me I need it when (as you are saying) it doesn't
> apply to me!
>> 
>> I read an article by one of the founders of (I believe) Hadoop.  What he
> said was that NOSQL was *NOT* a replacement for SQL based languages, but a
> solution for places where SQL databases don't fit.  The things I do demand
> relational data.  Relationships are the core of my business.  My data is
> large, but they are not large individual chunks (paragraphs or pages or
> documents) but lots of records with lots of attributes.
>> 
>> I have 600 million records in about 30 table pairs.  The tables are pairs,
> each table related to one other with a pk/fk.  One table contains name /
> address / hash fields and a PK.  The other table has attributes about the
> people in that first table.  I have (in 15 tables) 300 million records with
> first name, last name, addr1, city, state, zip, plus a handful of other
> fields discussing the validity of the address itself.  I have to index on
> and pull addresses based on specific attributes of those addresses.  I have
> indexes on and pull data about those people records based on attributes
> (fields) in the attribute table.
>> 
>> As an example I have to pull those 300 million addresses out every month
> and run them through a third party program to track people moving.  That
> software requires the name / address fields and hands me back those same
> fields plus a bunch more that discuss the validity of those addresses (is
> the address still valid?  Is the address complete?) as well as move
> information.  That third party program requires the data in CSV and uses
> Foxpro to process it.
>> 
>> How in the world does this sound like Hadoop?
>> 
>> I think my friend Jim just has some misconceptions about what my data is.
> Given how much I have discussed my "database from hell" with its 600
> attribute fields I am a little puzzled how he could not understand what I
> do.
>> 
>> Or perhaps he (or I) misunderstands what NoSQL is.  From what I am
> reading, NoSQL is not about handling relational data or hundreds of tiny
> attributes (fields) of an object and selecting records based on those
> attributes.  NoSQL (AFAIU) is about storing documents and allowing you to
> search those documents.  I don't have a single field in all of my data that
> stores more than about 80 characters.  I have tables, related to other
> tables, each of which may have literally hundreds of fields, each field
> being anywhere from one character (yes / no) to 60 characters (email
> address). In fact the email address is the single biggest field in all of my
> data.  I have to select small (a few million) record sets based on "where"
> clauses examining those fields.  I have to join the information in these 15
> table pairs to select records based on commonality.
>> 
>> How does this sound like NoSQL?
>> 
>> Every time my friend Jim comes at me with "you need NoSQL" I spend more
> time trying to see what it is about NOSQL that fits my situation.  I am not
> blithely ignoring him.  I have spent hours now reading stuff about this
> technology, and every time I keep reading stuff by the very people who
> design NoSQL saying that *it is not a replacement for SQL*.  These people
> say that NOSQL does not do SQL kind of stuff easily.  These people say NOSQL
> is about spreading the load of searching millions of *documents* across an
> entire server farm.  These people are saying that it tough (requires entire
> new languages, technology and knowledge base) to get the data split out
> across that server farm and to reassemble the search results *but* that the
> results are worth it *when* you are dealing with billions of *documents*.
>> 
>> I am a one man show.  I don't own a server farm.  I don't have billions of
> documents and I am not going to acquire billions of documents.
>> 
>> I am just tired of being told how my situation is going to be helped by a
> technology specifically and intentionally designed to handle the storage and
> search of *documents*, when I don't have a single document in my entire
> database.
>> 
>> *THIS IS GETTING OLD!!!*
>> 
>> I am thrilled that NoSQL exists and that it helps those that it helps.
> What I am *not* seeing is a single case study where they are taking an SAP
> process and doing it on NoSQL.  Flattening 10 thousand tables in a massive
> SQL based data processing system and making it run on NoSQL.  What I am
> *not* seeing is anyone claiming that in 5 (or even 30) years the SQL
> language will cease to exist because of NoSQL.
>> 
>> And what I am not thrilled about is a constant "you need NOSQL" when there
> is never any explanation about how this very cool but not applicable
> technology applies to me.
>> 
>> IOW, LMTFA!!!
>> 
>> John W. Colby
>> Colby Consulting
>> 
>> Reality is what refuses to go away
>> when you do not believe in it
>> 
>> On 10/15/2011 2:47 AM, Hans-Christian Andersen wrote:
>>> 
>>> 
>>> To be more accurate, NoSQL is intended to be a solution for companies
> that are expecting rapid growth and cannot rely on vertical scaling alone in
> order to keep up with demands on resources.
>>> 
>>> Trust me, John, if you are in a situation like this, no amount of memory,
> cpu or compression can save you. It's not nonsense, just because it doesn't
> apply to you. :)
>>> 
>>> - Hans
>>> 
>>> 
>>> 
>>> On 2011-10-14, at 3:37 PM, jwcolby wrote:
>>> 
>>>> LOL.  And maybe you will eventually discover that NoSQL is targeted at
> people with a a million blade servers / million dollars in a data center and
> will quit haranguing me?  ;)
>>>> 
>>>> I see NO NOSQL in my future.
>>>> 
>>>> I have already solved my issues the same way that these guys did.
> Jillions of cores and 64 gigabytes of RAM, and compression.  My hour long
> processes are under a minute or two.
>>>> 
>>>> It would be interesting to actually show you what I do Jim.  You would
> instantly quit this nonsense.
>>>> 
>>>> John W. Colby
>> _______________________________________________
>> dba-SQLServer mailing list
>> dba-SQLServer at databaseadvisors.com
>> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>> http://www.databaseadvisors.com
>> 
> 
> 
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
> 
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
> 





More information about the dba-SQLServer mailing list