[dba-SQLServer] Microsoft is moving ahead

jwcolby jwcolby at colbyconsulting.com
Sun Oct 16 23:50:47 CDT 2011


Great video!

John W. Colby
Colby Consulting

Reality is what refuses to go away
when you do not believe in it

On 10/17/2011 12:03 AM, Hans-Christian Andersen wrote:
>
> Hi John,
>
> Ya, I understand what you mean. Hadoop could very well be the tool to suit that problem.
>
> Sometimes its hard to these things put into words, given how much has been changing over the last decade - the huge amount of growth in online usage, the pervasiveness of the internet, the types of applications we now use and the scale of data that comes with it. It even took me a while to get my head around the whole concept of cloud computing at first and a lot of that has to do with that these concepts are still changing and maturing, so I hope that I made things a bit clearer, although, I was just looking up online for some information that could perhaps explain it a bit better and I think this guy (Mike Olsen) does a better job.
>
> http://www.youtube.com/watch?v=S9xnYBVqLws
>
> Worth a watch, even if just for the first half of the video.
>
> One thing is for sure, though. We are currently in a state of a massive fundamental change in computing from the hardware in the box on your desk being the computer to now the network being a fundamental part of your computer and that has caused a rethink of traditional concepts and assumptions.
>
> - Hans
>
>
> On 2011-10-16, at 8:40 PM, jwcolby wrote:
>
>> Hans,
>>
>> Good info, realistic point of view, and observations re practicality and problems solved.
>>
>> Having listened closely to this discussion (I do pay attention) I am thinking about how this might solve a problem of my call center client.  They need a relational database for the call center side of things but they are also drowning in documents.  IOW they have to obtain and store medical records, legal documents, IRS documents, claim reports, private investigation reports and so forth, all tied to claims in the claim system.  Each claim generates an entire box of documents.  They currently use a third party system which scans everything into electronic form but then ???  It is kept in something like pdf I think.  Cut to a dvd and sent off to Iron Mountain for permanent storage.  This certainly sounds like it could benefit them in terms of being able to store and search these documents.  We have never even tried to solve that side using an SQL database because it just doesn't fit.
>>
>> John W. Colby
>> Colby Consulting
>>
>> Reality is what refuses to go away
>> when you do not believe in it
>>
>> On 10/16/2011 11:14 PM, Hans-Christian Andersen wrote:
>>>
>>>
>>> Ya, sure. I think I might have mentioned a few details about our current implementation and what problems we are solving with it already.
>>>
>>> Essentially, our fundamental issue is that, in any system we deploy, we always have to factor in redundancy. Having a single, powerful server with backups isn't good enough. Any amount of downtime costs us money and we have bumped into many problems trying to implement old technology that were never designed to solve such problems of scale and redundancy. They often inherently break in weird ways.
>>>
>>> What caused us to look into this in the first place was that we needed a redundant and reliable distributed file system/storage. We've played around with a few different solutions to distributed filesystems, but found each one to be lacking in some critical respect (SMB, NFS, DFS, glusterfs, etc). In the end, we decided to take a step back and take a more holistic approach, which could be used to solve more problems than just file storage. So we looked into NoSQL and decided on Cassandra in the end.
>>>
>>> We are running Cassandra under 3 Xen VMs on 3 different Dell PowerEdge servers (one of which is offsite in another country). The servers themselves come with Intel Xeon E5640 processors for a total of 16 cores and 24G memory. However, our Cassandra VMs use less than that: only 1 core is dedicated to the VM and 2G or memory allocated to the VM. We've given each Cassandra node roughly 500G of disk space (in a RAID 5 configuration). In terms of data being handled, we are currently using about 25% of our total capacity, but bear in mind that some data is being duplicated across some of our nodes due to our chosen partitioning scheme. Cost of implementation is a hard one to answer, since they are being run as virtual machines on host servers, which also host other virtual machines, but the load is pretty negligible compare to any other VMs we run (I even run a Cassandra database on my development laptop and its still quite fast). Our load average rarely goes above 0.2 (not su
re
>> i!
>>>   f this is a concept Windows admins would understand though, its a Unix thing) and CPU usage generally averages out to around 5% or lower, which is pretty impressive, given that Cassandra is written in Java.
>>>
>>> We do not currently make use of Cassandra for anything more than a key-value store. Currently we use our Cassandra setup to store and serve everything from images to videos to documents and, for that, it works great and really, really fast.  We haven't yet delved into the realm of map reduce and I'm taking more of a level-headed approach to it. NoSQL does not replace our need for a RDBMS, nor would I ever advocate that NoSQL is a natural alternative to a traditional database anymore than that a train is an alternative to a car. It depends entirely on your product and what problems you are trying to solve. There are instances where your data could more effectively exist on a RDBMS, but if your ultimate requirement is something very scalable, its possible to alter your data model and rethink your approach to your application so that you can use NoSQL exclusively. Such a migration, however, does come at a significant cost in terms of technical capital, so it's not always pr
ac
>> ti!
>>>   cal unless you are doing it from scratch and you obviously need to know the pros and cons, but that goes without saying.
>>>
>>> However, its worth just mentioning that NoSQL is more a solution to problems of scaling and having lots of data, where RDBMSs fundamentally have issues, so it should be no surprise that it is popular for highly clustered environments (i.e. super computers) and web companies and not necessarily in enterprise solutions as much. I often encounter a certain hostility towards NoSQL when mentioned in the company of DB admins and I can understand to a degree why, since it challenges a certain orthodoxy in database design that goes back decades. But, I tend to find that they are looking at it from a completely different perspective and don't quite understand what problem NoSQL databases are trying to solve and that NoSQL isn't trying to invalidate traditional database (I really hate it when articles try to portray it that way). NoSQL is what it is: a database that attempts to trim all the unnecessary layers of complexity that RDBMSs have and impose on your application design and
/o
>> re!
>>>    system architecture, so that it in the end can be scalable, flexible, very fast and simple to use. If this is not what you need, then it is not what you want, but, for many web companies out there, it is.
>>>
>>> Which goes back to what I've said previously. A lot of companies/startups are looking at NoSQL precisely because they want something which will scale easily, should they suddenly become massively successful and their site traffic increase dramatically and suddenly reach some sort of limit of the database and hardware. Throwing better hardware at the problem is only going to get you so far and, even if it could, a live migration of data in a RDBMS is never easy. With Cassandra, for example, you can add and remove nodes on the fly by the stroke of a few CLI commands. I've never seen anything that simple in a traditional database, but please correct me if I am wrong.
>>>
>>> What makes me curious is how traditional database vendors are going to marry NoSQL concepts into their existing products. Microsoft looks to be sensible (what with adopting Hadoop now) and I have no doubt they will succeed. Oracle, on the other hand...
>>>
>>> Anyways, I hope that answers some questions.
>>>
>>>
>>> - Hans
>> _______________________________________________
>> dba-SQLServer mailing list
>> dba-SQLServer at databaseadvisors.com
>> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>> http://www.databaseadvisors.com
>>
>
>
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
>
>



More information about the dba-SQLServer mailing list