[dba-Tech] VoltDB and its treatment of RAM on multiple servers

Jim Lawrence accessd at shaw.ca
Wed Feb 21 19:55:50 CST 2018


Hi Arthur:

My understanding, and I could be very wrong, in that any distributive type processes or data storage suffers from a being distributive. Of course the performance is gained if the volumes holding the data and/or the CPU/RAM have their own client that can independently duplicate the driver requests. This is why small requests are much slower in a distributive system. A request requiring access to huge amounts of data and the requirements for extensive CPU usage and RAM access is as quickly executed and retrieved as a small request.

The whole operations over a distributive type system is dependant on message-queing and managing the resultant retrieval.

The Cloud is just a distributive system. For small jobs a fully loaded box will always be faster...the more RAM, cores and data-space, the better. Small jobs sent to the cloud are just run on a single computer just like at your home except there is a connection hit. Of course Cloud computers have the advantage that they are probably top of the line servers and they have huge bandwidth.

I have checked out VoltDB and listened to a couple of their podcasts but have not discovered that the DB uses clients, on remote machines, to optimize it performance so I suspect that VDB just uses the default local network structure...nothing hyper fancy. Some languages, like GO, are used to develop applications that can parse and distribute complex requests to multiple machines across diverse systems...so GO is one of the new Cloud optimized languages which is capable of doing this.

Many years ago, I had a program that ran on AccessMDB. The application produced reports from a possible four databases. One MS SQL, two Oracle and one MDB. When accessing all this data, I had to be continually cautious of the potential or it would result in unceremonious MSAccess data crashes. Each ADO-OLE request had to be very specific or the resulting dump would crash everything. There is also a couple of features in ADO that can help. The chunk option, great for managing blobs/large picture files and the stream option, great for managing a potential data flood. I made the system send out a cluster of requests and had it wait until all the remote requests had been answered. If the total amount of data retrieved ever exceed a threshold I would delete the recordsets so to potentially stop a crash...it didn't always work though and that is why MSAccess couldn't seriously complete in data reporting.      

Any further revelations on how VoltDB works would be appreciated.

Jim

----- Original Message -----
From: "Arthur Fuller" gmail.com>
To: "Discussion of Hardware and Software issues" databaseadvisors.com>
Sent: Tuesday, February 20, 2018 4:26:45 AM
Subject: Re: [dba-Tech] VoltDB and its treatment of RAM on multiple servers

Jim,

I may be way off base here, but I think that an implementation in the cloud
might defeat the whole purpose of the VoltDB design, which is performance.
>From what I gather, it's designed to handle massive amounts of input, such
as a Bloomberg feed or a worldwide gaming site or tracking the votes in an
election, polling site by polling site. My thought (and I could be wrong
about this, since I know little about cloud technologies) is that there
would be a performance hit due to talking to the cloud. SQL Server is
designed for much smaller loads than VoltDB. The latter can easily handle
50k TPS, and upwards.

The examples given in the VoltDB documentation imply a level of hardware
vastly beyond my feeble resources, such as a dozen servers in a cluster,
each housing 64 GB of RAM.

On the other hand, there is a free community edition which could be run on
a single server. There, the point would be to load the entire database
(smallish, say 50 GB) into RAM, with a few hundred simultaneous users.
Periodically and automatically, the RAM database updates the hard disk(s).

I certainly don't pretend to understand a lot of the VoltDB implementation.
Many of its concepts I've never encountered before. Lots of the concepts
assume a collection of servers, with partitioned tables and even
partitioned indexes, and smallish tables  (i.e. lookups) duplicated on each
server, and a lot more that is beyond my meager experience.

Thanks for the links. As time permits, I will follow up, and further my
education.

Arthur

On Tue, Feb 20, 2018 at 1:29 AM, Jim Lawrence <accessd at shaw.ca> wrote:

> Arthur:
>
> PS I could suggest that you check out the Framework Apache Flink:
>
> https://en.wikipedia.org/wiki/Apache_Flink
>
> It is basically a structure that allow processing (CPU core and RAM use)
> across a network or cluster...ideal for Cloud computing.
>
> Jim
>
> ----- Original Message -----
> From: "accessd" <accessd at shaw.ca>
> To: "Discussion of Hardware and Software issues" <
> dba-tech at databaseadvisors.com>
> Sent: Monday, February 19, 2018 9:37:20 PM
> Subject: Re: [dba-Tech] VoltDB and its treatment of RAM on multiple servers
>
> You have to pick one of then most complex subjects on computers these
> days. There are a few applications that are free and OS for your Linux
> computer. One is MPI...a very old application:
>
> To do this the program(s) accessing the CPU/RAM resources must be
> specifically designed to access said resources. A system set up in this
> manner is called a cluster, and the typical way resources are shared is
> with a protocol called MPI (message passing interface). It is a free
> download and using it with Linux can yield a powerful cluster (possibly
> even a super computer) for minimal cost, but again it is useless unless you
> have programs that were specifically designed to take advantage of MPI.
> There are some good cluster tutorials out there, if you are still
> interested you should check one out.
>
> http://www.mcs.anl.gov/research/projects/mpi/
> ...and...
> http://www.mpich.org/
> ...and...
> http://www.mpich.org/static/downloads/3.2.1/mpich-3.2.1-installguide.pdf
> http://bit.ly/2FaMpA4
>
> Then there is PVM3 or Parallel Virtual Machine, version 3. Here is where
> to download the package:
>
> http://www.tucows.com/preview/39014/PVM
>
> Then there is applications under full development like (Very interesting):
>
> https://ramcloud.atlassian.net/wiki/spaces/RAM/overview
>
> I have tried some of these products that allow the file system to extend
> to all RAM and/or hard drive space but have consistently run into issues
> that I could not find a solution for. It all boiled down to older and
> limited computers being added to a cluster. It became an issue of huge
> backups and restoring which took days... OTOH, I learned how to effectively
> do backups properly and now have twice as much hardware dedicated to
> backing up the network as in the network. Of course if someone wanted to
> give me $50K and let me have a free month... ;-)
>
> If you wanted to run an application that requires a substantial amount of
> RAM, it would be better to use a Cloud based source that has all that
> functionality built right in. I would recommend that you use the services
> of DigitalOcean as there is no fixed amount of resources...you pay for what
> you use and for the amount of time you use it. I know of some companies
> doing testing of their application on DO and only do testing for an hour or
> two. Example: 192GB RAM, 32CPUs and 12TB of SSD (plus high-speed bandwidth)
> is $1.43 an hour...then there is optimized Droplets with compute optimized
> virtual machines with dedicated hyper-threads from best in class Intel CPUs
> for CPU Intensive applications like CI/CD, video encoding, machine
> learning, ad serving, batch processing and active front-end web
> servers...less RAM and HD space but only 0.95 per hour. I know of no other
> Cloud provider that gives that level of graduation.
>
> If VoltDB was written in one of the distributive optimized languages then
> it could use multi-cores across a network:
>
> https://en.wikipedia.org/wiki/List_of_concurrent_and_
> parallel_programming_languages
> http://bit.ly/2EDazSK
>
> The old favourites like Erlang and the modern GO are two such programming
> languages. I do think that Gustav and Shamil might have some more insights
> into this environment as well.
>
> * I do have another possible solution but will send you the information
> off line.
>
> Jim
>
>
> ----- Original Message -----
> From: "Arthur Fuller" gmail.com>
> To: "Discussion of Hardware and Software issues" databaseadvisors.com>
> Sent: Saturday, February 17, 2018 8:48:37 AM
> Subject: [dba-Tech] VoltDB and its treatment of RAM on multiple servers
>
> VoltDB, the (relatively) new database from Michael Stonebraker, intended
> for in-RAM databases needing to perform 50K+ transactions per second, has
> some way of treating a bunch of servers as a logical unit. IOW, 8 servers
> each equipped with 64GB of RAM can be regarded as a single 512GB unit of
> RAM. I understand how this approach can work with hard disks, but I have no
> clue how to make it work with RAM. Do you?
>
> The documentation and even packaging of VoltDB suggests that its ideal
> hardware environment is Linux (since the Windows version is installed and
> run using Docker). I don't know enough about Linux to know whether this
> "union" of servers' RAM is built-in or an innovation provided within
> VoltDB. (Speak up, Jim and other Linux gurus!)
>
> Whatever the answer, I can't see this working with the typical mix of
> Windows boxes most of us, other than the thick wallets, being able to
> cobble this together, especially since those of us with multiple boxes
> typically run different versions on different boxes, so we can support
> clients running various versions of Windows, Access, SQL, Office, etc. Add
> to that the problem that on Windows VoltDB runs only in Docker. So I think
> this is a strictly-Linux solution.
>
> Assuming that I'm correct (Linux-only), I'm still faced with the problem of
> how to implement it. Assume modest hardware, say three boxes of varying
> vintage each running the same version of Linux, and all connected. Further
> assume that as yet, VoltDB is not installed. Is it possible to treat their
> collective RAM as a single logical unit of RAM? If so, how so?
>
> --
> Arthur
> _______________________________________________
> dba-Tech mailing list
> dba-Tech at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-tech
> Website: http://www.databaseadvisors.com
>
> _______________________________________________
> dba-Tech mailing list
> dba-Tech at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-tech
> Website: http://www.databaseadvisors.com
>
> _______________________________________________
> dba-Tech mailing list
> dba-Tech at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-tech
> Website: http://www.databaseadvisors.com
>



-- 
Arthur
_______________________________________________
dba-Tech mailing list
dba-Tech at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-tech
Website: http://www.databaseadvisors.com



More information about the dba-Tech mailing list