I have been in the past few weeks looking more deeply into my distributed software project, writing quite a bit of code for the logic of the software itself. I was trying to satisfy my first requirement : keeping track of events causality (with a simple vector clock for a start). However when I came about to write the interface with other instances, I found myself like stuck in mud, trying to write something I didnt plan for… I have been thinking in the abstract world, with bubbles and arrows, but when you come down to the wire, you have to send bits through the pipe… So just while I was struggling to find a good solution to send a message efficiently to multiple peers ( multicast, multi unicast, etc…) , I realized that I could use a Message Oriented Middleware…
I went to search into these type of technology, to find mostly AMQP. Other messaging middleware exists, starting with CORBA, some more open than others. One worth noting though is ZeroMQ, mostly because, being brokerless, it s very different from AMQP.
RabbitMQ seems to be also a very interesting implementation of AMQP, especially because of its use of erlang to achieve high availability and fault tolerance.
While many people seems to think that the slowness of the early HLA RTI implementation could be blamed on CORBA, it certainly provided a very handy fundation to build the future IEEE 1516 standard on top of it. Therefore I think that, following the path of reusing the wheel that has already been done, I should use a MOM.
I will give it a go, and rethink my previous layered architecture to :
– RabbitMQ or ZeroMQ ( which greatly reduce the scope of my work, yey ! )
– Event Causality Tracking ( my main focus right now )
– Distributed Data Interpolation ( Basic dead reckoning algorithm )
– Game Engine Layer (Time keeping, space maintenance, etc.)
Comparing ZeroMQ and RabbitMQ, I think I ll start with RabbitMQ, because it provides already good reliability. ZeroMQ is more flexible, but it might be also a bit trickier to grasp at first. And well, I started prototyping my code in erlang, so I might as well go erlang all the way there. More details in a future post, when I get some time to write some more code.
After looking more deeply into what does a DHT like Kademlia does or not, I started to write some useful code in Erlang…
First it seems ( quite intuitive I must say ) that a DHT doesn’t say anything about connections… That is who am I connecting to at first, how to choose my endpoint, etc. Everything that is not enforced by the DHT mechanism is an opportunity to tune the system towards more special needs and features other than the DHT possibility set. Therefore I am making the intuitive assumption that a good way would be to connect to a very close node. BATMAN has an interesting and quite simple way of handling connection, so I decided to follow the example 😉
I just finished a UDP broadcaster in erlang, pretty simple, that basically advertise to his online neighbors itself came online, and register its neighbors’ replies.
Then comparing This and That I got a bit confused about some stuff…
For example : nodeID seems to be up to the user, provided a few conditions… is it really ?
But anyway it is sure that the nodeID is going to be a long binary. So I decided to start implementing a key scheme for what I had in mind for the dynamic vector clocks algorithms, that is a key based on the order of connection of the different node.
However, I was quite disappointed to *not find* any simple way to deal with bitstring operations in erlang… Binary operations work on integer only, and that would be 8bits for convenience with bitstring apparently… so I started writing my own module for that, with bs_and, bs_or, bs_not, and so on, that work on a bitstring of indefinite size. It s pretty basic and not optimized at all, but it works. Not sure if there is some interest out there for it, but let me know if there is, I can always put it on github somewhere 😉
Other than that, I keep working on my little portable SDL-based Game Engine, which now has OpenGL enabled by default if available on your machine.
It s working pretty nicely for simple bitmap display. Now the refresh is optimized ( I should display fps up there 🙂 ), and the user doesnt have to manage the list of rectangles to know what has changed on the screen or not ( which wasnt supported before in SDL render mode). Also openGL can be disabled if not needed, nothing changes on the interface for the user 🙂 pretty handy 😀 That was some work, but now the most troubling point is the “Font” part, that doesnt behave exaclyt as you would expect… more work to be done there.
I recently looked around for the different distribution protocols that I mentioned in my previous post.
Apart from that, after reading a lot of research papers on distribution. I have selected a few that seem to be related to my interests.
One of the problems, in the way I see it, is that when you have one CPU, you can easily say that instruction happened “after” that one. And that becomes natural when you write a line of code after another. Now when you have multiple CPU running multiple processes in different places, with delays in transmissions, it become crucial to be able to determine if an instruction actually happened “before” or “after” ( or “at the same time” ) another instruction.
The most widely used algorithm to do that might be what is called Vector Clocks.
One problem though, is that vector clocks aren’t really dynamic, as you must know from the start how many concurrent processes are running, and you might be quite limited on the number of concurrent process you can have, as it will, with a naive implementation, change the size of your clock.
Second issue is that you need, most of the time, if you care about the size of the data you are sending over the wire, to have a central point to establish and increment that clock. Otherwise you need either :
– to duplicate your clock to actually end up with a Matrix Clock that can be in some context prohibitive because of its size…
– to send all your messages to all processes, so that the clock doesn’t become inconsistent between 2 nodes, and make sure they actually arrive…
This is not a simple issue to work around if you want to use your distribution redundancy as a safety net, for a process to replace another one if something fails for example, or for a “tunnel” to be setup when a connection in your net falls down.
All this is a little annoying as if you want to optimise your transfer over the network, while using the benefits of the distribution to write safe applications, without any single failure point.
An interesting alternative to vector clock has been proposed : Interval Tree Clocks
It deals with the dynamic problem, and I planned first to try to implement something similar to organize data over my distributed network.
I am not sure I understand all of it already, but from what I can catch so far, I think the size of data you need to send to the different nodes might change with a naive implementation at least… I am not so sure of how to handle that especially if I want to use UDP and not TCP, as I do not really need the total ordering TCP provides, but I will likely enjoy the UDP performance…
The idea to have an ProcessID that is used when you increment the clock seems to be the way to go to handle dynamic creation / failures / termination of processes ; and the way the number of concurrent processes is “compressed” is pretty effective and interesting. However I am not sure if it will actually work in an architecture without “central point” or total broadcast, or some node might wait forever for a message who is not coming…
I now think that I need something closer to the network, at a lower level. Something that can deal with network intricacies (latency, losses … ) if needed. Or maybe I will need to subdivide my feature set, and implement them in separate components… we’ll see.
When you have a distributed, network hungry, application, the best way to reduce messages transmitted from one place to the other is to actually filter the messages you are going to send. Therefore there is in many protocols a “publish/subscribe” system to actually determine what needs to be send to who. And that “selective messaging” while improving network usage, doesnt play nice with ordering algorithms…
What I want to try to get, is a completely distributed architecture, without the need for a central “reference” point at any moment, and with possible “selective sending”. While keeping the partial ordering in a dynamic context that is provided by ITC. I think this might raise a lot of interests in people towards distributed architecture, because of the safety you might gain ( no central failure point ) and the optimisation on the network that appears when you have many processes communicating the same information around…
A lot of other improvements have already been made around vector clocks, more or less publicly, with more or less success, and what I need to likely to be a “mix of those”. Yet I ll have to prove somehow that it s going to work, and that it will not impact the network in a bad way…
The search is far from over…
It s been quite a bit since I last blogged here, and I ve been busy looking for a job, fixing my little game engine, and writing prototypes erlang for the network side of it. Since I plan to implement quickly some type of distribution, it is useful to have a look at existing distributed applications and protocols, and how they are used and loved ( or not ) by people out there…
But I am talking about total, complete, distribution, meaning peer-to-peer distribution. No central servers, ie. no central point of failure, or slow-downs.
So first lets make a quick list of things, that might be worth looking into :
– bittorent, which has been around for quite a while. ( the tracker can be the server, however it is just required temporarily… )
– BOINC, which is right now probably the most used distributed platform ( do they need servers ? )
– erlang_OTP, which apparently can be used to write and run completely distributed applications.
– Git and Bazaar, Distributed CVS : only data is distributed, not the actual computation effort. However the distribution is complete here. Server or not, everything still works…
– still need to have a look at Mozart…
Mmmm after a quick look, completely distributed applications arent really wide spread… After all the problem is still an open one and there are still lots of research being done.
Famous applications, that may look distributed, such as Twitter for example, actually arent using anything more than the usual client-server architecture, and therefore require a server… However some might be distributable among different servers, without bottleneck effect : Can ejabberd be completely distributed for example ?
Any other ideas ?
In the following posts I ll have a look at how these applications works to help me write prototypes of distributed, server-less, code. My first application is likely to be a chat-like, probably similar to twitter, as it can use XMPP, and has pretty simple commands. Let’s see what I can come up with 😉