Erlang Bitstring Operators

2010/03/12 2 comments

After looking more deeply into what does a DHT like Kademlia does or not, I started to write some useful code in Erlang…

First it seems ( quite intuitive I must say ) that a DHT doesn’t say anything about connections… That is who am I connecting to at first, how to choose my endpoint, etc. Everything that is not enforced by the DHT mechanism is an opportunity to tune the system towards more special needs and features other than the DHT possibility set. Therefore I am making the intuitive assumption that a good way would be to connect to a very close node. BATMAN has an interesting and quite simple way of handling connection, so I decided to follow the example 😉

I just finished a UDP broadcaster in erlang, pretty simple, that basically advertise to his online neighbors itself came online, and register its neighbors’ replies.

Then comparing This and That I got a bit confused about some stuff…
For example : nodeID seems to be up to the user, provided a few conditions… is it really ?
But anyway it is sure that the nodeID is going to be a long binary. So I decided to start implementing a key scheme for what I had in mind for the dynamic vector clocks algorithms, that is a key based on the order of connection of the different node.

However, I was quite disappointed to *not find* any simple way to deal with bitstring operations in erlang… Binary operations work on integer only, and that would be 8bits for convenience with bitstring apparently… so I started writing my own module for that, with bs_and, bs_or, bs_not, and so on, that work on a bitstring of indefinite size. It s pretty basic and not optimized at all, but it works. Not sure if there is some interest out there for it, but let me know if there is, I can always put it on github somewhere 😉

Erlang Bitstring Operator test Screenshot

Erlang Bitstring Operator test Screenshot

Other than that, I keep working on my little portable SDL-based Game Engine, which now has OpenGL enabled by default if available on your machine.

SDLut_Refresh Test Screenshot

SDLut_Refresh Test Screenshot

It s working pretty nicely for simple bitmap display. Now the refresh is optimized ( I should display fps up there 🙂 ), and the user doesnt have to manage the list of rectangles to know what has changed on the screen or not ( which wasnt supported before in SDL render mode). Also openGL can be disabled if not needed, nothing changes on the interface for the user 🙂 pretty handy 😀 That was some work, but now the most troubling point is the “Font” part, that doesnt behave exaclyt as you would expect… more work to be done there.

Categories: Distribution, erlang Tags: ,

Towards a beginning of a design ?

2010/03/10 1 comment

I have been thinking for a very long time over this, gathering research papers, and browsing Internet to look for a possible way to implement what I was thinking of… If I wanted to name it, it would be something like : A Decentralized Distributed MMO(or not) Game(or more serious) Engine.

The idea seems simple, and quite intuitive, however one needs to be aware of the Fallacies of Distributed Computing

Here is what we have, from an very abstract and high point of view : Computers, and links between them.
Here is what we want to do with it : Store Data, Send Messages, Process Instructions
And we can see the problems we will have to face : data might not be available, data might be corrupt, message can disappear, messages can be in the wrong order, links can disappear, Network Topology can change, etc.

Here is what I want :
– No central point
– distributed topology, with node joining or leaving anytime
– resilient even if one node or link fails, or get out of the system at an unexpected moment. No data get lost, no connection gets broken.
– good performance ( real-time like would be great )

The trade-off between performance and resilience is pretty difficult to manage. Trying to build one on top of the other, which one would you start with first ?

Although many systems try to solve or alleviate one of these problems, none of them as far as I am aware, can deal with all of them while maintaining a decent performance. I thought after having a look at a few research proposal that one solution for one problem would be really interesting to implement, however, after trying it, I realized how important it was for some foundation to be laid down first. I made some small development in erlang, and quickly wondered how I could structure my software, given all the components that I would need to satisfy all the features I thought of… I wrote some of them, while others would have required much more expertise than my own to work. So I need to heavily reuse what has already been done to make my task a bit easier if I ever want to achieve my goal.

After all there is the “Researcher way”, who is an expert in his field, and can have enough funding to spend a lot of time developing one system, until it becomes as good as it can, in theory. And there is the “Entrepreneur way”, who has to make something working quickly, no matter how dirty and partially done it can be, with everything he can find, provided that people are interested and will sustain him to improve the system along the way…
Even if I am still tempted by the first way, I am no longer a student, nor seem to be able to secure any funding at the moment, and I therefore have to take the second path.

So I should :
– reuse what is already working elsewhere: DHT – p2p data sharing
– make something interesting out of the system ??? we ll see, depending on what it can do… probably trying to use it with my little open-source game engine
– plan for re-usability : structure the project, documents the different parts separately
– plan for improvements later on : divide to conquer, and specify interfaces between blocks

That why I decided on a basic layer architecture for a start :
– Implicit connection to the p2p network.
– DHT to keep “IP – nodeID” pairs distributively mostly, and other “global state data”…
– Routing Algorithm
– SCRIBE-like – manage groups and multicast
– Message Transport protocol ( overlay UDT ? or direct SCTPDCCP ? depending needs and performances… )
– Causality algorithms (Interval Tree Clock – like), which might need multicast for optimization, when there is no central system, depending on the type of implementation probably…
– Game Engine Layer, able to send state updates efficiently, with proper ordering, to a set of selected peers.

The choice to based the design on a DHT is, I think, the best for me. Despite my interest in AdHoc Networking protocols, and how much I would like to implement them on top of IP to get an increased fault tolerant network, I am not a Networks’ Algorithms Expert, and it would take me far too long to get to something decent working. Also DHT have now be quite extensively studied, and some implementation exists and are very usable, which enables me to reuse them, so I can focus on something else. Some improvements are likely to emerge in the years to come, and by using something already known, it will be easier to integrate evolutions.
Depending on which implementation I choose, I will have to check which features are available out of the box, and which one I will need to implement on top of it, to reach the feature set I want. Kademlia seems to be the more mature DHT algorithm from what I could gather around internet, but I will need to look at it more deeply. SCRIBE was implemented on top of the Pastry algorithm and I would need to reimplement it on top of Kademlia, as I didnt find any similar attempt…

Not an easy task but definitely an interesting research process 😉 Let s hope there will be something worth it at the end of the road.

Categories: Distribution Tags:

Peer-to-peer distributed, existing systems

Looking at GNU Social which is likely to be centralized, sadly, I found a list of other projects, much more distributed, that raised interests, and I should have a deeper look at them soon… Most of them concern file-sharing, but not only…

The Circle is a peer-to-peer distributed file system written mainly in Python. It is based on the Chord distributed hash table (DHT).
> But too bad : Development on the Circle has ceased in 2004. However the source is still available 😉

CSpace provides a platform for secure, decentralized, user-to-user communication over the internet. The driving idea behind the CSpace platform is to provide a connect(user,service) primitive, similar to the sockets API connect(ip,port). Applications built on top of CSpace can simply invoke connect(user,service) to establish a connection.
> That is pretty similar to what I want to achieve with my current developments, but if the “user view” will be similar, the intricacies will be quite different…

Tahoe-LAFS is a secure, decentralized, data store. All of the source code is available under a choice of two Free Software, Open Source licences. This filesystem is encrypted and spread over multiple peers in such a way that it remains available even when some of the peers are unavailable, malfunctioning, or malicious.
> Yeah so thats done. At least there is something I will not try to do 🙂 Still need to test it though…

GNUnet is a framework for secure peer-to-peer networking that does not use any centralized or otherwise trusted services. A first service implemented on top of the networking layer allows anonymous censorship-resistant file-sharing. Anonymity is provided by making messages originating from a peer indistinguishable from messages that the peer is routing. All peers act as routers and use link-encrypted connections with stable bandwidth utilization to communicate with each other. GNUnet uses a simple, excess-based economic model to allocate resources. Peers in GNUnet monitor each others behavior with respect to resource usage; peers that contribute to the network are rewarded with better service.
> Too bad they focus only on file sharing…

The ANGEL APPLICATION (a subproject of MISSION ETERNITY) aims to minimize, and ideally eliminate, the administrative and material costs of backing up. It does so by providing a peer-to-peer/social storage infrastructure where people collaborate to back up each other’s data. Its goals are (in order of descending relevance to this project)
> File sharing…

You can call Netsukuku a “scalable ad-hoc network architecture for cheap self-configuring Internets”. Scalable ad-hoc network architectures give the possibility to build and sustain a network as large as the Internet without any manual intervention. Netsukuku adopts a modified distance vector routing mechanism that is well integrated in different layers of its hierarchical network topology.
> Ad-Hoc alternative network 🙂 interesting… I still want to use internet though…

Syndie is an open source system for operating distributed forums (Why would you use Syndie?), offering a secure and consistent interface to various anonymous and non-anonymous content networks.
> Only forums… mmm…

I also found a blog that seems interesting although I am pretty sure it is mostly Amazon centric :
Might be worth to have a deeper look at to see what the big companies are coming up with…

Categories: Distribution Tags:

Distribution Protocols Specifications

I recently looked around for the different distribution protocols that I mentioned in my previous post.

Here is what I found :
– The Bittorrent protocol specification,
– The Erlang’s Distribution protocol,
– The BOINC System Architecture,

And I also found out some interesting initiatives, that seems to be pretty inactive at the moment, sadly : Gittorrent and its protocol : GTP
An interesting mix 🙂

Apart from that, after reading a lot of research papers on distribution. I have selected a few that seem to be related to my interests.

One of the problems, in the way I see it, is that when you have one CPU, you can easily say that instruction happened “after” that one. And that becomes natural when you write a line of code after another. Now when you have multiple CPU running multiple processes in different places, with delays in transmissions, it become crucial to be able to determine if an instruction actually happened “before” or “after” ( or “at the same time” ) another instruction.

The most widely used algorithm to do that might be what is called Vector Clocks.
One problem though, is that vector clocks aren’t really dynamic, as you must know from the start how many concurrent processes are running, and you might be quite limited on the number of concurrent process you can have, as it will, with a naive implementation, change the size of your clock.
Second issue is that you need, most of the time, if you care about the size of the data you are sending over the wire, to have a central point to establish and increment that clock. Otherwise you need either :
– to duplicate your clock to actually end up with a Matrix Clock that can be in some context prohibitive because of its size…
– to send all your messages to all processes, so that the clock doesn’t become inconsistent between 2 nodes, and make sure they actually arrive…
This is not a simple issue to work around if you want to use your distribution redundancy as a safety net, for a process to replace another one if something fails for example, or for a “tunnel”  to be setup when a connection in your net falls down.

All this is a little annoying as if you want to optimise your transfer over the network, while using the benefits of the distribution to write safe applications, without any single failure point.

An interesting alternative to vector clock has been proposed : Interval Tree Clocks
It deals with the dynamic problem, and I planned first to try to implement something similar to organize data over my distributed network.

I am not sure I understand all of it already, but from what I can catch so far, I think the size of data you need to send to the different nodes might change with a naive implementation at least… I am not so sure of how to handle that especially if I want to use UDP and not TCP, as I do not really need the total ordering TCP provides, but I will likely enjoy the UDP performance…
The idea to have an ProcessID that is used when you increment the clock seems to be the way to go to handle dynamic creation / failures / termination of processes ; and the way the number of concurrent processes is “compressed” is pretty effective and interesting. However I am not sure if it will actually work in an architecture without “central point” or total broadcast, or some node might wait forever for a message who is not coming…
I now think that I need something closer to the network, at a lower level. Something that can deal with network intricacies (latency, losses … ) if needed. Or maybe I will need to subdivide my feature set, and implement them in separate components… we’ll see.

When you have a distributed, network hungry, application, the best way to reduce messages transmitted from one place to the other is to actually filter the messages you are going to send. Therefore there is in many protocols a “publish/subscribe” system to actually determine what needs to be send to who. And that “selective messaging” while improving network usage, doesnt play nice with ordering algorithms…

What I want to try to get, is a completely distributed architecture, without the need for a central “reference” point at any moment, and with possible “selective sending”. While keeping the partial ordering in a dynamic context that is provided by ITC. I think this might raise a lot of interests in people towards distributed architecture, because of the safety you might gain ( no central failure point ) and the optimisation on the network that appears when you have many processes communicating the same information around…

A lot of other improvements have already been made around vector clocks, more or less publicly, with more or less success, and what I need to likely to be a “mix of those”. Yet I ll have to prove somehow that it s going to work, and that it will not impact the network in a bad way…
The search is far from over…

Distributed Computing, what is it used for ?

It s been quite a bit since I last blogged here, and I ve been busy looking for a job, fixing my little game engine, and writing prototypes erlang for the network side of it. Since I plan to implement quickly some type of distribution, it is useful to have a look at existing distributed applications and protocols, and how they are used and loved ( or not ) by people out there…

But I am talking about total, complete, distribution, meaning peer-to-peer distribution. No central servers, ie. no central point of failure, or slow-downs.

So first lets make a quick list of things, that might be worth looking into :
bittorent, which has been around for quite a while. ( the tracker can be the server, however it is just required temporarily… )
BOINC, which is right now probably the most used distributed platform ( do they need servers ? )
erlang_OTP, which apparently can be used to write and run completely distributed applications.
Git and Bazaar, Distributed CVS : only data is distributed, not the actual computation effort. However the distribution is complete here. Server or not, everything still works…
– still need to have a look at Mozart

Mmmm after a quick look, completely distributed applications arent really wide spread… After all the problem is still an open one and there are still lots of research being done.

Famous applications, that may look distributed, such as Twitter for example, actually arent using anything more than the usual client-server architecture, and therefore require a server… However some might be distributable among different servers, without bottleneck effect : Can ejabberd be completely distributed for example ?
Any other ideas ?

In the following posts I ll have a look at how these applications works to help me write prototypes of distributed, server-less, code. My first application is likely to be a chat-like, probably similar to twitter, as it can use XMPP, and has pretty simple commands. Let’s see what I can come up with 😉

Distributed Programming Language -> What is it exactly ?

There are already quite a few distributed programming languages out there. The most popular being probably Erlang, although it s not a “network transparent” or “distribution transparent” language. On the other hand Mozart ( which I didnt try yet, but probably will very soon ), seems to have some mechanics for transparent distribution, but with the cost of lot of semantics to be learned. And, well to be honest I never heard about it so far, although I am sure there must be clever systems in it… Why is that ?

The difficulty in finding a proper distributed language for your application comes from the fact, that you can write distributed application as soon as you have any “socket related” feature in your language. However the time you need to write such an application in a language that hasnt been designed for it, is humongous. In addition, in the real world, the necessity or distributed languages is arguable right now, because companies dont see much applications for it just yet. Without the wheel nobody thought about cars I bet 😉

The main question I want to discuss here is : “What does a distributed programming language need in its very core to be useful, widely used and accepted by usual developers ??”

=> To be used it needs to be useful, yet simple, so most developers used to imperative programming can grasp it quickly, and do something useful with it.
In traditional local programming, the execution is done on a CPU, one instruction after another.
In concurrent / distributed programming, the execution is done on multiple CPU, many instructions at the same time ( maybe? ) and we do need a way to harness that. It can be an extremely complex “concurrency graph” of relations between different instructions set in different locations. One of the way to harness it with a very fine-grained control system could be based on “Causality-tracking” algorithm, that are already in use by TeleCom Companies, for mobile phones related applications.

In traditional ( imperative and functional ) programming, we are writing instructions in a structured way, that we can represent as a decision tree. Problems arises when you have to mix the “causality graph” of relations between different instructions or events in the different location, and that decision tree.
If I try to write ( in C++ like syntax) instructions like :

var a = 5; if ( condition )
//I am asking remote processB to increment my variable "a"
{ remote_execute ( incr(a), processB); }

//I expect a to be incremented -> MISTAKE !!!
{ getfrom(a, processB ); }

I am making an obvious mistake, because in the “else” block, my variable a is not going to be incremented, and even more, it might not be known by processB at all. Even if it is a programmer error, we do need a way to tell him early on it s an error, or better, make it impossible for someone to write such a code. And this might be the most obvious error that can happen. If you have written multithreaded applications in the past, you know how troublesome it can quickly become.

The main thing behind compiled languages, is that they try to warn the user during compilation of all the possible errors. And they cant prevent most concurrency errors, because these are very dependent on other factors ( execution environment, ordering of instructions, etc. ) that the compiler cannot have knowledge of. The causality tree of a concurrent program is unveiling during the execution, there is no static representation of it that we can code. And even if we could, it would be a big hassle to the programmer who would probably not even want to think about it, since it s not related to his problem, as long as performance is satisfactory.

To conclude, to be useful, a distributed language needs to :
– remain simple and efficient, maybe simplifying the different ways to have concurrency( threads process, etc. )  and making it more transparent.
– harness concurrency with transparent causality tracking, ( simplicity : transparent, efficiency : allowing concurrent events – partial order )
– find simple and intuitive ways to mix that “dynamic” causality tree with the “static” decision tree. This is a very tricky part. Some system have concurrency controling imperative algorithms( BOINC ), some have imperative control over lightweight concurrency ( multithreads )
– be based on “evaluation”, just like script engines, because we already have a running system, and we can dynamically check for concurrency errors using it. It s also a good sandbox for concurrency algorithm tests.

=> To be widely accepted it needs to do *what we are already doing* but better, faster, stronger, even if we are not already doing distributed applications…
The most widespread “development” and “application writing” activity might be website development nowadays. It make sense in a way because it s related to distribution, people interaction, and just like with distributed languages, few years before the invention of the web, all the nice web applications we use on a daily basis right now would have been very hard to imagine.
To be able to compete with traditional imperative local languages, it should have decent performance on local CPU execution. Probably with implicit threading when it is detected possible, since nowadays most new machines have multiples CPU. It should also be quite familiar and intuitive to get used to it quickly. We probably need to keep and use most of the traditional control instructions ( if/else, for, while, etc. )

A good way for people to be able to use a quite low-level language for what they want is to allow “modules” or “plugins” to add high level functionality to it. the core language should however provide interface to most devices.

The road to an accepted distributed language seems still pretty long, but it s definitively an interesting one indeed…

Does someone already a knows a Distributed Language that fits all his needs ? Does anyone has other / better ideas for distributed language features ?
I wonder when the first truly transparent distributed language is going to arrive and be widely used…

RAID and external USB backup on NetBSD 5

2010/01/23 1 comment

I recently had to change my “storage policy”…

A failed drive in my RAID1 set of 2 disks, and brand new disks coming up forced me to 😉 So here I am sharing two facts that have wasted my time recently. Hopefully it would help some to avoid these hassles, or at least plan more time for it.

First : I followed the NetBSD guide to setup my RAID back when I was on NetBSD3 probably. Things where a bit different from now that I have a 5 fully upgraded version, but anyway it still works. Problems is to be that I have the right same geometry for my disks in the RAID1, I did as they say on the official guide, which is copy the disklabel of disk 1 onto disk 2. and it works perfectly. Problem is : When you want to change the disk, both disklabels have same name and identifier, and if on top of that you have for some logistic reason to change the way the disks are plugged into the machine, it get much trickier to find out which drive you wanted to keep and which one you wanted to remove… So I had to “fix” my disklabels, and put different meaningful identifiers, so I could use them to properly identify which disks I wanted to pull out of my RAIDset.

Second : For my backup strategy, as my RAID set was previously “under-used”, meaning there was a lot more free space than I actually needed, I used to backup stuff inside the RAID itself. Even if it is discouraged for obvious reason, a RAID1 mirroring was enough to ensure my data, so I just wanted to be able to go dig up old configuration files and revert some setup, so just keeping the files, even in the same disk, made some sense.
Now that my disk usage is getting higher, and hard drive prices are getting lower, it s becoming worth it to get an external USB drive. I naively thought that just plugin it, and pointing my rsnapshot config to the appropriate mount point would be enough. Error.
RSnapshot uses hardlinks in order to minimize the duplicates of files into the different backups, and therefore cannot work on the preconfigured FAT partition on that drive. I expected that, so I plug that USB dirve in my ubuntu laptop, uses gparted to make a big ext2fs partition, and unplug it. From there I expected i would work. Well it didnt. As stated in the mailing list, at the time I am writing this, ext2fs has problem being mounted on the netbsd-5-0 branch, although this has been fixed in netbsd-5. I really wanted to use ext2fs for my external backup just because it would make it so easier to retrieve data on another “non-BSD” computer, I now have to upgrade to a netbsd-5 branch, rebuild everything, install the newly built OS, and reboot. Extremely easy to do since I document and script all these tasks. Just time consuming…

Categories: Linux, NetBSD Tags: , , ,