• Simple Zookeeper cluster

    Sometimes I need to run ZooKeeper ensemble on my development box to test my application on the production-like environment. I found that recreating the whole ensemble from scratch is much faster than cleaning it up using ZooKeeper CLI tool. To automate this process I created a bash script which I want to share in this blog post. I hard-coded all the paths in the script using my regular conventions. You might need to change them to yours — it should be fairly straightforward.

    Before you can use the script, you need to install ZooKeeper on your box. That’s what I did on my machine

    $ cd /opt
    $ sudo mkdir zookeeper
    $ sudo chown -R andrey:admin zookeeper
    $ cd zookeeper
    $ wget http://apache.mirror.rafal.ca/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
    $ tar xf zookeeper-3.4.5.tar.gz
    $ rm zookeeper-3.4.5.tar.gz
    $ ln -s zookeeper-3.4.5 zookeeper
    

    In the end you should have a ZooKeeper installed in /opt/zookeeper/zookeeper directory.

    Now download, chmod, and run the script. It will create the following files

    /opt/zookeeper/zookeeper/cluster
    ├── server1
    │   ├── conf
    │   │   ├── log4j.properties
    │   │   └── zoo.cfg
    │   ├── data
    │   │   └── myid
    │   └── logs
    ├── server2
    │   ├── conf
    │   │   ├── log4j.properties
    │   │   └── zoo.cfg
    │   ├── data
    │   │   └── myid
    │   └── logs
    ├── server3
    │   ├── conf
    │   │   ├── log4j.properties
    │   │   └── zoo.cfg
    │   ├── data
    │   │   └── myid
    │   └── logs
    ├── start.sh
    └── stop.sh
    

    This is the minimum configuration for 3-node ensemble (cluster), which is recommended for production. To start the cluster, run the following command

    $ cd /opt/zookeeper/zookeeper
    $ cluster/start.sh
    

    Check the log files to see if the cluster is successfully started

    $ tail -f cluster/server{1,2,3}/logs/zookeeper.out
    

    When the cluster is up and running, you can test your application. After you are done, shutdown the cluster using the following command

    $ cluster/stop.sh
    $ ps -ef | grep java
    

    To recreate a clean cluster, just run the script again

    $ ./zookeeper-init-ensemble.sh
    
  • Embrace Big Data

  • RabbitMQ, ActiveMQ, ZeroMQ, HornetQ

    Warning: In this post I’m going to compare RabbitMQ, ZeroMQ, ActiveMQ, and HornetQ. The basis of the comparison is not the performance, or the scalability, or any other serious feature. The comparison is done purely based on the popularity of those systems. Therefore, if you came here to see some performance metrics, you will be disappointed — there is none in this post.

    Note: To calculate popularity, I’m going to use MongoDB and Python, so if you don’t care about message brokers, but you want to see some examples of MongoDB scripts, this post might be interesting to you.

  • Code Retreat 2012

    Yesterday was the Global Day of CodeRetreat. Software engineers around the globe met together to learn from each other.

    CR2012 1

    There were several sessions where people were pair-programming Conway’s Game of Life.

    CR2012 2

    Each session you had to choose a new partner, so that both of you can learn something new.

    CR2012 3

    During the first session my partner and I decided to implement the Game in Java, mainly because it was the language she was most comfortable with. We implemented the procedural solution using two-dimensional array and nested loops. At that moment that was the only solution I could think of. The main challenge was to cover all edge cases and fix all ArrayIndexOutOfBoundsExceptions. Java is fairly verbose language, and with nested loops and if-else statements the final solution was pretty hard to read. You can see here how it might look like.

    First session was a warmup, during which most people realized that programming arrays is a tedious work. For the second session my new partner suggested an object-oriented approach, where you would operate on Cell objects that would encapsulate coordinates on the grid. In this case you move the game logic from the grid to the cell, making it easier to calculate a new state. This was my first acquaintance with C#. Interesting language — basically, Java with lambdas. Here is an example of C# implementation. Our solution was very similar.

    While the first session’s data structure was array of booleans, on the second session it was replaced by a list of objects. The next step would be to relax the data structure even further. We decided to experiment with un-ordered set of coordinate pairs. For language we chose Clojure. Although we didn’t finish the implementation, by the end of the session we had a clear picture how to solve the problem in functional style.

    On the fourth session the facilitators put an interesting constraint: the coding must be done in absolute silence. That was the most amazing experience of the day. Before we started I thought we couldn’t accomplish much without talking. As it turned out, we could. The key of effective silent coding is to use the tools which both partners are familiar with. In our case we both were advanced users of Vim, and we knew Lisp languages. Our Clojure implementation was based on map/filter/reduce approach and spanned 20 lines of code. After the session Leo showed me Christophe Grand’s 7-line solution based on list comprehensions. It is so wonderful that I want to post it here

    life.clj
    (defn neighbours [[x y]]
    (for [dx [-1 0 1] dy (if (zero? dx) [-1 1] [-1 0 1])]
    [(+ dx x) (+ dy y)]))
    (defn step [cells]
    (set (for [[loc n] (frequencies (mapcat neighbours cells))
    :when (or (= n 3) (and (= n 2) (cells loc)))]
    loc)))

    For the last session we chose Erlang. Because we already knew how to implement the functional solution, that was just an exercise of translating Clojure code into Erlang. Unfortunately we didn’t find an equivalent of frequencies function in the standard library, so we implemented it ourselves. Other than that, the Erlang code is almost identical to Clojure.

    life.erl
    neighbours({X, Y}) ->
    [{X + DX, Y + DY} || DX <- [-1, 0, 1], DY <- [-1, 0, 1], {DX, DY} =/= {0, 0}].
    step(Cells) ->
    Nbs = lists:flatmap(fun neighbours/1, sets:to_list(Cells)),
    NewCells = [C || {C, N} <- dict:to_list(frequencies(Nbs)),
    (N == 3) orelse ((N == 2) andalso sets:is_element(C, Cells))],
    sets:from_list(NewCells).
    frequencies(List) -> frequencies(List, dict:new()).
    frequencies([], Acc) -> Acc;
    frequencies([X|Xs], Acc) -> frequencies(Xs, dict:update_counter(X, 1, Acc)).

    Summary

    During this day I learnt a lot: new language, new abstractions, new techniques, new ways of communication, new ideas. I met bunch of smart people. I was so overwhelmed with all this cool stuff that I had to write this blog post to offload it from my head.

    If you are a programmer and you’ve never been to CodeRetreat, I strongly encourage you to do it next year. It’s an exciting experience.

    I want to thank all the people who organized and participated in this event.

    Photo Credits

    • Michael DiBernardo [1]
    • Kunal Gupta [2]
    • Carlo Barrettara [3]
  • Flexible language

    I’ve been learning Lisp for few years now, and every Lisp book I read keeps saying that Lisp is a flexible language that you can extend to the degree when it fits naturally to your domain. It’s easy to say, but what exactly does this phrase mean? After all, when you program in your non-Lisp language, don’t you modify it for your domain problem? I’ve been thinking about it for a long time, and only recently I started to understand what flexibility really means. There is a difference between using the language and changing the language to solve a problem. In this post I will try to show the difference based on a simple example.

    Problem

    Suppose you have a process that listens to a message queue. The messages are just ordinary maps. If the map contains certain keys, one or more handlers must be invoked. Here is a matrix that shows which handler is invoked for which key.

    For example, if the map has key a, then DocHandler and AlertHandler need to be called. If it has key b, then NoteHandler and AlertHandler are called. In reality there might be more keys and more handlers, but for simplicity we limit our example to three keys and three handlers.