Side Notes

never finished…

PyCon Canada 2012

This weekend I attended PyCon Canada, the first conference in Canada dedicated to Python ecosystem. As you might find from my blog, I’m not a Python guy. I’ve been using Python mostly as a scripting language. I went to this conference for fresh ideas, or, as Michael Feathers said, for cross-polination from Python community. This blog post is not a detailed review of the conference — I just want to share my impression in general.

Organization

Considering how little time the organizers had for preparing this conference, 5 months I believe, they did amazing job. They invited great speakers. They kept people well informed using mailing list and Twitter. The official web site was clear and easy to navigate. The location was good. The food was decent. The only complaint I had is about the temperature in the rooms on the first day. It was so freezing cold inside that I had to wear my jacket all the time. But on the second day the problem was fixed.

Keynotes

Keynotes were absolutely fantastic. There were three of them. Jessica McKellar was talking about Python community. How they foster it, how they attract new people to programming in general and to Python in particular. She shared her experience from organizing Boston Python user group, the biggest Python user group in the world. The takeaway from her talk: Python community is big, welcoming, and well supported by Python Foundation.

Second keynote was Michael Feathers’ Why You Should(n’t) be Using a Functional Programming Language Instead. The main idea of his talk is: Don’t lock yourself inside one language. Go outside of your community to see what other languages exist out there, how they solved the problems. Study those languages, learn their idioms and techniques, and then go back to your language and start using the ideas you’ve learnt. I completely agree with that, and that’s why I went to this conference in the first place. He gave bunch of examples of functional programming in Haskell. Then he showed his Ruby code written in functional style, where you could see the influence of Haskell. I liked his presentation because he verbalized the ideas I myself have been thinking about for a while. When I started programming in Groovy my Groovy code was basically a Java code without semicolons. Now my Java code looks more like Groovy.

The closing keynote was by Fernando Pérez, the scientist from University of California, Berkeley, and the creator of IPython. The talk, titled Science and Python, was really mind blowing. When I was a student I did all my computations using mainly Fortran and some proprietary software I don’t even remember the name of. Later, I played with Mathematica and Octave a little bit. But I didn’t know that you can do very sophisticated scientific calculations using Python. Fernando gave some examples from neuroscience, astrophysics and biology, and it’s really impressive. The discovery of Supernova PTF11kyl is especially astonishing. From now on, if I need to do some math, I’ll be using Python libraries; no more proprietary expensive software. Another theme of the presentation was IPython. Initially I thought it’s just a shell on top of the standard Python, but it’s actually the whole ecosystem. I cannot explain in a few words how amazing it is. Just google for “ipython notebook” or read Fernando’s blog.

Talks

As it happens on every conference, there were some great talks and some lousy talks, interesting talks and boring talks, geeky talks and academic talks. It’s all normal and fine. The good thing about this conference though is that signal-noise ratio was pretty high; congratulations to the organizers for choosing talks. Another thing I like is the diversity of formats. There were 45-min presentation, 20-min talks, 5-min lightning talks, 90-min tutorials, and 3-hour workshop (there are also two full day coding sessions but I’m not attending them). This is a really good approach. Switching between different formats during the day helps your brain functioning more productive, in my opinion.

Pleasant discoveries

I found many projects presented at the conference are using RabbitMQ, and that’s great. I wish in Java world people would use AMQP more frequently instead of blindly choosing JMS for every new project.

Many people are using MongoDB properly. Nowadays NoSQL is a very popular buzzword, and many projects are using various NoSQL databases just because it’s cool, even if it makes no sense for the project at all. It was nice to see that there are developers out there who do their homework and adopt NoSQL because it fits their domain.

Unpleasant discoveries

There seems to be a trend in Python community to despise Java. I actually see this trend in many communities outside of Java, so it’s not Python specific, but at this conference I’ve heard too many jokes about Java so it’s not funny anymore, especially hearing them from the people who don’t write a line of code in Java.

Another thing surprised me is the fanatic admiration of Mercurial and hate of Git from some Python programmers. I know lots of people who hate Git, mainly because they are confused and scared by Git. But dislike it for the reason not being written in Python is something new to me.

Problems in Python

Package and distribution management in Python is in pretty bad shape. Every person I talked to admitted that it’s complete mess at the moment. I myself feel that pain every time I need to install a new library. Which tool should I use: pip, easy_install, pysetup? Some libraries installed using those tools don’t work, or work partially. Many programmers use rpm or deb packages instead of Python tools, because OS packages usually work. I came to the same conclusion on my Mac OS. The only flawlessly working Python environment I have is that installed via mac ports. In Java we don’t have those problems. Maven solved it once and for all long time ago. Now every JVM language benefits from it. Python community should clean up this mess and standardize their tools. I was told that with introducing PyPi and PEPs the situation is getting better, well, let’s see if it resolves all the issues.

What I’ve learnt

Here is the list of things I found pretty interesting, in no particular order.

Python libraries to use

numpy, matplotlib, pandas, scipy, sympy, quantities, collections. Thanks to the people who told me about these libraries.

Cool Python stuff

RunSnakeRun — GUI for Python profiler. Check out the screenshots on their web site. I wish Java profilers could draw such nice graphs.

bpython — Python REPL for geeks written in Urwid. Thanks to Ian Ward for the really nice presentation.

Interesting ideas

Print log statements in JSON format so that you can analyze them using powerful tools. You can also save logs in MongoDB, either offline or asynchronously, and do statistic analysis using aggregation framework.

Write stored procedures in PostgreSQL in Python (and some other languages). They look much better in Python than plSQL.

Things to learn 

Here are some technology and tools that have a great potential, in my opinion, and worthy of learning: ZeroMQ, IPython, OpenStack. Those were mentioned multiple times during the conference, and I need to check them out in more details.

Summary

The conference was great. I’m glad I attended it. The organizers did a great job. The conference was beneficial not only to Python community but to Toronto programming community in general. Thanks to all who made it happen.

P.S. Videos from the conference are available here.

Simple Web Application in Clojure

This blog entry is a micro-tutorial on how to build a simple web application in Clojure. The reason I call it micro will be clear when I introduce the framework we are going to use. This tutorial will be interesting to programmers relatively new to Clojure, but who have some experience with web frameworks in other languages, for instance Spring MVC. The goal of this tutorial is to help you get started with web development in Clojure. Also I want to share my approach to web development in general and in Clojure in particular. This approach is by no means the best way to develop web applications, but because I like to watch how other people write the software, I thought somebody might be interested to see how I do it.

Exporting Solr Documents

Recently I had to copy some documents from one Solr server to another. I expected Solr already had an interface that allowed me to extract documents in the same format they were inserted. In that case I would pipe an output of one curl command to another, and consider the job done. As it turned out, the format of Solr input document is different than the output format. Here is how input document looks like:

<add>
    <doc>
        <field name="id">12345</field>
        <field name="articlestate">published</field>
        <field name="articletype">news</field>
        <field name="body">Lorem ipsum dolor...</field>
        <field name="referenceid">175820</field>
        <field name="referenceid">163786</field>
        <field name="created">2011-02-15T14:57:54.766Z</field>
    </doc>
</add>

Notice the flat structure of this document: all element names are the same regardless of the filed type, and arrays (referenceid) are not grouped. Now compare it to the output format. Here is what you get when you execute a query against a Solr server:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
            <str name="q">id:12345</str>
        </lst>
    </lst>
    <result name="response" numFound="1" start="0">
        <doc>
            <str name="id">12345</str>
            <str name="articlestate">published</str>
            <str name="articletype">news</str>
            <str name="body">Lorem ipsum dolor...</str>
            <arr name="referenceid">
                <str>175820</str>
                <str>163786</str>
            </arr>
            <date name="created">2011-02-15T14:57:54.766Z</date>
        </doc>
    </result>
</response>

Even if we ignore the response header, the structure of the response/result/doc is not the same as of input document: the element names reflect the types, the arrays are grouped. If you try to add this document to a Solr server, you will get an error “unexpected XML tag”, obviously. I googled for couple hours on how to convert an output document to an input, and, to my surprise, didn’t find any solution. Therefore I implemented my own converter in Groovy, which solved the problem. I post it here in case somebody needs it.

Note: You can also use this script to re-index Solr.

Modulo Who?

When programmer and mathematician are talking about modulus or modulo, there is often a confusion what this term means. For programmer modulo means an operator that finds the remainder of division of one number by another, e.g. 5 mod 2 = 1. For mathematician modulo is a congruence relation between two numbers: a and b are said to be congruent modulo n, written a ≡ b (mod n), if their difference a − b is an integer multiple of n.

These two definitions are not equivalent. The former is a special case of the latter: if b mod n = a then a ≡ b (mod n). The inverse is not true in general case. 5 mod 2 = 1, and 1 ≡ 5 (mod 2) because 1 - 5 = –4 is integer multiple of 2. Now 5 ≡ 1 (mod 2), because 5 - 1 = 4 is evenly divisible by 2, but 1 mod 2 = 1, not 5.

The biggest confusion happens when programmer and mathematician start arguing about Gauss’ famous Golden Theorem where both definitions of modulus can be used.

Thomae's Function

Thomae’s function (a.k.a. Riemann function) is defined on the interval (0, 1) as follows

Here is the graph of this function with some points highlighted as plus symbols for better view.

This function has interesting property: it’s continuous at all irrational points. It’s easy to see this if you notice that for any positive ε there is a finite number of dots above the line y = ε. That means for any irrational number x0 you can always construct a δ-neighbourhood that doesn’t contain any dot from the area above the line y = ε.

To generate the data file with point coordinates I wrote Common Lisp program:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(defun rational-numbers (max-denominator)
  (let ((result (list)))
    (loop for q from 2 to max-denominator do
      (loop for p from 1 to (1- q) do
        (pushnew (/ p q) result)))
    result))

(defun thomae-rational-points (abscissae)
  (mapcar (lambda (x) (list x (/ 1 (denominator x)))) abscissae))

(defun thomae (max-denominator)
  (let ((points (thomae-rational-points (rational-numbers max-denominator))))
    (with-open-file (stream "thomae.dat" :direction :output)
      (loop for point in points do
        (format stream "~4$ ~4$~%" (first point) (second point))))))

(thomae 500)

To create the images I used gnuplot commands:

plot "thomae.dat" using 1:2 with dots
plot "thomae.dat" using 1:2 with points

and Photoshop.

Math and Physics of Benderama

The last episode of Futurama has interesting formula involved. The entire plot is based on the Professor’s latest invention — Banach-Tarski Dupla-Shrinker — the machine that produces two copies of any object at a 60% scale. It was just a matter of time when Bender found a proper usage of this machine: to replicate himself. Then two small copies of Bender replicated themselves making four smaller copies, and so forth. At some point the Professor horrified the crew that if they don’t stop this unlimited growth, the total mass of all Benders will eventually be so big that the entire Earth will be consumed during the process of replication. As a proof he demonstrated this formula of the mass of all generations of Bender

This is a perfect toy for a science geek. The first obvious question it brings: is this formula mathematically correct? As it turns out, it is not. Considering the scale of 60%, the cubic dependency of volume on linear dimension, and the constant density of all copies, the formula should be the following

As you can see the total mass of infinite number of Benders actually converges to approximately 1.76 M0. So from Math perspective there is nothing to worry about. But what if our assumption of constant density is invalid. Would it be a problem from Physics perspective? Let’s see.

Knowing that every new copy has a size of 0.6 of the original it was made from, we have the following formula for the size of Bender in the nth generation

This exponential function becomes very small pretty soon. In the 154th generation it already reaches the Planck length, after which the further replication is physically impossible. If we calculate the total mass of 154 Bender’s generations using the Professor’s formula, we get H(154) × 238 kg ≈ 1,337.56 kg, which is nothing comparing to the Earth mass.

So we have to admit that from both Math and Physics perspective the Professor was wrong, and there was no real threat to the Earth.

Although the Professor’s formula doesn’t describe the replication process adequately, it’s still a beautiful piece of Math because it’s a formula of harmonic series. If you want to know why harmonic series is beautiful and which real processes it describes, read this nice article of John H. Webb.

And don’t miss the next episode of Futurama this Thursday :-)

Functional Groovy Switch Statement

In the previous post I showed how to replace chained if-else statements in Groovy with one concise switch. It was done for the special case of if-stement where every branch was evaluated using the same condition function. Today I want to make a generalization of that technique by allowing to use different conditionals.

Suppose your code looks like this:

if (param % 2 == 0) {
    'even'
} else if (param % 3 == 0) {
    'threeven'
} else if (0 < param) {
    'positive'
} else {
    'negative'
}

As long as every condition operates on the same parameter, you can replace the entire chain with a switch. In this scenario param becomes a switch parameter and conditions become case parameters of Closure type. The only thing we need to do is to override Closure.isCase() method as I described in the previous post. The safest way to do it is to create a category class:

class CaseCategory {
    static boolean isCase(Closure casePredicate, Object switchParameter) {
        casePredicate.call switchParameter
    }
}

Now we can replace if-statement with the following switch:

use (CaseCategory) {
    switch (param) {
        case { it % 2 == 0 } : return 'even'
        case { it % 3 == 0 } : return 'threeven'
        case { 0 < it }      : return 'positive'
        default              : return 'negative'
    }
}

We can actually go further and extract in-line closures:

def even = {
    it % 2 == 0
}
def threeven = {
    it % 3 == 0
}
def positive = {
    0 < it
}

After which the code becomes even more readable:

use (CaseCategory) {
    switch (param) {
        case even     : return 'even'
        case threeven : return 'threeven'
        case positive : return 'positive'
        default       : return 'negative'
    }
}

Nothing New Under the Sun

Every generation of software developers needs its own fad. For my generation it was Agile, for generation before it was OOP, and before that it was another big thing. Gerald Weinberg, one of the most influential people in our industry, blogged yesterday about this issue. With over 50 years of experience in software development he knows what he is talking about. Read his blog post — he has a very good point.

P.S. I’m wondering what will be the next big thing. Will it be Cloud or Big Data?

Multimethods in Groovy

Every time I switch from Groovy to Java I have to remind myself that some things that seem so natural and work as expected in Groovy, don’t work in Java. One of such differences is method dispatching. Groovy supports multiple dispatch, while Java does not. Therefore the following code works differently in Groovy and Java:

public class A {
    public void foo(A a) { System.out.println("A/A"); }
    public void foo(B b) { System.out.println("A/B"); }
}
public class B extends A {
    public void foo(A a) { System.out.println("B/A"); }
    public void foo(B b) { System.out.println("B/B"); }
}
public class Main {
    public static void main(String[] args) {
        A a = new A();
        A b = new B();
        a.foo(a);
        b.foo(b);
    }
}

$ java Main
A/A
B/A

$ groovy Main.groovy
A/A
B/B