• PyCon Canada 2012

    This weekend I attended PyCon Canada, the first conference in Canada dedicated to Python ecosystem. As you might find from my blog, I’m not a Python guy. I’ve been using Python mostly as a scripting language. I went to this conference for fresh ideas, or, as Michael Feathers said, for cross-polination from Python community. This blog post is not a detailed review of the conference — I just want to share my impression in general.

    Organization

    Considering how little time the organizers had for preparing this conference, 5 months I believe, they did amazing job. They invited great speakers. They kept people well informed using mailing list and Twitter. The official web site was clear and easy to navigate. The location was good. The food was decent. The only complaint I had is about the temperature in the rooms on the first day. It was so freezing cold inside that I had to wear my jacket all the time. But on the second day the problem was fixed.

    Keynotes

    Keynotes were absolutely fantastic. There were three of them. Jessica McKellar was talking about Python community. How they foster it, how they attract new people to programming in general and to Python in particular. She shared her experience from organizing Boston Python user group, the biggest Python user group in the world. The takeaway from her talk: Python community is big, welcoming, and well supported by Python Foundation.

    Second keynote was Michael Feathers’ Why You Should(n’t) be Using a Functional Programming Language Instead. The main idea of his talk is: Don’t lock yourself inside one language. Go outside of your community to see what other languages exist out there, how they solved the problems. Study those languages, learn their idioms and techniques, and then go back to your language and start using the ideas you’ve learnt. I completely agree with that, and that’s why I went to this conference in the first place. He gave bunch of examples of functional programming in Haskell. Then he showed his Ruby code written in functional style, where you could see the influence of Haskell. I liked his presentation because he verbalized the ideas I myself have been thinking about for a while. When I started programming in Groovy my Groovy code was basically a Java code without semicolons. Now my Java code looks more like Groovy.

    The closing keynote was by Fernando Pérez, the scientist from University of California, Berkeley, and the creator of IPython. The talk, titled Science and Python, was really mind blowing. When I was a student I did all my computations using mainly Fortran and some proprietary software I don’t even remember the name of. Later, I played with Mathematica and Octave a little bit. But I didn’t know that you can do very sophisticated scientific calculations using Python. Fernando gave some examples from neuroscience, astrophysics and biology, and it’s really impressive. The discovery of Supernova PTF11kyl is especially astonishing. From now on, if I need to do some math, I’ll be using Python libraries; no more proprietary expensive software. Another theme of the presentation was IPython. Initially I thought it’s just a shell on top of the standard Python, but it’s actually the whole ecosystem. I cannot explain in a few words how amazing it is. Just google for “ipython notebook” or read Fernando’s blog.

    Talks

    As it happens on every conference, there were some great talks and some lousy talks, interesting talks and boring talks, geeky talks and academic talks. It’s all normal and fine. The good thing about this conference though is that signal-noise ratio was pretty high; congratulations to the organizers for choosing talks. Another thing I like is the diversity of formats. There were 45-min presentation, 20-min talks, 5-min lightning talks, 90-min tutorials, and 3-hour workshop (there are also two full day coding sessions but I’m not attending them). This is a really good approach. Switching between different formats during the day helps your brain functioning more productive, in my opinion.

    Pleasant discoveries

    I found many projects presented at the conference are using RabbitMQ, and that’s great. I wish in Java world people would use AMQP more frequently instead of blindly choosing JMS for every new project.

    Many people are using MongoDB properly. Nowadays NoSQL is a very popular buzzword, and many projects are using various NoSQL databases just because it’s cool, even if it makes no sense for the project at all. It was nice to see that there are developers out there who do their homework and adopt NoSQL because it fits their domain.

    Unpleasant discoveries

    There seems to be a trend in Python community to despise Java. I actually see this trend in many communities outside of Java, so it’s not Python specific, but at this conference I’ve heard too many jokes about Java so it’s not funny anymore, especially hearing them from the people who don’t write a line of code in Java.

    Another thing surprised me is the fanatic admiration of Mercurial and hate of Git from some Python programmers. I know lots of people who hate Git, mainly because they are confused and scared by Git. But dislike it for the reason not being written in Python is something new to me.

    Problems in Python

    Package and distribution management in Python is in pretty bad shape. Every person I talked to admitted that it’s complete mess at the moment. I myself feel that pain every time I need to install a new library. Which tool should I use: pip, easy_install, pysetup? Some libraries installed using those tools don’t work, or work partially. Many programmers use rpm or deb packages instead of Python tools, because OS packages usually work. I came to the same conclusion on my Mac OS. The only flawlessly working Python environment I have is that installed via mac ports. In Java we don’t have those problems. Maven solved it once and for all long time ago. Now every JVM language benefits from it. Python community should clean up this mess and standardize their tools. I was told that with introducing PyPi and PEPs the situation is getting better, well, let’s see if it resolves all the issues.

    What I’ve learnt

    Here is the list of things I found pretty interesting, in no particular order.

    Python libraries to use

    numpy, matplotlib, pandas, scipy, sympy, quantities, collections. Thanks to the people who told me about these libraries.

    Cool Python stuff

    RunSnakeRun — GUI for Python profiler. Check out the screenshots on their web site. I wish Java profilers could draw such nice graphs.

    bpython — Python REPL for geeks written in Urwid. Thanks to Ian Ward for the really nice presentation.

    Interesting ideas

    Print log statements in JSON format so that you can analyze them using powerful tools. You can also save logs in MongoDB, either offline or asynchronously, and do statistic analysis using aggregation framework.

    Write stored procedures in PostgreSQL in Python (and some other languages). They look much better in Python than plSQL.

    Things to learn 

    Here are some technology and tools that have a great potential, in my opinion, and worthy of learning: ZeroMQ, IPython, OpenStack. Those were mentioned multiple times during the conference, and I need to check them out in more details.

    Summary

    The conference was great. I’m glad I attended it. The organizers did a great job. The conference was beneficial not only to Python community but to Toronto programming community in general. Thanks to all who made it happen.

    P.S. Videos from the conference are available here.

  • Simple web application in Clojure

    This blog entry is a micro-tutorial on how to build a simple web application in Clojure. The reason I call it micro will be clear when I introduce the framework we are going to use. This tutorial will be interesting to programmers relatively new to Clojure, but who have some experience with web frameworks in other languages, for instance Spring MVC. The goal of this tutorial is to help you get started with web development in Clojure. Also I want to share my approach to web development in general and in Clojure in particular. This approach is by no means the best way to develop web applications, but because I like to watch how other people write the software, I thought somebody might be interested to see how I do it.

  • Exporting Solr documents

    Recently I had to copy some documents from one Solr server to another. I expected Solr already had an interface that allowed me to extract documents in the same format they were inserted. In that case I would pipe an output of one curl command to another, and consider the job done. As it turned out, the format of Solr input document is different than the output format. Here is how input document looks like:

    <add>
        <doc>
            <field name="id">12345</field>
            <field name="articlestate">published</field>
            <field name="articletype">news</field>
            <field name="body">Lorem ipsum dolor...</field>
            <field name="referenceid">175820</field>
            <field name="referenceid">163786</field>
            <field name="created">2011-02-15T14:57:54.766Z</field>
        </doc>
    </add>
    

    Notice the flat structure of this document: all element names are the same regardless of the filed type, and arrays (referenceid) are not grouped. Now compare it to the output format. Here is what you get when you execute a query against a Solr server:

    <response>
        <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">1</int>
            <lst name="params">
                <str name="q">id:12345</str>
            </lst>
        </lst>
        <result name="response" numFound="1" start="0">
            <doc>
                <str name="id">12345</str>
                <str name="articlestate">published</str>
                <str name="articletype">news</str>
                <str name="body">Lorem ipsum dolor...</str>
                <arr name="referenceid">
                    <str>175820</str>
                    <str>163786</str>
                </arr>
                <date name="created">2011-02-15T14:57:54.766Z</date>
            </doc>
        </result>
    </response>
    

    Even if we ignore the response header, the structure of the response/result/doc is not the same as of input document: the element names reflect the types, the arrays are grouped. If you try to add this document to a Solr server, you will get an error “unexpected XML tag”, obviously. I googled for couple hours on how to convert an output document to an input, and, to my surprise, didn’t find any solution. Therefore I implemented my own converter in Groovy, which solved the problem. I post it here in case somebody needs it.

    Note: You can also use this script to re-index Solr.

  • Ford marbles

    I found these marvelous renderings of Ford circles on flickr. I can’t help but share them here.

  • Modulo who?

    When programmer and mathematician are talking about modulus or modulo, there is often a confusion what this term means. For programmer modulo means an operator that finds the remainder of division of one number by another, e.g. 5 mod 2 = 1. For mathematician modulo is a congruence relation between two numbers: a and b are said to be congruent modulo n, written a ≡ b (mod n), if their difference a − b is an integer multiple of n.

    These two definitions are not equivalent. The former is a special case of the latter: if b mod n = a then a ≡ b (mod n). The inverse is not true in general case. 5 mod 2 = 1, and 1 ≡ 5 (mod 2) because 1 - 5 = –4 is integer multiple of 2. Now 5 ≡ 1 (mod 2), because 5 - 1 = 4 is evenly divisible by 2, but 1 mod 2 = 1, not 5.

    The biggest confusion happens when programmer and mathematician start arguing about Gauss’ famous Golden Theorem where both definitions of modulus can be used.