How and Why We Switched from Erlang to Python

A core component of Mixpanel is the server that sits at http://api.mixpanel.com. This server is the entry point for all data that comes into the system – it’s hit every time an event is sent from a browser, phone, or backend server. Since it handles traffic from all of our customers’ customers, it must manage thousands of requests per second, reliably. It implements an interface we’ve spec’d out here, and essentially decodes the requests, cleans them up, and then puts them on a queue for further processing.

Because of these performance requirements, we originally wrote the server in Erlang (with MochiWeb) two years ago. After two years of iteration, the code has become difficult to maintain.  No one on our team is an Erlang expert, and we have had trouble debugging downtime and performance problems. So, we decided to rewrite it in Python, the de-facto language at Mixpanel.

Given how crucial this service is to our product, you can imagine my surprise when I found out that this would be my first project as an intern on the backend team. I really enjoy working on scaling problems, and the cool thing about a startup like Mixpanel is that I got to dive into one immediately. Our backend architecture is modular, so as long my service implemented the specification, I didn’t have to worry about ramping up on other Mixpanel infrastructure.

Libraries and Tradeoffs

The first thing to think about is the networking library and framework to use. This server needs to scale, which for Python means using asynchronous I/O. At Mixpanel, we use eventlet pretty widely [1], so I decided to stick with that. Furthermore, since the API server handles and responds with some interesting headers, I decided to use eventlet’s raw WSGI library. Eventlet is actually built to emulate techniques pioneered by Erlang. Its “green threads” are pretty similar to Erlang’s “actors.” The main difference is that eventlet can’t influence the Python runtime, but actors are built into Erlang at a language level, so the Erlang VM can do some cool stuff like mapping actors to kernel threads (one per core) and preemption. We get around this problem by launching one API server per core and load balancing with nginx.

Another thing to think about is the JSON library to use. Erlang is historically bad at string processing, and it turns out that string processing is very frequently the limiting factor in networked systems because you have to serialize data every time you want to transfer it. There’s not a lot of documentation online about mochijson’s performance, but switching to Python I knew that simplejson is written in C, and performs roughly 10x better than the default json library.

Finally, we use a few stateful, global data structures to track incoming requests and funnel them off to the right backend queues. In Erlang, the right way to do this is to spawn off a separate set of actors to manage each data structure and message pass with them to save and retrieve data. Our code was not set up this way at all, and it was clearly crippled by being haphazardly implemented in a functional style. It was quick and easy for me to implement a clear and easy-to-use analog in Python, and I was able to enhance some of the backing algorithms in the process. It might not sound “cool” to implement a data structure this way, but I was able to provide some important operations in constant time along with other optimizations that were cripplingly slow in the Erlang version.

Performance

Of course, a major concern with Python is performance. We receive a few thousands of requests per second, and it’s important that the API server can handle spikes in traffic. As a startup, we also need to be careful about our server costs. We could buy several servers to power the API and scale horizontally, but if we can write a fast enough server to begin with, that’s a waste of money. The optimizations I made were relatively minor, and most of my speed came from leveraging the right Python libraries. The community is extremely active, so many of my questions were already answered on Stack Overflow and in eventlet’s documentation.

Benchmarking

The setup I used to benchmark was pretty simple. I ran the API server (it only used a single core) and kestrel (queue) on the same machine, with 4 GB of RAM and a 2000 MHz AMD Opteron processor. The 4 client processes ran together on a quad-core machine with 1 GB of RAM and the same type of CPU. Each client had 25 eventlet green threads and ran requests randomly from access logs I pre-processed. Everything was running on Rackspace. Here are the results:

Requests Per Second vs TimeRound Trip Latency

As you can see, we maintained roughly 1000-1200 requests per second (on a single core), with a latency of almost always less than 100 milliseconds. We plan to deploy the API on a quad-core machine with 4 GB of RAM, so these numbers are definitely fast enough.

A Little Treat

If you read our API spec, you’ll notice that we return ‘0’ on failure and ‘1’ on success. The failure could be anything from invalid base-64 encoding to an error on our end. Developers writing their own clients have complained about this, and we listened. With the rewrite you can set “verbose=1” in the query string and we’ll respond with a helpful error message if your request fails. We’ll post again when this feature is fully live.

Reflections

I’ve written a few servers like this as personal projects, but I’ve never gotten the opportunity to throw one against real scale. With actual traffic, the challenges are much different, and unlike the prototypes I’ve written before, the code has to work. Our customers rely on the API server reliably storing their requests, and we need to recover from every type of possible failure. The biggest challenge for me was pushing the server from working 99.9% of the time to 99.99% of the time, because those last few bugs were especially hard to find.

I’ve learned a lot about how to scale a real service in the couple of weeks I’ve been here. I went into Mixpanel thinking Erlang was a really cool and fast language, but after spending a significant amount of time sorting through a real implementation, I understand how important code clarity and maintainability is. Scalability is as much about being able to think through your code as it is about systems-level optimizations.

If you’re itching to learn more about real-world scaling problems, then you should work here. Mixpanel offers the unique combination of a fun startup environment with the scaling challenges that companies like Facebook and Twitter face every day.

[1] The other popular one is gevent. Avery wrote about some of the tradeoffs earlier: http://code.mixpanel.com/2010/10/29/gevent-the-good-the-bad-the-ugly/

29 thoughts on “How and Why We Switched from Erlang to Python

  1. If you’re concerned about performance, have you considered deploying your Python code with PyPy rather than CPython? It can run most programs faster, some rather significantly faster, and it’s quite reliable as well. We’re running it over at twistedmatrix.com for our primary DNS and we’ve had no problems.

  2. @Glyph: While there is a stackless version of PyPy (and therefore I’m guessing eventlet either works as-is or could be ported without too much trouble), it does not currently work with the JIT compiler. I’m afraid this means you’d be losing most if not all of the benefits of using PyPy (especially if mixpanel uses libraries with excellent C extensions like psycopg2, simplejson, or Thrift).

  3. Name-dropping Erlang string performance without doing benchmarks is kind of irresponsible. Besides, you should be using the mochijson2 which handles strings as binaries which is a lot faster.

  4. Your comments on the need for code clarity and maintainability extends to any language. The problem you had seems to be, simply stated, that you didn’t have access to good erlang programmers.

  5. “Finally, we use a few stateful, global data structures to track incoming requests and funnel them off to the right backend queues. In Erlang, the right way to do this is to spawn off a separate set of actors to manage each data structure and message pass with them to save and retrieve data.”

    Nope, that’s not the right way. The way you were doing it ended up making all calls sequential and bound to single processes that could lose state. That’s not right.

    The best way to do it would have been to use ETS tables (which can be optimized either for parallel reads or writes), which also allows destructive updates, in order to have the best performance and memory usage possible. Note that you could then have had memory-only Mnesia table (adding transactions, sharding and distribution on top of ETS) to do it.

    As for string performances, I’m wondering if you used lists-as-strings, binary strings or io-lists to do your things. This can have significant impact in performance and memory use.

    Then again, if you had a bunch of Python and no Erlang experts, I can’t really say anything truly convincing against a language switch. Go for what your team feels good with.

  6. From the benchmarks I’ve seen it seems like gevent typically outperforms eventlet. Given the similarities between the two I’m curious why eventlet over gevent?

  7. Why We Switched from Erlang to Python?

    No one on our team is an Erlang expert, and we have had trouble debugging downtime and performance problems.

  8. Hey Ankur,

    I’m disappointed that you guys didn’t stick with Erlang. I think you’d be VERY pleased with its performance if you had stuck with it. I’ve only recently been introduced to Erlang and have been using it over the last year professionally. We have been able to solve so many problems with Erlang that would have been nearly impossible to solve with any other language. Performance was one of the biggest things. Without going into proprietary details, at work, we are doing on one Mac Mini with Erlang what takes our competition 78 Windows servers to do.

    Before I go any farther, congratulations on inspiring me to write code in my free time. That doesn’t happen often. 🙂

    I read your problem description and was curious how Erlang, written by someone who has some professional experience with it in a performance-oriented world, would do in comparison to the Python results. I saw in your requirements that the intention was to hand the JSON object off to another process through RabbitMQ, but I didn’t understand why when all it seemed like all you wanted to do was log the event. So, I just wrote it to MongoDB instead. From the testing I’ve done at work, it’s faster to offload the message to Rabbit than it is to write it to Mongo. So, if anything, writing it to Mongo made my results worse.

    I ran the sample data you have in a single “thread” (called a light-weight process in Erlang). I used mochiweb to receive the request, parse out the “data” value from the query string, base64 decode it, JSON decode it, and write the JSON to Mongo DB. (BTW, doing the base64 and JSON decoding together took 63 microseconds.) With a single process, mind you, it was handling 4,863 messages per second for the 1,000,000 messages I ran. In that test, I was sending, receiving, processing, writing to the DB, responding, and receiving the response in the same Erlang VM on my laptop.

    I then wondered how much of that time was taken up by writing to the DB, so I removed that part of the test and re-ran it. That got me 6,700 messages per second.

    Needless to say, at over 5 times the performance of Python, it might be worth your company’s money to send you to an Erlang class or for you to invest the time in learning it yourself. I know it’s super weird when you first get into it, but you’ll definitely grow to love it for problems like this. Like I said, I’m writing code in my free time, and that says a lot! 🙂 Anyway, I just had to stretch Erlang’s legs in this context and see how it would do.

    Jeremy Hood
    jdhood1@gmail.com

  9. Few points.

    #1. Having no expert on staff that major application X is written in, is absolutely silly, and it basically nullifies all your other points about maintainability, speed and downtime. The same would happen in Ruby or C++ or YYY. Erlang is generally considered exceptionally maintainable due to low number of lines of code, functional idioms and tons of real world reuse.

    #2. The only major (valid?) point you had on performance seemed to be based around a single implementation (mochijson) of a JSON library and more generally strings. I would advise you to look at Jiffy (NIF C based JSON encoder / decoder) and binaries (general Erlang language feature). Most of the time you don’t need to muck with strings too much, just pass ’em around or compare them to something.

    #3. The other performance issue — as you mentioned, was a haphazard implementation done by non-experts, I think you can hardly hold this against Erlang, anymore than you could hold a horrible C++ implementation of X against C++.

    … almost seems amazing that you ran a production app in an language you don’t have experts in for two years at high scale… I would give the language bonus points for that, triple-dog-bonus-points.

    Erlang as a functional language can seem daunting at first, but is exceptionally maintainable on very large scales, but it has to have a proper architecture and have experts on staff.

  10. So… How does this compare to the Erlang code? Did you benchmark it? Did you move global O(1) state into ets tables in Erlang?

    All I can reliably understand from this article is “we didn’t understand what we were doing in language A so we rewrote it in language B” which is sadly not very helpful outside your own particular environment.

    TAKE: I have developed and maintain servers in both Erlang and Python! I like them both, for different things.

  11. Pingback: Sysadmin Sunday #42 « Boxed Ice Blog

  12. Do you have similar benchmarks for the Erlang based API? You’ve made some specific claims about performance in Erlang, but no data 🙁

    Erlang certainly faces performance issues, but Python does as well. If you’re using a C-based JSON library in Python, you’d want to use a C-based JSON library in Erlang. To be fair and balanced.

    And boy I too have seen some messy, hard to maintain code in functional languages.

    Strangely though I’ve also seen messy, hard to maintain code in imperative languages.

  13. Pingback: Länksprutning – 8 August 2011 – Månhus

  14. Interesting post. Could MixPanel open-source this Erlang code, since they are no longer using it? Except for the understandable criterion of sticking with languages and platforms that the staff know well, there may be nothing wrong with Erlang in this case that couldn’t be fixed with better structure, better inline documentation, and the use of a high-performance C-coded Erlang NIF for JSON parsing, like Jiffy – https://github.com/davisp/jiffy

    Native JSON handling has been an Erlang Extension Proposal for quite while now, who knows when it will happen though.

  15. Pingback: » links for 2011-08-08 (Dhananjay Nene)

  16. Hi,
    Could you post the stats for running the same bench mark on the erlang server. If possible some details on the issues you faced with erlang solved by python. Theoretically erlang should scale better, I dont have much practical experience with erlang to know how this translates to a real world senario .

  17. Pingback: How and Why Mixpanel Switched from Erlang to Python | بهترین ها|behtarinha|مطالب جالب|دانلود نرم افزار|دانلود بازی|اس ام اس|جک|خفن|فال|همه چیز

  18. Pingback: How and Why Mixpanel Switched from Erlang to Python

  19. Pingback: How and Why Mixpanel Switched from Erlang to Python | Kaiusee.com

  20. Pingback: How and Why Mixpanel Switched from Erlang to Python | TechDiem.com

  21. Pingback: Seeking Scalablity Part 1: Resources - webJABr

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.