Lately I’ve been doing lots of research that deals with dissection and analysis of network packets. Granted, we all know that there’s one true answer for that – and that is Wireshark. However, in my case, I seriously want to consider alternatives:
- Majority of protocols I work on (beyond Ethernet, IPv4 and TCP/UDP) is binary and proprietary – obviously, Wireshark knows nothing about them. Writing your own dissectors for Wireshark is an option, but it’s not for the faint of heart. Your best bet would be coding them in Lua, but even there you’ll end up writing pretty cryptic code dealing with lots of Wireshark internals.
- Wireshark functionality is somewhat lacking for me. Sometimes I just want to access protocol fields programmatically from a normal, popular programming language – ideally Python, as 99% of our software is in Python.
- I need to process tons of traffic (think gigabytes), so I’d like it to be as fast as possible. Wireshark dissectors written in Lua are slow and, what’s even worse, very memory-hungry.
So, it boils down to the wonderful world of packet dissector frameworks for Python. Thankfully, Python’s vivid and living infrastructure offers us quite a few of them. So, let’s try them – it’s not like anyone would want to re-implement and maintain all that stuff.
Before choosing a tool for the job, I’ve decided to run a few benchmarks on them to test their raw speed.
My benchmark consists of parsing Ethernet frames (and all inner layers – IPv4, TCP, etc). For sake of simplicity and consistency, I’ll load single Ethernet frame in memory from a file, run parsing of it for zillion times and measure packet per second parsed rate. To make it fair (if some parser uses lazy parsing), I’ll access one critical field: source IPv4 address once. This way it won’t be bound by I/O, and would just measure the very raw packet processing speed.
Thus, the overall core benchmark code looks like that:
from timeit import default_timer as timer # Load sample Ethernet frame to be used for parsing with open("ethernet_frame.bin", "rb") as fh: buf = fh.read() TIMES = 10000 t1 = timer() for _ in range(TIMES): # parse Ethernet frame here # access source IPv4 address field here pass t2 = timer() pps = TIMES / (t2 - t1) print("pps = %f" % (pps))
I’ve deliberately chosen to test parser code written by framework maintainers only, as I trust them to write most optimal, best written code for particular framework than I might hope to achieve in foreseeable future.
All tests were done on the same hardware and OS, so generally it doesn’t matter what is that, but I’ll mention it anyway: it’s ThinkPad T460 laptop, sporting i5-6200U, 16 GB of RAM and running Linux Ubuntu 16.04 LTS. My production environment is close to this one, mostly consisting of Amazon’s EC2 C3-C4 large/xlarge instances running the same 16.04 LTS.
Let’s go 🙂
Scapy
Scapy is one of the oldest and well-known network packet library for Python (developed since ~2002). Its functionality stretches a bit beyond what I need: it can also create packets, send, receive and capture them over the ‘net, but I’m interested now in one particular part: packet dissection.
Installing Scapy is a breeze: pip install scapy
does the trick. If you want command-line tools, you’ll need a little extra, but it my case I’m totally ok with the libraries only.
Parsing code in Scapy that we’re going to benchmark is very simple:
from scapy.all import Ether # ... pkt = Ether(buf) dummy = pkt.getlayer(1).src
Running it on my notebook yields:
- 2,763 pps on average in Python 2.7
- 3,337 pps on average in Python 3.5.1 (kudos, that’s 20% increase!)
Construct
Construct is also a well-known and mature Pythonic framework. Most people think of it as “struct-on-steroids”, and that’s partially true. However, Construct offers tons more features, like conditional parsing, repeated fields, tunneling, lazy parsing, etc, etc.
As soon as you’ll get ipstack.py, the code to benchmark is simple:
from ipstack import * # ... pkt = ip_stack.parse(buf) dummy = pkt.next.header.source
Benchmark results are kinda disappointing:
- 1,486 pps on average in Python 2.7
- 1,420 pps on average in Python 3.5.1
Python 3 performance drop is really odd. When upgrading 2 → 3 I’m mostly expecting 10…20% performance gain, but in this case it’s even a slight loss. I’ve triple-checked my benchmarking process, and re-ran it like a dozen of times, but that’s it, I’m stable at getting slightly lower results in P3.
On the upbeat, I’d like to praise Construct documentation. It’s concise, well written, and is overall a good example of what a decent documentation should look like.
Hachoir
Hachoir is a French word for meat mincer, and it was written by French Red Hat CPython engineers. It offers an ambitious introduction, and boasts a huge library of ready-made file formats and packet networks. There’s a large set of tools around Hachoir: hachoir-metadata, hachoir-uwid, hachoir-grep, hachoir-strip, etc. Also, given that it’s written by CPython engineers, I’ve expected top-notch performance.
However, the reality is turns out to be much cruel than the introduction. Documentation is, well, lacking, to put it mildly. Readthedocs.io page might seem like a huge user manual, but in reality, it’s like a dozen of paragraphs of text there, and that’s it. Most of the docs are related to command-line end-user tools, not a framework for fellow developers.
Also, Hachoir makes a hard distinction between “parser” and “fields”, so you can’t just easily call an inner layer parser inside the file format parser. Hachoir developers supply a .pcap file parser (called “hachoir.parser.network.tcpdump”), so I had to modify core benchmark to accomodate that:
stream = FileInputStream("%s/pcap_http.dat" % DATA_DIR) r = TcpdumpFile(stream) for i in range(TIMES): pkt = r['packet[%d]' % (i)] dummy = pkt['ipv4/src']
API is prety ugly and text-based. You have to use [] operator with some internal Hachoir path addressing language to address particular fields in the tree of objects. I haven’t found a way to do that without messy string construction.
Surprisingly, performance is pretty good, on par with Scapy: I get 2794 packets per second on average using Python 3. Looks like RedHat CPython engineers totally know their trade 🙂
Unfortunately (or fortunately?), Hachoir 3a doesn’t seem to work on Python 2, resulting in an Unicode error:
File "lib/python2.7/site-packages/hachoir/stream/__init__.py", line 7, in from hachoir.stream.input_helper import FileInputStream, guessStreamCharset # noqa File "lib/python2.7/site-packages/hachoir/stream/input_helper.py", line 1, in from hachoir.core.i18n import guessBytesCharset File "lib/python2.7/site-packages/hachoir/core/i18n.py", line 88, in (set("©®éêè\xE0ç".encode("ISO-8859-1")), "ISO-8859-1"), UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
Kaitai Struct
Kaitai Struct is that new kid on the block that I keep hearing of lately. That’s a mysterious new project built by some Russian and Hungarian developers that aims to be like an universal binary parsing framework for everything. Here’s my chance to test it.
From the outside, it is similar to frameworks I’ve tested so far, but it really sports a huge difference. In fact, it’s not Python framework (although it includes a bit of Python runtime code, it’s relatively small), but a distinct domain-specific language with its own compiler that gets packet specification for input and gives you a Python parser code for output. Actually, it can output tons of other languages that one might find helpful: C++, Java, JavaScript, Ruby, Perl, PHP, C#. This gives one an unique edge: one can develop parser using Kaitai Struct language and build a prototype with, say, Python, and if you’ll find yourself in need of better performance, you can switch to C++ relatively easy, retaining the same protocol specifications you’ve spent so much time developing. That’s neat!
The learning curve is somewhat steep: first of all, you need to download and install the compiler to actually do any .ksy -> .py compilation. The project is still new, so no chance that you’ll get that by simple apt-get install
from Ubuntu repos, you’ll have to plug in Kaitai Struct’s own repository and install .deb from there. That, in turn, would require you to pull their auth key first. Not exactly the rocket science, so, generally, just stick to the installation instructions in their “Download” section, and you’ll be fine.
Then, one needs to download relevant .ksy and run the compiler. That’s easy:
ksc -t python pcap.ksy
Voila, you’ve got pcap.py. But then, to be able to run it, you’ll also need the Kaitai Struct runtime. That one is installed as regular Python library:
pip install kaitaistruct
Fortunately, if you’re smart, you can opt to skip all that nuissance (well, except for runtime) and just go to [https://kt.pe/kaitai_struct_webide/](Kaitai Web IDE project). That’s yet another piece of awesomeness that I’ve encountered:
It’s a web-based application (and it runs purely on a client, so no server needed), which sports a compiler and a visualizer that can apply compilation result to the binary dump you’ll put into it right away! It’s very similar to what advanced commerical hex editors like 010 Editor offer you, but:
- it’s free
- it’s web-based
- it can generate parsers in language of one’s choice, i.e. Python!
Using Kaitai Struct’s generated library is also not super straightforward and reminded me slightly of Java’s IO libraries:
from ethernet_frame import * # first, we wrap our byte array into an IO stream using BytesIO io = BytesIO(buf) # next, we wrap regular pythonic IO into special KaitaiStream IO ksio = KaitaiStream(io) # finally, we run the parser and get the parser packet pkt = EthernetFrame(ksio) # accessing the field is more or less the same as with competing frameworks dummy = pkt.ipv4_body.src_ip_addr
Kaitai Struct authors declared that their product is “fast”, because “it’s a compiler”, but it totally blew my mind when I’ve seen the actual result:
- 32,816 pps on average in Python 2.7
- 31,925 pps on average in Python 3.5.1
Wow. Just wow. It’s like more than 10x improvement among the fastest competitor so far, i.e. Scapy. And it’s still 100% pure Python code, no natives, no C pieces compilation, no other tricks up the sleeve. Also note that again, Python 3 performance is slightly lower, to my disappointment (as I’m a huge proponent of moving to P3 everywhere where I can).
To be fair, I’d like to highlight a few downsides of KS:
- Relatively complex multi-step installation (install ksc + run ksc + install runtime).
- It’s very new product, still in version 0.x.
- Documentation is not as mature as Scapy’s or excellent Construct manual (but still better than Hachoir’s).
- There is no packet generation support at all (although it seems to be planned), so it’s parsing-only so far.
Conclusion
TL;DR: Kaitai Struct beats every Python packet dissection framework by at least an order of magnitude. Second place goes to good ol’ Scapy and mysterious Hachoir à la française. However, while Hachoir performance is not the worst, using it wasn’t a really pleasant experience: I’ve got tangled in poor documentation and a weird API. Construct was the slowest of them all, and, unfortunately, I can’t recommend it.
My choice is clear: Kaitai Struct is definitely a way to go.
A few ideas for future analysis: I definitely need to try compiling the same code for C++ and try Kaitai Struct’s C++ output with the same network packet .ksy to see how much improvement I’ll be able to get by switching to closer-to-the-metal language.
Excellent article! I’d be curious to see a comparison of Katai vs Construct in Java: https://github.com/Sirtrack/construct/releases
LikeLike
Thanks! Unfortunately, I don’t really use Java, so probably someone else might want to take care of that. Besides, this project you’ve linked to seems to be kinda dead. Last commit was ~2 years ago.
LikeLike
You should test vstruct also if you have the time.
LikeLike
Good idea, thanks!
LikeLike
Great article. By any chance, do you know if all the parsers go to the same depth in the packets? If Kaitai Struct only parses the upper level structures of the IP packet, while the other parsers are trying to go much deeper (TCP and app layer), that could explain the huge difference in performance.
LikeLike
I’ve used a packet that contains only levels 2-3-4 for testing, so you can’t go much deeper than level 4 (TCP). To make sure that benchmark shows something resembling real-life performance, I’ve accessed the field to trigger lazy parsing to happen, if framework had some sort of lazy parsing.
I’m only learning Kaitai right now, but as far as I can tell, all its networking packet parsers are eager. You need to use
instances
in Kaitai to do lazy parsing.LikeLike
Hey,
Thank for your analysis. However you only compare the speed of each framework but you do not take into account several other aspects like :
– diversity of supported protocols (scapy is, I think, the king of all)
– functionalities for packet manipulation (packet crafting, automata, etc.)
– way to handle complex protocols (never tested Kaitai but Hachoir and Scapy can handle complex structures. They have their limits of course)
Cheers!
LikeLike
Very nice article. Thank you for sharing it!
LikeLiked by 1 person
Great article, it made me want to try katai but to keep scapy close just in case :). Would be nice to see if these two work with pypy, and how much faster they are in it. Just remember to warm the jit before comparing result as your benchmark currently doesn’t throw away the first iterations (because it would make no difference in cpython).
LikeLike
They should work, there’s nothing magical inside. Kaitai’s generated files are actually just tons of calls to python stdlibs’ed struct module.
Unfortunately, I don’t have any good experience with Java. Could you recommend me something to read about PyPy and JVM benchmarking? It’s actually might be super interesting, as we can also run Kaitai to generate native Java parser and thus compare PyPy vs native Java performance.
LikeLike
pypy is an alternative python interpreter with jit. It has no relation to java or the jvm. You can install it on ubuntu using:
wget https://bitbucket.org/squeaky/portable-pypy/downloads/pypy-5.6-linux_x86_64-portable.tar.bz2
tar jxf pypy-5.6-linux_x86_64-portable.tar.bz2
./pypy-5.6-linux_x86_64-portable/bin/virtualenv-pypy env
./env/bin/pip install scapy
/env/bin/pypy benchmark.py
I tried to reproduce your benchmark but I could produce a ethernet_frame.bin that wouldn’t give me this error in scapy:
Traceback (most recent call last):
File “scapy_benchmark.py”, line 13, in
dummy = pkt.getlayer(1).src
File “/home/ubuntu/env/site-packages/scapy/packet.py”, line 192, in getattr
fld,v = self.getfield_and_val(attr)
File “/home/ubuntu/env/site-packages/scapy/packet.py”, line 189, in getfield_and_val
return self.payload.getfield_and_val(attr)
File “/home/ubuntu/env/site-packages/scapy/packet.py”, line 1125, in getfield_and_val
raise AttributeError(attr)
AttributeError: src
(both in pypy and python 2.7)
a simple (and maybe innefective) way to warm up the jit would be:
from scapy.all import Ether
from timeit import default_timer as timer
def loop():
for _ in range(TIMES):
pkt = Ether(buf)
dummy = pkt.getlayer(1).src
Load sample Ethernet frame to be used for parsing
with open(“ethernet_frame.bin”, “rb”) as fh:
buf = fh.read()
TIMES = 100000
run it once for warmup
loop()
t1 = timer()
loop()
t2 = timer()
pps = TIMES / (t2 – t1)
print(“pps = %f” % (pps))
LikeLike
Thanks for a great explanation, will try it soon!
LikeLike
Very nice article…. Thanks for sharing.
LikeLike
Thank you for the article!
I worked extensively with construct and scapy. The former is, as you said, more like struct on steroids (binpy conversion), the latter oriented towards network packets processing (layers as the prominent factor).
Scapy is slow. It’s fine for offline analysis most of the time, but not suitable for real-time (at all). On the other hand adding new protocols is so easy …
Construct was only a bit faster IMO. I don’t have any benchmarks for that, though.
But, I’m little disappointed to not see the two here:
1. dpkt (https://pypi.python.org/pypi/dpkt) – it’s quite well known, should be quite fast
2. tins (libtins – http://libtins.github.io/) – it promised to be blazingly fast, but that’s C++ code and Python bridge was problematic or not functional at the time I checked (year or so ago)
I see it as a topic crying for much more elaborate analysis, including set of features comparison, benchmarks, custom format defining, ease of use, documentation, …
Best regards,
Robert
LikeLike