How we rolled out one of the largest Python 3 migrations ever

Dropbox is one of the most popular desktop applications in the world: You can install it today on Windows, macOS, and some flavors of Linux. What you may not know is that much of the application is written using Python. In fact, Drew’s very first lines of code for Dropbox were written in Python for Windows using venerable libraries such as pywin32.

Though we’ve relied on Python 2 for many years (most recently, we used Python 2.7), we began moving to Python 3 back in 2015. This transition is now complete: If you’re using Dropbox today, the application is powered by a Dropbox-customized variant of Python 3.5. This post is the first in a series that explores how we planned, executed, and rolled out one of the largest Python 3 migrations ever.

Why Python 3?

Python 3 adoption has long been a subject of debate in the Python community. This is still somewhat true, though it’s now reached widespread support, with some very popular projects such as Django dropping Python 2 support entirely. As for us, a few key factors influenced our decision to make the jump:

Exciting new features
Python 3 has seen rapid innovation. Apart from the (very) long list of general improvements (e.g. the str vs bytes rationalization), a few specific features caught our eye:

  • Type annotation syntax: Our codebase is quite large, so the ability to use type annotations has been important to developer productivity. We’re big fans of MyPy here at Dropbox, so the ability to natively support type annotations is naturally appealing to us.
  • Coroutine function syntax: We rely heavily on threading and message-passing—through variants of the Actor pattern and by using Futures—to build many of our features. The asyncio project and its async/await syntax could sometimes remove the need for callbacks, leading to cleaner code.

Aging toolchains
As Python 2 has aged, the set of toolchains initially compatible for deploying it has largely become obsolete. Due to these factors, continued use of Python 2 was associated with a growing maintenance burden:

  • The use of older compilers/runtimes was limiting our ability to upgrade some important dependencies.
    • For example, we use Qt on Windows and Linux: Recent versions of Qt require more modern compilers due to the inclusion of Chromium (via QtWebEngine).
  • As we continued to integrate deeply with the operating system, our inability to rely on more recent versions of these toolchains increased the cost of adoption for newer APIs.
    • For example, Python 2 still technically requires Visual Studio 2008. This version is no longer supported by Microsoft and is not compatible with the Windows 10 SDK.

Freezers and scripts

Initially, we relied on “freezer” scripts to create the native applications for each of our supported platforms. However, rather than use the native toolchains directly, such as Xcode for macOS, we delegated the creation of platform-compliant binaries to py2exe for Windows, py2app for macOS, and bbfreeze for Linux. This Python-focused build system was inspired by distutils: Our application was initially little more than a Python package, so we had a single setup.py-like script to build it.

Over time, our codebase became more and more heterogenous. Today, Python is no longer the only language used for development. In fact, our code now consists of a mix of TypeScript/HTML, Rust, and Python, as well as Objective-C and C++ for some specific platform integrations. To support all these components, this setup.py script—internally named build-all.py—grew to be so large and messy that it became difficult to maintain.

The tipping point came from changes to how we integrate with each operating system: First, we began introducing increasingly advanced OS extensions—like Smart Sync’s kernel components—that can’t and often shouldn’t be written in Python. Second, vendors like Microsoft and Apple began introducing new requirements for deploying applications that imposed the use of new, more sophisticated and often proprietary tools (e.g. code signing).

On macOS, for example, version 10.10 introduced a new app extension for integrating with the Finder: [FinderSync]. Not merely an API, a FinderSync extension is a full-blown application package (.appex) with custom life cycle rules (i.e. it is launched by the OS) and more stringent requirements for inter-process communication. Put another way: Xcode makes leveraging these extensions easy, while py2app does not support them altogether.

We were therefore faced with two problems:

  • Our use of Python 2 prevented us from using new toolchains, making using new APIs more costly (e.g. using the Windows Runtime on Windows 10).
  • Our use of freezer scripts made deploying native code more costly (e.g. building app extensions on macOS).

While we knew that we wanted to migrate to Python 3, this left us with a choice: invest in the freezer dependencies to add support for Python 3 (and thus the modern compilers) and platform-specific features (like app extensions), or move away from a Python-centric build system, doing away with “freezers” altogether. We chose the latter.

A note on pyinstaller: We seriously considered using it in the early stages of the project, but it did not support Python 3 at the time, and more importantly, it suffers from similar limitations as other freezers. Regardless, it is an impressive project that we simply felt didn’t suit our needs.

Embedding Python

To solve this build and deploy problem, we decided on a new architecture to embed the Python runtime in our native application. Rather than delegate this process to the freezers, we would use tooling specific to each platform (e.g. Visual Studio on Windows) to build the various entry points ourselves. Further, we would abstract Python code behind a library, aiming to more directly support the “mixing and matching” of various languages.

This would allow us to make use of each platform’s IDEs/toolchain directly (e.g. to add native targets like FinderSync on macOS) while retaining the ability to conveniently write much of our application logic in Python.

We landed on the following rough structure:

  • Native entry points: These are compatible with each platform’s application model.
    • This includes application extensions, such as COM components on Windows or app extensions on macOS.
  • Shared libraries written in multiple languages (including Python).

On the surface, the application would more closely resemble what the platform expects, while behind various libraries, teams would have more flexibility to use their choice of programming language or tooling.

This architecture’s increased modularity would also provide a key side effect: It would now be possible to deploy both a Python 2 library and a Python 3 library side by side. Tying this back to the Python 3 migration, the process would thus require two steps: first, to implement the new architecture around Python 2, and second, to use it to “swap out” Python 2 in favor of Python 3.

Step 1: “Anti-freeze”

Our first step was to stop using the freezer scripts. Both bbfreeze and pywin32 lacked Python 3 support at this stage, leaving us little choice. Starting in 2016, we began to gradually make this change.

First, we abstracted away the work of configuring the Python runtime and starting Python threads to a new library named libdropbox_bootstrap. This library would replicate some of what the freezer scripts provided. Though we no longer needed to rely on these scripts wholesale, it was still necessary to provide a minimum basis to run Python code:

Packaging our code for on-device execution
This ensures we ship compiled Python “bytecode” rather than raw Python source. Where each freezer script previously had its own on-disk format, we used this opportunity to introduce a single format for bundling our code across all platforms:

  • For Python bytecode .pyc, a single ZIP archive (e.g. python-packages-35.zip) contains all necessary Python modules.
  • For native extensions .pyd/.so, as these are platform-native DLLs, they are installed in a location that guarantees the application can load them without interference.
    • On Windows, for example, they are alongside the entry points (i.e. Dropbox.exe).
  • Packaging is implemented using the excellent modulegraph (by Ronald Oussoren of py2app and PyObjC fame).

Isolating our Python interpreter
This prevents our application from running other on-device Python source. Interestingly, Python 3 makes this type of embedding much simpler. The new [Py_SetPath] function, for example, allowed us to isolate our code without having to do some of the more complicated work of isolation the freezer scripts had to do on Python 2. To support this in Python 2, we back-ported this function to our custom fork.

Second, we introduced platform-specific entry points Dropbox.exe, Dropbox.app, and dropboxd to make use of this library. These entry points were built using each platform’s “standard” tooling: Visual Studio, Xcode, and make were used rather than distutils, allowing us to remove much of the custom patchwork imposed on the freezer scripts. For example, on Windows, this greatly simplified configuring DEP/NX for Dropbox.exe, embedding an application manifest as well as including resources.

A note on Windows: At this point, continued use of Visual Studio 2008 was becoming highly costly. To transition properly, we needed a version capable of supporting both Python 2 and 3 simultaneously, so we settled on Visual Studio 2013. To support it, we extensively altered our custom fork of Python 2 to make it properly compile using that version. The cost of these changes further reinforced our belief that moving to Python 3 was the right decision.

Step 2: Hydra

Successfully making a transition of this size (our application contains over 1 million Python LOCs) and at our scale (hundreds of millions of installs) would require a gradual process: We couldn’t simply “flip a switch” in a single release—this was especially true due to our release process, which deploys new versions to all our users every two weeks. There would have to be a way to expose a small/growing number of users to Python 3 in order to detect and fix bugs early.

To achieve this, we decided to make it possible to build Dropbox using both Python 2 and 3. This entailed:

  • The ability to ship both Python 2 and Python 3 “packages,” complete with bytecode and extensions, side by side.
  • The enforcing of a hybrid Python 2/3 syntax during the transition.

We used the embedded design introduced through the previous step to our advantage: By abstracting away Python into a library and package, we could easily introduce another variant for another version. Choosing what Python version to use could then be controlled in the entry point itself (e.g. Dropbox.app) during early initialization.

This was achieved by making the entry point manually link against libdropbox_bootstrap. On macOS and Linux, for example, we used dlopen/dlsym once a version of Python was chosen. On Windows, we used LoadLibrary and GetProcAddress.

The choice of a runtime Python interpreter needed to be made before Python was loaded, so we made it possible for it to be influenced using both a command-line argument /py3 for development purposes and a persistent on-disk setting so it could be controlled by Stormcrow, our feature-gating system.

With this in place, we were able to dynamically choose the Python version when launching the Dropbox client. This allowed us to set up additional jobs in our CI infrastructure to run unit and integration tests targeting Python 3. We also integrated automated checks to our commit queue to prevent changes from being pushed that would regress Python 3 support.

Once we had gained enough confidence through automated testing, we began rolling out Python 3 to real users. This was achieved by incrementally opting in clients through a remote feature gate. We first rolled out the change to Dropboxers, which allowed us to identify and correct a majority of the underlying issues. We later expanded this to a fraction of our Beta population—which is a lot more heterogeneous when it comes to OS versions—eventually expanding to our Stable channel: Within 7 months, all Dropbox installs were running Python 3. In order to maximize quality, we adopted a policy requiring that all bugs identified as migration-related be fully investigated and corrected before expanding the number of exposed users.

Gradual rollout on the Beta channel
Gradual rollout on the Stable channel

As of version 52, this process is complete: Python 2 has been removed altogether from Dropbox’s desktop client.

But wait, there’s more

There’s much more to tell about this process. In future posts, we’ll look at:

  • How we report crashes on Windows and macOS and use them to debug both native and Python code.
  • How we maintained a hybrid Python 2 and 3 syntax, and what tools helped.
  • Our very best bugs and stories from the Python 3 migration.

Acknowledgements

Special thanks to the Dropboxers who contributed to this project:

Aditya Jayaraman, Aisha Ferrazares, Allison Kaptur, Amandine Lee, Anaid Chacon, Angela Gong, Ben Newhouse, Benjamin Peterson, Billy Wood, Brandon Jue, Bryon Ross, Cary Yang, Case Larsen, Clarence Lee, Darsey Litzenberger, David Euresti, Denbeigh Stevens, Drew Haven, Eddy Escardo-Raffo, Elmer Le, Eric Swanson, Gautam Gupta, Geoff SongGuido van Rossum, Isaac Goldberg, John Lai, Jonathan Chien, Joshua Warner, Michael Wu, Naphat Sanguansin, Nikhil Marathe, Nipunn Koorapati, Patrick Chenglo, Peter Vilim, Rafael Tello-Cabrales, Reginald Lips, Ritu Vincent, Ryan Kwon, Samer Masterson, Sean Stephens, Stefan Vainberg, Thomas Ballinger, Tony Grue, Will Anderson

Very special thanks to a few members of the Python community:

Steve Dower (@zooba): for his work behind Python 3’s excellent support for modern versions of Windows and Visual Studio.
Ronald Oussoren (@RonaldOussoren): for his work maintaining PyObjC and his many years of contributions to Python on macOS.
Zachary Ware: for his early work in supporting VS2013 in Python 2.