In Pycon in 2014, Guido van Rossum, the creator of Python and, at the time, the Benevolent Dictator For Life of the language, stood on stage in a shirt that had a large 2.8 written on it in block letters, with a big red no entry sign through it. “It’s time to move on to Python 3,” he said, telling the audience that they should start adopting the new version of the language into their workflows.
After many years of hard work towards that goal from the core committers, and surrounding community of libraries, Python 2 is finally at end of life. January 1, 2020, according to pythonclock.org, is the drop-dead date for support of Python 2.
For some companies who have already made the change years ago, it won’t be an issue. However, there’s a whole range of companies who won’t be making the change anytime soon, for a number of reasons.
What does this change mean for companies heavily utilizing the language, particularly those who may not be ready to migrate? To understand the entire context of what’s going on, let’s take a stroll back through Python history.
A brief history of Python
The idea behind developing Python 3 was to implement a single big change that got rid of a legacy problem in Python: rendering all strings as Unicode behind the scenes. As Brett Cannon, one of the core developers of Python, writes,
People sometimes forget how old Python is; Guido started coding Python in December 1989 and was first released as open source in February 1991. This means that Python itself predates the first volume of the Unicode standard which came out in October 1991. Over the intervening years, languages created after Unicode standardized chose to base their implementation for strings on encodings that could support Unicode.
….
Supporting Unicode and text from any written language is important. Python is a language for the world, not just for those languages that support the Roman alphabet that ASCII covers. This is why Python 3 makes it “Unicode or bust” when it comes to text; it guarantees that all Python 3 code will support everyone in the world whether the developer who wrote the code explicitly meant for it to or not.
Unfortunately, the team assumed that everyone would make the big switchover immediately, and made Python 3 backwards incompatible, and set 2 as a maintenance branch. However, many people didn’t want to switch, because, as the PEP for the improvement said, Python 3 was “a relatively mild improvement on Python 2.” Many people didn’t switch for what they perceived to be as mostly an inconvenience. At that time, the largest difference was changing of the print statement to Python function syntax, which broke a lot of code.
As a result, Python 2 continued to be in active development. In 2019, though, Python 3 has finally (mostly) become the default version of the language for new Python development, and many companies and projects are using the top features of Python 3: f-strings, Path, type hints, asyncio, and, of course, Unicode rendering.
A slow adoption process
It’s been a long road to get to Python 3 adoption since the new major version was announced all the way back in 2008. Dustin shows just how long adoption has taken:
At first, there were a lot of good reasons for not adopting Python 3: Most importantly, it wasn’t backwards compatible with Python 2. As a result, major libraries were hesitant to move to the platform, and in a self-fulfilling prophecy, and it was hard to port the code with a lack of supporting tools (eventually solved with things like 2to3 and six).
The tipping point for conversion occurred somewhere around 2016 or so with the release of Python 3.5, which featured matrix multiplication, the introduction of asyncio, speed improvements to OrderedDict, and an implementation of type hints that brought some static language-like features to Python.
Later versions include even more features, like the Pathlib library and f-string manipulation. With these changes, many libraries that people use (like scikit-learn for machine learning) started their migrations to Python 3.
As more and more dependencies started upgrading, companies started moving, too.
So now that we’re close to the end, what does the cutoff of Python 2 from development mean for the ecosystem of developers dependent on it?
Judging by the state of things on the internet, you’d have thought that everyone completed their migrations. In a survey from Jetbrains, who makes IDEs like IntelliJ and PyCharm, 75% of individual respondents had indicated that they’ve already migrated. A flurry of blog posts have shown the same. For example, Dropbox detailed their migration in the fall of 2018. Instagram migrated in 2017. Facebook started in 2014. Splunk, at the urging of their customers, also did so recently.
However, just because Python 2 is reaching end of life doesn’t mean companies will stop using it overnight. How do we know there’s still significant energy being invested into Python 2? We can check out what’s going on directly with PyPI, the Python package library. In 2016, the core developers behind PyPI started sending logs to Google’s BigQuery, for the ability to run SQL against them, which makes it much easier to make architectural decisions based on usage.
For example, if you want to see which libraries have been downloaded, by Python version, over the last 30 days, you can create a new project in BigQuery (the first 1 TB queried per month is free), and run:
SELECT
REGEXP_EXTRACT(details.python, r"^([^\.]+\.[^\.]+)") as python_version,
COUNT(*) as download_count,
FROM
TABLE_DATE_RANGE(
[the-psf:pypi.downloads],
DATE_ADD(CURRENT_TIMESTAMP(), -31, "day"),
DATE_ADD(CURRENT_TIMESTAMP(), -1, "day")
)
GROUP BY
python_version,
ORDER BY
download_count DESC
LIMIT 100
Even though Python 3 has been the dominant version in the community for at least a year, the latest count of individual package downloads from PyPI shows that at least 40% of all packages for September of 2019 are 2.7 downloads. Granted, this is down from 60% at the beginning of the year, but is still significant given that EOL is only several months away.
On a per-library basis, it gets a little trickier: Most Flask downloads are completed using the Python 3 version, but only 26% of botocore downloads (the AWS SDK for Python) are using Python 3.
And, there are several libraries that are going to hold off with the migration: Twisted, a web framework, which has only partially been ported, and PyPy, a frequently-used JIT compiler, which will keep version 2 around indefinitely.
End of life for any given piece of software usually doesn’t mean that software is no longer available. It does mean that it’s no longer updated against any security vulnerabilities or adding any further bug fixes. The Python PEP regulating the end of life (the language spec) specifies that,
This declaration does not guarantee that bugfix releases will be made on a regular basis, but it should enable volunteers who want to contribute bugfixes for Python 2.7 and it should satisfy vendors who still have to support Python 2 for years to come.
But, there are a lot of risks with not updating to Python 3—most importantly, the risk of losing security updates, not taking advantage of new features like type hints, and speed gains.
Why the adoption rate is so slow
So why aren’t we at a higher adoption rate this close to the deadline?
In a tongue-in-cheek post, I wrote that IT runs on Java 8.(which is ancient by today’s standards)
Java 8 is still the dominant development environment, according to the JVM ecosystem report of 2018.
This holds the answer: most large organizations, outside of the hype cycle of technical news posts, move much more slowly than the press or blogs would have you think. Most major banks are still running some variation of FORTRAN and COBOL under the covers, for example.
So while many companies are outlining their migration strategies, just the same or larger amount will stay on Python 2 for a long time. Why is this the case?
In reading the accounts of people who have already migrated, it’s easy to see: migrating codebases takes a long time, is a highly political decision, and experiences inertia, even in the companies that are the highest tech, with the best intentions.
For example, in order to move to Python 3 at Facebook, Jason Fried started by rewriting a service in 2014. Along the way, he made a lot of mistakes, changed a lot of code, and did a lot of finagling to make it known that people were moving to Python 3 at Facebook by doing things like including himself in on new developer trainings. He then teamed up with Łukasz Langa, who had done the Instagram conversion to Python 3:
In 2016, he and Langa formed a brand new team in Facebook to shepherd Python within the company, which they dubbed “The Ministry of Silly Walks.” Because they were “the Python team,” the “perceived authority” he mentioned earlier worked; people assumed they could make decisions about Python at Facebook.
Instagram’s move itself took 10 months. Dropbox, where Guido and Langa now work, took three years, and as of Guido’s retirement several weeks ago, is still in progress. Granted, all of these are enormous codebases, but you have to wonder: if it takes that long with the top people in Python working on it, how long would it take for a regular company, maybe one where Python is not even the primary language?
In all the cases, politics played just as important of a role as technical direction.
Second, security concerns are a problem. Ironically, you would assume that not upgrading would be the bigger risk, but in larger organizations, many people are not allowed to upgrade Python by themselves: the admin or security team pushes updates to them. In some cases, PIP downloads are also not allowed. If Python 2 is the default agreed-upon by the security team, it can take a monumental effort to convince people to make the switch to 3, particularly in settings that are heavily regulated (such as healthcare or finance), and government.
This brings us to the third reason: inertia. Although many versions of Linux, such as RHEL, for example, are including Python 3 alongside Python 2, it is by no means the default, and in switching between 2 and 3, some bugs are constantly being found, especially with pointers to system versions of Python, for example, at Debian.
Python’s been through a long path to move from 2 to 3, and individuals and forward-looking startups have adopted it. Now the second great migration will occur when large enterprises start their migrations away from 2. With regards to Python 2, we’ll see that 40% number shrink further in 2020, but the changes will be incremental, and there will be companies running Python 2.7 well into the future.
We have something fun for ya. Our latest podcast episode is out! You can check out all our episodes here.
Good analysis, but you only just touched on the other major way python is installed – from OS distributions.
PyPi won’t have any data on how many times apt or rpm install a python2 package vs. python3. All the major distributions current LTS versions include python2, and will continue to be supported next year.
Ehm, mercurial? (I am not sure whether they have finally at least passing test suite)
Just wanted to quickly let you know that Łukasz Langa does not work at Dropbox, nor has he ever to my knowledge.
I think you didn’t mean to refer to Jason Fried in this article (wrt the Facebook migration.)
s/Jason Fried/Jake Edge
My organisation has no plans to rewrite internal tools as doing so will generates no “alpha” (income) at best and at worst will most likely create new bugs.
One thing to add: The download statistics for Python 2 are not representative in the sense that modern pip versions employ much heavier local caching. As such, Python 2 downloads are overrepresented relative to their use.
Ah, here we go again
The Debian issue you link is a non-issue. It was talking about packages relying on Python 3 packages which are non-existent. The people working for Debian packages are normal human being, they couldn’t do everything given such short period of time.
For adoption, all companies contribute the whole team to open source in the Linux server space for necessary packages. Problem solved.
If the package is not present at the OS level, compile it yourself. People are busy solving important problems.
You want to talk about security issue? That’s funny. You can code everything in security in mind by yourself too.
Do the logs for these PyPI statistics include information about where they were downloaded from?
It might be interesting to know how much of that 40% of 2.7 downloads are for automated tests (a good indicator of which would be the download client being a Gitlab/Travis CI/etc server). I certainly leave 2.7 tests around longer than I probably should in code bases I work on. If everyone else does that too, would that skew the figure substantially?
Just a small data point here: when I write a new script at work (even if it’s just for in-house stuff), I choose a language and version that is supported on all platforms our software currently runs on. That’s RHEL 6, 7, 8. RHEL 6 doesn’t even have Python 3 in its official repos. RHEL 7 doesn’t install it by default. So if I ever want to use that script on a RHEL 6 or 7 system, I would face a battle with Python 3 that I don’t face with Python 2. And since Python 2 is good enough for most tasks, there’s no incentive to move (except that Python 2 goes EOL).
I guess the general lesson is: if you want your language to spread, make sure it’s available in the official repos of the main distros. That’s the main way into large, slow-moving companies.
Sounds like JavaScript and ECMAscript 2015. In particular, modules, which are incompletely defined (no “registry”) and ultimately distort the devs to use weird workflow like WebPack to bundle everything.