this post was submitted on 03 Apr 2024
36 points (92.9% liked)

Linux

48338 readers
475 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

While the exact details of this vulnerability are still investigated (see here if you want to catch up on the topic), I wanted to share some of the thoughts I had regarding to what this incident means for the wider open source ecosystem.

TL;DR: To summarize, these are the main points I found remarkable in this entire development:

  • A backdoor was snuck relatively openly into an open source project
  • It was done by a somewhat trusted maintainer
  • The target was not even xz itself, but rather sshd through an obscure chain of dependencies
  • Luckily, it was discovered within a few weeks before the backdoored version was widely adopted

Obviously, there are many examples of security vulnerabilities occurring in open source software. But these are usually due to oversights or mistakes of most likely well-meaning developers that end up enabling the possibility for critical exploits. In the case of the xz backdoor however, it was obviously constructed with malicious intent and high effort towards a precise target. Does anybody know of another vulnerability ending up in a high-profile open source project that is similar in that sense?

This was only possible because the malicious actor under the pseudonym Jia Tan had direct write access to the xz repository as a maintainer. I don't think it is too unreasonable that with enough time and effort, anyone can get maintenance access to openly developed projects like xz. That is part of the beauty of the democratic process in open source. But what this incident shows is that for projects that are as widely used as xz, even changes coming from seemingly trusted maintainers should be properly reviewed. I don't mean to say that the original maintainer Lasse Collin has any fault in this matter, or that he should have prevented it, this is too much of a burden to expect from a single person. Instead I think the large tech corporations should put more resources into vetting these kind of open source projects that much of their infrastructure so heavily relies on (in fact, this backdoor seems to mainly target servers).

Even just looking at the source code, the backdoor was very cleverly hidden in testing binaries for the compression algorithm. These things are always easy to say in hindsight, but I do believe that a closer review of the build system shenanigans used to install the backdoor would have at least raised some questions. There was just too much luck involved in the discovery of the backdoor with someone noticing ssh access taking 0.5 seconds longer than usual.

This isn't really news, but this incident again shows that just like a chain is only as strong as its weakest link, a program is only as strong as its weakest dependency. The fact that the backdoor just hooks into the dynamic library loading process and completely hijacks authorization functions of ssh from inside xz is pretty scary. Maybe this will encourage developers to be more careful and sparing with adding dependencies. However to be honest, up until recently I would have pretty blindly trusted xz to be a very safe dependency due to its popularity and relatively simple use-case.

By opening a backdoor into ssh servers, this is a very critical issue, and there was clearly a lot of time and effort put into making it seem innocuous and hard to detect. I'm very glad that it got found and patched by the time it did, but it does leave me wondering what else is out there. It would be illusionary to think that such attack vectors always get found out eventually.

you are viewing a single comment's thread
view the rest of the comments
[–] chameleon@kbin.social 17 points 7 months ago (4 children)

These things are always easy to say in hindsight, but I do believe that a closer review of the build system shenanigans used to install the backdoor would have at least raised some questions.

Nobody noticed it because nobody is reviewing autotools spaghetti and especially not autotools spaghetti that only exists as shipped in a tarball. Minor differences in those files are perfectly normal as the contents of them are copied in from the shared autoconf-archive project, but every distro ships a different version of that, so what any given thing looks like will depend on the maintainer's computer. And nearly nobody has a good understanding of what any given line in a .m4 file is going to ultimately lead to the execution of regardless, so why bother investigating any differences? The maintainer of Meson has a good take on this.

Shipping tarballs without any form of generated files and having a process to validate release tarballs against the repo would be a good step, but is much easier said than done for a variety of reasons. Same thing can be said for shipping without any form of binary files in the repo, there's quite high value in integration tests and xz's README for the test blobs has correctly included this paragraph for 16 years:

Many of the files have been created by hand with a hex editor, thus there is no better "source code" than the files themselves.

[–] Corngood@lemmy.ml 2 points 7 months ago (2 children)

Many of the files have been created by hand with a hex editor, thus there is no better “source code” than the files themselves.

I don't buy that. There would have been some rationale behind the contents that could be automated, like "compressed file with bytes 3-7 in the header zeroed".

You also probably don't need these test files to be available in the environment where the library itself is built. There are various ways you could avoid that.

I do agree about the autotools stuff though.

Minor differences in those files are perfectly normal as the contents of them are copied in from the shared autoconf-archive project, but every distro ships a different version of that, so what any given thing looks like will depend on the maintainer’s computer.

This seems avoidable. We shouldn't be copying code around like that.

[–] chameleon@kbin.social 3 points 7 months ago (1 children)

Test files often represent states that can't be represented in the library proper. Things like "a tree where node A is a child of B and node B is a child of A", "the previous instruction repeated x times" where x was never set or there was no previous instruction, or weird combinations of mutually exclusive effects. More often than not, you can't really generate those using the library itself, as libraries tend to be written to reject those kinds of invalid states (there's only so much you can do in C but in functional programming land, "make invalid states unrepresentable" is a straight up mantra).

Even if you did manage to do that, using the system under test to generate test data for the system under test is generally not very useful by itself; you'd need some kind of extra protections on top to make sure the actual test files continue to be identical between revisions (like hashing them). Otherwise, a major incompatibility could be easily overlooked. But that also makes it hard to make any kind of valid changes to the library at all. Worse yet, some libraries don't implement everything needed to generate the test files: even xz is missing pieces, for example there's an lzip decompressor but not a compressor.

There's some arguments to be made for separating the test system from the main distribution, but the end result will likely be that nobody runs the testsuite at all. It's difficult enough to get distros to do it in the first place.

[–] Corngood@lemmy.ml 1 points 7 months ago

Yeah, that's fair. If you want to test that you can still decompress something compressed with some random old version, you either need to keep the old algorithm around, or the data.

load more comments (1 replies)