this post was submitted on 21 May 2026
14 points (100.0% liked)

Linux

65398 readers
780 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 7 years ago
MODERATORS
 

This is a follow up to: https://barthalion.blog/flathub-internals-cdn/

The blog was not loading for me earlier, so here's a copy:

It's probably not your fault.

On a cache miss, there are two things a reverse proxy (which Fastly is to us) can do. It can make the client wait until the proxy itself fetches the requested content and then serve it, with subsequent requests being served from the cache. From a user's perspective, it means staring at "hung" process, and people tend not to be understanding when a program is stuck seemingly doing nothing.

Instead, the proxy can stream the response from the origin, caching it at the end. This makes the client receive the data right away, although it's not without drawbacks.

In a streaming setup like Flathub's, an all-MISS path adds some upstream latency before the first byte, but also limits the download speed to what the slowest link can deliver. As we don't run servers in the same datacenter or on a single backbone network, the hop from Fastly through the caching proxy to the master server incurs a penalty that may affect how quickly the data gets back.

In order to cache files larger than 20MB, Fastly expects customers who use streaming misses to use segmented caching. Anything larger than that gets broken down into smaller chunks. When Fastly wants the data from us, it will add a Range header specifying which bytes we should respond with. Fastly will then serve the request after reconstructing the file from various chunks. Our caching proxies also use the value of the Range header in the caching key to avoid requesting the full file over and over again from the master server as well.

While great for caching, many concurrent range MISSes can turn what would be a sequential file read into scattered, random reads. It wouldn't matter with SSD or NVMe, but as the repository is stored on HDDs, when combined with streaming misses, it can turn cold transfer speed into min(network bottleneck, ZFS random-read bottleneck).

Counterintuitively, you may improve your download speeds by aborting the ongoing Flatpak operation and starting it again. While the initial request was slow, there's a non-zero chance it went through all the caching layers and it will become a cache hit in the meantime. Flatpak

Let's talk Flatpak. When installing or upgrading applications, Flatpak will try to use delta files. A typical delta is an update file that contains only the difference between versions. There are also from-scratch deltas, which effectively are an archive with all files required to install an app from scratch, thus the name.

Flathub generates a single upgrade delta and a from-scratch delta for the latest version. Delta generation is an expensive process in terms of disk reads and writes, but also disk space. Because our ZFS setup isn't exactly the fastest, generating more delta files also affects how quickly we can publish an update. Yes, in theory we could be doing this out of band but we don't. In hindsight, Titanic wasn't unsinkable after all.

What happens if you are not updating often enough? A lot of suffering. Flatpak will download each missing file between the version you are on and the one you want to upgrade to, separately. This is an almost certain cache miss causing even more random seeks on the master server. At this point Flatpak would be better off downloading the from-scratch delta but it can't. The behaviour is controlled by OSTree, which doesn't offer any knobs to affect it. It is the right choice if the goal is to limit the bandwidth used by the client to fetch updates, but an incredibly bad one for anyone on a reliable connection; downloading a single large file is almost always faster than fetching multiple smaller ones.

What do? Some brave soul could fix OSTree to apply a better heuristic on when to use from-scratch deltas for upgrades, or at least make it expose an API that lets Flatpak choose. For the rest of us mere mortals, we can only update regularly or wait patiently for the update to finish.

you are viewing a single comment's thread
view the rest of the comments