this post was submitted on 21 Dec 2023
244 points (96.2% liked)
Linux
59165 readers
385 users here now
From Wikipedia, the free encyclopedia
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).
Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.
Rules
- Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
- No misinformation
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
Community icon by Alpár-Etele Méder, licensed under CC BY 3.0
founded 6 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The main difficulty I have with Rust (what prevents me from using it), is that the maintainers insist on statically compiling everything. This is fine for small programs, and even large monolithic applications that are not expected to change very often.
But for the machine learning projects I work on, I might want to include a single algorithm from a fairly large library of algorithms. The amount of memory used is not trivial, I am talking about the difference between loading a single algorithm in 50 MB of compiled code for a dynamically loadable library, versus loading the entire 1.5 GB library of algorithms of statically linked code just to use that one algorithm. Then when distributing this code to a few dozen compute nodes, that 50 MB versus 1.5 GB is suddenly a very noticeable difference.
There are other problems with statically linking everything as well, for example, if you want your application to be written in a high-level language like Python, TypeScript, or Lisp, you might want to have a library of Rust code that you can dynamically load into the Python interpreter and establish foreign function bindings to the Rust APIs. But this is not possible with statically linked code.
And as I understand, it is a difficult technical problem to solve. Apparently, in order for Rust to optimize a program and guarantee type safety and performance, it needs the type information in the source code. This type information is not normally stored into the dynamically loadable libraries (the
.so
or.dll
files), so if you dynamically load a library into a Rust program its type safety and performance guarantees go out the window. So the Rust compiler developers have chosen to make everything as statically compiled as possible.This is why I don't see Rust replacing C any time soon. A language like Zig might have a better chance than Rust because it can produce dynamically loadable libraries that are fully ABI compatible with the libraries compiled by C compilers.
You can load Rust into Python just fine. In fact, several packages have started requiring a Rust compiler on platforms thst don't get prebuilt binaries. It's why I installed Rust on my phone.
The build files for Rust are bigger than you may expect, but they're not unreasonably big. Languages like Python and Java like to put their dependencies in system folders and cache folders outside of their project so you don't notice them as often, but I find the difference not that problematic. The binaries Rust generates are often huge but if you build in release mode rather than debug mode and strip the debug symbols, you can quickly remove hundreds of megabytes of "executable" data.
Rust can be told to export things in the C FFI, which is how Python bindings are generally accomplished (although you rarely deal with those because of all the helper crates).
Statically compiled code will also load into processes fine, they just take up more RAM than you may like. The OS normally deduplicates dynamically loaded libraries across running processes, but with statically compiled programs you only get the one blob (which itself then gets deduplicated, usually).
Rust can also load and access standard DLLs. The safety assertions do break, because these files are accessed through the C FFI which is marked
unsafe
automatically, but that doesn't need to be a problem.There are downsides and upsides to static compilation, but it doesn't really affect glue languages like Python or Typescript. Early versions of Rust lacked the C FFI and there are still issues with Rust programs dynamically loading other Rust programs without going through the C FFI, but I don't think that's a common issue at all.
I don't see Rust replace all of C either, because I think Rust is a better replacement for C++ than for C. The C parts it does replace (parsers, drivers, GUIs, complex command line tools) weren't really things I would write in C in the first place. There are still cars where Rust just fails (it can't deal with running out of memory, for one) so languages like Zig will always have their place.
Is it not possible for Rust to optimize out unused functions as with C? That seems ...like a strange choice if so.
No Rust can do dead code elimination. And I just checked, Rust can do indeed do FFI bindings from other languages when you ask the compiler to produce dynamically linking libraries, but I am guessing it has the same problems as Haskell when it produces
.so
or.dll
files. In Haskell, things like "monad transformers" depend pretty heavily on function inlining in order to achieve good performance.So I am talking more about how Rust makes use of the type system to make decisions about when to inline functions which is pretty important when it comes to performance. You usually can't inline across module boundaries unless modules are all statically linked. So as I understand it, if you enable dynamic linking in your Rust program, you might see performance suffer a lot as compared to static linking, and this is why most Rust people (as I understand it) just make everything statically linked by default.
I am not sure that is quite right. I dont think rust support just enabling dynamic linking of its dependencies. It can talk to dynamically linked libraries - which is how FFI works. And you can compile rust crates to be dynamically linked. But when you are going down this route you are talking over the C ABI. This requires some effort on the code author to make their APIs exportable to C types and means you lose all safety when talking over the C ABI.
I also dont think that rust inlines across a crate boundary unless the function is marked as inline or LTO is enabled - inlining across crate boundaries is expensive and so only done when explicitly needed or asked for it. It is more that you lose features like generics and traits and other things that are not supported over the C API.
Do you need inlining if you just use fixed monad transformers?
I am not sure what you mean by "fixed" monad transformers, if you mean writing your own
newtype
where the functor variable is the only type variable, essentially what you are doing is hand-inlining the monad transformer, and so no, if you inline by hand, then the compiler doesn't need to do it.Haskell inlines all
newtype
definitions automatically, so if your monad transformer has all of the type variables bound (except for the functor variable, because that is a special case the Haskell compiler is specifically designed to handle) the compiler will usually reduce those to ordinary lambda expressions automatically, and lambda expressions usually optimize to the most efficient machine code.The only time the compiler cannot reduce a
newtype
to an efficient lambda is if the non-functor variables, e.g. the state type variable or the exception type variable, are unbound. Those values could become anything at all at its call site, limited only by the constraints set by the type context. So the type context information, a lookup table of type class instances, must be associated with that lambda expression, and in order to do that, the compiler must create a closure around those values. Creating closures allocates values on the heap, and this is much, much slower than efficient lambda expressions, and no faster than allocating a data constructor as with Free Monads.Alexis King did a presentation on it where she explains all of this extremely well, if you are interested: https://youtu.be/0jI-AlWEwYI
It is a bit long, but at 17:40 or so she starts talking about strategies for how monads and effects can be implemented in the GHC intermediate code, and compares Free Monads and effects to monad transformers. At 21:15 or so she begins to explain how
newtype
types can be optimized away completely,newtype
constructors don't exist at all in the low-level code, they are a "zero-cost abstraction." On the other hand,data
constructors (used for Free monads and effects) always allocate something on the heap which is an order of magnitude slower.Then at around 27:45 she begins to show how
newtypes
with type variables cannot be inlined across module boundaries for the reason I explained above (type context tables associated with closures), and so monad transformers cannot be optimized across module boundaries.Yep, I mean like
newtype MyT m a = MyT (ReaderT MyEnv (StateT MyState m) a)
. But one can useReaderT MyEnv (State MyState m) a
directly as well.I found the MTL style (tagless final) a bit problematic anyway, so I wanted to comment about this.
It is and it does.
So you're working on your machine learning projects in Zig?
No, Python and C++, which were the languages chosen by both Google and Facebook for their AI frameworks.
I just think if a systems programming language like Rust does not provide a good way to facilitate dynamic linking the way C, C++ does, these languages will start running into issues as the size of the compiled binaries become ever larger and larger. I think we might all be a little too comfortable with the huge amount of memory, CPU cycles, and network bandwidth that we have nowadays, and it leads to problems when you want to scale-up a deployment. So I think Zig might make a more viable successor to C or C++ as a systems programming language than Rust does.
That said, I definitely think Rust and Haskell's type systems are much better than that of Zig.