this post was submitted on 12 Jun 2024
1315 points (98.7% liked)

Memes

45719 readers
1276 users here now

Rules:

  1. Be civil and nice.
  2. Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.

founded 5 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] barsoap@lemm.ee 5 points 5 months ago* (last edited 5 months ago)

Also, you need a supported card. I have a potato going by the name RX 5500, not on the supported list. I have the choice between three rocm versions:

  1. An age-old prebuilt, generally works, occasionally crashes the graphics driver, unrecoverably so... Linux tries to re-initialise everything but that fails, it needs a proper reset. I do need to tell it to pretend I have a different card.
  2. A custom-built one, which I fished out of a docker image I found on the net because I can't be arsed to build that behemoth. It's dog-slow, due to using all generic code and no specialised kernels.
  3. A newer prebuilt, any. Works fine for some, or should I say, very few workloads (mostly just BLAS stuff), otherwise it simply hangs. Presumably because they updated the kernels and now they're using instructions that my card doesn't have.

#1 is what I'm actually using. I can deal with a random crash every other day to every other week or so.

It really would not take much work for them to have a fourth version: One that's not "supported-supported" but "we're making sure this things runs": Current rocm code, use kernels you write for other cards if they happen to work, generic code otherwise.

Seriously, rocm is making me consider Intel cards. Price/performance is decent, plenty of VRAM (at least for its class), and apparently their API support is actually great. I don't need cuda or rocm after all what I need is pytorch.