You can run Stable Diffusion XL on 8GB of VRAM (to generate images). For beginners, there's e.g. the open source software Fooocus, which handles quite a lot of work for you - it sends your prompt to a GPT-2 model (running on your PC) to do some prompt engineering for you and then uses that to generate your images and generally features several presets, etc. to get going easily.
Jan (basically an open source software that resembles ChatGPT and allows you to use several AI models) can run in 8GB, but only for 3B models or quantized 7B models. They recommend at least 16GB for regular 7B models (which they consider "minimum usable models"). Then there are larger, more sophisticated models, that require even more.
Jan can run on CPU in your regular RAM. Since it's chatting with you, it's not too bad, when it spits out words slowly, but GPU is / would be nice here...
Have you found a way to split those mp3s into several files by chapter etc.? All converters that I have tried so far just yield a single, several hours long mp3...