No offense intended, possible that I miss read your experience level:
I hear a user asking developer questions. Either you go the route of using the publicly available services (dalle and Co) or you start digging into hosting the models yourself. The page you linked hosts trained models to use in your own contexts, not for a "click button and it works".
As a starting point for image generation self hosting I suggest https://github.com/AUTOMATIC1111/stable-diffusion-webui.
For the training part, I'll be very blunt: if you don't indent to spend five to six digit sums on hardware or processing power, forget it. And even then you'd need the raw training data to pull it of.
Perhaps what you want to do use fine tune a pretrained model, that's something I only have a. It of experience in LLMs thohfn(and even there I don't have the hardware to get beyond a personal proof of concept).