Using stable diffusion in art can be tough because it's about getting the right look over a long time. It's like trying to control how things spread or mix in your artwork, but doing it in a way that looks good over a while. It's a bit like a patient game where artists need to be careful not to mess up the final result while waiting for things to come together. It's a challenge to keep things looking good and controlled throughout the whole process.
I would start off with Chatgpt to generate a script about any random topic, and whatever it would deliver would be the prompts for stable diffusion, and also the voice for TextToSpeech. I would then pair those images and audio files and put them both into a Wav2Lip model, which allowed the audio and images to pair up as if the subject was actually talking. There were A lot of trials and tribulations that came with this project, A big one being time. Certain models like stable diffusion may take an incredible about of processing power, and that also means a large portion of time is spent waiting. After having GPU issues, I decided to find models within the browser that were less GPU intensive. This led to lesser quality images, however it allowed me to get more work done. I believe there is a trade off to working with AI as an artist; it is important to realize what we value more, our time as humans, or our work as artist who utilize AI. You may spend 24 hours generating images that may not even be sufficient. With such incredible technology, I still don't believe the time trade off is worth it, nor do I see AI as a replacement for human artists.