Tag: Stable Diffusion

  • How I Guide Stable Diffusion with ControlNet and Composite Images

    GIMP showing a multi-layer image of Lynn Conway on the right and her co-authored textbook Introduction to VLSI Systems on the left.

    For the illustration of Lynn Conway and her co-authored textbook Introduction to VLSI Systems at the top of yesterday’s post, I used a locally hosted installation of Automatic1111’s stable-diffusion-webui, the finetuned model Dreamshaper 5, which is based on StabilityAI’s Stable Diffusion 1.5 general model, and the ControlNet extension for A1111.

    Stable Diffusion is an image generating AI model that can be utilized with different software. I used Automatic1111’s stable-diffusion-webui to instruct and configure the model to create images. In its most basic operation, I type into the positive prompt box what I want to see in the output image, I type into the negative prompt box what I don’t want to see in the output image, and click “Generate.” Based on the prompts and default parameters, I will see an image output on the right that may or may not align with what I had in mind.

    Automatic1111's stable-diffusion-webui image generating area

    For the positive prompt, I wrote:

    illustration of a 40yo woman smiling slightly with a nervous expression and showing her teeth with strawberry-blonde hair and bangs, highly detailed, next to a textbook titled introduction to VLSI systems with microprocessor circuits on the cover, neutral background, <lora:age_slider_v6:1>

    I began by focusing on the type of image (an illustration), then describing its subject (woman), other details (the textbook), and the background (neutral). The last part in angle brackets is a LoRA or low rank adaptation. It further tweaks the model that I’m using, which in this case is Dreamshaper 5. This particular LoRA is an age slider, which works by inputting a number that corresponds with the physical appearance of the subject. A “1” presents about middle age. A higher number is older and a lower/negative number is younger.

    Automatic1111's stable-diffusion-webui ControlNet extension area

    ControlNet, which employs different models focused on depth, shape, body poses, etc. to shape the output image’s composition, is an extension to Automatic1111’s stable-diffusion-webui that helps guide the generative AI model to produce an output image more closely aligned with what the user had in mind.

    For the Lynn Conway illustration, I used three different ControlNet units: depth (detecting what is closer and what is further away in an image), canny (one kind of edge detection for fine details), and lineart (another kind of edge detection for broader strokes). Giving each of these different levels of importance (control weight) and telling stable-diffusion-webui when to begin using a ControlNet (starting control step) and when to stop using a ControlNet (ending control step) during each image creation changes how the final image will look.

    Typically, each ControlNet unit uses an image as input for its guidance on the generative AI model. I used the GNU Image Manipulation Program (GIMP) to create a composite image with a photo of Lynn Conway on the right and a photo of her co-authored textbook on the left (see the screenshot at the top of this post). Thankfully, Charles Rogers added his photo of Conway to Wikipedia under a CC BY-SA 2.5 license, which gives others the right to remix the photo with credit to the original author, which I’ve done. Because the photo of Conway cropped her right arm, I rebuilt it using the clone tool in GIMP.

    I input the image that I made into the three ControlNets and through trial-and-error with each unit’s settings, A1111’s stable-diffusion-webui output an image that I was happy with and used on the post yesterday. I used a similar workflow to create the Jef Raskin illustration for this post, too.

  • Joan Slonczewski Added to Yet Another Science Fiction Textbook (YASFT)

    An image of a woman walking through a tunnel toward an ocean's beach and a sky filled with stars inspired by Joan Slonczewski's novel A Door Into Ocean. Created with Stable Diffusion.

    I added a whole new section on the Hard SF writer Joan Slonczewski (they/them/theirs) to the Feminist SF chapter of the OER Yet Another Science Fiction Textbook (YASFT). It gives students an overview of their background as a scientist, writer, and Quaker, and it discusses three representative novels from their oeuvre: A Door Into Ocean (1986), Brain Plague (2000), and The Highest Frontier (2011). Like the Afrofuturism chapter, I brought in more cited, critical analysis of Slonczewski’s writing, which is parenthetically cited with a full citation instead of using a works cited list or footnotes.

    Slonczewski’s A Door Into Ocean was the inspiration for the image above that I created using Stable Diffusion. It took the better part of a day to create the basic structure of the image, then there was inpainting of specific details such as the woman’s footprints in the sand, and finally, feeding the inpainted image back into SD’s controlnet to produce the final image.

  • College Cat Studying in the Stacks, and Video Card Downgrade

    Anthropomorphic cat wearing a hoodie, sitting in a library, studying two open books. Image created in Stable Diffusion.

    I decided to sell my NVIDIA RTX A6000 video card and downgrade to an RTX 4060 Ti with 16GB GDDR6.

    I’ll miss loading large LLMs on the A6000’s 48GB of memory, but between the 16GB of memory on the 4060 and my computer’s 128GB DDR4 RAM, I can get my work done–it’ll just take some magnitudes longer in some cases.

    The college cat studying image above was one of the last that I generated with Stable Diffusion on the A6000.

    Swapping out the video cards was completely painless on Debian 12 with NVIDIA drivers 525.147.05. I pulled out the A6000 and its power adapter, and installed the 4060 and connected its single power cable.

  • Cyberpunk Help Desk Cat Made with Stable Diffusion

    A chubby anthropomorphic cat wearing a hoodie jacket is working at a cyberpunk help desk.

    When I saw this image of a cyberpunk computer technician anthropomorphic cat that I generated with Stable Diffusion, the first thing that came to mind was the Bastard Operator from Hell. Having worked at a help desk, I think it would be an interesting experience to be his co-worker. It certainly wouldn’t be boring!

  • Almost Done With a Sabbatical Side Project

    Anthropomorphic cat typing on a typewriter at a desk. City buildings seen in the window behind him. Image created with Stable Diffusion.

    These past two weeks, I’ve been working on a sabbatical side project. I put my primary research project on hold so that I could think about it some more before proceeding. In the meantime, I’m using generative AI to help accelerate my work on an open educational resource (OER) focused on Science Fiction (SF) that I plan to launch soon. The writing is done on the project. What I am doing now is using an large language model (LLM) that I’m running on my desktop workstation to help me with editing. I think the end product will be pretty cool, and it will be something anyone is free to use after it’s launched. Stay tuned!