Edit Your Images Easily with Inpainting and Diffusion
Wondering how new technology supports us in image modification and editing? It turns out diffusion can help us a lot. Among these techniques, image-to-image methods like impainting are especially useful. Instead of generating a whole new image, we focus only on modifying specific parts of an existing image.
Whether using Stable Diffusion or other variants, each can demonstrate benefits in specific fields.
In this tutorial, we will guide you through:
- Preparing the environment
- Installing the library
- Loading the model and performing inference in Python
- Creating an interactive UI with Gradio for a better experience
Requirements: To run this lab, we suggest having a GPU with 12GB of VRAM, or using Google Colab is preferred.
Prepare the Environment
In this tutorial, we run the diffusion model in a Python 3.9 environment using the Anaconda manager. If you are using Google Colab, you can skip this step.
Below is how you can set up the environment:
1# Create a new conda environment 2conda create -n diffusion python=3.9 3conda activate diffusion
Then, we need to install Torch and Transformers. These libraries are essential for running the diffusion model.
1# Install the transformers library 2pip install transformers
Pay attention to the version of CUDA on your system. If you are using Google Colab, you don’t need to install Torch. For example, in our system, we are using CUDA 12.4 and above, so the installation will be:
1# Install torch based on your CUDA version 2pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu124
Next, we will install Diffusers, the module that enables inpainting.
1# Install the diffusers library 2pip install diffusers
Load the Model
If you check out Hugging Face, several models are available. Remember to select the impainting model.
In this tutorial, let's use diffusers/stable-diffusion-xl-1.0-inpainting-0.1 as our starting point.
The code to load a pipeline that applies this model is as follows:
1import torch 2from diffusers import AutoPipelineForInpainting 3 4# Load model and move it to GPU 5pipe = AutoPipelineForInpainting.from_pretrained( 6 "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", 7 torch_dtype=torch.float16 8).to("cuda")
Explanation of the code above:
- We import torch and the AutoPipelineForInpainting from the Diffusers library.
- The
from_pretrained
method loads the specified model, and we set it to run on the GPU usingtorch.float16
for better performance.
Load Image and Mask
The idea of impainting is to modify parts of an image.
Images are the visuals we want to edit, while masks specify the areas we want to change.
We will load an available image and mask for testing as shown below:
1# Load an image to inpaint 2url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png" 3response = requests.get(url) 4image = Image.open(BytesIO(response.content)).convert("RGB") 5 6# Load the mask image 7mask_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png" 8mask_response = requests.get(mask_url) 9mask_image = Image.open(BytesIO(mask_response.content)).convert("RGB") 10 11# Resize images if necessary 12image = image.resize((1024, 1024)) 13mask_image = mask_image.resize((1024, 1024))
Explanation of the code:
- We load the image and the mask using URLs.
- The images are converted to RGB format and resized to 1024x1024 pixels.
Let’s visualize the images we loaded from the above URLs:
1# Visualize both images in the same chart using matplotlib 2import matplotlib.pyplot as plt 3 4fig, ax = plt.subplots(1, 2, figsize=(10, 5)) 5ax[0].imshow(image) 6ax[0].set_title("Image") 7ax[0].axis("off") 8ax[1].imshow(mask_image) 9ax[1].set_title("Mask") 10ax[1].axis("off") 11plt.show()
As you can see, the original image shows a picture of a road and a mountain in the center. The mask covers the part of the mountain. Our aim is to use impainting to change this mountain using a prompt.
Set Up the Prompt
A prompt is a textual description of what we want to generate in the masked area.
1generator = torch.Generator("cuda").manual_seed(92) 2prompt = 'an elven castle, in the mountain mist'
We use a generator to ensure the output is consistent whenever we use the same prompt.
Next, we input the prompt, image, and mask into the pipeline:
1# Generate the inpainted image 2impainted_image = pipe(prompt=prompt, image=image, mask_image=mask_image, generator=generator).images[0]
It may take some time to process, depending on the speed of your GPU.
Let’s visualize the results of inpainting:
1# Use matplotlib to visualize the image, mask, and inpainted image 2fig, ax = plt.subplots(1, 3, figsize=(15, 5)) 3ax[0].imshow(image) 4ax[0].set_title("Image") 5ax[0].axis("off") 6ax[1].imshow(mask_image) 7ax[1].set_title("Mask") 8ax[1].axis("off") 9ax[2].imshow(impainted_image) 10ax[2].set_title("Inpainted Image") 11ax[2].axis("off") 12plt.show()
You can see that the area we masked before has been changed, while the rest of the image remains intact.
Change the Parameters
We can adjust the parameters in the pipeline, such as guidance scale, number of inference steps, and strength.
1impainted_image = pipe( 2 prompt=prompt, 3 image=image, 4 mask_image=mask_image, 5 guidance_scale=8.0, 6 num_inference_steps=20, # Steps between 15 and 30 work well for us 7 strength=0.99, # Ensure `strength` is below 1.0 8).images[0]
Explanation of each parameter:
- guidance_scale: This controls how closely the generated image should match the prompt. Higher values make the image adhere more closely to the prompt.
- num_inference_steps: This determines how many steps the model takes to generate the image. More steps can lead to better results.
- strength: This adjusts how much of the original image is preserved. A value below 1.0 means some original features are retained.
Interactive UI with Gradio
We can use Gradio to create an AI tool that allows users to draw a mask and apply inpainting.
1# Install Gradio 2pip install gradio
First, we create a function to receive the image, prompt, mask, and parameters:
1# Dummy inpainting function with hyperparameters 2def impaint(img, prompt, num_steps, guidance_scale, strength): 3 # Extract alpha channel to create a mask for inpainting 4 background = img["background"][:, :, :3] # Discard the alpha channel, keep RGB 5 6 # Extract alpha channel to create a mask for inpainting 7 alpha_channel = img["layers"][0][:, :, 3] 8 mask = np.where(alpha_channel == 0, 0, 255).astype(np.uint8) 9 10 background = Image.fromarray(background) 11 mask = Image.fromarray(mask) 12 13 # Apply inpainting with the given prompt and hyperparameters 14 impainted_image = pipe(prompt=prompt, image=background, 15 mask_image=mask, 16 num_steps=num_steps, guidance_scale=guidance_scale, strength=strength 17 ).images[0] 18 19 return impainted_image
- The impaint function takes in an image, prompt, and parameters.
- It extracts the background and the alpha channel to create a mask.
- The inpainting is performed with the given parameters, and the inpainted image is returned.
Next, we set up the Gradio interface:
1with gr.Blocks() as demo: 2 3 with gr.Row(): 4 # Textbox to enter the inpainting prompt 5 prompt_input = gr.Textbox(label="Inpainting Prompt", placeholder="Enter text prompt for inpainting") 6 7 with gr.Row(): 8 # Numeric inputs for editing hyperparameters 9 num_steps = gr.Slider(minimum=1, maximum=100, value=50, step=1, label="Number of Inference Steps") 10 guidance_scale = gr.Slider(minimum=1.0, maximum=20.0, value=7.5, step=0.1, label="Guidance Scale") 11 strength = gr.Slider(minimum=0.0, maximum=1.0, value=0.5, step=0.1, label="Strength") 12 13 with gr.Row(): 14 img = gr.ImageEditor( 15 crop_size="1:1", 16 height="30vw" 17 ) 18 im_preview = gr.Image(height="30vw") 19 20 # Automatically process the uploaded image and show it immediately 21 btn = gr.Button("Process Image") 22 23 btn.click(impaint, [img, prompt_input, num_steps, guidance_scale, strength], [im_preview])
- We set up the Gradio interface with input fields for the prompt and hyperparameters.
- The ImageEditor allows users to draw the mask directly on the image.
- When the Process Image button is clicked, the impaint function processes the inputs and displays the inpainted image.
Finally, we run the code:
1demo.launch()
To access the web app, go to: http://127.0.0.1:7860 in the browser. You can see the app interface as shown below:
In this interface, you can change a variety of input options and upload your own image.
After uploading, you can use the drawing tool to draw a mask.
After that, type in the form, adjust the values of parameters, and click the Process Image button.
After some time (depending on your GPU), we can view the result in the right panel.
You can also change the prompt and other parameters to generate your desired output:
Finally, you can save the output image by clicking the download button.
Conclusion
In this tutorial, we explored how to use impainting with diffusion models to modify specific parts of an image effectively. With libraries like Diffusers and Gradio, we can create interactive applications that make image editing easy and accessible.
Feel free to experiment with different prompts and parameters to see how they affect the inpainting results!
References
Hugging Face. (n.d.). Inpainting. In diffusers documentation. Retrieved December 16, 2024, from https://huggingface.co/docs/diffusers/using-diffusers/inpaint