DeOldify is an open-source deep learning model created by Jason Antic that is specifically designed to colorize and restore old images and film footage. It aims to transform black-and-white visual content into vibrant, realistic, and often photorealistic color versions, effectively giving them “new life”. While the project’s main repository was archived by its owner on October 19, 2024, indicating it is no longer actively maintained, its innovative approaches have significantly contributed to the field of AI image and video colorization.

How DeOldify Works: Technical Underpinnings

At its core, DeOldify leverages deep learning algorithms, primarily Generative Adversarial Networks (GANs), to achieve its impressive results. Image colorization is inherently challenging because a single grayscale image can correspond to multiple possible color versions. DeOldify tackles this “fundamentally ambiguous” problem by learning to predict and apply colors that closely resemble what the original scene might have looked like.

Key technical aspects of DeOldify include:

  • Generative Adversarial Networks (GANs): DeOldify utilizes a GAN architecture, which consists of two neural networks: a generator and a discriminator (critic).
    • The generator is trained to add colors and produce colorized images.
    • The discriminator acts as a “critic,” trained to distinguish between images generated by the generator and real, original color images. This adversarial process encourages the generator to create increasingly realistic outputs.
  • NoGAN Training This is a unique training method developed by Jason Antic for DeOldify, aimed at combining the benefits of GAN training (realistic colorization) while mitigating common GAN issues like flickering and artifacts.
    • The process involves minimal direct GAN training. Most of the training time is spent pretraining the generator and critic separately using more conventional, reliable methods.
    • The generator is initially trained with a Perceptual Loss (or Feature Loss) based on a VGG16 network, which biases it to replicate the input image. While perceptual loss alone isn’t sufficient for good colorization (tending to produce generic brown/green/blue outputs), it gets the model “most of the way there” before GAN training “closes the gap on realism”.
    • After the generator is pretrained, images are generated from it, and the critic is trained to distinguish these from real images.
    • Finally, the generator and critic are trained together in a GAN setting for a very short period (as little as 1-3% of ImageNet data, amounting to 30-60 minutes). There’s an “inflection point” where the critic has transferred all useful knowledge to the generator, after which training can become unstable, leading to issues like “orangish skin”.
  • Network Architecture (U-Net with ResNet Backbone): The core colorizer network is a U-Net.
    • It uses a pretrained ResNet (either ResNet34 or ResNet101 depending on the model version) as its encoding part, with a decoding part added to reconstruct the color image.
    • Skip connections are used to directly integrate outputs from intermediate ResNet layers into the decoder, mixing features from different scales.
    • The network is trained on the ImageNet dataset.
    • Input Handling: Although the input is a grayscale image, it’s encoded in RGB (three channels with same values) because the pretrained ResNet requires RGB inputs. A post-processing step converts the predicted RGB image to YUV, keeps the chrominance (U and V) from the prediction, and combines it with the luminance (Y) from the original grayscale image to ensure consistent luminance.
  • Self-Attention Layer: This layer, based on a “non-local operation,” helps to introduce spatial consistency at a large scale. It allows any pixel’s position to impact the result of any other pixel, helping to better separate different image zones (e.g., sky and earth, faces and background).
  • Spectral Normalization: Applied to most convolutional layers, this technique helps stabilize network training and reduces the appearance of “false colors”.
  • Progressive Training: DeOldify often uses a progressive training approach where the image size is gradually increased during training (e.g., from 64×64 to 192×192 pixels). This helps the network learn large-scale structures first and then progressively finer details, aiding convergence to better local minima.
  • Temporal Consistency: For video colorization, an additional training phase can involve noisy images, inspired by style transfer methods, to reduce “flickering” effects, though there were flaws in the original implementation that were later modified.

DeOldify Models and Their Use Cases

DeOldify provides three distinct models, each optimized for different purposes:

  • Artistic Model:
    • Achieves the highest quality results in image coloration, with rich details and vibrancy.
    • Uses a ResNet34 backbone with a U-Net, emphasizing depth of layers on the decoder side.
    • Can be challenging to use, often requiring manual adjustment of the render_factor (rendering resolution) for optimal results.
    • May not perform as well in common scenarios like nature scenes and portraits compared to the “stable” model.
  • Stable Model:
    • Produces best results for landscapes and portraits.
    • Significantly reduces instances of “zombies” (where faces or limbs remain gray).
    • Generally has fewer unusual miscolorations than the “artistic” model but is also less colorful.
    • Uses a ResNet101 backbone with a U-Net, emphasizing width of layers on the decoder side.
  • Video Model:
    • Optimized for smooth, consistent, and flicker-free video.
    • It is generally the least colorful of the three models, though close to “stable” in vibrancy.
    • Shares the same architecture as the “stable” model but differs in its training.
    • Remarkably, it achieves smooth video colorization by processing individual frames, without explicit temporal modeling.

Strengths and Advantages

DeOldify has garnered significant attention due to its capabilities:

  • High-Quality, Realistic Colorization: Many users and reviewers praise DeOldify for producing realistic and natural-looking colors that make old photos appear as if they were originally captured in color.
  • Reduction of Artifacts and Glitches: The NoGAN training method specifically addresses and largely eliminates artifacts and glitches, leading to cleaner outputs compared to earlier models.
  • Improved Skin Tones: It has made advancements in rendering skin tones more naturally, reducing “zombie” effects where faces might remain gray.
  • Video Stability: Despite processing frames individually, DeOldify’s video model offers smooth and consistent results with minimal flickering, a significant challenge in video colorization.
  • Versatility: It can handle a wide range of images, from landscapes and portraits to cityscapes and even engravings/drawings.
  • Open-Source Availability: DeOldify is an open-source project, providing access to its code and allowing others to build upon it. This has led to various implementations, including desktop GUIs, browser-based versions, and plugins for other software.

Limitations and Challenges

Despite its strengths, DeOldify and AI colorization in general face limitations:

  • Historical Accuracy: A significant challenge is ensuring historical accuracy. AI models “make a guess” about colors based on patterns learned from large datasets. While outputs may look realistic, they aren’t necessarily “correct” without historical evidence. For example, DeOldify colorized a black-and-white image of the Golden Gate Bridge, but the AI-chosen colors were not historically accurate as the towers were already covered in red primer at the time of the photo. Historians like Jordan Lloyd emphasize that accurate recoloring requires meticulous research, which AI cannot currently replicate.
  • Ambiguity and Complex Patterns: AI colorization works best with clear, predictable patterns but can struggle with complex or unknown patterns. This can lead to issues like “false colors,” uncolored sections, or difficulty coloring objects with a wide range of acceptable colors (e.g., clothing).
  • Computational Demands: Training DeOldify models, especially the larger ones, requires significant computational resources, including powerful GPUs and substantial memory.
  • Instability and “Inflection Point”: The NoGAN training process, while effective, involves finding a precise “inflection point” where optimal training occurs. Past this point, results can become inconsistent.
  • Video Degradation: While effective for smooth video, colorizing archive or degraded videos may still result in flickering or sepia tones.
  • Accessibility for Non-Developers: As an open-source project, the core DeOldify implementation is primarily Linux-based and requires some technical background to set up locally. While third-party tools have made it more accessible, the original developers explicitly state they do not aim to provide a “ready to use free ‘product’ or ‘app'” or endless personalized support.

Usage and Accessibility

DeOldify can be accessed in several ways:

  • Google Colab: The easiest way to try DeOldify is through dedicated Google Colab notebooks for image (Artistic/Stable) and video.
  • Jupyter Notebook: Users can clone the GitHub repository, install dependencies, and download pretrained weights to run DeOldify on their own machines.
  • Desktop Applications/Plugins: Various third-party applications and plugins, such as the Stable Diffusion Web UI Plugin and ColorfulSoft Windows GUI, have been developed to make DeOldify accessible on desktops, some even without a GPU.
  • Online Tools: Services like DeepAI’s DeOldify implementation offer quick, free ways to colorize images directly in a browser. Other tools mentioned, such as Palette.fm, DeepAI, Canva, and Face Max Color, also provide AI colorization services with varying features and pricing models.

Users can adjust parameters like the render_factor (resolution for inference) and saturation (to tone down or enhance colors) to optimize results. Lower render_factor values mean faster processing and more vibrant colors but less stability, while higher values are better for high-resolution images and portraits. A saturation value of 1.5 to 2.0 is often recommended.

Broader Context of AI in Photography

DeOldify is part of a larger trend of AI transforming photography editing and culling. AI tools can perform automatic adjustments, cull images quickly, assist in creative storytelling, and manage large photo collections. However, experts like Penelope Diamantopoulos emphasize the importance of AI aligning with a photographer’s personal vision rather than replacing it. Concerns remain about AI potentially diluting a photographer’s unique style or causing over-reliance, making photos less “their own”.

Other AI tools in the broader field of photo restoration, like “PhotoRestorer,” demonstrate the use of CNNs to classify image damage (e.g., blurred, cracked) and then apply specialized pre-trained models (like GFP-GAN for blur or Image Inpainting for cracks) for restoration. While distinct from DeOldify’s primary focus on colorization and general restoration, this highlights the growing sophistication of AI in addressing various forms of image degradation.

Future Outlook

Despite the archiving of the main GitHub repository, the concepts pioneered by DeOldify remain influential. Jason Antic, the creator, expressed ambitions to continue improving the code, make it more user-friendly, and address issues like sound in videos and more types of degradation (e.g., JPEG artifacts). The long-term goal for DeOldify was to consolidate the three models into one unified model with all the desirable characteristics. The advancements in deep learning, particularly with models understanding semantic information and real-time image analysis, continue to push the boundaries of what’s possible in photo editing.

Analogy: Think of DeOldify AI as a master painter who specializes in breathing new life into old, faded photographs. Instead of just adding splashes of random color, this painter has studied millions of real-world scenes, learning the subtle nuances of light, shadow, and object recognition. When given a grayscale image, they don’t just “guess” colors; they apply their vast learned knowledge, much like a skilled artist would intuitively know the typical color of a sky or grass, even from a monochrome sketch. The “NoGAN” technique is like the painter’s secret training regimen—it allows them to quickly refine their technique to produce stunningly realistic results without the usual smudges and mistakes that often plague less disciplined artists. While they might occasionally get a historical detail wrong or struggle with an unusually patterned shirt, their overall ability to transform a relic of the past into a vibrant, compelling image is nothing short of artistic magic.