Background

Gatys, Ecker and Bethge (NIPS 2015) showed that the Gram matrices of intermediate VGG layers encode textural content well enough to synthesize novel images with matching mid-level statistics. This is related to but distinct from Neural Style Transfer: there is no content image, no content loss, just a random noise matrix iteratively optimized to match the Gram matrix statistics of a single texture donor across five VGG19 layers (first conv + four pooling layers).

Motivation

The original Caffe implementation is effectively unusable today without significant environment archaeology. PyTorch is the current standard for deep learning research, so this re-implementation exists to make the algorithm accessible without the dependency burden of a deprecated framework.

What It Does

Given an input texture, the algorithm synthesizes a new image that shares its statistical fingerprint at multiple scales of the feature hierarchy, without copying any specific spatial structure. The output is called a texform.

Implementation

Delivered as a Jupyter notebook rather than a packaged library, which fits the scope: this is primarily a clear walkthrough of the algorithm rather than production tooling. Loss is Gram-matrix MSE across the five selected layers, optimized directly on the pixel values of the output image.