Whether neural style transfer has much concrete business value is questionable but it remains a fun thing to do and to look at. The theoretical value is however huge, it emphasizes how ‘style’ can be captured via a matrix (the Gram matrix) and how it potentially can be applied to text and sound. It makes the idea of ‘style’ more tangible and it’s amazingly effective. Below you can find an implementation using PyTorch but there are heaps of (TensorFlow) alternatives around.

Some terminology first. The image on which a style is applied is called the content image, the image with a particular style is called the style image and the result is called the pastiche (19th century French for imitation).

Style transfer is just another neural learning process but with two pulling forces: it tries to learn the content image and the style image at the same time thus creating a tension or mixture. The tension is captured by the loss function which accounts for how much it resembles both images. The crucial (and magical) bit is that the loss with respect to the style is not simply the difference between the images but with respect to the Gram matrix. Why this works is a bit of a mystery but you can find in this article an alternative view on things. So, if C, S, P are the matrices representing the content, style and pastiche respectively, the loss in content is

\mathcal{L}_C = \|C-P\|

with respect to some norm, usually the Frobenius norm, and the style loss is

\mathcal{L}_S = \|G(S)-G(P)\|

with G(.) the Gram matrix of the style and pastiche. The total loss is then

\mathcal{L} - \alpha\mathcal{L}_C + (1-\alpha)\mathcal{L}_S

where $latex\alpha$ acts as a weight emphasizing either style or content. Below you can see how this parameter affects the resulting pastiche. With this all the rest is basic neural network mechanics (i.e. SGD).

Upon animating the balance between content and style you get an animation like below.



Full article