Universal Neural Style Transfer | Two Minute Papers #213



The paper “Universal Style Transfer via Feature Transforms” and its source code is available here:
https://arxiv.org/abs/1705.08086
https://github.com/Yijunmaverick/UniversalStyleTransfer

Recommended for you:
https://www.youtube.com/watch?v=Rdpbnd0pCiI – What is an Autoencoder?

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Andrew Melnychuk, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dave Rushton-Smith, Dennis Abts, Eric Haddad, Esa Turkulainen, Evan Breznyik, Kaben Gabriel Nanlohy, Malek Cellier, Marten Rauschenberg, Michael Albrecht, Michael Jensen, Michael Orenstein, Raul Araújo da Silva, Robin Graham, Steef, Steve Messina, Sunil Kim, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

One-time payments:
PayPal: https://www.paypal.me/TwoMinutePapers
Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh

Music: Antarctica by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/)
Artist: http://audionautix.com/

Thumbnail background image credit: https://pixabay.com/photo-1978682/
Splash screen/thumbnail design: Felícia Fehér – http://felicia.hu

Károly Zsolnai-Fehér’s links:
Facebook: https://www.facebook.com/TwoMinutePapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

source

Fahad Hameed

Fahad Hashmi is one of the known Software Engineer and blogger likes to blog about design resources. He is passionate about collecting the awe-inspiring design tools, to help designers.He blogs only for Designers & Photographers.

38 thoughts on “Universal Neural Style Transfer | Two Minute Papers #213

  • December 13, 2017 at 10:36 pm
    Permalink

    A couple of commenters have asked what's so special about this method as compared to what's been around for the past two years:

    The original algo from Gatys in 2015 still produces the most visually pleasing results as far as I know. It poses style transfer as an optimization objective where a noise image is fed forward through a pre-trained image classification net like VGG16, and the loss balances between content/style losses that encourage it to match the VGG features of given content/style images at multiple layers. The error is backpropagated and the noise image is updated with a gradient descent step. The forward/backward/update process is repeated up to hundreds of times, and so this is typically slow and can't stylize in real-time.

    jcjohnson in 2016 introduced a 'fast' style transfer approach that addresses the speed issue with somewhat lower quality results. It uses a separate 'image transformation network' that's trained to apply a single style to input content images with only a forward pass. It also uses VGG to calculate a similar loss during training, but at test time only the transformation net is needed. Most style transfer mobile apps likely use a variation of this.

    There are a couple of extensions to the 'fast' architecture that support multiple styles without needing to train a new net for each. One of these is Conditional Instance Normalization from Google Magenta https://magenta.tensorflow.org/2016/11/01/multistyle-pastiche-generator. However, this is limited to a pre-set # of styles that have to be supplied during training. Adding an unseen style to the net requires training with a new set of params.

    This paper is 'universal' in the sense of having the flexibility to generalize to arbitrary styles while also being fast. In fact, there is no explicit style transfer objective and style images aren't even needed for training. The architecture is simply an autoencoder trained to decode VGG features to reconstruct the image that's fed in.

    The style transfer magic is applied at test time by combining the content/style image features using a Whitening-Coloring Transform https://goo.gl/9eUnUW. This relies on the insight that style information is represented by the feature statistics, and so transferring the mean/covariance of the style features to the content with WCT is sufficient to preserve the structure of the content while applying style. The stylized result is then obtained by forwarding the transformed features through the decoder. The results still aren't as good as Gatys, but in the realm of fast approaches it feels like the conceptually cleanest.

    And to plug my own work: I've implemented this paper in TensorFlow https://github.com/eridgd/WCT-TF

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    I didn't really understand the difference between the old method and the new one. Could you elaborate a bit on this in the comments?

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    What about the hypothesis that the essence of idealisation in the brain is merely (usually) the result of this "bottlenecking", (or just heavy data compression achieved by some means)?

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    wait, let me guess, throw random ANN at wall, see what sticks, publish paper. hey, i have an idea! just train an ANN to write papers about ANNs and have an ANN summarize those papers and create youtube video of them.

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    "Universal Neural Style Transfer " Oh cmon. How does that now mean transferring your consciousness into another mind?!?!

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    2:28 It is not clear at all how the two inputs are combined. How does this "Feature Transforms" step work?

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    Am I the only one who thinks that these are worse than the previous versions it was compared against? Or is the main improvement in speed rather than quality?

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    What happens if you conect it up the other way around so the style transfers in the other direction

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    why not use this to interview an AI? just like you would a person. We cant tell what an AI is thinking because the tensors involved are too complex, but we can interview an AI and then start to map out factors the AI uses to arrive at a set conclusion.

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    Amazing episode!!! Thank you so much, and I have no idea whether you read my comment on the last episode, but this one really had the perfect amount of explanation! Thank you Karoly from a Viennese Fellow Scholar/Data Scientist 😉 I only miss how the two bottleneck representations are combined, but -> paper reading it is 🙂

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    That masking technique is an interesting option, if you could have another neural network segment the source image semantically so that the style input for any given area was also semantically linked the results would be amazing. e.g. If the input image was a pencil sketch and the style inputs were photographs the output would be a photorealistic interpretation of the pencil sketch. Add a third network for tweening between sketches and you would have a storyboard to movie system. I think this is where AI will really take off, entire pipelines or webs of networks working together.

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    The results don't seem to look as good as jcjohnson's Neural-Style. This seems to be more towards "Fast" style transfer, which produces lower quality, but faster outputs. Reminds me a lot of AdaIN, Style-swap, and Fast-Neural-Style. These sorts of style transfer networks seem best suited for devices like phones, and those without access to high end GPUs, but they still can't compete with the original Neural-Style.

    There are already a large number of ways to control the outputs produced by Neural-Style, and I've tried to list them all here: https://github.com/jcjohnson/neural-style/wiki/Scripts. You can transfer styles between different regions, using the style feature mean in addition to the gram matrix, gram matrix delta manipulation, layer channel manipulation, luminance transfer, histogram matching, photorealism, simultaneous DeepDream and style transfer, endless zoom, multiscale resolution, tiling, etc…

    So I don't think it's accurate to say that you couldn't tune the output artistically to your liking, in previous style transfer algorithms. It's more accurate to say that you couldn't really tune "Fast" style transfer outputs to your liking as easily, in previous "Fast" style transfer algorithms.

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    Waiting for the day style transfer is used as a post effect for a game. Maybe a game where you can jump around different artworks. While there's a lot of stuff you can do with post shaders, there's got to be a bunch of amazing effects and animations that can only be done with style transfers.

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    deepdreamgeneratorcom – not usre if uses the same principle, but the output is really good

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    Even If I see the interest of this work, depending of the style I still prefer (qualitatively) the other methods ( :

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    This is very impressive 😮
    And thx for the detailed explaination.

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    There are apps that are performing real-time style transfer on phone already. Check out envision and dreamsnap! Both run on the GPU of iPhones using a framework called Bender

    Reply
  • December 13, 2017 at 10:36 pm
    Permalink

    How are these style things being considered "research" these days? These need to stop being mixed in with actual research and have their own fun blog or something.

    Reply

Leave a Reply to Ginsu131 Cancel reply

Your email address will not be published. Required fields are marked *