Khue

Deep Residual Networks re-implemented by Facebook and CornellTech

7 posts in this topic

The post was co-authored by Sam Gross from Facebook AI Research and Michael Wilber from CornellTech.

In this blog post we implement Deep Residual Networks (ResNets) and investigate ResNets from a model-selection and optimization perspective. We also discuss multi-GPU optimizations and engineering best-practices in training ResNets. We finally compare ResNets to GoogleNet and VGG networks.

We release training code on GitHub, as well as pre-trained models for download with instructions for fine-tuning on your own datasets.

Our released pre-trained models have a higher accuracy than the models in the original paper.

At the end of last year, Microsoft Research Asia released a paper titled "Deep Residual Learning for Image Recognition", authored by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. The paper achieved state-of-the-art results in image classification and detection, winning the ImageNet and COCO competitions.

The central idea of the paper itself is simple and elegant. They take a standard feed-forward ConvNet and add skip connections that bypass (or shortcut) a few convolution layers at a time. Each bypass gives rise to aresidual block in which the convolution layers predict a residual that is added to the block's input tensor.

An example residual block is shown in the figure below.

resnets_1.png

Deep feed-forward conv nets tend to suffer from optimization difficulty. Beyond a certain depth, adding extra layers results in higher training error and higher validation error, even when batch normalization is used. The authors of the ResNet paper argue that this underfitting is unlikely to be caused by vanishing gradients, since this difficulty occurs even with batch normalized networks. The residual network architecture solves this by adding shortcut connections that are summed with the output of the convolution layers.

This post gives some data points for one trying to understand residual nets in more detail from an optimization perspective. It also investigates the contributions of certain design decisions on the effectiveness of the resulting networks.

After the paper was published on Arxiv, both of us who authored this post independently started investigating and reproducing the results of the paper. After learning about each other's efforts, we decided to collectively write a single post combining our experiences.

Read more on Torch blog...

The authors of the paper also released their original (Caffe) pre-trained models a few days ago: https://github.com/KaimingHe/deep-residual-networks.

Share this post


Link to post
Share on other sites

We may have an opportunity to implement ResNets in the final assignment of CS231n. Andrej mentioned this Lecture 6 or 7.

Share this post


Link to post
Share on other sites

I don't have GPU on my machine. It seems impossible for me to train larger models.

Share this post


Link to post
Share on other sites
Just now, Wei Xue đã nói:

I don't have GPU on my machine. It seems impossible for me to train larger models.

If they include ResNets in the assignment then they will probably make sure it's feasible on regular PCs. I guess most of the students do not have a GPU either :D 

Share this post


Link to post
Share on other sites

Does ResNet even run on 980GTX with 4 Gig of ram or does it need a TITANX with 12Gigs?

AlexNet needs 3 to 4 Gig to run properly! and that is just 8 layers!

Do you know any implications on the hardware requirements of this model?

Thanks in advance

Share this post


Link to post
Share on other sites

We have cloud like AWS, however we must pay monthly or depends on how much of resources we use.... 

1 person likes this

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now