In this project you will look at a general representation learning method called Big Transfer that was found that is particularly good in specialized small label downstream tasks.

(from the abstract) Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe where by combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets.

Tasks / Rubric (100 points)
Write a 2 page (excluding pictures) summary of what transfer learning (10 points)
Write a 2 page (excluding pictures) summary of what architectural improvements, if any, the authors have assumed on top the ResNet-xyz V1 architecture and why (20 points)
Write a 4 page (excluding figures) summary of the key points that need to be made so a non-expert can understand the differences between GN, WS and BN. The top 3 comparisons will make it to my class notes with attribution. (30 points)
Write a 2 page (excluding figures) summary of MixUp regularization and how this may help (30 points).
Explain all models that are used in producing performance results (eg RetinaNet) (10 points)
NOTE:

5 points will be subtracted if you dont issue a pull request of your work to the github with a branch name that equals your UCID.
After your submission we may call you and ask you to explain what you wrote. Copying text and images for documenting your arguments are OK but make sure you understand it. The corresponding points will be subtracted up to 100% if we find that don’t understafktnd what you put in the report.