I was planning on doing a blog post about some cool random deep learning paper that I have read in the last year or so. However, I keep finding that someone else has already written a way better blog post than what I could write. Instead I have decided to write a very brief summary of some hot ideas and then provide a link to some other page where someone describes it way better than me.
The Lottery Ticket Hypothesis
This idea has to do with pruning a model, which is when you remove a parts of your model to make it more computationally efficient while barely loosing accuracy. The lottery ticket hypothesis also has to do with how weight are initialized in neural networks and why larger models often achieve better performance.
Anyways, the hypothesis says the following: “Dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that—when trained in isolation—reach test accuracy comparable to the original network in a similar number of iterations.” In their analogy, the random initialization of a models weights is treated like a lottery, where some combination of a subset of these weight is already pretty close to the network you want to train (winning ticket). For a better description and a summary of advances in this field I would recommend this blog post.
SAM: Sharpness aware minimization
The key idea here has to do with finding the best optimizer to train a model capable of generalization. According to this paper, a model that has converged to a sharp minima will be less likely to generalize than one that has converged to a flatter minima. They show the following plot to provide an intuition of why this may be the case.
In the SAM paper (and ASAM for adaptive) the authors implement an optimizer that is more likely to converge to a flat minima. I found this blog post by the authors of ASAM gives a very good description of the field.
Continue reading