![]() ![]() Gradient dynamics of shallow univariate relu networks. Francis Williams, Matthew Trager, Daniele Panozzo, Claudio Silva, Denis Zorin, and Joan Bruna. ![]() How do infinite width bounded norm networks look in function space? In Conference on Learning Theory, pages 2667–2690, 2019. Pedro Savarese, Itay Evron, Daniel Soudry, and Nathan Srebro.Approximation by superpositions of a sigmoidal function. Multilayer feedforward networks are universal approximators. Although this reduction is not computationally useful (since the size of the hyperplane arrangement is O( n d)), studying properties of the dataset that allow for substantially smaller arrangements is a promising direction for future research. In other words, training shallow ReLU networks in the overparametrised regime amounts to solving a linear program in finite dimension, given by the hyperplane arrangement generated by the dataset when identifying a point x ∈ R d with a hyperplane in the dual. In fact, recently have shown that all solutions of this program are sparse, therefore resulting in only a finite number of neurons, irrespective of the amount of overparameterization. Classic results from convex geometry assert that such convex programs admit sparse solutions with at most n atoms, the so-called Representer theorem. ![]() In the context of shallow ReLU networks, the resulting training problem can be formulated in terms of a probability measure over the neuron parameters, leading to a sparse regularised convex program over a ‘continuous’ dictionary, where Θ ⊆ R d is the space of neuron parameters and φ θ( x) = is a single neuron. Weight-decay is a popular regularisation strategy that penalizes the squared L 2 norm of the neuron’s weights. A possible hypothesis supporting this behavior comes from the implicit regularisation built-in by gradient descent methods at solving the Empirical Risk Minimisation. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |