## Department Colloquium

### Friday, February 12th, 2021

**Time:** 2:30 p.m. **Place:** Online (via Zoom)

**Speaker:** Yihong Wu (Yale University)

**Title:** Self-regularizing Property of Nonparametric Maximum Likelihood Estimator in Mixture Models

**Abstract: ** Introduced by Kiefer and Wolfowitz 1956, the nonparametric maximum likelihood estimator (NPMLE) is a widely used methodology for learning mixture models and empirical Bayes estimation. Sidestepping the non-convexity in mixture likelihood, the NPMLE estimates the mixing distribution by maximizing the total likelihood over the space of probability measures, which can be viewed as an extreme form of overparameterization. In this work we discover a surprising property of the NPMLE solution. Consider, for example, a Gaussian mixture model on the real line with a subgaussian mixing distribution. Leveraging complex-analytic techniques, we show that with high probability the NPMLE based on a sample of size n has $O(\log n)$ atoms (mass points), significantly improving the deterministic upper bound of n due to Lindsay 1983. Notably, any such Gaussian mixture is statistically indistinguishable from a finite one with $O(\log n)$ components (and this is tight for certain mixtures). Thus, absent any explicit form of model selection, NPMLE automatically chooses the right model complexity, a property we term self-regularization. Statistical applications and extensions to other exponential families will be given. Time permitting, we will discuss some recent results on optimal regret in empirical Bayes and the role of NPMLE. This is based on joint work with Yury Polyanskiy (MIT).

**Yihong Wu** is an Associate Professor in the Department of Statistics and Data Science at Yale University. He obtained his Ph.D.~in Electrical Engineering from Princeton University in 2011. He was a Postdoctoral Fellow at the University of Pennsylvania (2011 - 2012), and an Assistant Professor at the University of Illinois at Urbana-Champaign (2013 - 2016). He is broadly interested in theoretical and algorithmic aspects of high-dimensional statistics, information theory, and optimization. He has received many awards, including the Sloan Research Fellowship in Mathematics in 2018, the NSF CAREER award in 2017, the Simons-Berkeley Research Fellowship in 2015, and the Marconi Society Paul Baran Young Scholar Award in 2011.