# Huizhen (Janey) Yu (University of Alberta)

Date

Friday November 4, 20222:30 pm - 3:30 pm

Location

Jeffery Hall, Room 234## Math & Stats Department Colloquium

**Friday, November 4th, 2022**

**Time:** 2:30 p.m. **Place:** Jeffery Hall, Room 234

**Speaker:** Huizhen (Janey) Yu (University of Alberta)

**Title:** Average-Cost Markov Decision Processes with Borel Spaces and Universally Measurable Policies

**Abstract:** In this talk, I will present results for discrete-time Markov decision processes (MDPs) with infinite state and action spaces, specifically Borel-space MDPs, under the long-run average-cost criterion. While these MDPs have been extensively studied in the cases of discounted and total cost, the average-cost case is harder to analyze and still not fully understood. In formulating my results, I have adopted a general mathematical framework that, unlike most prior work on average cost, does not require continuity of the state transition and one-stage cost functions nor compactness of the admissible action sets for each state. I will begin by introducing Borel-space MDPs and reviewing past results on average-cost optimality, which depend on these assumptions, and then devote the remainder of the talk to two recent optimality results in the more general setting. The first result establishes the average-cost optimality inequality (ACOI) for two classes of MDPs, one with nonnegative one-stage costs and the other with a Lyapunov-type stability property and unbounded one-stage costs. The ACOI is the inequality counterpart of the standard average-cost optimality equation (ACOE) and implies that the optimal average-cost function is constant and that there exist stationary, universally measurable, $\epsilon$-optimal policies. In deriving this result, I use the vanishing discount factor approach and a set of new conditions to handle discontinuous system dynamics and one-stage cost functions, which are motivated by Egoroff's theorem and a relationship between the epi-limits and pointwise limits of sequences of functions. The second result concerns the more general case where the ACOI may not hold, the optimal average cost may depend on the initial state, and stationary $\epsilon$-optimal policies may not exist. Using submartingale arguments, I show that, for a given set of reachability and boundedness conditions, the optimal average-cost function is constant almost everywhere with respect to certain $\sigma$-finite measures. This result provides a characterization of the structure of the optimal average-cost functions for the class of Borel-space multichain MDPs satisfying these conditions.

Huizhen (Janey) Yu is a research associate at the Reinforcement Learning and Artificial Intelligence Group (RLAI) in the Department of Computing Science, University of Alberta. She received the Ph.D. degree from the Laboratory for Information and Decision Systems (LIDS), Massachusetts Institute of Technology. Her research interests include reinforcement learning and stochastic approximation based computational methods for solving MDPs, as well as theoretical properties of MDPs with general state and action spaces. She has served as an associate editor for several journals in the past and is a recipient of this year's Top Reviewer Award from the journal Operations Research Letters.