References

[ADX10]

Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Proceedings of the 23rd Conference on Learning Theory (COLT), 28–40. 2010.

[CBL06]

Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.

[CesaBianchiGLS12]

Nicolò Cesa-Bianchi, Pierre Gaillard, Gábor Lugosi, and Gilles Stoltz. Mirror descent meets fixed share (and feels no regret). In Advances in Neural Information Processing Systems 25 (NIPS), 989–997. 2012.

[CesaBianchiMS07]

Nicolò Cesa-Bianchi, Yishay Mansour, and Gilles Stoltz. Improved second-order bounds for prediction with expert advice. Machine Learning, 66(2-3):321–352, 2007.

[CLW21a]

Liyu Chen, Haipeng Luo, and Chen-Yu Wei. Minimax regret for stochastic shortest path with adversarial costs and known transition. In Proceedings of the 34th Conference on Learning Theory (COLT), 1180–1215. 2021.

[CLW21b]

Liyu Chen, Haipeng Luo, and Chen-Yu Wei. Impossible tuning made possible: A new expert algorithm and its applications. In Proceedings of the 34th Conference on Learning Theory (COLT), 1216–1259. 2021.

[CYL+12]

Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu. Online optimization with gradual variations. In Proceedings of the 25th Conference On Learning Theory (COLT), 6.1–6.20. 2012.

[DGSS15]

Amit Daniely, Alon Gonen, and Shai Shalev-Shwartz. Strongly adaptive online learning. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 1405–1411. 2015.

[FKM05]

Abraham Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 385–394. 2005.

[GyorgyLL12]

András György, Tamás Linder, and Gábor Lugosi. Efficient tracking of large classes of experts. IEEE Transactions on Information Theory, 58(11):6709–6725, 2012.

[Haz16]

Elad Hazan. Introduction to Online Convex Optimization. Foundations and Trends in Optimization, 2(3-4):157–325, 2016.

[HS07]

Elad Hazan and C. Seshadhri. Adaptive algorithms for online decision problems. Electronic Colloquium on Computational Complexity (ECCC), 2007.

[HS09]

Elad Hazan and C. Seshadhri. Efficient learning algorithms for changing environments. In Proceedings of the 26th International Conference on Machine Learning (ICML), 393–400. 2009.

[JRSS15]

Ali Jadbabaie, Alexander Rakhlin, Shahin Shahrampour, and Karthik Sridharan. Online optimization: competing with dynamic comparators. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS), 398–406. 2015.

[JOWW17]

Kwang-Sung Jun, Francesco Orabona, Stephen Wright, and Rebecca Willett. Improved strongly adaptive online learning using coin betting. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 943–951. 2017.

[LW94]

Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.

[LS15]

Haipeng Luo and Robert E. Schapire. Achieving all with no parameters: AdaNormalHedge. In Proceedings of the 28th Annual Conference Computational Learning Theory (COLT), 1286–1304. 2015.

[OPal18]

Francesco Orabona and Dávid Pál. Scale-free online learning. Theoretical Computer Science, 716:50–69, 2018.

[RS13]

Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In Proceedings of the 26th Conference On Learning Theory (COLT), 993–1019. 2013.

[RM21]

Aviv Rosenberg and Yishay Mansour. Stochastic shortest path with adversarially changing costs. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI), 2936–2942. 2021.

[SB18]

Richard S Sutton and Andrew G Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018.

[WZZ18]

Guanghui Wang, Dakuan Zhao, and Lijun Zhang. Minimizing adaptive regret with one gradient per iteration. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), 2762–2768. 2018.

[ZLZ19]

Lijun Zhang, Tie-Yan Liu, and Zhi-Hua Zhou. Adaptive regret of convex and smooth functions. In Proceedings of the 36th International Conference on Machine Learning (ICML), 7414–7423. 2019.

[ZLZ18]

Lijun Zhang, Shiyin Lu, and Zhi-Hua Zhou. Adaptive online learning in dynamic environments. In Advances in Neural Information Processing Systems 31 (NeurIPS), 1330–1340. 2018.

[ZWTZ21]

Lijun Zhang, Guanghui Wang, Wei-Wei Tu, and Zhi-Hua ZHou. Dual adaptivity: a universal algorithm for minimizing the adaptive regret of convex functions. In Advances in Neural Information Processing Systems 34 (NeurIPS), 24968–24980. 2021.

[ZZZ20]

Yu-Jie Zhang, Peng Zhao, and Zhi-Hua Zhou. A simple online algorithm for competing with dynamic comparators. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), 390–399. 2020.

[Zha21]

Peng Zhao. Online Ensemble Theories and Methods for Robust Online Learning. PhD thesis, Nanjing University, Nanjing, China, 2021. Advisor: Zhi-Hua Zhou.

[ZLZ22]

Peng Zhao, Long-Fei Li, and Zhi-Hua Zhou. Dynamic regret of online Markov decision processes. In Proceedings of the 39th International Conference on Machine Learning (ICML)), to appear. 2022.

[ZWZZ21]

Peng Zhao, Guanghui Wang, Lijun Zhang, and Zhi-Hua Zhou. Bandit convex optimization in non-stationary environments. Journal of Machine Learning Research, 22(125):1–45, 2021.

[ZWZ22]

Peng Zhao, Yu-Xiang Wang, and Zhi-Hua Zhou. Non-stationary online learning with memory and non-stochastic control. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2101–2133. 2022.

[ZZZZ20]

Peng Zhao, Yu-Jie Zhang, Lijun Zhang, and Zhi-Hua Zhou. Dynamic regret of convex and smooth functions. In Advances in Neural Information Processing Systems 33 (NeurIPS), 12510–12520. 2020.

[ZZZZ21]

Peng Zhao, Yu-Jie Zhang, Lijun Zhang, and Zhi-Hua Zhou. Adaptivity and non-stationarity: problem-dependent dynamic regret for online convex optimization. ArXiv preprint, 2021.

[Zho12]

Zhi-Hua Zhou. Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC Press, 2012.

[ZN13]

Alexander Zimin and Gergely Neu. Online learning in episodic Markovian decision processes by relative entropy policy search. In Advances in Neural Information Processing Systems 26 (NIPS), 1583–1591. 2013.

[Zin03]

Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML), 928–936. 2003.