Publications
(You can find my publications on my Google Scholar profile)
Adapting to game trees in zero-sum imperfect information games Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet and Michal Valko, ICML 2023
Fast Rates for Maximum Entropy Exploration Daniil Tiapkin, Denis Belomestny, Daniele Calandriello , Eric Moulines, Rémi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko and Pierre Ménard, ICML 2023.
Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees Daniil Tiapkin, Denis Belomestny, Daniele Calandriello , Eric Moulines, Rémi Munos, Alexey Naumov, Mark Rowland, Michal Valko and Pierre Ménard, NeurIPS 2022.
Learning Generative Models with Goal-conditioned Reinforcement Learning Mariana Vargas Vieyra and Pierre Ménard, EWRL 2022.
From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses Daniil Tiapkin, Denis Belomestny, Eric Moulines, Rémi Munos, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko and Pierre Ménard, ICML 2022.
Adaptive Multi-Goal Exploration Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko and Alessandro Lazaric, AISTATS 2022.
Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall Tadashi Kozuno, Pierre Ménard, Rémi Munos and Michal Valko, NeurIPS 2021.
Bandits with many optimal arms Rianne de Heide, James Cheshire, Pierre Ménard and Alexandra Carpentier, NeurIPS 2021.
Indexed Minimum Empirical Divergence for Unimodal Bandits Hassan Saber, Pierre Ménard and Odalric-Ambrym Maillard, NeurIPS 2021.
UCB Momentum Q-learning: Correcting the bias without forgetting Pierre Ménard, Omar Darwiche Domingues, Xuedong Shang and Michal Valko, ICML 2021.
Problem Dependent View on Structured Thresholding Bandit Problems James Cheshire, Pierre Ménard and Alexandra Carpentier, ICML 2021.
Fast active learning for pure exploration in reinforcement learning Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Emilie Kaufmann, Edouard Leurent and Michal Valko, ICML 2021.
Regret bounds for kernel-based reinforcement learning Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann and Michal Valko, ICML 2021.
A kernel-based approach to non-stationary reinforcement learning in metric spaces Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann and Michal Valko, AISTATS 2021.
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann and Michal Valko, ALT 2021.
Adaptive reward-free exploration Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent and Michal Valko, ALT 2021.
Optimal Strategies for Graph-Structured Bandits Hassan Saber, Pierre Ménard and Odalric-Ambrym Maillard, preprint 2020.
Planning in Markov decision processes with gap-dependent sample complexity Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent and Michal Valko, NeurIPS 2020.
The Influence of Shape Constraints on the Thresholding Bandit Problem James Cheshire, Pierre Ménard and Alexandra Carpentier, COLT 2020.
A single algorithm for both restless and rested rotting bandits Julien Seznec, Pierre Ménard, Alessandro Lazaric and Michal Valko, AISTATS 2020.
Fixed-confidence guarantees for Bayesian best-arm identification Xuedong Shang, Rianne de Heide, Pierre Ménard, Emilie Kaufmann and Michal Valko, AISTATS 2020.
Gamification of pure exploration for linear bandits Rémy Degenne, Pierre Ménard, Xuedong Shang and Michal Valko, ICML 2020.
Thresholding Bandit for Dose-ranging: The Impact of Monotonicity Laurent Rossi, Pierre Ménard and Aurélien Garivier, ICMA 2020.
Fano’s inequality for random variables Sébastien Gerchinovitz, Pierre Ménard and Gilles Stoltz, Statistical Science 2020.
Non-asymptotic pure exploration by solving games Rémy Degenne, Wouter Koolen and Pierre Ménard, NeurIPS 2019.
Planning in entropy-regularized Markov decision processes and games Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard, Rémi Munos and Michal Valko, NeurIPS 2019.
Gradient Ascent for Active Exploration in Bandit Problems Pierre Ménard, preprint 2019.
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints Aurélien Garivier, Hédi Hadiji, Pierre Ménard and Gilles Stoltz, JMLR 2022.
Explore first, exploit next: the true shape of regret in bandit problems Aurélien Garivier, Pierre Ménard and Gilles Stoltz, MOR 2018.
A minimax and asymptotically optimal algorithm for stochastic bandits Pierre Ménard and Aurélien Garivier, ALT 2017.