The 2nd Bayes-Duality Workshop 2024

Bayes-Duality 2024 Schedule

June 12

Morning Session (chair: Emtiyaz Khan)

[10:00-10:10] Masashi Sugiyama: Introduction to AIP Video
[10:10-10:30]* Emtiyaz Khan: Logistics Video
*Short Coffee Break during the talk
[10:30-11:30] Vincent Fortuin: Use Cases for Bayesian Deep Learning in the Age of Foundation Models Details Video
Abstract: Many researchers have pondered the same existential questions in this day and age: Is scale really all you need? Will the future of machine learning rely exclusively on foundation models? Should we all drop our current research agenda and work on the next large language model instead? In this talk, I will try to make the case that the answer to all these questions should be a convinced “no” and that now, maybe more than ever, should be the time to focus on fundamental questions in machine learning again. I will provide evidence for this by presenting three modern use cases of Bayesian deep learning in the areas of interpretable additive modeling, neural network sparsification, and subspace inference for fine-tuning. Together, these will show that the research field of Bayesian deep learning is very much alive and thriving and that its potential for valuable real-world impact is only just unfolding.

Bio: Vincent Fortuin is a tenure-track research group leader at Helmholtz AI in Munich, leading the group for Efficient Learning and Probabilistic Inference for Science (ELPIS). He is also junior faculty at the Technical University of Munich, a Fellow of the Konrad Zuse School for Reliable AI, affiliated with the Munich Center of Machine Learning, and a Branco Weiss Fellow. His research focuses on reliable and data-efficient AI approaches leveraging Bayesian deep learning, deep generative modeling, meta-learning, and PAC-Bayesian theory. Before that, he did his PhD in Machine Learning at ETH Zürich and was a Research Fellow at the University of Cambridge. He is a member and unit faculty of ELLIS, a regular reviewer and area chair for all major machine learning conferences, and a co-organizer of the Symposium on Advances in Approximate Bayesian Inference (AABI) and the ICBINB initiative.
[11:30-12:30] Juho Lee: Toward scalable and generalizable Bayesian deep learning Details Video
Abstract: In the era of large-scale foundation models, Bayesian deep learning remains an indispensable tool due to its capability to quantify uncertainty in a principled manner and adapt continuously to dynamic environments. However, the well-known challenges of Bayesian learning—difficulty in posterior inference, high cost of Bayesian model averaging (BMA), and selecting appropriate prior distributions—are even more pronounced in modern AI models. In this talk, I will present our recent work addressing these challenges. First, for more efficient posterior inference in Bayesian neural networks (BNNs), I will discuss how meta-learning can enhance the mixing of stochastic gradient MCMC (SGMCMC) algorithms for various BNNs. Second, I will introduce our newly developed algorithm that reduces the cost of BMA using diffusion-based distribution matching techniques. Finally, I will present our work on meta-learning stochastic processes, which can serve as priors for a range of downstream tasks.

Bio: Dr. Juho Lee is an associate professor at the Kim Jaechul Graduate School of AI Korea Advanced Institute of Science and Technology (KAIST). He earned his Ph.D. in Computer Science & Engineering from Pohang University of Science & Technology (POSTECH) and did his postdoc in the Computational Statistics & Machine Learning group at the University of Oxford, working with Professor François Caron. His research primarily focuses on Bayesian deep learning, with significant contributions to Bayesian nonparametrics, meta-learning, and generative modeling.
[12:30-14:00] Lunch Break (On Your Own)

Afternoon Session (chair: Vincent Fortuin)

[14:00-15:00] Tutorial by Eugene Ndiaye: Conversation on Conformal Prediction Details
Abstract: If you predict a label y of a new object with y_pred, how confident are you that "y = y_pred"? The conformal prediction method provides an elegant framework for answering such a question by establishing a confidence set for an unobserved response of a feature vector based on previous similar observations of responses and features. This is performed without assumptions about the distribution of the data. I will try, in this presentation, to discuss this approach, to evaluate its validity, strength and limitation. Last but not least, is there a "hidden / implicit" link with the classic Bayesian approach?

Bio: I am currently a researcher in the Machine Learning Group @Apple in Paris. I focus mainly on optimization and uncertainty quantification. I was previously a postdoctoral researcher both at Georgia Institute of Technology (USA) and Riken AIP (Japan). I hold a PhD in Applied Mathematics from University of Paris Saclay (France). My doctoral thesis focused on the design and analysis of faster and safer optimization algorithms for variable selection and hyperparameter calibration in high dimension.
[15:00-16:00] Eugene Ndiaye: From Conformal Predictions to Confidence Regions Details
Abstract: Conformal prediction methodologies have significantly advanced the quantification of uncertainties in predictive models. Yet, the construction of confidence regions for model parameters presents a notable challenge, often necessitating stringent assumptions regarding data distribution or merely providing asymptotic guarantees. We introduce a novel approach termed CCR, which employs a combination of conformal prediction intervals for the model outputs to establish confidence regions for model parameters. We present coverage guarantees under minimal assumptions on noise and that is valid in finite sample regime. Our approach is applicable to both split conformal predictions and black-box methodologies including full or cross-conformal approaches. In the specific case of linear models, the derived confidence region manifests as the feasible set of a Mixed-Integer Linear Program (MILP), facilitating the deduction of confidence intervals for individual parameters and enabling robust optimization. We empirically compare CCR to recent advancements in challenging settings such as with heteroskedastic and non-Gaussian noise.

Bio: I am currently a researcher in the Machine Learning Group @Apple in Paris. I focus mainly on optimization and uncertainty quantification. I was previously a postdoctoral researcher both at Georgia Institute of Technology (USA) and Riken AIP (Japan). I hold a PhD in Applied Mathematics from University of Paris Saclay (France). My doctoral thesis focused on the design and analysis of faster and safer optimization algorithms for variable selection and hyperparameter calibration in high dimension.
[16:00-16:30] Coffee Break
[16:30-17:30] Alexander Immer: Advances in Bayesian Model Selection for Deep Learning Details Video
Abstract: Choosing optimal hyperparameters for deep learning can be highly expensive due to trial-and-error procedures and required expertise. Bayesian model selection can help to overcome such issues using gradient-based optimization and does not require a held-out validation set. However, it requires estimation and differentiation of the marginal likelihood, which is inherently intractable. In my talk, I will first discuss how scalable Laplace approximations enable Bayesian model selection for advanced applications. Further, I will demonstrate how to derive faster approximations using lower bounds and dualities.

Bio: I am a last-year PhD student at ETH Zürich, Max Planck Institute for Intelligent Systems, and Google Research. I work on probabilistic deep learning with a focus on Bayesian model selection and scientific applications.

[Back to Program]

June 13

Morning Session (chair: Alexander Immer)

[10:00-11:00] Haavard Rue: Correcting Approximations using Variational Bayes Details
Abstract: I'll discuss how to use Variational Bayes (VB) to correct approximations rather than using VB to get approximations itself.

Bio: Haavard Rue is a professor of Statistics at CEMSE Division, at the King Abdullah University of Science and Technology in Saudi Arabia, since 2017, and before that a professor at the Department of Mathematical Sciences at the Norwegian University for Science and Technology. He was named a highly cited researcher according to the Highly Cited Researchers in the years 2019--2021, from the Web of Science Group, gave the Bahadur Memorial Lectures at Univ of Chicago in 2018, and in 2021 awarded the Royal Statistical Society (RSS) Guy Medal in Silver for his work on Integrated Nested Laplace Approximations (INLA) and the Stochastic Partial Differential Equation (SPDE) approach represent and compute with Gaussian fields. His research is mainly centred around the "R-INLA project", see www.r-inla.org.
[11:00-11:30] Coffee Break
[11:30-12:30] Tutorial by Thomas Moellenhoff on Convex Duality Details Video
Abstract: In this tutorial, I will motivate duality through various applications in machine learning. I will also give a gentle and self-contained introduction to convex duality, Fenchel conjugate functions and how dual variables are related to sensitivities to perturbations of the problem's parameters or data.

Bio: Thomas Möllenhoff received his PhD in Informatics from the Technical University of Munich in 2020. From 2020 to 2023, he was a post-doc in the Approximate Bayesian Inference Team at RIKEN. Since 2023 he works at RIKEN as a tenured research scientist. His current research focuses on optimization and Bayesian deep learning and has been awarded several times, including the Best Paper Honorable Mention award at CVPR 2016 and a first-place at the NeurIPS 2021 Challenge on Approximate Inference.
[12:30-14:00] Lunch Break (On Your Own)

Afternoon Session (chair: Eugene Ndiaye)

[14:00-16:00] Tutorial by Emtiyaz Khan: Bayesian Learning Rule Details Video
Abstract: I argue that information processing has a deep-rooted connection with Bayes' rule, and therefore all good algorithms must have Bayesian roots. I will discuss a Bayesian idea, which I call, the conjugate computations where information processing reduce to a simple addition. In general, such computations are realized through the Bayesian learning rule (BLR) and a wide-variety of algorithms can be derived from it (I will briefly outline the derivation of RMSprop and Adam). Time permitting, I will discuss the ``dual'' view of the BLR which is the starting point for Bayes-duality.

Bio: Available here
[16:00-16:30] Coffee Break
[16:30-17:30] Matt Jones: Bayesian Online Natural Gradient (BONG) Details Video
Abstract: We propose a novel approach to sequential Bayesian inference based on variational Bayes. The standard variational loss is a sum of expected negative log-likelihood and KL divergence from the prior. Our key insight is that, in the online setting, we can drop the KL term and instead perform a single step of natural gradient descent on the expected NLL, starting from the prior predictive (which comes from the posterior at the previous timestep). Thus instead of explicitly regularizing to the prior, we do so implicitly. We prove this method recovers exact Bayesian updating if the model is conjugate, and empirically outperforms other online VB methods in the non-conjugate such as online learning for neural networks, especially when controlling for computational costs.

Bio: Professor of Psychology at University of Colorado and recent Visiting Faculty at Google Brain/DeepMind

[Go to Program]

June 14

Morning Session (chair: Haavard Rue)

[10:00-11:00] Frank Nielsen: Some generalizations of Bregman divergences Details Video
Abstract: In this talk, I shall present several generalizations of Bregman divergences for machine learning with algorithmic and geometric considerations. In particular, I will describe duality structures and a few algorithms on Bregman manifolds, introduce the Bregman duo pseudo-divergences, and present a generalization of convexity which yields conformal Bregman divergences.

Bio: Frank Nielsen prepared his PhD on computational geometry (1996) at INRIA Sophia-Antipolis (France). He is a Senior Researcher and Fellow of Sony Computer Science Laboratories Inc. (Sony CSL, Tokyo) where he currently conducts research on Structures, Dynamics, and Geometric Computing for AI and Information Theory. He serves the following journals: Information Geometry (Springer), Transactions on Information Theory (IEEE), and Entropy (MDPI). Frank Nielsen co-organizes with Frederic Barbaresco the biannual conference Geometric Science of Information.
[11:00-11:30] Coffee Break
[11:30-12:30] Jonghyun Choi: Practical Set-ups and a Method for Continual Learning Details Video
Abstract: Prevalent setups in continual learning often fall short of practical applicability, posing unrealistic assumptions and constraints. First, I will try to address these gaps by presenting various continual learning setups (mostly for class incremental learning) that are more realistic and feasible, on which we would need to evaluate our algorithms. Then, I will discuss one of our approaches to address a continual learning (e.g., class incremental learning) in a bit realistic set-up, online continuous data stream and beyond.

Bio: Jonghyun Choi received the B.S. and M.S. degrees in electrical engineering and computer science from Seoul National University, Seoul, South Korea in 2003 and 2008 respectively. He received a Ph.D. degree from University of Maryland, College Park in 2015, under the supervision of Prof. Larry S. Davis. He is currently an associate professor at Seoul National University, Seoul, South Korea. During his PhD, he has worked as a research intern in a number of research labs including US Army Research Lab (2012), Adobe Research (2013), Disney Research Pittsburgh (2014) and Microsoft Research Redmond (2014). He was a senior researcher at Comcast Applied AI Research, Washington, DC from 2015 to 2016. He was a research scientist at Allen Institute for Artificial Intelligence (AI2), Seattle, WA from 2016 to 2018 and is currently an affiliated research scientist. He was an associate professor at Yonsei University, Seoul, South Korea from 2022-2024, and an assistant professor at GIST, Gwangju, South Korea from 2018-2022. He serves as an area chair at IEEE/CVF Conference of CVPR and WACV, NeurIPS, BMVC, and was a paper review chair at CoLLAs 2023, and an associate editor of IEEE Transactions on PAMI. His research interest includes visual recognition using weakly supervised data for semantic understanding of images and videos (e.g., continual learning, unlearning) and visual understanding for edge devices and household robots.
[12:30-14:00] Lunch Break (On Your Own)

Afternoon Session

[14:00-15:00] Tutorial by Martin Mundt on Continual Learning: Pillars of forgetting in continual updates and the road to lifelong learning Details Video
Abstract: Machine learning studies the design of models and training algorithms in order to learn how to solve tasks from data. Whereas traditional machine learning concentrates on predefined training datasets, the renaissance of continual learning also takes into account that the world is constantly evolving. In this tutorial, I will focus on the challenge of catastrophic interference when attempting respective continual updates and summarize mechanisms to counteract it. To this end, I will survey three pillars of approaches, spanning the perspective from data, to optimization, and choice of model. Finally, I will highlight how the challenge of sequential updates relates to broader machine learning and which further elements are required on the road to true lifelong learning systems.

Bio: Martin is an independent research group leader at TU Darmstadt and hessian.AI, where he leads the Open World Lifelong Learning lab. He is also a board member at the non-profit ContinualAI, a core-organizer of Queer in AI, was the Diversity & Inclusion chair at AAAI-24 and currently serves as Review Process Chair for CoLLAs 2024. Previously, he was an interim professor and postdoctoral researcher at TU Darmstadt, has obtained a CS PhD from Goethe University Frankfurt, and holds a Masters degree in Physics. The main vision behind his research is to transcend static machine learning systems towards adaptive and sustainable lifecycles.
[15:00-16:00] Tutorial by Tom Rainforth on Modern Bayesian Experimental Design Details Video
Abstract: Bayesian experimental design (BED) provides a powerful and general framework for optimizing the design of experiments. However, its deployment often poses substantial computational challenges that can undermine its practical use. In this tutorial, I will outline the Bayesian experimental design framework and explain how recent advances have transformed our ability to overcome these challenges and thus utilize BED effectively, before discussing some key areas for future development in the field. Related review paper

Bio: I am a Senior Researcher in Machine Learning (and from September, Associate Professor) in the Department of Statistics at the University of Oxford, where I run the RainML Research Lab. My research covers a wide range of topics in and around machine learning and experimental design, with areas of particular interest including Bayesian experimental design, deep learning, representation learning, generative models, Monte Carlo methods, active learning, probabilistic programming, and variational inference.
[16:00-16:30] Coffee Break
[16:30-18:00] Panel I: Uncertainty in AI (moderator: Vincent Fortuin)
Panelists: Juho Lee, Matt Jones, Haavard Rue, Eugene Ndiaye, Alexander Immer, Frank Nielsen, Jonghyun Choi, Srijith PK, Martin Mundt, Tom Rainforth

[Go to Program]

June 12

June 13

June 14

June 17

June 18 (The CREST-talk day)

June 19

June 20

June 21

Posters