Morning Session (chair: TBA)
Abstract:
Bio:
Abstract:
Bio:
Afternoon Session (chair: TBA)
Abstract:
Bio:
Abstract:
Bio:
Abstract:
Bio:
Morning Session (chair: TBA)
Abstract:
Bio:
Abstract:
Bio:
Afternoon Session (chair: TBA)
Abstract:
Bio:
Abstract:
Bio:
Abstract:
Bio:
Morning Session (chair: Emtiyaz Khan)
Abstract: Modern deep learning systems impress with their capabilities but, at the same time, also face considerable challenges—requiring massive datasets, enormous computational resources, and raising growing concerns about their transparency and trustworthiness. In this talk, we will ask the question if it really has to be like this and discuss some of the major challenges that limit the success of deep learning on smaller scales. We will offer algorithmic and theoretical insights into sparsity, overparameterization, and their implicit biases. Exploiting our theoretical insights, we will adapt the implicit bias of standard optimizers according to a dynamic sparsity principle, which achieves performance boosts comparable to SAM, yet, with a complementary, novel mechanism.
Bio: I am a tenure-track faculty member at the Helmholtz Center CISPA, where I lead the Relational Machine Learning Group. Our research combines robust algorithm design and complex network science with the quest for a theoretical understanding of deep learning. Based on theoretical and experimental insights, we develop efficient models and algorithms that are robust to noise, adapt to a changing environment, and integrate information that can be available by small amounts of data and various forms of domain knowledge. This makes our approach well suited for the biomedical domain and sciences in general. While we care about solving real world problems in collaboration with domain experts, we have a special interest in problems related to glycans, gene regulation, and its alterations during cancer progression.
Abstract:In this talk, I describe how sparse modeling techniques can be extended and adapted for facilitating dynamic sparsity in neural models, where different neural pathways are activated depending on the input. The building block is a family of sparse transformations induced by Tsallis entropies called alpha-entmax, a drop-in replacement for softmax, which contains sparsemax as a particular case. Entmax transformations are differentiable and (unlike softmax) they can return sparse probability distributions, useful for routing, interpretability, efficiency, and length generalization, being less prone to phenomena such as dispersion, oversquashing, and representational collapse. They can also be used to design new Fenchel-Young loss functions, replacing the cross-entropy loss. Variants of these sparse transformations and losses have been applied with success to machine translation, natural language inference, visual question answering, Hopfield networks, reinforcement learning, and other tasks. I will discuss AdaSplash, an efficient implementation of entmax attention (https://arxiv.org/abs/2502.12082), and recent applications of these sparse losses to conformal prediction (https://arxiv.org/abs/2502.14773) and to generalized Bayesian inference (https://arxiv.org/abs/2502.10295).
Bio:I am an Associate Professor at the Computer Science Department (DEI) and at the Electrical and Computer Engineering Department (DEEC) at Instituto Superior Técnico. I am also the VP of AI Research at Unbabel in Lisbon, Portugal, and a Senior Researcher at the Instituto de Telecomunicações, where I lead the SARDINE Lab. Until 2012, I was a PhD student in the joint CMU-Portugal program in Language Technologies, at Carnegie Mellon University and at Instituto Superior Técnico, where I worked under the supervision of Mário Figueiredo, Noah Smith, Pedro Aguiar and Eric Xing. My research interests revolve around natural language processing and machine learning, more specifically sparse and structured transformations, uncertainty quantification, interpretability, and multimodal processing applied to machine translation, natural language generation, quality estimation, and evaluation. My research has been funded by a ERC Starting Grant (DeepSPIN) and Consolidator Grant (DECOLLAGE), among other grants, and has received several paper awards at ACL conferences. I co-founded and co-organize the Lisbon Machine Learning School (LxMLS). I am a Fellow of the ELLIS society and a co-director of the ELLIS Program in Natural Language Processing. I am a member of the Lisbon Academy of Sciences and of the Research & Innovation Advisory Group (RIAG) of the EuroHPC Joint Undertaking.
Afternoon Session (chair: Joe Austerweil)
Abstract: Sampling-based inference is often regarded as the gold standard for posterior inference in Bayesian neural networks (BNNs), yet it continues to face skepticism regarding its practicality in large-scale or complex models. This perception has been challenged by recent methodological and computational advances that significantly broaden the scope of feasible applications. The presentation examines how sampling operates in BNNs, how performance can be improved through targeted adaptations, and why not all sampling procedures are equally effective. It further explores the role of implicit regularization induced by both the network architecture and the sampling dynamics. The discussion points toward future opportunities where sampling may redefine Bayesian deep learning, contingent on addressing current challenges in scalability, efficiency, and inference cost.
Bio: I am an Associate Professor at the LMU Munich, heading the Munich Uncertainty Quantification AI Lab, an Ellis Member, Associated Fellow of the Konrad Zuse School of Excellence in Reliable AI (relAI), and also a Principal Investigator of the Munich Center for Machine Learning (MCML). My current research involves the development of uncertainty quantification for deep learning approaches (using e.g. a Bayesian paradigm), the unification of concepts from statistics and deep learning, and studying overparametrization in neural networks. See also my Google Scholar profile for some of my recent research.
Abstract:
Bio:
Abstract:
Bio:
The day will feature talks with many recent works from the CREST team.
Morning Session (chair: Pierre Alquier)
Abstract: Research achievements of the ABI team and the CREST
Bio:
Abstract: Variational learning with an Improved Variational Online Newton method (IVON) can consistently match or outperform Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. The talk gives a broad overview of several projects which have been using IVON since its publication last year. In particular, I will talk about a theoretical analysis of IVON’s promising performance, connections to adaptive label smoothing, and its usefulness for multimodal models, language generation with LLMs and federated learning. Link to the paper
Bio (Thomas): Thomas Möllenhoff received his PhD in Informatics from the Technical University of Munich in 2020. From 2020 to 2023, he was a post-doc in the Approximate Bayesian Inference Team at RIKEN. Since 2023 he works at RIKEN as a research scientist and since 2025 as a senior research scientist. His research focuses on optimization and Bayesian deep learning and has been awarded several times, including the Best Paper Honorable Mention award at CVPR 2016 and a first-place at the NeurIPS 2021 Challenge on Approximate Inference.
Abstract: Humans and animals have a natural ability to autonomously learn and continually adapt, but modern AI models, despite their amazing performance, cannot do so and remain extremely costly to train. I will present a new learning paradigm called Adaptive Bayesian Intelligence to bridge this gap. I will show that a wide range of adaptive methods can all be seen as different ways of “correcting” the approximate posteriors. Better posteriors lead to smaller corrections, which in turn imply faster and cheaper adaptation. The result is obtained by using a dual-perspective of the Bayesian Learning Rule (Khan and Rue, 2023), giving rise to a new Bayesian-Duality principle. I will demonstrate the effectiveness of the new principle on continual and federated deep learning, as well as merging and finetuning of LLMs. Link to the paper
Bio:
Afternoon Session (chair: Jonghyun Choi)
Abstract: We provide new connections between two distinct federated learning approaches based on (i) ADMM and (ii) Variational Bayes (VB), and propose new variants by combining their complementary strengths. Specifically, we show that the dual variables in ADMM naturally emerge through the 'site' parameters used in VB with isotropic Gaussian covariances. Using this, we derive two versions of ADMM from VB that use flexible covariances and functional regularisation, respectively. Through numerical experiments, we validate the improvements obtained in performance. The work shows connection between two fields that are believed to be fundamentally different and combines them to improve federated learning. Link to the paper
Bio:
Abstract: ADMM is a popular method for federated deep learning which originated in the 1970s and, even though many new variants of it have been proposed since then, its core algorithmic structure has remained unchanged. In this talk, we will introduce a structure called Bayesian Duality which exploits a duality of the posterior distributions obtained by solving a variational-Bayesian reformulation of the original problem. We show that this naturally recovers the original ADMM when isotropic Gaussian posteriors are used, and yields non-trivial extensions for other posterior forms. For instance, full-covariance Gaussians lead to Newton-like variants of ADMM, while diagonal covariances result in a cheap Adam-like variant. This is especially useful to handle heterogeneity in federated deep learning, giving up to 7% accuracy improvements over recent baselines.
Bio: Thomas Möllenhoff received his PhD in Informatics from the Technical University of Munich in 2020. From 2020 to 2023, he was a post-doc in the Approximate Bayesian Inference Team at RIKEN. Since 2023 he works at RIKEN as a research scientist and since 2025 as a senior research scientist. His research focuses on optimization and Bayesian deep learning and has been awarded several times, including the Best Paper Honorable Mention award at CVPR 2016 and a first-place at the NeurIPS 2021 Challenge on Approximate Inference. Link to the paper
Abstract: In this talk, I will explain the role played by the Mathematical Science Team at RIKEN AIP and some of the things we have been working on. I will talk about my original research field of arithmetic theory, and the application of such thinking to the field of statistical mechanics. Within the first part of my talk, Akinori Tanaka of our team will give a 5min. introduction about his research. Then I will give a short introduction to the theory of Lie Group Updates -- by members of Emti's team and our team, Mehmet, Thomas, Koiichi, and Emti. Link to the paper
Bio:
Abstract: In the Lie-group Bayesian learning framework, assuming a log-normal distribution over the weights leads to multiplicative weight dynamics, including both parameter updates and noise injection. We propose a novel Log-Normal Multiplicative Dynamics (LMD) optimizer that scales to models at the scale of Vision Transformer (ViT) and GPT-2. We show that leveraging LMD enables forward matrix multiplications during training to be executed in low-precision formats without significant performance degradation in large neural networks. Link to the paper
Bio:
Abstract: Bayesian methods offer clear theoretical advantages and have performed well on modest-sized problems. Scaling them to today's billion-parameter language models, however, remains challenging. In this talk, I will review recent efforts to bring Bayesian ideas into LLM adaptation. I will then share our latest results on improving Low-Rank Adaptation (LoRA) fine-tuning of LLMs with the IVON optimizer.
Bio: I am a second-year Ph.D. student in the School of Computing at Institute of Science Tokyo, working with Prof. Rio Yokota. I hold a B.E. in Biomedical Engineering from Shanghai Jiao Tong University and an M.Eng. in Computer Science from Tokyo Institute of Technology. I have research experience in high-performance computing, deep learning optimizers, and Bayesian deep learning, and I am currently interested in scaling Bayesian methods to large models. I am part of the IVON project and serve as a maintainer of the IVON repository on GitHub. Link to the paper
Abstract:
Bio:
Morning Session (chair: Siddharth Swaroop)
Abstract: In this talk I will try to provide a few insights into in-context learning from a few recent works I've been involved in, and take a step back and discuss the interactions between ICL and adaptation. While probably some of the hypotheses I'm going to try to put forward are relatively speculative, the more coherent part of the talk will be focused on https://arxiv.org/pdf/2505.00661, https://arxiv.org/pdf/2503.21676 and related work. In particular I will discuss some early empirical evidence that the reversal curse acts differently when resolved in context versus learned by gradient descent. I will also look at how transformers might be learning and storing facts, and how that can impact adapting or unlearning them. Overall the hope is to stimulate questions and to argue that sequential models, and transformers specifically, can have specific learning dynamics that might be worth studying and considering when we think about adaptation.
Bio: I'm currently a Research Scientist at DeepMind and Affiliate Member at MILA. I grew up in Romania and studied computer science and electrical engineering for my undergrads in Germany. I got my MSc from Jacobs University, Bremen in 2009 under the supervision of prof. Herbert Jaeger. I hold a PhD from University of Montreal (2014), which I did under the supervision of prof. Yoshua Bengio. My PhD thesis can be found here. I was involved in developing Theano and help write some of the deep learning tutorials for Theano. I've published several papers on topics surrounding deep learning and deep reinforcement learning (see my scholar page). I'm one of the organizers of EEML (www.eeml.eu) and part of the organizers of AIRomania. As part of the AIRomania community, I have organized RomanianAIDays since 2020, and helped build a course on AI aimed at high school students.
Abstract:How large should a neural network be? We argue that adjusting neural network size according to what the problem requires is important in continual learning settings, as well as for computational efficiency. We study this problem in the single-layer case, and give philosophical, theoretical, and practical arguments why approximations to Gaussian processes are a natural way to solve this problem. This leads to an elegant method for growing and shrinking neuron count in approximate Gaussian processes, with principles that prescribe how most hyperparameters should be set, which makes the method very automatic. We show sophisticated behaviours from this method, depending on dataset characteristics, e.g. saturation of neuron count in continual learning when data are redundant, and dropping of the neuron count when “grokking” occurs. We believe these ideas will be helpful in the generalisation to deep models.
Bio:I am an Associate Professor in the Department of Computer Science at the University of Oxford, researching machine learning, and a Tutorial Fellow at Hertford College. Together with my research group, I work on three central questions: How do we find general patterns that allow generalization beyond the training set, without humans manually encoding them? (Equivariance, causality, continual learning…). How can we create neurons that automatically assemble their connectivity structure (architecture), while minimising the computational costs of the network as a whole? (Generalisation bounds, Bayesian model selection, MDL, meta-learning). How do we interact with the environment, while avoiding risk but learning as quickly as possible? (Bayesian optimisation, foundation models for industrial applications e.g. chemistry). These improvements are relevant for both small-data statistics (Gaussian processes), and large-data machine learning (neural networks). A wide range of research topics contribute towards these improvements, such as invariance, Bayesian inference, causality, meta-learning, local learning rules and generative modelling. My work has been presented at the leading machine learning conferences (e.g. NeurIPS and ICML), and includes a best paper award.
Afternoon Session (chair: Weiwei Pan)
Abstract:Processing multimodal signals reliably is (still) a challenge. In this talk, I will argue that we need strong multimodal representations, good reasoning, and finally, enable models to become self-aware. Along these dimensions, I will discuss various research directions we are currently pursuing. This includes, among others, video & language modeling, selective prediction for visual question answering, multimodal fact checking, and backdoor attacks for unlearning in diffusion models. I will conclude with a discussion on future work.
Bio:I have started a new lab “Multimodal Reliable AI” at TU Darmstadt, Germany, as an Alexander von Humboldt Professor and with a LOEWE Professorship (W3 / Full Professor). Previously, I was a FAIR Research Scientist at Meta AI (2017-2023). I was a PostDoc at UCB (University of California, Berkeley) at EECS and ICSI with Trevor Darrell (2014-2017), and I did my PhD at the Max Planck Institute for Informatics with Bernt Schiele (2010-2014). My interests include computer vision, computational linguistics, and machine learning, as well as how to make models reliable so we can trust them.
Abstract: Driven by the goals of "augmenting diversity", increasing speed, reducing cost, the use of synthetic data as a replacement for human participants is gaining retraction in AI research and product development (Agnew et al., 2024). This talk critically examines the claim that synthetic data can “augment diversity”, arguing that this notion is empirically unsubstantiated, conceptually flawed, and epistemically harmful. While speed and cost-efficiency may be achievable, they often come at the expense of rigor, insight, and robust science. Drawing on research from dataset audits, model evaluations, Black feminist scholarship, and complexity science, I argue that replacing human participants with synthetic data risks producing both real-world and epistemic harms at worst and superficial knowledge and cheap science at best.
Bio: I am a cognitive scientist researching human behaviour, social systems, and responsible and ethical Artificial Intelligence (AI). I recently finished my PhD, where I explored the challenges and pitfalls of automating human behaviour through critical examination of existing computational models and audits of large scale datasets. I am currently a Senior Fellow in Trustworthy AI at Mozilla Foundation. I am also an Adjunct Lecturer/Assistant Professor at the School of Computer Science and Statistics at Trinity College Dublin, Ireland.
Abstract:
Bio: