Goal: To develop a new learning paradigm for Artificial Intelligence (AI) that learns like humans in an adaptive, robust, and continuous fashion.
Summary: The new learning paradigm will be based on a new principle of machine learning, which we call the Bayes-Duality principle and will develop during this project. Conceptually, the new principle hinges on the fundamental idea that an AI should be capable of efficiently preserving and acquiring the relevant past knowledge, for a quick adaptation in the future. We will apply the principle to representation of the past knowledge, faithful transfer to new situations, and collection of new knowledge whenever necessary. Current Deep-learning methods lack these mechanisms and instead focus on brute-force data collection and training. Bayes-Duality aims to fix these deficiencies.
Funding and Duration:
|Name||University||Position||Team in project|
|Benoît Collins||Kyoto University||Professor||Math-Science team|
|Florence Forbes||Inria Grenoble Rhône-Alpes||Principal investigator||Stat-Theory team|
|Kei Hagihara||RIKEN AIP||Postdoctoral Researcher||Math-Science team|
|Samuel Kaski||University of Aalto||Department of Computer Science, Professor||Approx-Bayes team|
|Takahiro Katagiri||Nagoya University, Information Technology Center||Professor||HPC team|
|Akihiro Ida||The University of Tokyo, Information Technology Center, Project||Associate Professor||HPC team|
|Takeshi Iwashita||Hokkaido University, Information Initiative Center||Professor||HPC team|
|Julien Mairal||Inria Grenoble Rhône-Alpes||Research scientist||Stat-Theory team|
|Eren Mehmet Kiral||RIKEN AIP||Special Postdoctoral Researcher||Math-Science team|
|Kengo Nakajima||The University of Tokyo, Information Technology Center||Professor||HPC team|
|Takeshi Ogita||Tokyo Woman’s Christian University, School of Arts and Sciences||Professor||HPC team|
|Jan Peters||TU Darmstadt||Professor||Approx-Bayes team|
|Judith Rousseau||Université Paris-Dauphine & University of Oxford||Professor||Stat-Theory team|
|Haavard Rue||King Abdullah University of Science and Technology, CEMSE division||Professor||Approx-Bayes team|
|Akiyoshi Sannai||RIKEN AIP||Research Scientist||Math-Science team|
|Mark Schmidt||University of British Columbia, Department of Computer Science||Associate Professor||Approx-Bayes team|
|Arno Solin||University of Aalto, Department of Computer Science||Assistant Professor||Approx-Bayes team|
|Siddharth Swaroop||Haarvard University||Post-doc||Approx-Bayes team|
|Asuka Takatsu||Tokyo Metropolitan University||Associate Professor||Math-Science team|
|Koichi Tojo||RIKEN AIP||Special Postdoctoral Researcher||Math-Science team|
|Richard Turner||Cambridge University, UK, Department of Engineering||Associate Professor||Approx-Bayes team|
|Mariia Vladimirova||Inria Grenoble Rhône-Alpes||PhD Student||Stat-Theory team|
|Pierre Wolinski||Inria Grenoble Rhône-Alpes & University of Oxford||Post-doctoral fellow||Stat-Theory team|
|Shuji Yamamoto||Keio University/RIKEN AIP||Associate Professor||Math-Science team|
Our goal is to develop a new learning paradigm that enables adaptive, robust, and lifelong learning of AI systems. Deep learning methods are not sufficiently adaptive or robust, e.g., new knowledge cannot be easily added in trained models and, when forced, the old knowledge is easily forgotten. Given a new dataset, the whole model needs to be retrained from scratch on both the old and new data, and training only on the new dataset leads to the catastrophic forgetting of the past. All of the data must be available at the same time, which creates a dependency on large datasets and models that plagues almost all deep learning systems. Our main goal is to fix this by developing a new learning paradigm to support adaptive and robust systems that learn throughout their lives.
We introduce a new learning-principle for machine learning, which we call the Bayes Duality principle or “Bayes-Duality”. The principle exploits the “dual perspectives” of approximate (Bayesian) posteriors, to extend the concepts of duality (similar to convex duality) to nonconvex problems. It is based on a new discovery that natural-gradients used in approximate Bayesian methods automatically give rise to such dual representations. In the past, we have shown that natural-gradients for Bayesian problems yield a majority of machine learning algorithms as special cases (see our paper on the Bayesian Learning Rule). Our goal now is to show that the same approach to apply ideas of duality to nonconvex problems, such as those that arise in deep learning.
Our main goals in the future include the following: