Three co-authored research papers from OMRON SINIC X presented in International Conference on Machine Learning 2023
"One of the three papers accepted with the Outstanding Paper Award"
- August 09, 2023
OMRON SINIC X Corporation (HQ: Bunkyo-ku, Tokyo; President and CEO: Masaki Suwa; hereinafter "OSX") is pleased to announce that our three co-authored research papers by an OSX senior researcher, Tadashi Kozuno, and external co-researchers have been accepted to be presented at the International Conference on Machine Learning 2023 (hereinafter "ICML 2023") held in Honolulu on July 23.
Along with NeurIPS*1, ICML is one of the premier international conferences with international authority in the field of machine learning and related areas. More than 5,000 research papers were submitted to the conference this year, and just 27.9% were accepted.
*1: Neural Information Processing Systems
In these research papers, mathematical analyses are carried out and summarized as results in order to improve the efficiency and performance of the reinforcement learning algorithm and the algorithm for imperfect information games. In particular,
In particular, "Adapting to game trees in zero-sum imperfect information games" won an Outstanding Paper Award, given to only six papers out of all submissions.
OSX continues to create value via technological innovation through collaboration with universities and external research institutions.
<Publication>
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Authors:
*1: The University of Tokyo, *2: OSX, *3: Google DeepMind, *4: Google Research, Brain team, *5: Peking University, *6: Otto von Guericke University Magdeburg, *7: University of Alberta
Background:
In this study, we extended MDVI and proposed a method that can achieve optimal sample efficiency even in situations where function approximation is required.
Method:
Future:
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Authors:
*1: Google DeepMind, *2: OSX
Background:
In addition to these classifications, there is also a classification between single-step learning and multi-step learning. Single-step learning refers to the use of only the action and its immediate result at each time step during policy evaluation and improvement, while multi-step learning refers to the use of actions and their results over multiple consecutive time steps. Empirically, it is known that multi-step learning is effective in improving performance.
However, previous methods that used multi-step learning for policy improvement were limited to on-policy learning. In this study, we aimed to further improve reinforcement learning methods by proposing DoMo-VI and DoMo-AC, which are off-policy and multi-step learning methods for both policy evaluation and policy improvement.
Method:
1) Atari-57 is a benchmark task consisting of 57 different games from the Atari 2600, and it is often used to evaluate the performance of reinforcement learning methods.
We implemented DoMo-AC based on IMPALA, a distributed deep reinforcement learning algorithm, and tested it on the Atari-57 benchmark task. DoMo-AC achieved stable performance improvements compared to IMPALA. It also showed low sensitivity to parameters that adjust the degree of multi-step in policy evaluation and policy improvement. This indicates that it is easy to use in practice.
Future:
Adapting to game trees in zero-sum imperfect information games
Authors:
*1: CREST, ENSAE, IP Paris, *2: ENS Lyon, *3: OSX, *4: Google DeepMind, *5: CRITEO AI Lab
Background:
There are two goals in IIGs. The first one is to adapt and choose the optimal strategy against the opponent. However, this is not easy as the opponent also changes their strategy. The other one is to compute a Nash equilibrium. There are methods to compute Nash equilibria when the game structure, transition probabilities, and reward function are known beforehand, but they are computationally expensive and not practical.
We proposed a method that achieves both goals with low computational cost.
Method:
The first one is Balanced FTRL. Balanced FTRL can achieve optimal sample efficiency and performance by using prior knowledge of the game structure. The second one is Adaptive FTRL, which learns while estimating the game structure knowledge required for Balanced FTRL. It can achieve near-optimal sample efficiency and performance without requiring knowledge of the game structure.
We verified their performances in practice and confirmed that Adaptive FTRL showed almost the same performance as that of Balanced FTRL and is more practical because it does not require knowledge of the game structure.
Future:
About OMRON SINIC X Corporation
OMRON SINIC X Corporation is a strategic subsidiary seeking to realize the "near future designs" that OMRON forecasts. Researchers with cutting-edge knowledge and experience across many technological domains, including AI, Robotics, IoT, and sensing, are affiliated with OSX, and with the aim of solving social issues, they are working to create near future designs by integrating innovative technologies with business models and strategies in technology and IP. The company will also accelerate the creation of near future designs through joint research with universities and external research institutions. For more details, please refer to https://www.omron.com/sinicx/en/
- For press inquiries related to this release, please contact the following:
- Tech Communications and Collaboration Promo Dept.
Strategy Division
Technology and Intellectual Property H.Q.
OMRON Corporation
TEL: +81-774-74-2010