OMRON

OMRON SINIC X Corporation | Global

OMRON SINIC X Presents Latest Research Findings at NeurIPS 2025, Top-Tier Conference on AI and Machine Learning

OMRON SINIC X Corporation (HQ: Bunkyo-ku, Tokyo; President and CEO: Masaki Suwa; hereinafter “OSX”) presents the latest research findings atThe Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025).

“NeurIPS 2025” is one of the largest and most influential top conferences in the field of machine learning and artificial intelligence (AI). In 2025, 5,290 papers (approximately 24.52%) were accepted out of 21,575 submissions, and the conference will be held in San Diego, USA, from December 2 to December 7 (local time).

The two research papers to be presented by OSX are as follows. In particular, the first paper was accepted as a Spotlight presentation.

NeurIPS 2025 presentations

■ 1) Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

Toshinori Kitamura (OSX), Arnob Ghosh (New Jersey Institute of Technology), Tadashi Kozuno (OSX), Wataru Kumagai (OSX), Kazumi Kasaura (OSX), Kenta Hoshino (Kyoto University), Yohei Hosoe (Kyoto University), Yutaka Matsuo (The University of Tokyo)

Constrained Markov Decision Processes (CMDPs) are widely used as a formulation for reinforcement learning problems that take safety into account. In this study, we propose a reinforcement learning algorithm that ensures no constraint violations occur in each episode. Most existing methods guarantee constraint satisfaction only on average over many episodes, meaning that violations may still occur in individual episodes. Such methods are unsafe and impractical for real-world applications.

https://arxiv.org/abs/2502.10138

  

■ 2) Self Iterative Label Refinement via Robust Unlabeled Learning

Hikaru Asano (OSX), Tadashi Kozuno (OSX), Yukino Baba (The University of Tokyo)

In recent years, it has become increasingly common to use large language models (LLMs) themselves to generate training data for their own learning. However, when the LLM lacks sufficient knowledge in a particular domain, the generated data may contain inaccuracies, and training on such data can lead to degraded performance. To address this issue, we propose a novel approach that leverages unlabeled learning to mitigate the negative impact of inaccurate generated data.

https://arxiv.org/abs/2502.12565

  
※Author information is current as of the date of writing or submission. Please be advised that the information may become outdated after that point.
 


 
For any inquiries about OSX, please contact us here.

share
home
Page
Top