
OMRON SINIC X Corporation (HQ: Bunkyo-ku, Tokyo; President and CEO: Masaki Suwa, hereinafter “OSX”) will present the latest research findings at the 2025 IEEE/CVF International Conference on Computer Vision (hereinafter “ICCV 2025”).
ICCV 2025 is one of the the premier biennial international conferences in the field of computer vision. The conference will be held from October 19 to October 23, 2025, in Honolulu, Hawai’i, (local time). This year, 2,701 out of 11,239 submissions were accepted, resulting in an acceptance rate of approximately 24%.
The research paper to be presented by OSX has been selected as a Highlight1) in recognition of its exceptional quality and potential impact.
1) In 2025, out of 2,701 accepted papers, 263 (approximately 9.7%) were selected as Highlight.
The following provides an overview of the paper.
ICCV 2025 presentations
■ CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
Kuniaki Saito (OSX), Donghyun Kim (Korea University), Kwanyong Park (University of Seoul), Atsushi Hashimoto (OSX), Yoshitaka Ushiku (OSX)
CaptionSmiths is a controllable image captioning framework that allows smooth adjustment of caption properties such as length, descriptiveness, and word uniqueness—within a single model.
Unlike existing models, which lack explicit conditioning and struggle with smooth transitions between styles, CaptionSmiths quantifies these properties as continuous scalar values and interpolates between learned endpoint representations (e.g., very short ↔ very long). This enables fine-grained control over caption styles.
Experiments show that CaptionSmiths not only improves lexical alignment, but also reduces caption length control error by over 500% compared to strong baselines.
https://arxiv.org/abs/2507.01409
https://ksaito-ut.github.io/captionsmiths_web/
※Author information is current as of the date of writing or submission. Please be advised that the information may become outdated after that point.
For any inquiries about OSX, please contact us here.