CapTune: Adapting Non-Speech Captions With Anchored Generative Models
Abstract
Non-speech captions are essential to the video experience of deaf and hard of hearing (DHH) viewers, yet conventional approaches often overlook the diversity of their preferences. We present CapTune, a system that enables customization of non-speech captions based on DHH viewers' needs while preserving creator intent. CapTune allows caption authors to define safe transformation spaces using concrete examples and empowers viewers to personalize captions across four dimensions: level of detail, expressiveness, sound representation method, and genre alignment. Evaluations with seven caption creators and twelve DHH participants showed that CapTune supported creators' creative control while enhancing viewers' emotional engagement with content. Our findings also reveal trade-offs between information richness and cognitive load, tensions between interpretive and descriptive representations of sound, and the context-dependent nature of caption preferences.
System Architecture
Given the video segment, captions, and the audio track, CapTune's Creator Tool empowers creators to prepare video metadata and anchor captions that define the safe transformation boundary that guides generative AI behaviors, while Context Extractor summarizes the segment' audio. The Viewer Client then enables viewers to specify their viewing preferences. These preferences, together with the anchor captions and context information, are passed to a generative model that rewrite the caption for that time span while preserving the narrative intent. The rewritten caption is then displayed in sync with the video playback.
Video Presentation
BibTeX
@article{captune_2025,
title={CapTune: Adapting Non-Speech Captions With Anchored Generative Models},
author={Jeremy Zhengqi Huang, Caluã de Lacerda Pataca, Liang-Yuan Wu, Dhruv Jain},
journal={The 27th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2025)},
year={2025},
url={https://arxiv.org/abs/2508.19971}
}