[EMNLP 2022 Findings] Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation
[EMNLP 2022 Findings] Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation
We first propose extracting pertinent information from videos and turning it into reasoning paths that are acceptable to PLMs. Additionally, we propose a multi-agent reinforcement learning method to collaboratively perform reasoning on different modalities (i.e., video and dialogue context)