Adaptive Multimodal Fusion for Document VQA

Best docs (REINFORCE wins most): #6, #11, #13, #16, #28

Document Index (0–99)

0 99

Custom Question (optional)

Compares: Equal · Fixed · Text-Only · CAFP+REINFORCE

Document Image

Fusion Weights Comparison

See exactly which OCR words each method keeps vs discards. 🟢 Green = kept and fed to the VQA model · 🔴 Red = compressed out

📌 Fixed Weights (α=0.5 β=0.3 γ=0.2)

🤖 CAFP+REINFORCE (Adaptive Weights)