Adaptive Multimodal Fusion for Document VQA
Cross-Attention Fusion Predictor (CAFP) + REINFORCE Fine-tuning
Best docs (REINFORCE wins most): #6, #11, #13, #16, #28
0 99
Compares: Equal · Fixed · Text-Only · CAFP+REINFORCE
🎨 Word Selection Visualization
See exactly which OCR words each method keeps vs discards. 🟢 Green = kept and fed to the VQA model · 🔴 Red = compressed out