Adaptive Multimodal Fusion for Document VQA

Cross-Attention Fusion Predictor (CAFP) + REINFORCE Fine-tuning

Best docs (REINFORCE wins most): #6, #11, #13, #16, #28

0 99

Compares: Equal · Fixed · Text-Only · CAFP+REINFORCE


🎨 Word Selection Visualization

See exactly which OCR words each method keeps vs discards. 🟢 Green = kept and fed to the VQA model · 🔴 Red = compressed out