MedFusion: A Unified Multimodal Framework for Visual Question Answering and Explainable Medical Recommendation
- 1 Schools of Computer Science, Odisha University of Technology and Research, Bhubaneswar, Odisha, India
- 2 Faculty of Computing and Software Engineering, Arba Minch University, Arba Minch, Ethiopia
Abstract
In clinical decision-making, the ability to ask visual questions about medical images and receive accurate, personalized, and interpretable recommendations can significantly enhance practitioner support systems. This paper presents MedFusion, a unified multimodal framework that integrates Visual Question Answering (VQA), personalized medical recommendation, and explainability within a single architecture. The proposed model employs co-attention–based visual–textual fusion augmented with retrieval-enhanced reasoning to improve answer grounding, while personalized recommendations are generated using a shared multimodal representation supported by GAN-guided feature augmentation. To enhance transparency, the framework provides attention-based heatmaps and natural-language rationales for both answers and recommendations. Extensive experiments on VQA-RAD, EHRXQA, and Med-RecX demonstrate that MedFusion outperforms state-of-the-art medical VQA and recommendation baselines, achieving a 7.4% improvement in VQA accuracy, reducing RMSE to 0.91, and improving human-rated interpretability to 4.5/5. Ablation studies confirm the effectiveness of retrieval augmentation, GAN-guided enhancement, and joint multi-task learning. These results indicate that MedFusion offers a robust and explainable decision-support solution, advancing the deployment of trustworthy, user-adaptive AI systems in real-world healthcare environments.
DOI: https://doi.org/10.3844/jcssp.2026.1539.1551
Copyright: © 2026 Satyajit Mahapatra, Jibitesh Mishra, Kumar Janardan Patra, Sanjit Kumar Dash and Aliazar Deneke Deferisha. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 46 Views
- 12 Downloads
- 0 Citations
Download
Keywords
- Multimodal Learning
- VQA
- Medical Recommendation
- XAI
- Co-Attention
- Retrieval-Augmented Reasoning
- CGAN
- Healthcare Informatics