Back to today's list

Pan-FM: A Pan-Organ Foundation Model with Saliency-Guided Masking for Missing Robustness

Qiangqiang Wu, Grace McIlvain, Zhou Yu, Junhao Wen

Published May 11, 2026
Editorial review7.2
Relevance0.505
Freshness0.000

Why It Matters

What makes this one worth your time

This research addresses the challenge of missing data in multimodal medical imaging, which is crucial for developing more generalizable and robust AI models in healthcare.

Pan-FM leverages Saliency-Guided Masking to improve robustness in multimodal medical imaging with missing data.

Summary

The paper introduces Pan-FM, a pan-organ foundation model designed to handle missing data in multimodal medical imaging by using Saliency-Guided Masking to prevent dominant-organ bias during pre-training. The model is pre-trained on imaging data from seven organs and shows improved prediction performance across multiple disease categories compared to existing baselines.

Key contributions

  • Development of a pan-organ foundation model pre-trained on seven organs.
  • Introduction of Saliency-Guided Masking to address dominant-organ bias.
  • Demonstration of improved prediction performance under missing-organ scenarios.

Notable insights

  • Saliency-Guided Masking helps mitigate dominant-organ bias by adaptively masking during pre-training.
  • The approach introduces negligible computational overhead, making it practical for integration into existing frameworks.

Possible limitations

  • Not stated in the abstract

Abstract

arXiv:2605.07055v1 Announce Type: cross Abstract: Foundation models (FMs) have shown great promise in medical imaging, but most FMs are trained on unimodal data within isolated domains, such as brain MRI alone. Human aging and disease arise through coordinated biological processes across organs, therefore motivating multimodal FMs that learn whole-body representations. A key challenge, however, is that real-world multimodal biomedical data are often missing not at random, which can reduce power, limit generalizability, and introduce bias. We propose Pan-FM, a pan-organ foundation model pre-trained on imaging from seven organs (Brain, Heart, Adipose, Liver, Kidney, Spleen, and Pancreas) under realistic missing-organ scenarios. Pan-FM uses a unified backbone that handles organ missingness during both training and inference, and is pre-trained with masking-based self-distillation. We find that naive multimodal pre-training leads to dominant-organ shortcut learning bias, with the model over-relying on dominant organs such as adipose and heart. To address this, we introduce Saliency-Guided Masking (SGM), which uses the model attention distribution to adaptively mask dominant organs during pre-training, thus encouraging more balanced cross-organ, whole-body learning. Notably, SGM introduces negligible computational overhead and can be seamlessly integrated into existing self-supervised learning frameworks to improve multi-organ representation learning. On the UK Biobank, Pan-FM achieves stronger prediction across 13 disease categories and 14 single disease entities than single-organ and multi-organ baselines, with improved robustness under missing-organ settings. Pan-FM serves as a scalable solution to realistic modality-missingness in multimodal learning in system neuroscience and as a step toward more generalizable whole-body FMs.