3D Foundation Model for Generalizable Disease Detection in Head Computed Tomography

Weicheng Zhu, Haoxu Huang, Huanze Tang, Rushabh Musthyala, Boyang Yu, Long Chen, Emilio Vega, Thomas O'Donnell, Seena Dehkharghani, Jennifer A. Frontera, Arjun V. Masurkar, Kara Melmed, Narges Razavian

Published Apr 22, 2026

Open on arXiv Read PDF

Editorial review7.2

Relevance0.482

Freshness0.000

Why It Matters

What makes this one worth your time

This research demonstrates the potential of self-supervised learning to improve medical imaging diagnostics, reducing the need for annotated data and potentially enhancing early disease detection in clinical settings.

A 3D foundation model for head CT scans enhances disease detection using self-supervised learning.

Summary

The paper introduces FM-CT, a 3D foundation model for head CT scans, leveraging self-supervised learning to improve disease detection without manual annotations. It is trained on a large dataset of 361,663 non-contrast 3D head CT scans, using techniques like self-distillation and masked image modeling. The model's performance is evaluated on both in-distribution and out-of-distribution datasets, showing significant improvements over models trained from scratch and previous 3D CT models.

Key contributions

Introduction of a 3D foundation model for head CT scans.
Application of self-supervised learning techniques like self-distillation and masked image modeling.
Demonstration of improved performance on both in-distribution and out-of-distribution datasets.

Notable insights

Utilizing self-supervised learning to train on a large dataset without manual annotations.
Employing 3D modeling to capture comprehensive structural information from head CT scans.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2502.02779v3 Announce Type: replace-cross Abstract: Head computed tomography (CT) imaging is a widely-used imaging modality with multitudes of medical indications, particularly in assessing pathology of the brain, skull, and cerebrovascular system. It is commonly the first-line imaging in neurologic emergencies given its rapidity of image acquisition, safety, cost, and ubiquity. Deep learning models may facilitate detection of a wide range of diseases. However, the scarcity of high-quality labels and annotations, particularly among less common conditions, significantly hinders the development of powerful models. To address this challenge, we introduce FM-CT: a Foundation Model for Head CT for generalizable disease detection, trained using self-supervised learning. Our approach pre-trains a deep learning model on a large, diverse dataset of 361,663 non-contrast 3D head CT scans without the need for manual annotations, enabling the model to learn robust, generalizable features. To investigate the potential of self-supervised learning in head CT, we employed both discrimination with self-distillation and masked image modeling, and we construct our model in 3D rather than at the slice level (2D) to exploit the structure of head CT scans more comprehensively and efficiently. The model's downstream classification performance is evaluated using internal and three external datasets, encompassing both in-distribution (ID) and out-of-distribution (OOD) data. Our results demonstrate that the self-supervised foundation model significantly improves performance on downstream diagnostic tasks compared to models trained from scratch and previous 3D CT foundation models on scarce annotated datasets. This work highlights the effectiveness of self-supervised learning in medical imaging and sets a new benchmark for head CT image analysis in 3D, enabling broader use of artificial intelligence for head CT-based diagnosis.