Data-Centric Foundation Models in Computational Healthcare: A Survey

Yunkun Zhang, Jin Gao, Zheling Tan, Lingfeng Zhou, Kexin Ding, Mu Zhou, Shaoting Zhang, Dequan Wang

Published Apr 30, 2026

Editorial review6.8

Relevance0.517

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding data-centric approaches in foundation models can improve healthcare AI systems by addressing data quality and ethical challenges.

A survey of data-centric foundation models in healthcare, highlighting data challenges and opportunities.

Summary

The paper surveys data-centric approaches in the context of foundation models for computational healthcare, discussing challenges and opportunities in data quality, security, and alignment with human values, and provides a list of relevant models and datasets.

Key contributions

Survey of data-centric approaches in foundation models for healthcare.
Discussion on AI security and alignment with human values.
Compilation of healthcare-related foundation models and datasets.

Notable insights

The interactive nature of foundation models emphasizes the importance of data quality and characterization.
Foundation models can potentially enhance patient outcomes and clinical workflows.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2401.02458v3 Announce Type: replace-cross Abstract: The advent of foundation models (FMs) as an emerging suite of AI techniques has struck a wave of opportunities in computational healthcare. The interactive nature of these models, guided by pre-training data and human instructions, has ignited a data-centric AI paradigm that emphasizes better data characterization, quality, and scale. In healthcare AI, obtaining and processing high-quality clinical data records has been a longstanding challenge, encompassing data quantity, annotation, patient privacy, and ethics. In this survey, we investigate a wide range of data-centric approaches in the FM era (from model pre-training to inference) towards improving the healthcare workflow. We discuss key perspectives in AI security, assessment, and alignment with human values. Finally, we offer a promising outlook on FM-based analytics to enhance patient outcomes and clinical workflows in the evolving landscape of healthcare and medicine. We provide an up-to-date list of healthcare-related foundation models and datasets at https://github.com/Yunkun-Zhang/Data-Centric-FM-Healthcare.