fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding

Yuxiang Wei, Yanteng Zhang, Xi Xiao, Chengxuan Qian, Tianyang Wang, Vince D. Calhoun

Published May 16, 2026Featured #10In the daily list Apr 20, 2026

Open on arXiv Read PDF

Daily score51.8

Editorial review6.8

Relevance0.491

Freshness0.056

Why It Matters

What makes this one worth your time

This work is significant for researchers interested in the intersection of neuroscience and AI, as it proposes a method to link brain activity with language, potentially advancing cognitive neuroscience and AI applications.

fMRI-LM bridges brain imaging and language models for enhanced semantic understanding.

Summary

The paper introduces fMRI-LM, a foundational model designed to integrate functional MRI data with language models through a three-stage framework. It involves creating a neural tokenizer for fMRI, adapting a pretrained language model to handle fMRI tokens and text, and using multi-task instruction tuning to enhance semantic understanding. The model demonstrates strong zero-shot and few-shot performance across benchmarks.

Key contributions

Development of a neural tokenizer for fMRI data.
Adaptation of a pretrained language model to jointly model fMRI tokens and text.
Implementation of multi-task instruction tuning for semantic understanding.

Notable insights

The use of a neural tokenizer to map fMRI data into language-consistent tokens is a novel approach.
Constructing a descriptive corpus to translate imaging features into textual descriptors addresses the lack of natural fMRI-text pairs.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2511.21760v4 Announce Type: replace-cross Abstract: Recent advances in multimodal large language models (LLMs) have enabled unified reasoning across images, audio, and video, but extending such capability to brain imaging remains largely unexplored. Bridging this gap is essential to link neural activity with semantic cognition and to develop cross-modal brain representations. To this end, we present fMRI-LM, a foundational model that bridges functional MRI (fMRI) and language through a three-stage framework. In Stage 1, we learn a neural tokenizer that maps fMRI into discrete tokens embedded in a language-consistent space. In Stage 2, a pretrained LLM is adapted to jointly model fMRI tokens and text, treating brain activity as a sequence that can be temporally predicted and linguistically described. To overcome the lack of natural fMRI-text pairs, we construct a large descriptive corpus that translates diverse imaging-based features into structured textual descriptors, capturing the low-level organization of fMRI signals. In Stage 3, we perform multi-task, multi-paradigm instruction tuning to endow fMRI-LM with high-level semantic understanding, supporting diverse downstream applications. Across various benchmarks, fMRI-LM achieves strong zero-shot and few-shot performance, and adapts efficiently with parameter-efficient tuning (LoRA), establishing a scalable pathway toward a language-aligned, universal model for structural and semantic understanding of fMRI.