TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics

Tobia Boschi, Andrea Loreti, Nicola C. Amorisco, Rodrigo H. Ordonez-Hurtado, C\'ecile Rousseau, George K. Holt, Eszter Sz\'ekely, Alexander Whittle, Samuel Jackson, Adriano Agnello, Stanislas Pamela, Alessandra Pascale, Robert Akers, Juan Bernabe Moreno, Vassil Alexandrov, Mykhaylo Zayats

Published Jun 8, 2026Featured #1In the daily list Jun 9, 2026

Open on arXiv Read PDF

Daily score73.6

Editorial review7.5

Relevance0.456

Freshness0.722

Why It Matters

What makes this one worth your time

This work provides a robust foundation for future research in fusion modeling, potentially accelerating advancements in plasma physics and fusion energy.

TokaMind sets a new standard for modeling tokamak plasma dynamics with a multi-modal transformer approach.

Summary

The paper introduces TokaMind, an open-source multi-modal transformer model designed for tokamak plasma dynamics, utilizing a variety of data types and demonstrating superior performance on a benchmark dataset.

Key contributions

Introduction of TokaMind as the first open-source foundation model for tokamak plasma dynamics.
Implementation of a Multi-Modal Transformer architecture capable of processing diverse data modalities.
Demonstration of significant performance improvements on the TokaMark benchmark across multiple tasks.

Notable insights

The use of a lightweight fixed-basis Discrete Cosine Transform embedding for multi-modal signals is a clever approach that may enhance efficiency.
The model's ability to handle missing signals robustly is particularly relevant for real-world applications where data may be incomplete.

Possible limitations

Not stated in the abstract.

Abstract

arXiv:2602.15084v2 Announce Type: replace-cross Abstract: We present TokaMind, to our knowledge the first open-source foundation model for tokamak plasma dynamics, based on a Multi-Modal Transformer (MMT) and pretrained on heterogeneous diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model components. To represent multi-modal signals, we use a lightweight fixed-basis Discrete Cosine Transform embedding (DCT3D) and provide a clean interface for alternative embeddings (e.g., Variational Autoencoders). We evaluate TokaMind on the recently introduced MAST benchmark TokaMark, which comprises 14 tasks with heterogeneous reconstruction and forecasting objectives. Our results show that fine-tuned TokaMind outperforms the strongest benchmark baseline on all but one task. Compared with training the same architecture from scratch under a matched epoch budget, warm-start adaptation is most beneficial on demanding downstream settings, including long-horizon forecasting and high-dimensional equilibrium objectives. These findings highlight the value of multi-modal pretraining for tokamak plasma dynamics and provide a practical, extensible foundation for future fusion modeling tasks. Training code and model weights are publicly available at github.com/UKAEA-IBM-STFC-Fusion-FMs/tokamind and huggingface.co/UKAEA-IBM-STFC, respectively.