TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models
Ziwen Kan, Yishuo Chen, Kecheng Li, Andrew Wen, Xiaomeng Wang, Liwei Wang, Jihao Duan, Song Wang, Hongfang Liu, Tianlong Chen
Why It Matters
What makes this one worth your time
This work is relevant for AI researchers and engineers dealing with real-world multimodal time series data, as it offers a method to improve model robustness and accuracy in the presence of missing or irregular data.
TRACE enhances multimodal time series models by effectively handling missing data and irregular sampling.
Summary
The paper introduces TRACE, a conditional estimation framework designed to handle multimodal time series data with missing modalities and irregular sampling. It aims to improve the robustness of time series foundation models by inferring incomplete target modalities from available auxiliary modalities. The approach is evaluated on healthcare and affective computing benchmarks, showing superior performance over existing multimodal fusion methods.
Key contributions
- Introduction of TRACE, a conditional estimation paradigm for multimodal time series.
- Demonstration of improved performance on benchmarks like MIMIC-IV and CMU-MOSI/CMU-MOSEI.
- Addressing cross-modal dependencies in the presence of missing data.
Notable insights
- The use of conditional estimation to infer missing modalities from auxiliary data is a clever approach to address data incompleteness.
- TRACE's ability to handle irregular sampling and modality missingness could significantly improve the reliability of multimodal models in practical applications.
Possible limitations
- Not stated in the abstract
Abstract
arXiv:2606.06285v1 Announce Type: new Abstract: Time series foundation models (TS-FMs) aim to learn generalizable temporal representations that can be adapted to a wide range of downstream tasks. In real-world multimodal settings, time series are frequently affected by temporal misalignment and partial modality missingness, where different modalities are observed at heterogeneous time scales or are partially absent. Existing approaches typically rely on naive imputation or masking strategies, which fail to account for cross-modal dependencies and often lead to misaligned or degraded representations. We propose TRACE, a conditional estimation paradigm for multimodal time series foundation model pipelines under missingness and irregular sampling, allowing incomplete target modalities to be systematically inferred from available auxiliary modalities. We evaluate TRACE on diverse multimodal benchmarks spanning healthcare and affective computing, including the MIMIC-IV clinical dataset and the CMU-MOSI and CMU-MOSEI benchmarks for multimodal sentiment analysis. Across a range of downstream prediction tasks and missing-modality settings, TRACE consistently outperforms prior multimodal fusion approaches, demonstrating improved robustness to severe modality missingness and more reliable cross-modal representations.