KumoRFM-2: Scaling Foundation Models for Relational Learning
Valter Hudovernik, Federico L\'opez, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, Matthias Fey
Why It Matters
What makes this one worth your time
This paper introduces a foundation model that significantly improves the handling of relational data, a critical component in many real-world applications, by outperforming existing methods and scaling to large datasets.
KumoRFM-2 advances relational learning by outperforming traditional methods on benchmark tasks.
Summary
KumoRFM-2 is a significant advancement in foundation models for relational data, enabling in-context learning and fine-tuning without the need for manual table flattening. It processes relational data natively, preserving temporal consistency and scaling to billion-scale datasets. The model demonstrates superior performance over traditional supervised methods on benchmark tasks, particularly under challenging conditions such as cold start and noisy data.
Key contributions
- Development of KumoRFM-2, a scalable foundation model for relational data that outperforms supervised approaches on benchmarks.
Notable insights
- KumoRFM-2's ability to inject task information early enhances its robustness to noisy data.
Possible limitations
- The paper does not address the computational resources required for training and deploying KumoRFM-2 at scale.
Abstract
arXiv:2604.12596v1 Announce Type: cross Abstract: We introduce KumoRFM-2, the next iteration of a pre-trained foundation model for relational data. KumoRFM-2 supports in-context learning as well as fine-tuning and is applicable to a wide range of predictive tasks. In contrast to tabular foundation models, KumoRFM-2 natively operates on relational data, processing one or more connected tables simultaneously without manual table flattening or target variable generation, all while preserving temporal consistency. KumoRFM-2 leverages a large corpus of synthetic and real-world data to pre-train across four axes: the row and column dimensions at the individual table level, and the foreign key and cross-sample dimensions at the database level. In contrast to its predecessor, KumoRFM-2 injects task information as early as possible, enabling sharper selection of task-relevant columns and improved robustness to noisy data. Through extensive experiments on 41 challenging benchmarks and analysis around expressivity and sensitivity, we demonstrate that KumoRFM-2 outperforms supervised and foundational approaches by up to 8%, while maintaining strong performance under extreme settings of cold start and noisy data. To our knowledge, this is the first time a few-shot foundation model has been shown to surpass supervised approaches on common benchmark tasks, with performance further improving upon fine-tuning. Finally, while KumoRFM-1 was limited to small-scale in-memory datasets, KumoRFM-2 scales to billion-scale relational datasets.