Where Should Knowledge Enter? A Layered Framework for Knowledge Infusion in Multimodal Iterative Generative Mo

Renjith Prasad, Chathurangi Shyalika, Anushka Pawar, Amit Sheth

Published Jun 6, 2026Featured #9In the daily list Jun 7, 2026

Open on arXiv Read PDF

Daily score67.6

Editorial review7.5

Relevance0.450

Freshness0.722

Why It Matters

What makes this one worth your time

This research addresses the critical challenge of ensuring reliability in generative models when handling domain-specific knowledge, which is essential for applications in safety-critical areas.

A novel framework for structured knowledge infusion in multimodal generative models.

Summary

The paper proposes a layered framework for knowledge infusion in multimodal iterative generative models, identifying four distinct intervention layers and demonstrating their effectiveness in reducing knowledge-violating outputs through empirical experiments.

Key contributions

Introduction of a layered framework for knowledge infusion in generative models.
Empirical validation of the framework through a controlled safety-alignment experiment.
Mapping of existing methods to the proposed intervention layers.

Notable insights

The framework categorizes knowledge infusion by intervention layers rather than techniques, providing a structured approach to model improvement.
The empirical results indicate that each layer addresses unique failure modes, suggesting a complementary nature of the interventions.

Possible limitations

Not stated in the abstract.

Abstract

arXiv:2606.06356v1 Announce Type: new Abstract: Multimodal generative models produce fluent outputs but remain unreliable when generation must respect structured, domain-specific, or safety-critical knowledge. Existing methods incorporate knowledge through mechanisms such as prompt augmentation, guidance, latent editing, or fine-tuning, yet they are typically categorized by technique rather than by the component of the generative process they modify. We argue that knowledge infusion in iterative generative models is fundamentally anintervention-layer problem. Since thegenerative process unfolds as a trajectory of internal states, knowledge can act on four structurally distinct components of this process: the input/output boundary, the transition function, the intermediate state, and the model parameters. This maps to four intervention layers: surface, trajectory, latent, and parametric infusion. We instantiate the framework in diffusion models, map representative methods to all four layers, and derive design principles for multi-layer composition. In a controlled safety-alignment experiment using a multimodal knowledge graph with two diffusion backbones, we implement three of the four layers cumulatively, surface (input-side and output-side) and trajectory--latent (mid-generation). We show empirically that each additional layer addresses failure classes that prior layers cannot reach, reducing knowledge-violating outputs by 70.97% compared to vanilla generation and empirically confirming the framework's complementarity prediction.