Where Should Knowledge Enter? A Layered Framework for Knowledge Infusion in Multimodal Iterative Generative Mo
Renjith Prasad, Chathurangi Shyalika, Anushka Pawar, Amit Sheth
Why It Matters
What makes this one worth your time
This research addresses the critical challenge of ensuring reliability in generative models when handling domain-specific knowledge, which is essential for applications in safety-critical areas.
A novel framework for structured knowledge infusion in multimodal generative models.
Summary
The paper proposes a layered framework for knowledge infusion in multimodal iterative generative models, identifying four distinct intervention layers and demonstrating their effectiveness in reducing knowledge-violating outputs through empirical experiments.
Key contributions
- Introduction of a layered framework for knowledge infusion in generative models.
- Empirical validation of the framework through a controlled safety-alignment experiment.
- Mapping of existing methods to the proposed intervention layers.
Notable insights
- The framework categorizes knowledge infusion by intervention layers rather than techniques, providing a structured approach to model improvement.
- The empirical results indicate that each layer addresses unique failure modes, suggesting a complementary nature of the interventions.
Possible limitations
- Not stated in the abstract.
Abstract
arXiv:2606.06356v1 Announce Type: new Abstract: Multimodal generative models produce fluent outputs but remain unreliable when generation must respect structured, domain-specific, or safety-critical knowledge. Existing methods incorporate knowledge through mechanisms such as prompt augmentation, guidance, latent editing, or fine-tuning, yet they are typically categorized by technique rather than by the component of the generative process they modify. We argue that knowledge infusion in iterative generative models is fundamentally anintervention-layer problem. Since thegenerative process unfolds as a trajectory of internal states, knowledge can act on four structurally distinct components of this process: the input/output boundary, the transition function, the intermediate state, and the model parameters. This maps to four intervention layers: surface, trajectory, latent, and parametric infusion. We instantiate the framework in diffusion models, map representative methods to all four layers, and derive design principles for multi-layer composition. In a controlled safety-alignment experiment using a multimodal knowledge graph with two diffusion backbones, we implement three of the four layers cumulatively, surface (input-side and output-side) and trajectory--latent (mid-generation). We show empirically that each additional layer addresses failure classes that prior layers cannot reach, reducing knowledge-violating outputs by 70.97% compared to vanilla generation and empirically confirming the framework's complementarity prediction.