SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models
Chao Ding, Mouxiao Bian, Tianbin Li, Minjia Yuan, Yidong Jiang, Yankai Jiang, Jinru Ding, Jiayuan Chen, Zhuangzhi Gao, Pengcheng Chen, Zhao He, Rongzhao Zhang, Meiling Liu, Luyi Jiang, Jie Xu
Why It Matters
What makes this one worth your time
This work is relevant for AI researchers and engineers focused on deploying LLMs in healthcare, as it addresses critical safety and ethical concerns that are barriers to clinical adoption.
SafeMed-R1 enhances medical LLM safety and ethics through clinician-audited supervision.
Summary
The paper introduces SafeMed-R1, a large language model for medical applications, which is trained using a Clinical Trust Signals pipeline to ensure safety and ethics alignment. The model is evaluated on clinical benchmarks and adversarial safety tests, showing improved safety and performance compared to baseline models.
Key contributions
- Development of the SafeMed-R1 model with a focus on safety and ethics alignment.
- Introduction of a Clinical Trust Signals pipeline linking reasoning instances to clinician evaluations.
- Demonstration of improved safety and performance metrics in clinical and adversarial testing.
Notable insights
- The use of clinician-audited supervision provenance to improve model safety and ethics.
- Integration of red team stress testing to evaluate and enhance model resilience against adversarial misuse.
Possible limitations
- Not stated in the abstract
Abstract
arXiv:2605.28338v1 Announce Type: new Abstract: Large language models(LLMs) increasingly match expert performance on licensing examinations, yet routine clinical use remains limited because governance requires auditable reasoning, safety and ethics alignment, and resilience to adversarial misuse. Here we present SafeMed-R1, trained with a traceable Clinical Trust Signals(CTS) pipeline that links each reasoning instance to clinician rubric scores and edit histories, and aligned through safety and ethics supervision and red team stress testing. SafeMed-R1 attains a macro-averaged accuracy of 79.6% across clinical benchmarks. Under adversarial safety testing, it shows the lowest aggregated risk and reduces unsafe outputs by about 3 to 5% relative to its baseline. In a paired expert study of 30 medication safety vignettes, SafeMed-R1 matches PGY1 and PGY2 residents on medical correctness and scores higher for medication safety, guideline consistency, and clinical usefulness. Collectively, these results suggest that clinician-audited supervision provenance, together with domain-tailored safety and ethics alignment, can strengthen governance-relevant evidence without relying on inference-time retrieval or citation grounding.