Weike Zhao

赵唯珂 · PhD Candidate

I'm a PhD candidate at Shanghai Jiao Tong University (SJTU), advised by Prof. Weidi Xie and Prof. Ya Zhang.

My research focuses on Artificial Intelligence for Medicine (AI4Med), with a primary interest in developing AI diagnostic systems. I explore the use of large language models and agentic frameworks to create more reliable and interpretable clinical tools, with broader applications in multimodal and multi-omics analysis across medical domains.

zwk0629[at]sjtu.edu.cn Google Scholar GitHub

News

Recent Updates

2026.06One paper accepted at ECCV 2026.
2026.06Invited talk at the Google Genomics Deep Dives series.
2026.05Attended SAIL 2026 in Puerto Rico for a poster presentation.
2026.03Invited talk at the Harvard Medical AI (HMAI) Speaker Series.
2026.02One paper accepted at Nature! 🎉

Research

Selected Publications

* equal contribution · † corresponding author

Nature 2026

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Weike Zhao*, Chaoyi Wu*, Yanjie Fan*, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang†, Yongguo Yu†, Kun Sun†, Weidi Xie†

We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM), capable of processing heterogeneous clinical inputs.

In Submission 2026

End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang†, Weidi Xie†

Deep-DxSearch is an end-to-end agentic RAG system trained with reinforcement learning for traceable diagnostic reasoning. Built on a large-scale medical retrieval corpus with tailored rewards, it consistently outperforms prompt-engineering and training-free RAG approaches — achieving substantial gains over GPT-4o, DeepSeek-R1, and other medical frameworks on both common and rare disease diagnosis.

ECCV 2026

PhenoLIP: Integrating Phenotype Ontology Knowledge into Medical Vision-Language Pretraining

Cheng Liang, Chaoyi Wu, Weike Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie†

PhenoLIP is a medical vision-language model that integrates structured phenotype knowledge to improve medical image analysis, leveraging PhenoKG — a new large-scale knowledge graph of 520K+ image–text pairs linked to 3,000+ phenotypes. On the PhenoBench benchmark, PhenoLIP significantly outperforms existing models.

Nature Communications 2025

Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases

Pengcheng Qiu*, Chaoyi Wu*, Shuyu Liu, Weike Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie†

We quantitatively evaluate the free-text reasoning abilities of state-of-the-art LLMs, such as DeepSeek-R1 and OpenAI o3-mini, on assessment recommendation, diagnostic decision, and treatment planning.

EMNLP Main 2024

RaTEScore: A Metric for Radiology Report Generation

Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang†, Weidi Xie†

RaTEScore is an entity-aware metric for assessing AI-generated medical reports. It emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, is robust to complex medical synonyms, and is sensitive to negation — aligning more closely with human preference than existing metrics.

Nature Communications 2024

Large-scale Long-tailed Disease Diagnosis on Radiology Images

Qiaoyu Zheng*, Weike Zhao*, Chaoyi Wu*, Xiaoman Zhang, Ya Zhang, Yanfeng Wang†, Weidi Xie†

We build an academically accessible, large-scale diagnostic dataset covering 5,568 disorders linked to 930 unique ICD-10-CM codes — 39,026 cases and 192,675 scans — and present a novel architecture that processes an arbitrary number of input scans across imaging modalities, establishing a new benchmark for multi-modal, multi-anatomy long-tailed diagnosis.

Technical Report 2023

Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis

Chaoyi Wu*, Jiayu Lei*, Qiaoyu Zheng*, Weike Zhao*, Weixiong Lin*, Xiaoman Zhang*, Xiao Zhou*, Ziheng Zhao*, Ya Zhang, Yanfeng Wang, Weidi Xie†

We evaluate GPT-4V for multimodal medical diagnosis through case studies covering 17 human body systems across 8 clinical imaging modalities. As the cases show, GPT-4V remains far from clinical usage.

Beyond Research

Off Duty

When I'm not training models, you'll probably find me here:

⛷🏂⛸🎬🎯🎵🎻🏃🏊🏓🏸📸🤸🏔⛺🏑🎾