Runpeng Dai

I am a third-year Ph.D. candidate at the University of North Carolina at Chapel Hill, advised by Prof. Hongtu Zhu. Before that, I obtained my B.S in Statistics from the Shanghai University of Finance and Economics where I was advised by Prof. Fan Zhou. My research sits at the intersection of Reinforcement Learning and LLM Reasoning, bridging theory and practice in AI.

I am actively seeking 2026 summer internships. Let's connect!

Research Highlights

LLM Reasoning

Developing advanced reasoning training techniques and applying reasoning to real world applications.
- Efficient Exploration: Curiosity-Driven Exploration for RLVR training.
- Parallel Reasoning: Parallel-R1, the first RL framework to teach LLMs parallel thinking.
- Information Extraction: R1-RE, pioneer LLM reasoning for OOD Relation Extraction.
Reinforcement Learning

Developing intelligent agents and advanced post-training techniques for next-generation AI systems.
- Causal RL: Causal PIE: A novel framework for policy evaluation under interference in ride-sharing systems.
- Distributional RL: Non-crossing Quantile distributional RL

Research Experience

Tencent AI Lab (Seattle)

Research Scientist Intern

May 2025 - August 2025

Mentor: Dr. Linfeng Song

Develop Curiosity-Driven Exploration leveraging a model's intrinsic sense of curiosity to guide exploration in RLVR
Collaborated with fellow interns and colleagues on Parallel-R1 and VOGUE.

Baidu Qianfan

Research Intern

May 2024 - July 2024

Proposed a transformation-invariant sensitivity measure for LLMs and VLMs.
The measure can be applied to safeguard vulnerable parameters during quantization and model merging.

Selected Publications

*Equal contribution. Check out my full publication list.

Reasoning LLM

	CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models (ICLR), International Conference on Learning Representations, 2026. [Paper]
	Parallel-R1: Towards Parallel Thinking via Reinforcement Learning (ICLR), International Conference on Learning Representations, 2026. [Paper] [Code][Over 100+ Stars]
	VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning (arXiv), 2025. [Paper]
	StatEval: A Comprehensive Benchmark for Large Language Models in Statistics (arXiv), 2025. [Paper] [Project]
	R1-RE: Cross-Domain Relation Extraction with RLVR (arXiv), 2025. [Paper] [Code]
	Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models EACL main, European Chapter of the Association for Computational Linguistics, 2025. [Paper]

Reinforcement Learning and Causal Inference

	Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences (arXiv), 2024. [Paper]
	Deep Distributional Learning with Non-crossing Quantile Network (arXiv), 2025. [Paper]

Applied Research

Spatio-temporal Prediction of Fine-Grained Origin-Destination Matrices with Applications in Ridesharing

Journal of Computational and Graphical Statistics, 1-17, 2026.
[Paper]

Teaching & Professional Service

Teaching Assistant of BIOS740 Deep Learning for Biomedical Applications, University of North Carolina at Chapel Hill, Fall 2024
Lecturer of Deep Learning Methods in Advanced Statistical Problems, JSM 2025, ICSA 2024

Life

Outside of research, I enjoy fishing both in freshwater and saltwater. My Fishing Photos in Instagram .