Runpeng Dai

I am a third-year Ph.D. candidate at the University of North Carolina at Chapel Hill, advised by Prof. Hongtu Zhu. Before that, I obtained my B.S in Statistics from the Shanghai University of Finance and Economics where I was advised by Prof. Fan Zhou. My research sits at the intersection of Reinforcement Learning and LLM Reasoning, bridging theory and practice in AI.

I am actively seeking 2026 summer internships. Let's connect!

profile photo

Research Highlights

  • LLM Reasoning
    Developing advanced reasoning training techniques and applying reasoning to real world applications.
    • Efficient Exploration: Curiosity-Driven Exploration for RLVR training.
    • Parallel Reasoning: Parallel-R1, the first RL framework to teach LLMs parallel thinking.
    • Information Extraction: R1-RE, pioneer LLM reasoning for OOD Relation Extraction.
  • Reinforcement Learning
    Developing intelligent agents and advanced post-training techniques for next-generation AI systems.

Research Experience

Tencent Logo
Tencent AI Lab (Seattle)
Research Scientist Intern
May 2025 - August 2025
Mentor: Dr. Linfeng Song
  • Develop Curiosity-Driven Exploration leveraging a model's intrinsic sense of curiosity to guide exploration in RLVR
  • Collaborated with fellow interns and colleagues on Parallel-R1 and VOGUE.
Baidu Logo
Baidu Qianfan
Research Intern
May 2024 - July 2024
  • Proposed a transformation-invariant sensitivity measure for LLMs and VLMs.
  • The measure can be applied to safeguard vulnerable parameters during quantization and model merging.

Selected Publications

*Equal contribution. Check out my full publication list.

Reasoning LLM

hpp

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Neurips 2025 MATH-AI workshop, 2025.
[Paper]
hpp

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Neurips 2025 Efficient Reasoning workshop, 2025.
[Paper] [Code][Over 100+ Stars]
hpp

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

(arXiv), 2025.
[Paper]
hpp

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics

(arXiv), 2025.
[Paper] [Project]
hpp

R1-RE: Cross-Domain Relation Extraction with RLVR

(arXiv), 2025.
[Paper]

Reinforcement Learning and Causal Inference

hpp

Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

(arXiv), 2024.
[Paper]
hpp

Deep Distributional Learning with Non-crossing Quantile Network

(arXiv), 2025.
[Paper]

Teaching & Professional Service

Teaching Assistant of BIOS740 Deep Learning for Biomedical Applications, University of North Carolina at Chapel Hill, Fall 2024
Lecturer of Deep Learning Methods in Advanced Statistical Problems, JSM 2025, ICSA 2024

Life

Outside of research, I enjoy fishing both in freshwater and saltwater. My Fishing Photos in Instagram .