|
Runpeng Dai
I am a third-year Ph.D. candidate at the University of North Carolina at Chapel Hill, advised by Prof.
Hongtu Zhu. Before that, I obtained my B.S in Statistics from the Shanghai University of Finance and Economics where I was advised by Prof.
Fan Zhou. My research sits at the intersection of Reinforcement Learning and LLM Reasoning, bridging theory and practice in AI.
I am actively seeking 2026 summer internships. Let's connect!
|
|
Research Experience
May 2025 - August 2025
- Develop Curiosity-Driven Exploration leveraging a model's intrinsic sense of curiosity to guide exploration in RLVR
- Collaborated with fellow interns and colleagues on Parallel-R1 and VOGUE.
May 2024 - July 2024
- Proposed a transformation-invariant sensitivity measure for LLMs and VLMs.
- The measure can be applied to safeguard vulnerable parameters during quantization and model merging.
Selected Publications
Reasoning LLM
|
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Neurips 2025 MATH-AI workshop, 2025.
[Paper]
|
|
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Neurips 2025 Efficient Reasoning workshop, 2025.
[Paper]
[Code][Over 100+ Stars]
|
|
VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
(arXiv), 2025.
[Paper]
|
|
StatEval: A Comprehensive Benchmark for Large Language Models in Statistics
(arXiv), 2025.
[Paper]
[Project]
|
|
R1-RE: Cross-Domain Relation Extraction with RLVR
(arXiv), 2025.
[Paper]
|
Reinforcement Learning and Causal Inference
|
Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences
(arXiv), 2024.
[Paper]
|
|
Deep Distributional Learning with Non-crossing Quantile Network
(arXiv), 2025.
[Paper]
|
Teaching & Professional Service
|