Blog
Longer notes on research practice, technical work, reading, and ideas in progress.
/15 min
GRPO Post-Training for Math Reasoning on Qwen3-0.6B
From SFT to GRPO: a systematic exploration of post-training strategies that improved math reasoning accuracy from 38.2% to 55.2% on Qwen3-0.6B-Base, with reward function design and None discrimination.
GRPORLpost-training
/6 min read
Project 1 Q2: Survival Analysis for Telco Customer Churn
A survival-analysis report for IBM Telco Customer Churn data, including Kaplan-Meier, log-rank tests, Cox PH, AFT, and CLV.
survival analysissparkmysql