Blog | Kaicheng Mao

Blog

Longer notes on research practice, technical work, reading, and ideas in progress.

May 11, 2026/15 min

GRPO Post-Training for Math Reasoning on Qwen3-0.6B

From SFT to GRPO: a systematic exploration of post-training strategies that improved math reasoning accuracy from 38.2% to 55.2% on Qwen3-0.6B-Base, with reward function design and None discrimination.

GRPORLpost-training

Apr 28, 2026/6 min read

Project 1 Q2: Survival Analysis for Telco Customer Churn

A survival-analysis report for IBM Telco Customer Churn data, including Kaplan-Meier, log-rank tests, Cox PH, AFT, and CLV.

survival analysissparkmysql