Research

With the help of my friends and Codex, Claude Code, and others

Generative AI, LLM post-training, multi-source data processing

Selected projects

2026

GRPO Post-Training for Math Reasoning

Featured

Systematically explored SFT-to-GRPO post-training on Qwen3-0.6B-Base, improving math reasoning accuracy from 38.2% to 55.2% through reward function design and None discrimination.

GRPORLpost-trainingmath-reasoning

2026

Bilingual academic homepage

Featured

A file-driven profile, writing, and life archive built with Next.js, MDX, and localized content.

Next.jsMDXpersonal infrastructure

2026

Research workflow notes

A living collection of notes about reading papers, designing experiments, and keeping research artifacts organized.

writingworkflowresearch practice

With the help of my friends and Codex, Claude Code, and others

Generative AI, LLM post-training, multi-source data processing

Selected projects

GRPO Post-Training for Math Reasoning

Bilingual academic homepage

Research workflow notes

Publications and outputs

Talks and activities