Research

With the help of my friends and Codex, Claude Code, and others

Generative AI, LLM post-training, multi-source data processing

Selected projects

2026

GRPO Post-Training for Math Reasoning

Featured

Systematically explored SFT-to-GRPO post-training on Qwen3-0.6B-Base, improving math reasoning accuracy from 38.2% to 55.2% through reward function design and None discrimination.

GRPORLpost-trainingmath-reasoning

2026

Bilingual academic homepage

Featured

A file-driven profile, writing, and life archive built with Next.js, MDX, and localized content.

Next.jsMDXpersonal infrastructure

2026

Research workflow notes

A living collection of notes about reading papers, designing experiments, and keeping research artifacts organized.

writingworkflowresearch practice

Publications and outputs

Talks and activities