Posts by Category

Essays

Thoughts on vibe coding

4 minute read

Published: March 24, 2025

Recently, there’s a trend called “vibe coding,” proposed by Andrej Karpathy. It essentially means that by just telling what you want to build to LLMs, without writing a single line of code, everyone can build a standalone application. Many developers on X and Reddit are currently hyped about it, and I wanted to share my two cents.

LLM

从 SFT 到 RLVR 的平滑过渡：GRPO 的直觉解释

1 minute read

Published: December 14, 2025

这篇文章主要是从直觉角度来解释大模型训练中 SFT 和 RL 的关系。我们可以看到区别于SFT时“老师说的就是对”的学习方式，RL 是为了能够高效的利用“没那么正确”的样本，从而增加模型“正确”的可能性。这篇文章是我在阅读整理 Understanding Reinforcement Learning for Model Training, and future directions with GRAPE 这篇报告的思考和笔记，也强烈建议大家去读原文。

Random

A weird debugging process for Jupyter Notebook on WSL 2

1 minute read

Published: May 25, 2025

事情的起因是：在 WSL 2 下开启 Jupyter Notebook, 在 windows 下可以通过 127.0.0.1:8888 启动，但无法通过 localhost:8888 启动。

Resources

RL learning resources - normcore reads

less than 1 minute read

Published: March 18, 2025

This compiles a list of links that I found useful when trying learn RL for LLMs.

Build a link blog post