RL learning resources - normcore reads
Published:
This compiles a list of links that I found useful when trying learn RL for LLMs.
Published:
This compiles a list of links that I found useful when trying learn RL for LLMs.
Published:
这篇文章主要是从直觉角度来解释大模型训练中 SFT 和 RL 的关系。我们可以看到区别于SFT时“老师说的就是对”的学习方式,RL 是为了能够高效的利用“没那么正确”的样本,从而增加模型“正确”的可能性。这篇文章是我在阅读整理 Understanding Reinforcement Learning for Model Training, and future directions with GRAPE 这篇报告的思考和笔记,也强烈建议大家去读原文。
Published:
This compiles a list of links that I found useful when trying learn RL for LLMs.
Published:
事情的起因是: 在 WSL 2 下开启 Jupyter Notebook, 在 windows 下可以通过 127.0.0.1:8888 启动,但无法通过 localhost:8888 启动。
Published:
This compiles a list of links that I found useful when trying learn RL for LLMs.
Published:
Inspired by Simon Willison, I decided to put some links in my blog for helping myself to finish these links and useful resources, so that I can actually absort this.
Published:
Recently, there’s a trend called “vibe coding,” proposed by Andrej Karpathy. It essentially means that by just telling what you want to build to LLMs, without writing a single line of code, everyone can build a standalone application. Many developers on X and Reddit are currently hyped about it, and I wanted to share my two cents.