👋 Welcome to Rui’s blog

Hi, this is Rui. I’m documenting my learning notes in this blog. I came from a CS and Physics background, worked on image search systems and computer vision devtools before. Now I am catching up with recent developments of LLMs & LMMs.

From HumanEval to SWE-bench

Dimensions When we write code, we usually consider the following contexts: In-file references In-repo references that cross multiple files Code execution results guides us to update code iteratively Github Issues / PRs / Discussions express requirements & other infos In the figure, I list out some benchmarks specifically designed to test out different aspects of code LM’s performance. Context scope is the biggest differentiator among them. There are many more benchmarks that differentiate from these benchmarks taking into account of other concerns, e....

May 28, 2024 Â· 16 min Â· Rui Zheng