In Zurich, early into my PhD. Large language models (LLMs) are clearly a big deal, and I do not understand why most of ML academia still assumes that we will go back to the pre-2020 ML paradigm.

Working on empirical research on how language models can lose chain-of-thought interpretability when trained via reinforcement learning or other outcome-based optimization methods, for example with RLHF.

Reading most LLM failure modes and safety papers published recently, and summarizing some of them for my Twitter newsletter. Thinking about cross-posting to Mastodon.

I continue believing that we passed peak data relevance in 2020, and that future models will draw most of their training signal from some kind of reinforcement learning or self-distillation. Hope to be wrong.

Submitted two papers in the first month of my PhD:


Last updated November 2022.

What is a “now” page?