Now
In the fourth year of my PhD at ETH Zurich. During November 2025, I’m participating in the Inkhaven Residency, writing 30 posts in 30 days. You can follow along on my newsletter or on my Inkhaven page.
Eleven papers in my PhD so far.
- Pitfalls in Evaluating Language Model Forecasters
- Consistency Checks for Language Model Forecasters ICLR 2025 Oral
- Refusal in Language Models Is Mediated by a Single Direction
- Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
- Stealing part of a production language model Best Paper Award at ICML 2024
- Foundational Challenges in Assuring Alignment and Safety of Large Language Models
- Evaluating Superhuman Models with Consistency Checks
- ARB: Advanced Reasoning Benchmark for Large Language Models
- Poisoning Web-Scale Training Datasets is Practical
- Red-Teaming the Stable Diffusion Safety Filter
- A law of adversarial risk, interpolation, and label noise
Last updated November 2025.