Now
In the fourth year of my PhD at ETH Zurich.
Twelve papers in my PhD so far.
- Large-scale online deanonymization with LLMs
- Pitfalls in Evaluating Language Model Forecasters
- Consistency Checks for Language Model Forecasters ICLR 2025 Oral
- Refusal in Language Models Is Mediated by a Single Direction
- Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
- Stealing part of a production language model Best Paper Award at ICML 2024
- Foundational Challenges in Assuring Alignment and Safety of Large Language Models
- Evaluating Superhuman Models with Consistency Checks
- ARB: Advanced Reasoning Benchmark for Large Language Models
- Poisoning Web-Scale Training Datasets is Practical
- Red-Teaming the Stable Diffusion Safety Filter
- A law of adversarial risk, interpolation, and label noise
Last updated February 2026.