I'm Miles Wang. I’m a researcher at OpenAI trying to build beneficial and safe AGI.
I’m on the RL team, but my interests span alignment, evaluations, reasoning, and science. Right now, I'm thinking about AI systems that can continually learn.
I’ve worked on a number of research directions, including:
- Scalable oversight of increasingly capable models, such as monitoring chains-of-thought for reward hacking.
- Frontier evaluations for high-compute RL runs.
- AI for science (especially biology) with agents that learn online.
- Frontier risk evaluations for models, including maximal capability elicitation.
- Alignment of behavior and understanding when misalignment generalizes.
- Adversarial robustness to jailbreaks.
- Continual learning: Currently top of mind.
I studied Computer Science at Harvard before leaving to join OpenAI in March 2024. Feel free to contact me at milesw [at] openai [dot] com.
Selected Papers
- Monitoring Monitorability
- FrontierScience: Evaluating AI’s ability to perform scientific research tasks
- Measuring AI’s capability to accelerate biological research in the wet lab
- Estimating worst case frontier risks of open weight LLMs
- Persona Features Control Emergent Misalignment
- Forbidden Facts: An Investigation of Competing Objectives in Llama-2