I'm Miles Wang. I’m a researcher at OpenAI trying to build beneficial and safe AGI.
I’m on the RL team, but my interests span alignment, evaluations, reasoning, and science. I’ve worked on a number of research directions, including:
- Scalable oversight of increasingly capable models, such as monitoring chains-of-thought for reward hacking.
- Frontier evaluations for high-compute RL runs.
- AI for science (especially biology) with agents that learn online.
- Frontier risk evaluations for models, including maximal capability elicitation.
- Alignment of behavior and understanding when misalignment generalizes.
- Adversarial robustness to jailbreaks.
- Machines that learn over long horizons: Currently top of mind.
I studied Computer Science at Harvard before leaving to join OpenAI in March 2024. Feel free to contact me at milesw [at] openai [dot] com.
Selected Papers
- Monitoring Monitorability
- FrontierScience: Evaluating AI’s ability to perform scientific research tasks
- Measuring AI’s capability to accelerate biological research in the wet lab
- Estimating worst case frontier risks of open weight LLMs
- Persona Features Control Emergent Misalignment
- Forbidden Facts: An Investigation of Competing Objectives in Llama-2