autoregressive-diffusion
2025
Worked on autoregressive diffusion for video generation @ Decart.
open-source-rlhf
2024
Automated RLHF pipelines to align LLMs w/ live human feedback @ MIT CSAIL. Advised by Prof. Jacob Andreas and Prof. Leshem Choshen. In collaboration with Cohere and Hugging Face.
reward-hacking
2025
Studied how CoT monitoring awareness effects reward hacking behavior in coding agents.
Links:
adversarial-feedback
2025
Researched how adversarial preference signals shift model alignment + behavior; stress-testing LLMs under adversarial feedback.
jenbenarye.com
2025
You're here.