Jen Ben Arye

autoregressive-diffusion

2025

computer vision

diffusion

data

Worked on autoregressive diffusion for video generation @ Decart.

open-source-rlhf

2024

rlhf

llm

nlp

Automated RLHF pipelines to align LLMs w/ live human feedback @ MIT CSAIL. Advised by Prof. Jacob Andreas and Prof. Leshem Choshen. In collaboration with Cohere and Hugging Face.

reward-hacking

2025

agents

reward hacking

CoT

Studied how CoT monitoring awareness effects reward hacking behavior in coding agents.

Links:

GitHub

adversarial-feedback

2025

alignment

eval

rlhf

Researched how adversarial preference signals shift model alignment + behavior; stress-testing LLMs under adversarial feedback.

jenbenarye.com

2025

nextjs

react

You're here.