Hey, I’m a PhD Candidate at the MIT Media Lab. My research focuses on training and evaluating large language models, their social impact and governance.
Prior:
2024.07: 3 Oral and 1 Spotlight paper accepted to ICML 2024: (1) Safe Harbor, (2) Societal Impact of Open Foundation Models, (3) AI Autonomous Weapons Risk Geopolitical Instability, and (4) Data Authenticity, Consent, and Provenance for AI Are All Broken: What Will It Take to Fix Them?.
2024.06: The Data Provenance Initiative was awarded the Mozilla Data Futures Lab grant. Presented at MozFest 2024.
2024.05: Co-wrote the International Scientific Report on the Safety of Advanced AI.
2024.03: Our Open Letter on A Safe Harbor for Independent AI Evaluation & Red Teaming garnered 350+ signatures from leading researchers. Covered by the Washington Post, VentureBeat, and the Knight First Amendment Institute at Columbia University. Cited in the US Department of Justice’s letter to the US Copyright Office.
2024.01: The The Data Provenance Initiative wins the MIT Generative AI Impact Award, funded for $70,000.
2023.10: Launched the The Data Provenance Initiative, covered by the Washington Post, VentureBeat, and IEEE Spectrum.
2023.09: New paper on Foundation Model Transparency Index, covered by NYT, The Atlantic, and VentureBeat.
2023.05: New paper on A Pretrainer’s Guide to Training Data.
2023.01-05: Invited talks on ‘Effective Instruction Tuning: Data, Methods, & New Abilities’ at Apple, Oracle, Kailua Labs, Databricks, and Amazon.
2023.02-06: Co-instructor for MIT’s Generative AI course MAS.S68.
2023.03: A co-lead for Cohere for AI’s (C4AI) community research effort on Multilingual Instruction tuning.
2023.01: New paper on The Flan Collection. See the Google AI Blog post.