Shayne Longpre

AI Research Scientist

MIT

About me

Hey, I’m a PhD Candidate at MIT. My research focuses on the intersection of AI and policy: responsibly training, evaluating, and governing general-purpose AI systems. I lead the Data Provenance Initiative, led the Open Letter on A Safe Harbor for Independent AI Evaluation & Red Teaming, and have contributed to training models like Bloom, Aya, and Flan-T5/PaLM. I’m thankful for the recognition my research has received: Best Paper Awards from ACL 2024, NAACL 2024, as well as coverage by the NYT, Washington Post, Atlantic, 404 Media, Vox, and MIT Tech Review.

Prior:

Google Brain Student Researcher (2022), collaborating with Barret Zoph, Jason Wei.
Applied ML at Apple.
Research at Stanford NLP lab, advised by Chris Manning and Danqi Chen.
Research at Salesforce Research, supervised by Caiming Xiong and Richard Socher.
Stanford University: Economics BA, Computer Science MS, rowing, soccer, creative writing.

See my full resume here, and full list of publications here.

Recent News

2025.03: Our new position paper on Third-Party AI Flaw Disclosure and blog are released, covered by Wired.
2025.01: Core writing team for the International AI Safety Report.
2024.12: Lead organizer for The Future of Third-Party AI Evaluation Workshop, recorded here.
2025.10: Multimodal Data Provenance accepted to ICLR 2025. Covered by MIT Tech Review.
2024.08: Aya Model wins Best Paper Award at ACL 2024.
2024.08: Consent in Crisis accepted to NeurIPS 2024. Covered by the NYT, 404 Media, Vox, and Yahoo! Finance.
2024.07: A Pretrainer’s Guide to Training Data wins Outstanding Paper Award at NAACL 2024.
2024.07: 3 Oral and 1 Spotlight paper accepted to ICML 2024: (1) Safe Harbor, (2) Societal Impact of Open Foundation Models, (3) AI Autonomous Weapons Risk Geopolitical Instability, and (4) Data Authenticity, Consent, and Provenance for AI Are All Broken: What Will It Take to Fix Them?.
2024.06: The Data Provenance Initiative was awarded the Mozilla Data Futures Lab grant and wins the MIT Generative AI Impact Award, funded for $70,000. Presented at MozFest 2024.

Select AI/ML Publications

The Flan Collection. Designing Data and Methods for Effective Instruction Tuning

Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Jason Wei, Adam Roberts

ArXiv Google AI Blog

The Flan Collection. Designing Data and Methods for Effective Instruction Tuning

Scaling Instruction-Finetuned Language Models

Hyung Won Chung, Le Hou, Shayne Longpre, Jeff Dean, Adam Roberts, Quoc V Le, Jason Wei

Scaling Instruction-Finetuned Language Models

You Reap What You Sow: On the Challenges of Bias Evaluation Under Multilingual Settings

ACL 2022 BigScience Workshop.

Zeerak Talat, Aurélie Névéol, Stella Biderman, Miruna Clinciu, Manan Dey, Shayne Longpre

OpenReview ACLAnthology

You Reap What You Sow: On the Challenges of Bias Evaluation Under Multilingual Settings

Active Learning Over Multiple Domains in Natural Language Tasks

NeurIPS Workshop on Distribution Shift 2022.

Shayne Longpre, Julia Reisler, Edward Huang, Yi Lu, Andrew Frank, Nikhil Ramesh, Chris DuBois

Active Learning Over Multiple Domains in Natural Language Tasks

Entity-Based Knowledge Conflicts in Question Answering

Shayne Longpre, Kartik Perisetla, Anthony Chen, Nikhil Ramesh, Chris DuBois, Sameer Singh

Dataset ArXiv Apple ACLAnthology

Entity-Based Knowledge Conflicts in Question Answering

Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP

Anthony Chen, Pallavi Gudipati, Shayne Longpre, Xiao Ling, Sameer Singh

Dataset ArXiv Apple ACLAnthology

Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP

See all publications

Economics Publications

Invigorating Competition in Social Networking: An Interoperability Remedy that Addresses Data Network Effects and Privacy Concerns

CPI Antitrust Chronicle, June 2021

Jun 15, 2021 12:00 AM

Cristian Santesteban, Shayne Longpre

How Big Data Confers Market Power to Big Tech: Leveraging the Perspective of Data Science

The Antitrust Bulletin (Vol 65, Issue 3) – September 2020

Jun 24, 2020 12:00 AM

Cristian Santesteban, Shayne Longpre

SSRN The Antitrust Bulletin