When a news event breaks, thousands of people tweet about it simultaneously — in African American English, Hispanic aligned Language, White aligned English, and everything in between. If you ask an AI to summarize those tweets, it should represent all of those voices. It doesn't. Summarization models consistently over-represent some dialect groups and under-represent others, even when every group contributes equally to the input.
This research program asks why that happens and how to fix it. Over six papers, the lab built new datasets, exposed specific mechanisms of bias (like input ordering effects), developed algorithms that achieve equal representation without losing summary quality, and uncovered a deeper problem in how LLMs use context at all.
Six papers over four years, each building on the last.
For detailed experimental results, data tables, and figures, refer to the original papers below.