This goal of this project is to identify emerging research topics across time utilizing topic models and visualization techniques. The data utilized for this project is a corpus of Research and Development abstracts that is publicly available from Federal RePORTER. We built on prior work for this project by adding the 2019 data to our dataset and using the topic modeling techniques of Latent Dirichlet Allocation and Nonnegative Matrix Factorization. Using these topic model results we employed an emerging topic strategy to determine which topics are gaining (or waning) in popularity over time. We also created a dashboard for users to interact with topic model results and even create their own topic models about specific areas of interest, for example, pandemics.

Teaser Video:

Research Project Webpage:

Click here for more details about the project including findings, data, and methods.


Lara Haase

Lara Haase

Carnegie Mellon University, MS in Public Policy and Management – Data Analytics


Martha Czernuszenko

Martha Czernuszenko

The University of Texas at Austin, Information Systems & Canfield Business Honors Program

Liz Miller

Liz Miller

William & Mary, International Relations

Sean Pietrowicz

Sean Pietrowicz

University of Notre Dame, Applied and Computational Mathematics and Statistics


Kathryn Linehan

Research Scientist (Project Lead), Biocomplexity Institute, University of Virginia

Eric Oh

Research Assistant Professor, Biocomplexity Institute, University of Virginia

Stephanie Shipp

Deputy Division Director and Research Professor, Biocomplexity Institute, University of Virginia

Joel Thurston

Senior Scientist, Biocomplexity Institute, University of Virginia


National Center for Science and Engineering Statistics, Research & Development Statistics Program:

  • John Jankowski, Program Director
  • Audrey Kindlon, Survey Statistician
  • Chris Pece, Senior Analyst
  • Ronda Britt, Senior Analyst
  • Gary Anderson, Senior Science Resources Analyst