At Home with Over 170 Million News Stories: My Assistantship with the Cline Center over Covid-19

James Steur



At Home with Over 170 Million News Stories: My Assistantship at the Cline Center over COVID-19 


 “We make the data. You change the world.”


 In the spring of 2020, COVID unraveled my world. I didn’t know when I’d hug my family, drink coffee with my friends, and I worried about the looming financial and medical consequences facing the country. And, perhaps most challengingly, my workload as a PhD student persisted as normal: I worked on writing a dissertation proposal, attended committee meetings, and looked for a summer assistantship to refine my skills in data analysis. Amidst the chaos and exhaustion, I never expected to be offered a graduate research position at the Cline Center for Advanced Social Research—home of over 170 million news stories—in the summer of 2020 that was partially funded by the Center for Global Studies. Without a doubt, I can say that my assistantship at the Cline Center has offered me a supportive community and a newfound sense of purpose in using large scale data analytics to help change the world.


James Steur, PhD Student in Political Science, @JamesSteur


The mission of the Cline Center is simple: generate data to address key challenges that threaten human flourishing. We tackle important topics such as climate change, civil unrest, and public health by applying advanced computation techniques to extract structured data from unstructured texts at extreme scales. My focus has been supporting the Center’s unique Global News Index. This database contains  meta-data and extracted features from over 170 million news reports published between1945 and the present (2021), representing news output from every country in the world. The web crawler operation, which adds roughly 30,000 news stories each day from over 5,000 English-language news sources around the world, has collected features from over 70 million news stories published between 2006 and today. Projects from the Global News Index touch on a variety of topics like terrorism coverage, civil unrest, and media coverage of COVID-19. 


With all these impressive resources, the Cline Center staff has a limited capacity to support the dissemination of these materials across campus. Furthermore, the sheer volume of data, number of possible ways to analyze the data, and large assortment of variables can be daunting for even the most experienced researcher. This is one place where I have offered assistance. I have updated the Global News Index Codebook and other user guides, developed outreach materials by creating new user forms, written emails inviting researchers to try out the Archer system, and, most importantly, created a presentation and tutorial on using Archer—a user-friendly interface for querying and analyzing data from the Global News Index. 


During the Spring of 2021, we presented this accessible version of the Global News Index over Zoom to 25 faculty, staff, and students with a Savvy Researcher Workshop, and also did a presentation with assistance from the Center for Social & Behavioral Sciences. Ultimately, these presentations inform the broader campus about this rich dataset that they may be unaware of that can further their projects, increase the number of individuals using Archer, and cement the Cline Center’s reputation as a leader in replicable data practices across campus.


I also serve as one of the main points of contact for individual consultations over email or Zoom with faculty and students who request additional assistance on their projects. While the consultations happen over Zoom or email during COVID, the scope of the projects and their impact are impressive: two notable examples include Dr. Jodi Schneider’s ongoing work to examine if public health emergencies, like COVID-19 and the opioid crisis, are polarized across news outlets, and Dr. Alyssa Prorok’s work on generating new data that clarifies the determinants and effects of cooperation between belligerents in civil wars. I am eager to see how these research projects progress into high quality datasets and academic articles. Past projects include an article published in the Proceedings of the National Academy of Sciences of the United States of America  about the lack of media coverage on the decline of bees and the Coup D’état Project that was featured in the Washington Post, along with a critical statement about classifying the January 6, 2021 Assault on the US Capitol as a coup. 


A weekly communication team meeting with Dr. Jay Jennings, Dan Shalmon, Joe Bajjalieh, and James Steur.


Last, there is an analytical component to my role where I prepare and analyze data to inform the Cline Center on best practices around issues of data replicability, access, and future decision-making. While data collection technology has advanced exponentially over the last decade, the rules governing collection of  digital news content continue to pose challenges. These issues can limit the replicability and utility of news archives. For example, web pages and news feeds expire, and paywall regulations change frequently. On an ongoing basis, I analyze different components of the Cline Center’s data to better understand the nature of the Global News Index’s data and search for ongoing issues. In particular, I’ve performed data wrangling to analyze the validity of the Cline Center’s data collection process across multiple domains and RSS feeds to inform future development of the Center’s web crawler systems. I’ve also helped publish a toy dataset from the Global News Index by interacting with individuals from the Illinois Data Bank.


I would be remiss if I failed to mention the invaluable contributions of the GRA who held this position before me from January 2019–May 2020: Jenna Jordan, now an Illinois alumna with a Master's in Library and Information Science. One of Jenna’s most notable contributions during her time at the Cline Center was the creation of the GNI Codebook, Quick Reference Tables, Archer User Guide, & Archer Quick Start Guide that she wrote (published on the Illinois Data Bank). This set of documents contains detailed guides on how to use Archer, along with appropriate documentation that highlight the intricacies  of the variables in the Global News Index. These resources, in turn, have increased the accessibility and useability of the Cline Center’s Global News Index for a community of users with a wide range of skills in large-scale data analysis.


Jenna Jordan, MS in Library & Information Science, @JennaJrdn    


While COVID has changed the world around us, I can say without hesitation that my work at the Cline Center has been a grounding force for me during these challenging times. Beyond the development of my skills with querying large databases, working in a remote environment, and the myriad of other ways this position has benefited me and the broader scholarly community, I’m most grateful for the opportunity and sense of community I’ve found at the Cline Center during the pandemic. In particular, Dr. Scott Althaus, Joe Bajjalieh, and Dan Shalmon have contributed to my development and growth immeasurably, and for that I am deeply grateful. Finally, I thank the Center for Global Studies for their generous funding of my position—this opportunity would not have been possible without their support.  


James Steur is a doctoral student in political science at the University of Illinois at Urbana-Champaign. His research interests include quantitative methods, political psychology, and the role of emotions in citizens’ decision-making. You can connect with James on LinkedIn and on Twitter at @JamesSteur


This piece is the author’s work, and neither it nor its components should be attributed to their workplace.