OII | Understanding news story chains using information retrieval and network clustering techniques

Published on
31 Jan 2018

Written by
Jonathan Bright

I have a new draft paper out with my colleague Tom Nicholls, entitled Understanding news story chains using information retrieval and network clustering techniques. In it we address what we perceive as an important technical challenge in news media research, which is how to group together articles that all address the same individual news event. This challenge is unmet by most current approaches in unsupervised machine learning as applied to the news, which tend to focus on the broader (also important!) problem of grouping articles in topic categories. It is in general a difficult problem, as we are looking for what are typically small “chains” of content on the same event (e.g. four or five different articles) amongst a corpus of tens of thousands of articles, most of which are unrelated to each other.

Our approach makes use of algorithms and insight drawn from the fields of both information retrieval [IR] and network clustering to develop a novel unsupervised method of news story chain detection. IR techniques (which are used to build things like search engines) especially haven’t been much employed in the social sciences, where the focus has more been on machine learning. But these algorithms were much closer to our problem as connecting small amounts of news stories is quite similar to the task of searching a huge corpus of documents in response to a specific user query.

The resulting algorithm works pretty well, though it is very difficult to validate properly because of the nature of the data! We use it to pull out a couple of interesting first order descriptive statistics about news stories in the UK, for example the graphic above shows the typical evolution of news stories after the publication of an initial article.

Just a draft at the moment so all feedback welcome!

Author

Dr Jonathan Bright

Research Associate

In 2022 Jonathan Bright became the Head of AI for Public Services at the Turing Institute, having previously been a faculty member of the OII. A political scientist, he specialises in computational and ‘big data’ approaches to the social sciences.

View profile

Related People

Tom Nicholls

Former Research Associate

Tom Nicholls is a Research Associate at the Oxford Internet Institute studying the analysis of news texts using computational methods and the development of new methods for studying political behaviour online.

View profile

Author

Dr Jonathan Bright

Related People

Tom Nicholls

Related Topics: