Skip down to main content

Doing the math ‘predicts’ which movies will be box office hits

Published on
21 Aug 2013
A mathematical model devised by OII researcher Taha Yasseri and his colleagues has proven to be very effective in predicting the box office success of newly-released films by using an analysis of Wikipedia activity.

Researchers have devised a mathematical model which can be used to predict films that become blockbusters or flops at the box office – up to a month before the movie is released. Their model is based on an analysis of the activity on Wikipedia pages about American films released in 2009 and 2010. They examined 312 movies, taking into account the number of page views for the movie’s article, the number of human editors contributing to the article, the number of edits made and the diversity of online users. The researchers from Oxford University, UK, the Central European University at Budapest, and Budapest University of Technology and Economics, Hungary, have published their findings in the journal PLoS ONE.

The model was applied retrospectively so the researchers systematically charted the online buzz on Wikipedia around particular films and compared this with the box takings from the first weekend after release. The results of the comparison between the predicted opening weekend revenue, using their mathematical model, and the actual figures (published in Internet Movie Database (IMDb)) showed a high degree of correlation.

Their mathematical algorithm allowed them to predict box office revenues with an overall accuracy of around 77 per cent. The study authors say this level of accuracy is higher than the best existing predictive models applied by marketing firms (which they estimate to be at around 57 per cent). They could predict the box office takings of six out of 312 films with 99 per cent accuracy where the predicted value was within one per cent of the real value. Some 23 movies were predicted with 90 per cent accuracy and 70 movies with an accuracy of 70 per cent and above.

The more successful the show, the more accurately the researchers were able to predict box office takings. In the study, they explain that this is possibly due to the increased amount of online data generated by films that turn out to be successes. The model correctly forecast the commercial success of Iron Man 2, Alice in Wonderland, Toy Story 3 and Inception, but failed to accurately forecast the financial return on less successful movies Never Let Me Go, and Animal Kingdom.


Dr Taha Yasseri, from the Oxford Internet Institute at the University of Oxford, said: ‘These results can be of great value to marketing firms but more importantly for us; we were able to demonstrate how we can use socially generated online data to predict a lot about future human behaviour. The predicting power of the Wikipedia-based model, despite its simplicity compared with Twitter, is that many of the editors of the Wikipedia pages about the movies are committed movie-goers who gather and edit relevant material well before the release date. By contrast, the ‘mass’ production of tweets occurs very close to the release time, and often these can be spun by marketing agencies rather than reflecting the feelings of the public.’

Co-author Prof. János Kertész, from the Central European University of Budapest, Hungary, said: ‘We have demonstrated for the first time that Wikipedia edit statistics provide us with another tool to predict social events. We studied the problem of predicting the financial success of movies and concluded that, in some aspects, forecasting based on Wikipedia outperforms tweets as Wikipedia activity has a longer timescale which enables earlier predictions.’

The study suggests that the efficiency of the predictions might be improved by applying more sophisticated statistical methods, such as including the controversy measure of an article. The mathematical model has not been applied yet to films that are not on release.

Access the paper

Mestyán, M., Yasseri, T., and Kertész, J. (2013) Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data. PLoS ONE 8 (8) e71226.