Big Data Analytics
“Big data is a very useful source of information for social research, but handling it is not that easy: it requires technical skills and analytical methods that are specific to these type of data.”
Big data, the real time streams of transactional records of our daily activities, hold major promise for (computational) social science. However, to be able to collect, clean, analyse, model, and interpret these data, a high level of technical skills is required. In many cases the analytical techniques can be adopted from natural sciences, and in others, have to be invented within the framework of “Data Science” in order to “make sense” of the data under study.
In this course some of these techniques will be introduced and applied to real world datasets. The main focus of the course however is not on data collection and data cleaning but the statistical analysis, manipulating, and making sense of the already prepared data. Each week we introduce a set of related tools and techniques and provide the students with the appropriate datasets such that they could apply the introduced methods already within the course and diagnose and discuss the problems and potential challenges.
No prior computational or analytical skills are required; however, familiarity with basic mathematics and the basics of computer algorithms would be very helpful throughout the course.
Outcomes: Upon course completion students will: Have knowledge of a range of computational and numerical skills to analyse big amount of socially generated data; Be familiar with the limitations and natural shortcomings associated with data-driven social research and the tools enabling them to cope with them; Have hands-on familiarity with related software to analyse and manipulate large sets of data.
Past projects: Past projects on this course have analysed the distribution of Wikipedia article length to establish a better measure of language edition size to better understand growth and sustainability of the project, and have investigated popularity patterns of posts on Instagram to model meme virality.