Skip down to main content

Advanced Statistics for Internet Research I

Key Information

Course details
Digital Social Research Option Paper Group A; Hilary Term
Assessment
Projects; Exam
Reading list
View now
Tutor
Dr Grant Blank

About

Multiple regression is probably the most commonly used statistical technique in the social sciences. This course emphasizes its application to research on the Internet and society.

This course covers multiple regression techniques including the use of categorical variables, dummy variables, and interactions. It shows how to assess the adequacy of a model using supporting graphical tools both before and after the regression, and by analysis of residuals. It explores diagnostics and corrective techniques for the four major data problems: outliers, collinearity, heteroscedasticity and non-linearity.

The goal is to give students the tools to allow them to use regression and related techniques effectively in their own research. The course is based on four themes.

  1. The focus is on selection and interpretation of statistical techniques, reaching sensible conclusions, figuring out causality, and making decisions, combining graphical, exploratory, and confirmatory approaches in ways that suggest how to improve our understanding in the light of data.
  2. This requires hands-on work with data through statistical software. All calculations are done using the software, not using hand calculations or calculators. Class lectures and discussions involve use of statistical software. Formative assignments require intensive statistical computing.
  3. A hands-on approach to understanding data directs attention away from the formal, theoretical, mathematical properties of statistical estimators, which is sometimes an emphasis in statistics classes. The course emphasizes ability to interpret the substantive significance of graphical and numerical computer output.
  4. The strong emphasis on data and use of software leads to a final theme: Data almost never come to researchers in a form appropriate for analysis; they must be converted into a suitable form. Thus the course teaches common forms of data manipulation and these are incorporated into the formative assignments.

Outcomes

At the end of the course students will be able to:

  • Understand the strengths and limitations of multiple regression;
  • Understand and interpret multiple regression coefficients and significance levels, for both continuous and categorical independent variables;
  • Understand and interpret fit statistics for models;
  • Be able to diagnose and correct the four major problems with regression: outliers, collinearity, heteroscedasticity and nonlinearity.