Skip down to main content

Basic Machine Learning with WEKA for Social Scientists

Date & Time:
00:00:00, Thursday 28 April -
00:00:00, Thursday 12 May, 2016


The goal of data mining is to extract knowledge from datasets in human-understandable structures by applying machine-learning algorithms. In recent years, data mining has been used widely in different areas of science and engineering. Nevertheless, the social sciences still rely mostly on classical statistical methods for data analysis. One possible reason of the absence of data mining techniques in the social sciences may be explained by the difficulty of programming such methods with many of the available tools, which often remain outside the skill set of many social researchers.

This three-day workshop proposes to address this issue by introducing WEKA, a collection of machine learning algorithms for data mining tasks. WEKA makes it possible for social scientists to mine their datasets in a straightforward way and gain new insights into their data. The advantage of WEKA is that it is an intuitive option to explore data through visualization and providing simple statistics of each data item. In addition, for many small to mid-size datasets, WEKA provides options for quick deployment of learning algorithms and their evaluation with little effort.

Topics to be covered

  1. Introduction to Data Mining concepts
  2. Getting familiar with WEKA
    1. Explore the Explorer
    2. Exploring datasets
    3. Building a classifier
    4. Using filters
    5. Visualizing data
  3. Clustering
    1. K-Means
    2. Expectation-Maximization
    3. Other methods
  4. Evaluation
    1. Training and testing
    2. Repeated training and testing
    3. Cross-validation
  5. Simple classifiers
    1. ZeroR and OneR
    2. Overfitting
    3. Probabilities
    4. Decision trees
    5. Nearest neighbour
    6. Linear and Logistic regression

Who should take this course?

This workshop is designed specifically for “beginners” in the field of data mining with some basic knowledge in statistics (e.g., mean, standard deviation, variance, etc). Ideally, it is for people who already have collected data and want to find new ways to explore and analyse the dataset

Participants are required to bring a laptop computer with WEKA installed so they can try out the practical examples in class (get in touch with the workshop organiser if you have problems installing WEKA).


A 3-day workshop with 3 hours per day over three weeks.

  • Thursday, 28 April 2016 14h00-17h00
  • Thursday, 5 May 2016 14h00-17h00
  • Thursday, 12 May 2016 14h00-17h00

Optional readings

This book may provide guidance for the course but is not a requirement for the workshop:

Ian H. Witten and Eibe Frank. 2011. Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Data Dump to delete


  • Dr Ruth Olimpia García Gavilanes
  • Name: Dr Ruth Olimpia García Gavilanes
  • Affiliation: Oxford Internet Institute, University of Oxford
  • Role:
  • URL:
  • Bio: