Review of Module 10 of Data Analysis for Social Scientists (MITx, edX) – Intro to Machine Learning and Data Visualisation
This week’s topics were very interesting. In particular, I gained a better technical understanding of machine learning (prediction) versus estimation. And I love making pretty graphs so the data visualisation lecture was fun too!
Traditionally, the artificial intelligence approach to computation problems has been to imitate how humans complete the task (e.g. sentiment analysis in speech). This approach stalled because of the subtleties and variations involved.
Machine learning takes a very different approach. It turns any “intelligence” task into an empirical learning task by specifying what is to be predicted and what is used to predict it. Applications of machine learning include image classification, visual recognition and speech interpretation. And it can also be useful for for constructing measures of unobservable characteristics (e.g. measuring corruption) and designing policies which rely on our ability to predict (e.g. poverty scorecard).
Unlike estimation, the coefficients obtained from machine learning are not meaningful. Machine algorithms do not provide unbiased, consistent estimators. However, they can still be useful in providing clues as to what variables are meaningful for estimation.
The graphical representation of data is important, especially for communicating results. When people read papers, they tend to look at the graph first because it is attractive. Therefore, graphs should be interpretable on its own. Besides for communicating results, data visualisation helps guide the analysis of results.
Robert Kosara defined data visualisation as follows:
- Based on (non-visual) data
- Produces an image
- Results must be readable and recognisable
During the lecture, Prof Duflo discussed principles of good data visualisation and common mistakes.