As quoted from Prof Duflo during the lecture…

**Review of Module 7 of Data Analysis for Social Scientists (MITx, edX) – Causality, Analysing Randomised Experiments, and Nonparametric Regression**

The lectures were very interesting and I thought I had grasped the overall concepts. However, I found the homework was extremely difficult to work through. I think not having problem sets to work through makes it hard to be comfortable/confident with the calculations.

Here’s a summary of what I learnt from the lectures:

**Causality**

We make causal statements all the time. Causality may be thought of as the effect of manipulating a cause, where we compare (our best approximation) of what would have happened absent that cause and what actually happened. The Rubin causal model, for example, considers potential outcomes. This forces us to think about the counterfactual. The problem of causal inference is that at most only one of the potential outcomes can be realised, which means we are missing a lot of data about other potential outcomes. Complete randomisation would eliminate selection bias where there are underlying differences between those in the treatment group and selection group.

**Analysing randomised experiments**

Without knowledge of regression, it is very easy to analyse to completely randomised experiments through the Fisher exact test and Neyman’s approach. In designing an experiment, the power calculation helps us determine the sample size required, although there are many assumptions involved!

Randomised controlled trials (RCTs) are considered the gold standard, which has traditionally been used in clinical trials. In practice, there are incentives for selective reporting. Mitigating solutions include registries and pre-analysis plans. RCTs also have a long history in the social sciences, and randomisation is also used in web design and marketing.

**Non-parametric regressions**

Kernel regression is one common way to express the relationship between two variables.