Introducing key facts about water and sanitation services

Review of Week 1 of Water Supply and Sanitation Policy in Developing Countries Part 1: Understanding Complex Problems (University of Manchester, Coursera)

As I have an engineering background I think I tend to default to viewing water, sanitation and hygiene (WASH) issues from an engineering (and maybe pseudo-social) perspective. So I thought it was important to gain, through this course, a better policy perspective.

I thought it was great that they started off by emphasising that water and sanitation issues were complex, and it is interesting that a business school is teaching a course on water and sanitation policy!

The introductory week looked at key factors and current patterns of water and sanitation, but I had doubts about the data from the study they presented. I couldn’t access the full paper to check out the methodology, but I posted a question on the forum. If I do get the full paper I might review it in detail.

Clarissa Brocklehurst, formerly Chief of WASH at UNICEF, was interviewed and did acknowledge the shortcomings of the WHO/UNICEF Joint Monitoring Program (JMP) data. For example, that existing data do not measure whether water is safe to drink or not. I am glad that she did because I am generally very skeptical of WASH data and statistics , having observed numerous broken water and sanitation facilities (which is not necessary captured by the data) as well the inability of surveys to collect accurate and nuanced information. (Disclaimer: I have not studied the JMP methodology in detail so my judgement may be hash. Something I should probably do one day!)

What surprised me, though, was that there were only a handful of mostly part-time staff working on the JMP. I am definitely in favour of dedicating more resources to collecting high quality data on WASH!

Functions and equations weigh me down!

Hoping subsequent modules will be less dense!

Review of Module 4 of Data Analysis for Social Scientists (MITx, edX) – Functions and Moments of Random Variables and Intro to Regressions

I must say that I struggled with this module. There were theorems and equations that my rusty mathematics brain took time to process. Yet my current workload doesn’t afford me the patience to grasp the concepts in depth. (I’m not offering it as an excuse, just context). I mean, just try to get a handle of this:

  • Law of iterated expectations: The expectation of the expectation of Y given X is equal to the expectation of Y
  • Law of total variance: The variance of Y is equal to the variance of the expectation of Y given X added to the expectation of the variance of Y given X

(To be fair, it’s easier to understand by writing the equation.)


My sad score for the ‘Functions of Random Variables’ section

But at least, I think, I got the general gist. Much of the module was about transforming random variables and deriving their probability distribution functions (PDF), followed by calculating moments (i.e. mean, median, mode). At the end Sarah gave an introduction to covariance/correlation, and the link between probability, random variables and data analysis is getting clearer.



Soldiering on!

I thought kernels were hard…

Joint distribution functions are brutal!

Review of Module 3 of Data Analysis for Social Scientists (MITx, edX) – Describing Data, Joint and Conditional Distributions of Random Variables

This week we learnt more about histograms and kernel functions, as well as joint distribution functions and how to calculate marginal distributions and conditional distributions.

Somehow I don’t think I had learnt about kernels before, a weighting function that is used to estimate probability distribution functions (pdfs) from histograms. I got a couple of finger exercises wrong, which goes to show that you should not try to study probability and statistics when you are sleep deprived. Try processing the phrase “first-order stochastically dominates” when your brain is running on slow motion.

The homework was challenging, but mathematics and programming get me into a state of flow so I was happy to lose myself reproducing some cool graphs!


Liking the colours 😉


​What’s the probability I remember about probability?

Things got more technical in the second week!

Review of Module 2 of Data Analysis for Social Scientists (MITx, edX) – Fundamentals of Probability, Random Variables, Joint Distributions and Collecting Data

My burning question for this module’s lectures was: What is Sara drinking?! Jokes aside, parts of this module took some thinkng as we delved into Set Theory,  Bayes’ Theorem, probability functions of random variables and things like that. These concepts were not unfamiliar to me. However, as I tried to remember when I learnt these concepts I started feeling rather old.

I have never seen this drink in my life! (Source: Screen capture of Sara’s lecture)

You might wonder how probability theory is related to data analysis, and I would struggle to explain it too, until Sara mentioned that data analysis is about understanding the joint distribution between two or more variables. That made sense! I had never thought of it in that way before. So go learn the fundamentals of probability people!

The module concluded with an introduction by Esther on collecting data. It’s simple really, you either use existing data or collect your own. The lecture provided a list of available databases, such as IPUMS, which includes harmonized census data from Indonesia (yay, useful!). I also got a clearer understanding of panel studies, where a group of participants are tracked over a period of time.

More on distributions next week… feels good to be exercising my mathematical brain!

The power of data and R

This week I started a data analysis MOOC!

Review of Module 1 of Data Analysis for Social Scientists (MITx, edX) – Introductory Lecture.

The introductory lecture of the course was really about showing how data could be visualised and interpreted to make (or be wary about making) conclusions about the world. Prof Duflo used a number of real life examples that demonstrated the power of data really well. I especially liked the network diagram, possibly because I’ve never had the opportunity to organise data in this way! Perhaps we should do one to map the volunteers at WISE. Would anybody like to take up the challenge? 😉

This module also included introductory exercises to R, which we will be using throughout the 12-week course. I self-taught myself R two years ago to analyse the survey data for my dissertation, but I had not used it since then. This course is a great opportunity to revisit, within a structured learning environment, quantitative data analysis methods and programming. I just heard a claim about how, in journalism, 90% of everybody forgets 90% of everything in 90 days (h/t Ezra Klein show with Ben Thompson). I think the same principle applies to any knowledge or skill we learn. Our brain get rusty if we don’t keep revising or practising!

The course will explore regression and econometrics, design of experiments, randomized control trials (and  A/B testing), machine learning, and data visualization. I’ve touched on these topics before (somewhere) but the topics now seem like a distant memory. I look forward to delving deeper in future weeks!