I’m confident!

Into statistics proper

Review of Module 6 of Data Analysis for Social Scientists (MITx, edX) – Assessing and Deriving Estimators – Confidence Intervals, and Hypothesis Testing

The material felt much easier to grasp compared to the previous weeks. Maybe because I had been regularly using confidence intervals and hypothesis testing, so the concepts were familiar to me. It would be interesting to have the class’s average scores to check if the topic is just easier for the class in general.

Here is a brief summary of what was taught this week:

Criteria to consider when assessing estimators are bias, efficiency (given by the mean squared error), consistency, ease of computation, robustness (to the underlying assumptions of the distribution).

Frameworks for deriving estimators are the method of moments, maximum likelihood estimator (MLE) and dreaming them up 😉 MLE is efficient, but can be biased, difficult to compute and are not as robust as method of moments.

The confidence interval quantifies reliability. Typically, this can be constructed based on the normal or t-distribution. Correspondingly, hypothesis testing  is used to assess whether there is enough evidence to contradict some assertion about a population, given a random sample from the population. It can be characterised by significance level and power, based on Type I and Type II errors.



What is sociology?

I finally understand what sociology is!

Review of Video #1 of Crash Course Sociology.

This was a great introductory video because I felt like I finally genuinely understood what sociology was, although given that sociology is so broad, I would probably confuse sociology with the other social sciences. I also realised that I may be a sociologist at heart, ever since I realised at university that engineering was not the solution to many communities’ problems.

Here’s a summary of some of the concepts I learned:

Sociology is the study of society and human behaviour at and across every level, done objectively through controlled and repeated observation. A society is a group of people who share a culture and territory. This makes sociology really broad, because culture influences everything and everything is a product of society.

Therefore, sociology looks for patterns in all kinds of things in all kinds of places. The sociological perspective – my first time hearing of this – comprises trying to understand social behaviour by placing it in its wider social context (the general in the particular) and approaching the everyday world as though you were seeing it for the first time (seeing the strange in the familiar), in order to uncover patterns of behaviour in a culture.

Key concepts on social location, marginlisation, as well as power and inequality were also introduced, and I’m sure we will delve into those in future videos!

Things get easier (or so I thought)

The week we started on statistics!

Review of Module 5 of Data Analysis for Social Scientists (MITx, edX) – Special Distributions, the Sample Mean, the Central Limit Theorem, and Estimation

This week’s lectures and finger exercises seemed much easier to grasp than the previous few weeks (phew!). Then I got to the homework and it was disastrous. I think this is one of those modules that require lots of practice to properly understand.

Here’s a brief summary of what I learned this week.

Human subject research

Nazi human experimentation and the Tuskegee Syphilis Study raised the issue on the ethics of conducting research on human subjects. According to the Belmont report, research is defined as “any investigation conducted with the goal of creating generalisable knowledge”, which means studies conducted for internal use are not considered research. Criteria for appropriate human subject research is based on beneficence, justice and respect.

At this point I would note that many developing country institutions do not have an ethics approval board. I do not know the exact reasons why, I am sure it is complex, but it affects quality of research and ability to publish in well-known journals.

Special distributions

Esther went through the characteristics of Bernouilli, Binomial, Hypergeometric, Poisson, Uniform, and Exponential distributions.

Sample mean and Central Limit Theory

The sample mean is useful because it allows you to estimate characteristics of a phenomenon’s underlying distribution.

Regarding the distribution of the sample mean, the Central Limit Theorem states that as the sample size gets bigger, the standardised version of the sample mean approximates to a normal distribution. This means that we do not need to know about the distribution we are sampling from in order to know about the behaviour of the sample mean. (And I agree with Sara that this is pretty cool).


Statistics is the study of estimation and inference, where estimation refers to estimating the parameters that govern an observed stochastic process or phenomenon which we know or assume follow a certain distribution.

An estimator is the function of a random sample, while an estimate is the realisation of the random sample.

Political economy of water and sanitation policy in developing countries

Review of Week 2 of Water Supply and Sanitation Policy in Developing Countries Part 1: Understanding Complex Problems (University of Manchester, Coursera)

The lectures started off by saying that people in the water and sanitation sector were ‘full of passion but short of strategy’, something that I tend to agree with. Water and sanitation is complex, and you can’t address the situation without understanding the root causes. Developing a careful description of the problem will give insights into the solution.

The lectures focused a lot on corruption, and Prof Whittington gave six reasons why corruption is a particular problem for water and sanitation. The reason I probably agree with the most in this context is price in-elasticity of demand for water. In my mind, this reduces the bargaining power of customers to negotiate a fair price, allowing service providers and related stakeholders to take advantages.

The studies on corruption were interesting. However, the latest paper cited was from 2004. I’d be interested to see if there were any more recent studies as I would assume that situations would have evolved, at least in some countries, due to changes in political and economic situations, technologies and so on.


Introducing key facts about water and sanitation services

Review of Week 1 of Water Supply and Sanitation Policy in Developing Countries Part 1: Understanding Complex Problems (University of Manchester, Coursera)

As I have an engineering background I think I tend to default to viewing water, sanitation and hygiene (WASH) issues from an engineering (and maybe pseudo-social) perspective. So I thought it was important to gain, through this course, a better policy perspective.

I thought it was great that they started off by emphasising that water and sanitation issues were complex, and it is interesting that a business school is teaching a course on water and sanitation policy!

The introductory week looked at key factors and current patterns of water and sanitation, but I had doubts about the data from the study they presented. I couldn’t access the full paper to check out the methodology, but I posted a question on the forum. If I do get the full paper I might review it in detail.

Clarissa Brocklehurst, formerly Chief of WASH at UNICEF, was interviewed and did acknowledge the shortcomings of the WHO/UNICEF Joint Monitoring Program (JMP) data. For example, that existing data do not measure whether water is safe to drink or not. I am glad that she did because I am generally very skeptical of WASH data and statistics , having observed numerous broken water and sanitation facilities (which is not necessary captured by the data) as well the inability of surveys to collect accurate and nuanced information. (Disclaimer: I have not studied the JMP methodology in detail so my judgement may be hash. Something I should probably do one day!)

What surprised me, though, was that there were only a handful of mostly part-time staff working on the JMP. I am definitely in favour of dedicating more resources to collecting high quality data on WASH!

Functions and equations weigh me down!

Hoping subsequent modules will be less dense!

Review of Module 4 of Data Analysis for Social Scientists (MITx, edX) – Functions and Moments of Random Variables and Intro to Regressions

I must say that I struggled with this module. There were theorems and equations that my rusty mathematics brain took time to process. Yet my current workload doesn’t afford me the patience to grasp the concepts in depth. (I’m not offering it as an excuse, just context). I mean, just try to get a handle of this:

  • Law of iterated expectations: The expectation of the expectation of Y given X is equal to the expectation of Y
  • Law of total variance: The variance of Y is equal to the variance of the expectation of Y given X added to the expectation of the variance of Y given X

(To be fair, it’s easier to understand by writing the equation.)


My sad score for the ‘Functions of Random Variables’ section

But at least, I think, I got the general gist. Much of the module was about transforming random variables and deriving their probability distribution functions (PDF), followed by calculating moments (i.e. mean, median, mode). At the end Sarah gave an introduction to covariance/correlation, and the link between probability, random variables and data analysis is getting clearer.



Soldiering on!