3S Methodology: Scalability

Summary of Week 4 of Technology Evaluation for Global Development(edX – MITx)

CITE defines scalability as a firm’s capability to expand and meet customer demand, taking into account its supply chain configuration, costs, constraints, context and risks. Scalability is important because it creates a supply of goods, provides returns to investors, and grows the economy. Evaluation findings can provide insights into novel ideas for processes and business models.


The unit of analysis is the product distinguished by the brand and/or model, while the scalability unit of analysis are the processes which span all the parties involved in fulfilling a customer’s request (the supply chain). The supply chain can be characterised as follows:

  • Parties: Supplies, manufacturers, distributors, retailers, and consumers. Brand owners are typically manufacturers, but can be retailers and sometimes distributors.
  • Stages: Procurement (e.g. sourcing, procurement management), production (facilities, production planning), distribution (e.g. inventory echelon, warehouse management, transportation management), sales (sales channels, manufacturer representative, commercial, mission, distributor, broker), and after sales (warranty duration and type)
  • Flows: Material flow, financial flow and information flow

The scalability criteria for evaluating supply chains are:

  • Affordability (including landed cost, retail price, total cost of ownership, financing)
  • Accessibility (i.e. customer reach)
  • Availability (i.e. throughput capacity and inventory)
  • After sales service


CITE’s approach to product evaluation

Summary of Week 2 of Technology Evaluation for Global Development (edX – MITx)

Evaluation is needed due to an oversupply of products.

History of evaluation

Emerging trends in evaluation within global development include: an emphasis on quality and standards to ensure that research and monitoring evaluation is being done effectively; capacity building of developing country governments and entities to be able to evaluate, and; a focus on impact as well as process evaluations. More experimental and non-experimental approaches are being developed.

Criteria, Metrics, Weightings

To make an informed decision, consider who needs the information, which criteria are needed to make the decision, and how performance will be measured? The criteria (and sub-criteria) should be brought together in an easy-to-understand format that helps facilitate a decision.

The products chosen for evaluation are critical. Not everything has to be included but market leaders and some emerging technologies should not be left out. Design of experiments , a discipline concerned with planning experiments and analysing data, can inform the choice of products.

Scoping studies

A scoping study aims to quickly ramp up knowledge of the product family and context, narrow the scope of the evaluation, identify products that are available in the evaluation study area, identify the products that are available in the evaluation study area, and define the metrics that will be tested in the evaluation. It has 7 components: detailed product description, research questions, target users, major stakeholders, context study, use cases and criteria and metrics.

A scoping study can have a desk-based and field-based component.

Endogeneity, Instrumental Variables, and Experimental Design

Review of Module 11 of Data Analysis for Social Scientists (MITx, edX) – Intro to Machine Learning and Data Visualisation

Endogeneity problems can occur when there is simultaneous causality (i.e. the outcome variable affects the regressor of interest). Examples include health and exercise.

Instrumental variables are a way to indirectly measure causal relationships. For example, randomly assigned scholarships can be used as an instrument for education. One challenge with using instrumental variables is that the instrument should not have a direct effect on the outcome. For example, it can be argued that scholarships create confidence which then, together with years of education, increases test scores.

When designing experiments, things to think about are: what is being randomised; who is being randomised; how is randomisation introduced; and how many units are being randomised. Randomisation could be simple, through stratification or by clustering. Experimental designs include phase-in, randomising at the cutoff, encouragement design, etc.

Machine Learning and Data Visualisation

No homework (secret cheer)

Review of Module 10 of Data Analysis for Social Scientists (MITx, edX) – Intro to Machine Learning and Data Visualisation

This week’s topics were very interesting. In particular, I gained a better technical understanding of machine learning (prediction) versus estimation. And I love making pretty graphs so the data visualisation lecture was fun too!

Machine learning

Traditionally, the artificial intelligence approach to computation problems has been to imitate how humans complete the task (e.g. sentiment analysis in speech). This approach stalled because of the subtleties and variations involved.

Machine learning takes a very different approach. It turns any “intelligence” task into an empirical learning task by specifying what is to be predicted and what is used to predict it. Applications of machine learning include image classification, visual recognition and speech interpretation. And it can also be useful for for constructing measures of unobservable characteristics (e.g. measuring corruption) and designing policies which rely on our ability to predict (e.g. poverty scorecard).

Unlike estimation, the coefficients obtained from machine learning are not meaningful. Machine algorithms do not provide unbiased, consistent estimators. However, they can still be useful in providing clues as to what variables are meaningful for estimation.


Data visualisation


The graphical representation of data is important, especially for communicating results. When people read papers, they tend to look at the graph first because it is attractive. Therefore, graphs should be interpretable on its own. Besides for communicating results, data visualisation helps guide the analysis of results.


Robert Kosara defined data visualisation as follows:

  • Based on (non-visual) data
  • Produces an image
  • Results must be readable and recognisable

During the lecture, Prof Duflo discussed principles of good data visualisation and common mistakes.

Practical issues and omitted variable bias

Review of Module 9 of Data Analysis for Social Scientists (MITx, edX) – Practical Issues in Running Regressions, and Omitted Variable Bias

It has been challenging to fully understand the technical concepts taught in this course, as well as use R to complete the homework, given my intense workload. So, I have settled for understanding the general ideas, and I hope to revisit this (or a similar course) again in the future. Anyway, since I have already gained enough credit to pass, there is no need to work too hard 😛 (joking!)

A random selection of what I learnt this week…

Most statistical packages will provide the F-test (all coefficients = 0) and t-test (individual coefficients = 0). The standard error is also provided – this allows the confidence interval of the coefficient to be constructed.

Prof Esther discussed some practical issues in running regressions, including regressions with categorical variables, and interaction effects. With her examples I got a much better feel of how to interpret linear regressions.

It is possible to use a linear regression framework when the relationship between the independent and dependent variable is non linear. For example, polynomial models can be used to transform non-linear relationships. Regression discontinuities were also discussed.

Finally, omitted variable bias occurs when a model created incorrectly leaves out one or more important factors. The “bias” is created when the model compensates for the missing factor by over- or underestimating the effect of one of the other factors (Source: Wikipedia).

Single and multivariate linear models

Review of Module 8 of Data Analysis for Social Scientists (MITx, edX) – Single and multivariate linear models

Estimating the parameters of joint distributions can be used for prediction, determining causality and just understanding the world better. In linear regression, the regression coefficients can be estimated by using least squares, least absolute deviations or reverse least squares. By performing an analysis of variance, we can get a measure of the goodness-of-fit of the regression obtained. Linear regressions can also be used for non-linear relationships.

In the lectures, Prof Sara discussed the single and multivariate linear models and their assumptions in details, but I will not get into that here!

I was on the road this week and rushed to get the module done. So, I didn’t quite absorb everything, but I think the general concept of fitting relationships between variables is quite straightforward and the homework was not too challenging (unlike the week on Functions of Random Variables!).

Randomisation is not a substitute for thinking

As quoted from Prof Duflo during the lecture…

Review of Module 7 of Data Analysis for Social Scientists (MITx, edX) – Causality, Analysing Randomised Experiments, and Nonparametric Regression

The lectures were very interesting and I thought I had grasped the overall concepts. However, I found the homework was extremely difficult to work through. I think not having problem sets to work through makes it hard to be comfortable/confident with the calculations.

Here’s a summary of what I learnt from the lectures:


We make causal statements all the time. Causality may be thought of as the effect of manipulating a cause, where we compare (our best approximation) of what would have happened absent that cause and what actually happened. The Rubin causal model, for example, considers potential outcomes. This forces us to think about the counterfactual. The problem of causal inference is that at most only one of the potential outcomes can be realised, which means we are missing a lot of data about other potential outcomes. Complete randomisation would eliminate selection bias where there are underlying differences between those in the treatment group and selection group.

Analysing randomised experiments

Without knowledge of regression, it is very easy to analyse to completely randomised experiments through the Fisher exact test and Neyman’s approach. In designing an experiment, the power calculation helps us determine the sample size required, although there are many assumptions involved!

Randomised controlled trials (RCTs) are considered the gold standard, which has traditionally been used in clinical trials. In practice, there are incentives for selective reporting. Mitigating solutions include registries and pre-analysis plans.  RCTs also have a long history in the social sciences, and randomisation is also used in web design and marketing.

Non-parametric regressions

Kernel regression is one common way to express the relationship between two variables.