Location: India

Some More Reflections on RCTs

20180601 Development Day Image Image 01

In preparation of next year’s elections, the Swedish government chose recently to replace the Minister for International Development Cooperation. During her long mandate, former Minister Gunilla Carlsson championed the importance of aid evaluation and result focus, and managed to move aid from a quiet consensus to become a hotly debated topic. She also closed down the aid evaluation agency SADEV, following the publication of critical reviews about the work of the agency. Now, an expert group is in charge of rethinking and redesigning development policy evaluation and planning. One of the tools under consideration is randomized control trials (RCTs). This is an area in which Swedish development cooperation has no previous experience. Here are some reflections on RCTs.

In recent years, the methods of development economics have been crucially altered by the introduction of randomized control trials (RCTs). The idea behind RCTs is that development policies can be evaluated similarly to clinical trials in medicine, where subjects are randomly assigned to receive a treatment or to function as a reference or control group. The main benefit of this approach is that the random assignment allows for an estimation of the effect of the treatment (that is, the policy in question), while avoiding unobservable confounding factors or selection issues (see more about the advantages of the method in Banerjee et al. (2008)).

The diffusion of experimental methods in development economics has undoubtedly been a revolution in the academic and, if not yet fully, in the policy world. In the blogosphere there has even been talk of awarding Sveriges Riksbank’s Prize in Economic Sciences in Memory of Alfred Nobel, informally called the Nobel Prize of Economics, to the MIT couple Banerjee – Duflo. Due to their young age and the closeness in time of their contribution, this would be a ”shock” prize meant to give a strong signal. Their creation, the Abdul Latif Jameel Poverty Action Lab (J-PAL), stands for a new approach to both scientific and policy work in development that is a fantastic contribution, and definitely has the connotation of seminal.

However, it might be too early for the profession to sanction a method that has much good to show for, but also potentially undesired consequences. In the camp of critics there are heavy weights such as Angus Deaton and Dani Rodrik of Princeton, and the World Bank’s Philip Keefer and Martin Ravallion. The core of their position is of course not to deny the merits of RCTs, but to advocate their use in the right way and, in particular, as one tool among many others, with important complementarities to the others.

Some points in this context are often made, well understood and widely accepted: the limits of the approach per se, in particular the problem of external validity (the question of how generally applicable are the findings from such studies); the conflict between short-run and long-run implications, especially with respect to some policy areas (support to institution-building among others), and the incentives of policy actors. Another brief in this series by Anders Olofsgård spells out these points very clearly and references to further readings for those interested.

One aspect I find to be missing in the debate is a reflection on what impact this new method has on the three main actors involved, namely the researchers and practitioners in development and their way of working, and the people living in the countries and regions where these studies take place. This will therefore be the focus of this brief.

The Impact on the Scholarly Profession

The creation of experimental infrastructures and the popularity of the RCT methodology have rubbed off on the rest of the empirical practice in development economics and beyond, with ever-increasing demands and expectations on the econometric identification of new studies. However, when it comes to what is possibly the main weakness of RCTs as compared to most observational studies, namely external validity, the corresponding demands and expectations on how this is dealt with seem to fall behind. As pointed out in Rodrik (2008), it is enough to compare the number of pages spent on describing the identification in an average observational study to that on external validity in an average RCT-based paper. If the purpose is to learn “what works in development”, as opposed to “what worked once for a set of 25 primary schools in Uttar Pradesh faced with high drop out rates” [1], it is natural to expect the researcher that really wants to serve this purpose to provide for a desired generality of her findings. With no generality, the findings may be of limited practical use to politicians and practitioners who need to choose a policy tool or make a decision in conditions, which are likely to differ from the exact setting of the study.

During a recent presentation by one of the most active and prominent RCT researchers, the researcher clearly stated at some point that: “[t]his intervention was never thought for scaling up as a policy.” That made me pause. But what is the purpose, then? In my meaning, these studies should fit into a “bigger-picture” understanding, or at least hypothesizing on how development works, what the binding constraints and open challenges are, what might contribute to overcoming them, and how do we proceed from there. Once some candidates are identified, RCTs might, depending on the setting, be used to evaluate and compare before and after the preferred policy is implemented. Unfortunately, this attitude is far from common, beyond what has become the standard of the ‘Introduction paragraphs’.

Quite often RCT studies are extremely precise and accurate on “the impact of X on Y”, even in cases of very small effects, and can be perhaps a bit vague or face bigger uncertainties on the ‘bigger’ question. This means that many, more general (and very relevant) questions are not addressed by development economists just because a RCT is not feasible. An example mentioned in a recent keynote lecture by David Laitin is the BetterBirth Project. This is a WHO program that seems to be making a big difference for infant and maternal health in India’s poorest states through a list of 29 easy, low-cost, low-technology and well-known practices. The main lesson drawn by observers at the Harvard School of Public Health is that people follow the list more accordingly when it is spread through ”human contact”. No mass media advertisement campaign, no punishment or incentive schemes, just ”nice” people visiting, explaining, and demonstrating the list, while – in the words of an interviewed nurse – ”smiling a lot”. At first sight, this seems like something that could be randomized. However, the treatment is so diffuse and fuzzy that the practical implementation would be very challenging. If it is the case that the person meeting the clinics’ personnel and spreading the information has to be somewhat of a mentor in order for the transition to happen, to be kind and pedagogic, repeat the visits indefinitely to make sure that the practices have been adopted, and do whatever else it takes to make them learn, this is very hard to observe with precision. To simply define X as ”presentation of the list in person”, to be compared to, for example, the ”diffusion of the list through an information campaign” would probably run the risk of severely underestimating the impact. This would be because it would bundle together different types of informers and different levels of human interaction. This means that there would be a high risk of zero or insignificant results from such a study. A RCT would need to be complemented by other investigations, for example surveys, in order to find out if there really was an effect and how it came about. All of the above is likely to undermine the publication chances for an academic paper on the issue, thereby discouraging development scholars to study this program.

There are two main ways of augmenting the RCT methodology in the direction of generalizability and external validity: the elbow-grease approach of replication and the resuscitation of the concern for theoretical mechanisms. Replication studies are not very appealing in the perspective of a scholar that aspires academic publications. Besides completely new clever designs that establish a link of causation in a specific case – and possibly for each of these corresponding studies that establishes the absence of such a link in different settings – journals have little interest in publishing more variations on the same theme. Replications with small variations should instead be highly attractive for development institutions and practitioners, precisely for the reason, mentioned above, that they want to learn about effectiveness of alternative strategies in as many different specific contexts as possible. [2] In an ideal world, development institutions and aid bureaucracies would work in close cooperation with universities and academic institutions, involving young researchers before their career-concern-stress phase (perhaps Ph. D. students?) in the design and evaluation of as many of their planned interventions as possible. Moreover, in an ideal world this would be enough reward for the young researchers. This wealth of replications would then favor the possibility of “taking stock” and really learning about some general truth. I do not, however, have a good recipe for making this happen.

Luckily, some scholars are in the meanwhile working on making the pendulum swing back from the purest empiricism to the involvement with theory. Here is a list of possibilities that are important to reflect about, starting from a given RCT:

–       The macro problem. How does the found effect compare to the “bigger issue”, the one that most likely set the scene in the ‘Introduction paragraph’ of the study? Few studies go back to this point, after presenting their results. Numerical simulations or structural estimation of theoretical models might help answering this question. (See some examples in Buera et al. (2011) and Kaboski et al. (2011)).

–       The alternative hypothesis. What is the particular intervention compared against? If the set of circumstances or policy-relevant parameters that might be varied are too big or too dense for replications, maybe a theoretical model can help to vary them in a smooth and continuous way?

–       The strategic reaction. How are the involved economic agents likely to respond in case of an expansion in space, time or both, of the intervention? How would they have responded in the absence of the intervention?

The Impact on Development Practices

As stated above, RCTs may be a powerful tool for the learning and decision-making in development institutions, public or private. However, this assumes a seldom-questioned willingness to learn and change practices on their part. Brigham et al. (2013) show, through a RCT, that these organizations might be subject to confirmation bias. Brigham et al. sent out an invitation to microfinance institutions, offering partnership to evaluate their programs, randomly accompanying it with a survey of previous studies finding positive impact of microcredit, or a survey of studies finding no impact. The second treatment elicited barely half as many responses as the first one, which suggests that at least this type of organizations might not be so interested in learning whether what they do is effective or can be improved. Coupled with the mentioned publication bias, this might skew the distribution of reported, published and established findings even further.

The Impact on the Local Context

Individual studies can of course be affected by the so-called Hawthorne effect or experimenter effect. The phenomenon, by which the act of being experimented upon changes a subject’s behavior, was first observed and got its name in the 1920s in industrial psychology. Although it is clearly hard to establish, it has for decades been a central criticism of the ”participant observation” methodology in anthropology and ethnography. Also behavioral economists, that more recently started using experiments both in labs and in the field, are explicitly careful about it.

Depending on the definition of causality that the researcher has in mind, the fact that having knowledge about being treated impacts outcomes, might not be an issue at all for the measurement of the overall effect of an intervention. The overall effect should include also the (optimal) reaction of the agents (for example a change in behavior, the adoption of other complementary inputs, etc.) and this is actually considered one of the advantages of the method. However, this raises problems for the interpretation of the size of the effect and the analysis of the channels that bring it about. This point is made very clearly by Bulte et al. (2012), who compare a double-blind RCT with a regular one. If all or most of the effect simply comes from the participants knowing to be ”treated” and reacting to it, is the effect still going to be there when the intervention becomes a regular policy? The majority of both authors and critics mostly ignore this important question.

Beyond the perspective of a single study, a different concern comes to mind when considering how a substantial number of RCT studies are clustered geographically. The map below shows a snapshot of the J-PAL interventions in Africa and Asia, which are only a fraction, albeit substantial, of the total.

Figure 1. J-PAL Interventions in Africa and Asia


Reading study after study set in Kenya, or some Indian state, I wonder if people there are starting to get used to private organizations going around giving away assets, or used to temporary local government programs with funky benefit schemes. To my knowledge, no study has yet reflected upon the aggregate impact of experiments and randomized interventions in an area that has many. Might it be the case that exposure to many conditions eventually results in ”experimental fatigue”, or practice effects, which may influence the results of the studies and make the interpretation of the findings difficult?

Even more worrisome, given the frequency of and the resources involved in these interventions, perhaps we should expect an impact on the local political economy. As a parallel, I think about the agrarian reform and the later establishment of the welfare state in post-war Italy, and how they gave major local actors the ability to uphold their clientelistic systems. The newly established rights and entitlements, the various benefits and redistribution programs, were ”filtered” by the local elites and channeled through the traditional ties of family, kinship, friendship and neighborhood. According to comparative analyses of European welfare regimes, clientelism exists, in different forms and intensities, in all Mediterranean welfare states, and it appears to be linked to the process of political mobilization and the establishment of welfare state institutions in these nations.

A recent study by Ravallion et al. (2013) finds that unemployed fail to act on information about the National Employment Guarantee Scheme (NEGS) in India. They hypothesizes that the bottleneck lies with the local government institutions (Gram Panchayats). The GP are supposed to receive the applications and apply for central government resources for planning and implementation of projects, so as to guarantee 100 days of work per year to all adults from rural households who are willing to do unskilled manual labor at the statutory minimum wage. But perhaps – argue the authors – given the strict controls on corruption, the GP officials do not find anything in it for themselves, and hence do not proceed. Of course this is just one of the possible explanations, and moreover the NEGS is not a RCT. But in general the involvement of local official or unofficial power structures in contexts where this type of interventions are increasingly common could be interestingly related to the hypothesis on the ”Mediterranean welfare state” outlined above. The idea definitely deserves investigation.


The popularity of RCTs among development scholars is finally spreading to practitioners. This is mostly good news, there is much to gain and learn from this approach, especially in contexts where it is grossly underexploited, as has been the case in Sweden. However, a near-monopoly of this approach is though not granted, given its non-negligible limitations, often belittled in light of its numerous strengths. Spurring development “one experiment at a time” might take unnecessary extra time and efforts, and bring about other undesirable consequences. Both development scholars and practitioners should not forget the other arrows in their quiver.


  • Bannerjee, A. and E. Duflo (2008), “The Experimental Approach to Development Economics”, NBER Working Paper 14467.
  • Brigham, Matthew, Michael Findley, William Matthias, Chase Petrey, and Daniel Nelson. ”Aversion to Learning in Development? A Global Field Experiment on Microfinance Institutions”. Technical Report, Brigham Young University March 2013.
  • Buera, F. J., J. P. Kaboski, and Y. Shin (2011). ”The macroeconomics of microfinance.”
  • BREAD working paper.
  • Bulte, E., Pan, L., Hella, J., Beekman, G. and S. di Falco (2012). ”Pseudo-Placebo Effects in Randomized Controlled Trials for Development: Evidence from a Double-Blind Field Experiment in Tanzania.” Working Paper.
  • Kaboski, J. P. and R. M. Townsend (2011, July). ”A structural evaluation of a large-scale quasi-experimental microfinance initiative.” Econometrica 79, 1357–1406.
  • Olofsgård, A. ”What Do Recent Insights From Development Economics Tell Us About Foreign Aid Policy?” FREE Policy Brief Series, October 3, 2011.
  • Ravallion, M., et al. ”Try Telling People their Rights? On Making India’s Largest Antipoverty Program work in India’s Poorest State.” Department of Economics, Georgetown University, Washington DC (2013).
  • Rodrik, D. (2008). ‘The New Development Economics: We Shall Experiment, but How Shall We Learn?’. Harvard Kennedy School Working Paper No. RWP08-055.▪

[1] The example is fictitious. Any resemblance to real studies is unintended and purely coincidental.

[2] At least in theory – this point is discussed more in the next section.

New Tools to Fight Corruption and the Need for Complementary Reform

High office buildings facing sky representing Institutions and Services Trade

Corruption remains a serious problem for most developing countries, undermining state capacity and incentives to invest besides social cohesion and democratic institutions. It is also an increasingly important problem for many highly developed ones. In Italy, for example, corruption has increased in the last decades and the parliament is now finally struggling to pass a (rather mild)”anti-corruption law”. Even in Sweden, a country constantly considered among the least corrupt ones in the world, the problem seems to be increasing according to a recent report by the Agency for Public Management (Statskontoret), which also suggests that the current legislation needs to be improved, for example by offering some form of protection to whistleblowers.

In most Central and Eastern European countries, however, the problem appears particularly serious. Corruption seems to have been rapidly increasing in the region this last decade (The Economist, April 11, 2011 ; Nations in Transit, editions 2001-2012), although there are some virtuous exceptions (for example Georgia and Estonia).

Corruption is often caused by, and at the same time, an instrument for political developments towards autocracy, such as those recently observed in some of these countries (limiting judicial autonomy, democratic participation and the free press). This suggests that in countries where these political developments are taking place we may expect a further worsening of the corruption problem in coming years.

A country that is apparently taking the fight against corruption seriously is India, where a strong grassroots anticorruption movement has developed. The issue has become central in recent political debates and several proposals have been put forward and debated in the parliament. Among these proposals is one by Kaushik Basu, the finance minister’s Chief Economic Advisor. He suggests – for a specific class of bribes paid to obtain a service to which one is entitled for – to treat bribe paying as legal while doubling the sanctions against bribe taking (Basu 2011). The logic behind this proposal is to create stronger incentives for bribe-paying individuals to report it to law enforcers and expose corrupt civil servants: reporting should lead to the restitution of the bribe, besides the conviction of the bribe taker.

Since this proposal was made last year, there has been a lively debate both at the Indian as well as the international level. The debate has however been rather informal, and involved some (voluntary and involuntary) misunderstanding of the proposal (see Dufwenberg and Spagnolo 2011 for a short account of this debate). The proposal has been deemed as “radical” by the proponent, and has sometime been treated and dismissed as a theoretical curiosity. In fact, the proposal is similar to existing legal provisions against corruption that have been in place for quite some time in several countries. The proposal is also related to other legal provisions widely used around the world to fight related forms of illegal transactions, in primis leniency policies now used by most antitrust authorities to fight price-fixing cartels, but also accomplice-witness amnesty and protection program against mafia-like criminal organization (see Spagnolo 2008 for an overview).

We know from academic research on these related revelation schemes that they can be very powerful if appropriately designed and administered, but they may fail or even be counterproductive if they are poorly designed or run (see e.g. Spagnolo 2004, Buccirossi and Spagnolo 2006, Apesteguia et al. 2007, Miller 2009, Bigoni et al. 2009). The exact details how these subtle mechanisms are designed and then actually implemented are crucial to their success.

Asymmetric Sanctions, Leniency and Whistleblowers

As earlier mentioned, the main idea behind Basu’s proposal for India, treating partners in corruption asymmetrically is not a theoretical curiosity. It is already present in milder form in the Russian, Japanese and German (violation-of-duty) legislation, where bribe payers face lower sanctions than bribe takers and in the way prosecutorial discretion is used in Anglo-Saxon countries. An analogous provision seems to have also been introduced in China in 1997, and its effectiveness has recently been questioned by some observers, although in a very superficial way. Unfortunately we have no serious evidence of how these legislations have affected corruption.

More generally, the idea of deterring a collaborative crime by shaping the incentives of criminal partners so that one of them has the incentive to betray the others and report information to law enforcers is well established. The Prisoner’s Dilemma story, where each among the partners in crime are promised a light sentence in exchange for cooperation to convict the other criminal partners is familiar to most countries’ standard law enforcement practice.

These schemes have been the main and most successful tool in the fight against mafia and political terrorism in Italy and other countries, and they are currently regarded as the most important and effective instrument in the hands of competition authorities in their fight against cartels (US Department of Justice, Spagnolo 2008, Acconcia et al. 2009).

Apart from law enforcement, analogous “divide and conquer” schemes have been widely used ever since the Roman Empire in war-related situations to break down enemies’ coalitions. They are tools that many do not like on moral grounds, because they induce distrust and betrayal of partners, which some people see as bad even when the betrayed partnership is a criminal one and distrust prevents the criminal activity.

Still related but somewhat different are the whistleblower protection (from retaliation) and reward schemes aimed at inducing innocent witnesses to report a crime. Reward schemes for whistleblowers have been used in the US since the civil war to limit corruption in federal procurement and to fight government fraud (through the False Claim Act, sometimes called the Lincoln Law from the president that introduced it). They have more recently been introduced by the IRS against tax evasion and by the Dodd-Frank Act against financial fraud.

When witnesses are working in the same organization as the wrongdoers, or when the latter are powerful individuals (besides being prone to commit illegal acts, like violent retaliation), blowing the whistle typically generates very harsh consequences for the witness; ranging from various forms of harassment in the organization, to the loss of job, isolation and directly or indirectly induced death.[1] Legal action is typically slow and uncertain but immediate, certain, and very costly, while whistleblower protection provisions are typically imperfect (if present). This is why, even with a relatively efficient legal enforcement system like the American, large rewards are seen as necessary and justified to induce more whistleblowing and compensation for its consequences.

Trust, Distrust and Corruption

In some sense, one can see Basu’s proposal of legalizing bribe paying for services one is entitled to (while doubling sanctions for bribe taking) as transforming potential accomplice-witnesses into potential innocent whistleblowers. The question is then whether this scheme will induce more people to blow the whistle and consequently fewer bureaucrats to demand/accept bribes. Some observers have suggested that this provision might instead induce more people to pay bribes because it makes it legal and thereby may erode moral norms against bribe paying.

In Dufwenberg and Spagnolo (2011), we argued that amending Basu’s proposal in a way resembling leniency programs used in antitrust, where immunity is awarded only if the wrongdoing is reported to the law enforcement agency, is one way to avoid sending the signal that bribe paying is now legal. The real problem for these schemes is therefore whether at the end they will really induce bribe payers to report.

The way these revelation mechanisms deter corruption is by generating “distrust” among potential partners in crime (Bigoni et al. 2012). By making it very attractive to report to law enforcers for one party and very costly to be reported for the others, these schemes may deter illegal cooperation by ensuring that the parties cannot trust each other.

However, for these schemes to generate distrust and produce their potentially strong deterrence effects, the risk that accomplice-witnesses and other potential whistleblowers report must be a real one. For this to be the case, whistleblowers must trust the law enforcement agency to which they report. The example of leniency policies in antitrust is illuminating. In the US, as long as competition authorities retained discretion, colluding firms rarely applied for reporting under the leniency program. It was only when the Department of Justice gave up discretion by making immunity “automatic” – subject to an explicit set of conditions being satisfied – and committed to this policy through published rules that firms started to again to report information on cartels.

Besides a high risk of being reported, for these schemes to elicit reports and produce deterrence it is also necessary that sanctions for convicted parties are sufficient. To continue the parallel with antitrust enforcement, even after the authorities gave up discretion on the programs, they are not inducing cartel members to report in other countries than the US.

Indeed, the most serious problem for the success of the Basu proposal, as well as for that of the leniency-based modification put forward in Dufwenberg and Spagnolo (2011), remains whether witnesses/bribe payers will trust the law enforcement agency to which they should report the crime. If the law enforcement agency is inefficient or also corrupt, reporting may lead to further harassment or worse, rather than protection and justice.

When protection programs are poorly administered and law enforcement agencies inefficient or corrupt, so that potential witnesses don’t trust law enforcement agencies, it becomes very difficult to induce whistleblowers to report, as well as dangerous for the whistleblower.

A second important reason why these schemes may fail to generate reports and to produce the intended deterrence effects is, as we mentioned, the low sanctions against bribe takers. Recent experimental results (in Bigoni et al. 2012) suggest that reporting incentives provided by leniency programs are only effective in deterring collusion if the sanctions for the convicted partners are sufficiently strong. If not, these schemes may have no effects or even perverse ones (they reduce the sum of expected sanctions, and because of their complexity, they could be manipulated; see e.g. Buccirossi and Spagnolo 2006). Basu did suggest doubling the sanctions for the bribe payers. This, however, may or may not be enough for the case at hand, and would require a more thorough evaluation.

Note than in the case of corruption, there is an additional reason for sanctions to be reinforced, in particular by the requirement to always remove from office the convicted bribe taker. The reason is that if the bribe taker is not removed from office after the report, bribe payers may fear that after whistleblowing the bribe taker may retaliate in future interactions.


Asymmetric sanctions as proposed by Basu (2011) and leniency conditional on reporting as proposed by Dufwenberg and Spagnolo (2011) have the potential to deter corruption in a systematic way.  Necessary conditions for this to happen, however, are that:

  1. Sanctions are sufficiently robust to ensure that the increased risk of being convicted because of a report by a whistleblower dominate on the lenient treatment offered to induce reports;
  2. Potential whistleblowers trust that the law enforcement institutions will act on the report and protect them from retaliation by the corrupt and their friends, rather than harass them.

Countries with sufficiently independent and efficient law enforcement institutions should definitely consider introducing or reinforcing their revelation schemes, asymmetric treatment or leniency conditional on reporting, to counter the current widespread increase in corruption.

Simply introducing these schemes in countries with weaker institutions, in particular with a low level of independence of law enforcement agencies, may do more harm than good: after all they imply reduced sanctions and their complexity makes them easily manipulated.

These schemes can be very useful for these countries, but only if they are introduced as part of a broader set of complementary reforms that include increased judicial independence and the creation of a specialized law enforcement unit with particularly high levels of accountability and independence, able to credibly offer to whistleblowers at least confidentiality and protection from retaliation, if not monetary rewards.



[1] The sad recent stories of Sergei Magnitsky in Russia and of S.P. Mahantesh in India clarify that this risks are real.