Tag: Data analysis

Closing the Gender Data Gap

20240306 Closing the Gender Data Gap Image 01

High-quality data plays a crucial role in enhancing our comprehension of evolving social phenomena, and in designing effective policies to address existing and future challenges. This particularly applies to the gender dimension of data, given the profound impact of the pervasive so-called “gender data gap”. In recent decades, data recovered from archives, high quality surveys, and census and administrative data, combined with innovative approaches to data analysis and identification, has become pivotal for the progress of documenting structural gender differences. Nonetheless, before we can close the gender gaps on the labour market, within households, in politics, academia and other areas, researchers and policy-makers must first ensure a closure of the gender data gap.

Policy Brief | EN langauge version

Policy Brief | GE language version


Any progress in our understanding of social phenomena hinges on the availability of data, and there is no doubt that recent advances in economics and other social sciences would not have been possible without countless high quality data sources. As we argue in this policy brief, this applies also, and perhaps particularly, to the documentation of different dimensions of gender inequalities and the analysis to identify their causes. Over the last few decades innovative ways to document historical developments, combined with improvements in the access to existing data, as well as new approaches to data collection, have become cornerstones in the progress made in our understanding of the various expressions of gender inequality. In the economic sphere this has covered themes such as labor market status,  earning and income levels, wealth accumulation over the life course, education investments, pensions, as well as consumption patterns and time allocation – in particular caregiving and household work. Researchers have also been able to empirically study gender inequalities in politics, culture, crime, the justice system and in academia itself.

Groundbreaking studies in gender economics, including those by Claudia Goldin, the recent Nobel Prize laureate, would not have been possible without high quality data and innovative ways aimed at closing the “gender data gap”, a term coined by Caroline Criado Perez, in her bestseller “Invisible women” (Criado Perez, 2020). In the introduction to the book she notes that “(…) the chronicles of the past have left little space for women’s role in the evolution of humanity, whether cultural or biological. Instead, the lives of men have been taken to represent those of humans overall.” (p. XI). The gender data gap is the result of deficits of informative data sources on women, which has been augmented by frequent lack of differentiation of information by sex/gender in available sources. Closing the gender gaps along the dimensions already identified in existing studies will require a continuous monitoring of evidence, thus closing the gender data gap in the first place. New studies focused on greater equality and on the effectiveness of various implemented policies will continue to rely on good data. Thankfully, few new datasets currently ignore the gender of the respondents. However  as our understanding of the biological and cultural aspects of sex and gender grows, the way data is collected will need to be modified.

As we prepare for the new challenges ahead of those designing data collection efforts and examining the data, we believe it is important to give credit to the authors of some of the groundbreaking studies that paved the way to the current pool of evidence on gender inequality. Around the time of the International Women’s Day, we recall several empirical studies in gender economics that, in our opinion, merit special attention due to either their innovative approaches to data collection, their unique access to original data sources, or their methodological novelty. These studies bring valuable insights into specific dimensions of gender inequality. This short list is naturally a subjective choice, but we believe that all of these studies deserve credit not only among researchers within gender economics, but also among those more broadly interested in the recent progress in the understanding of different aspects of gender inequality.

From Data to Policy Recommendations

Over the last few decades substantial efforts have been made to provide empirical evidence concerning historical trends in inequalities between men and women on the labor market. Seminal work in this field was conducted by Claudia Goldin in the 1970s and 80s, culminating in the publication of the path-breaking book Understanding the Gender Gap: An Economic History of American Women (Goldin, 1990). The book fundamentally changed the view of women’s role in the labor market. Empirically Goldin shows that female labor force participation has been significantly higher in historical times than previously believed. Before Goldin, researchers mainly studied twentieth century data. Based on this it looked as if women’s participation in the labour market is positively correlated with economic growth. Goldin’s work showed instead that women were more likely to participate in the labour force prior to industrialization, and that early expansion of factories made it more difficult to combine work and family. Seen over the full 200 year period, from before industrialization to today, the pattern of women’s labour market participation is in fact U-shaped, pointing to the importance of various societal changes that alter incentives and possibilities for women’s work. Goldin’s contribution is however not just about getting the empirical picture right. At least equally important is the recognition of women as individual economic agents, who make forward looking decisions under various institutional constraints and limitations related to social norms about identity and family, as well as education opportunities and labor market options. While some decision can be modeled as taken by “the economic man”, others by households, it may seem surprising that studying women’s decisions was for so long neglected.

Institutional, cultural and economic factors behind historical trends have become the focus of much of the literature trying to identify the forces driving gender disparities. Some of the most original work considers the role that “chance” plays in determining individual decisions related to gender – how having a first-born son (e.g. Dahl and Moretti, 2008) or having twins (Angrist and Evans, 1998), both of which can be considered random, – affect choices related to partnership, future fertility and the labor market. Others examin the influence of gender imbalances caused by major historical events. Brainerd (2017) investigates the consequences of extremely unbalanced sex ratios in cohorts particularly affected by the massive loss of lives during World War II in the Soviet Union. By exploiting a unique historical data source derived from the first postwar census, combined with statistics registry records from archives, Brainerd provides evidence that the war-induced scarcity of men profoundly affected women’s outcomes on the marriage market. Women were more likely to never get married, give birth out of wedlock and get divorced. On top of that, unbalanced sex ratios affected married women’s intrahousehold bargaining power and resulted in lower fertility rates and a higher rate of marriages with a large age gap between spouses. The post-war institutional setup increased the cost of divorce and withdrew legal obligations to support children fathered out of wedlock, which exacerbated the consequences from the shortage of men by further reducing the rates of registered marriages and increasing marital instability.

The examples above highlight how conditions beyond individuals’ control can contribute to social gender imbalances, or shed light on existing gender biases. How these ‘exogenous’ circumstances translate into economic inequalities and what additional factors drive disparities has been the focus of much academic work on gender inequalities. One of the most challenging questions has been that of demonstrating that discrimination of women, rather than women’s characteristics or choices, are behind the growing body of evidence on economic gender inequality. In this respect Black and Strahan (2001) provide important convincing conclusions by using significant changes in the level of regulation in the US banking sector. Increasing competition between banks lowered banks’ profits, and led to a reduced ability of managers to ‘divide the spoils’, and thus to discriminate between different types of employees. The authors used information on wages within specific industries (including banking) from one of the oldest ongoing surveys in the world – the US Current Population Survey (CPS). By exploiting detailed individual data covering a period of several decades the authors show that higher levels of banking sector regulations (prior to deregulation) facilitated greater premia paid out to male compared to female employees. Thus, increased competition in the banking sector brought favorable changes to women’s pay conditions as well as their position in banks’ management.

While long running surveys such as the CPS continue to serve as invaluable sources of information on the relative conditions of men and women, the growing availability of administrative data has opened new opportunities for documentation of inequalities and identification of the reasons behind these. For instance, the ability to track individuals throughout their work history before and after the arrival of their first child has allowed researchers to compare the trajectories of women’s and men’s earnings, wages and working hours. This comparison has revealed the existence of the so-called “child penalty”, with women experiencing a drop in their labor market position relative to their male partners after the birth of their first child, and with the gap persisting for many years. Strikingly, this penalty has been estimated in some of the most gender-equal countries in the world, such as Sweden (Angelov et al., 2016) and Denmark (Kleven et al., 2019), two countries which have spearheaded collecting and making rich administrative data available to researchers.

Another area where individual register data has proven invaluable is in the study of the so-called “glass ceiling”, i.e., the sharply increasing differences between men and women when it comes to pay as well as representation in the very top of the income distribution. In a seminal study by Albrecht et al. (2003), individual earnings for men and women were compared and differences were found to be markedly higher (with men earning much more) when comparing men in the top of the male income distribution with women in the top of the female income distribution. Also making use of Swedish registry data, Boschini et al. (2020) study a related question, namely the evolution of the share of women in the top of the income distribution. In line with other glass-ceiling results, they demonstrate that the share of women in the top is small, and that it gets smaller the higher one looks, , although it has increased over time. Decomposing incomes into labor earnings and capital income they also show that while women seem to be catching up in the labor income distribution, they clearly lag in the capital income distribution. Also, the income profile of the partners of high-income men and high-income women are strikingly different. Most high-income women have high-income partners, while the opposite is not true for high-income men.

Differences in the economic position of men and women reflected in the above examples can have their origin much before the time individuals enter the labor market. They can be driven by differences in schooling opportunities, as well as other forms of early life investments, to the extent that even much of what is perceived as choices or preferences later in life are in fact results of these subtle early life disadvantages for women. While these have largely diminished in the global North, there is a growing number of studies documenting these differences in the global South. Jayachandran and Pande (2017) examine the impact of son preference, a widespread cultural practice for example in India, on child health and development. The study leverages a simple, standardized, and broadly available indicator – the height of children – which is measured at routine health checks and included in many population surveys, such as the Demographic and Health Surveys (DHS). Additionally, their use of a natural experiment, based on the birth order of children, helps to establish a causal relationship between eldest son preference and nutritional disparities that have long-term developmental consequences among subsequent children, not only for girls but for Indian children on average. Findings like these underscore the importance of gender equality not only as a fundamental value but also as a crucial factor in promoting growth and development at the societal level.

The social costs of gender inequality have also motivated the growing research interest in gender-based violence and crime. Given the specific challenges associated with these topics – such as the clandestine and underreported nature of these acts but also the consideration for victims’ confidentiality and safety – studies in this area has required researchers to develop and apply innovative tools and data collection methods. In this framework list experiments have emerged as a methodology allowing respondents to disclose sensitive or socially undesirable attitudes indirectly, reducing the likelihood of the so-called social desirability bias in survey reporting. In a list experiment, respondents are presented with a set of statements or behaviors and asked to indicate their agreement or engagement with these. Among listed items, one is considered “sensitive” and is included only for a randomly selected subset of respondents. By comparing the average number of items agreed with by the entire sample to a control group that did not get the sensitive item, researchers can estimate the proportion of respondents who agreed with or engaged in the sensitive behavior or opinion. Kuklinski et al. (1997) is one of the pioneering contributions in this area, estimating the proportion of voters who harbored racial prejudices but who may have been unwilling to admit it in a direct survey question. List experiments have since become a widely used tool in political science and economics and have helped in the advancement of our understanding of gender-based violence (Peterman et al., 2018). Given the strong assumptions underlying the analysis the method has not become the ”statistical truth serum” it was at some point considered to be. However, list experiments have broadened the analytical opportunities in an area plagued by significant informational and data challenges.

While worldwide gender gaps in economic opportunities and especially in education and health have rapidly declined (and sometimes reversed) in the last decades, larger differences remain in political empowerment (see e.g., WEF Gender Gap Report 2023). Another Nobel Prize laureate in economics, Esther Duflo, in her joint work with Raghabendra Chattopahyay (2004), have pioneered a highly prolific area of research on the impacts of women as policymakers. In their study, they leverage a unique policy experiment in India  that randomized the gender of the leader of Village Councils, and a detailed dataset based on extensive surveys administered to both Village Council leaders and villagers. The surveys allowed for estimation of the investments in different public goods in 265 Village Councils, as well as the preferences over each of these public goods among female and male villagers. Combining the randomization and this rich dataset, the authors establish that political leaders prioritize public goods that are more relevant to the needs of their own gender, suggesting that women’s under-representation in politics might result in women’s and men’s preferences being unequally represented in policy decisions.

Conclusions and Recommendations

The narrowing gender gap in political representation across various levels of government, the growing influence of women in other areas such as public institutions, administration etc., and the heightened awareness of the crucial role gender equality plays in socio-economic progress all bode well for improvements in access to high-quality gender-differentiated data sources. Before we can recognize and close gender gaps identified from high-quality data, the gender data gap needs to firstly be closed. Governments and public institutions should make their  increasing amounts of digitized information available for research purposes. Funding should be available to collect data through surveys, and these could in turn be combined with details available in administrative sources to take advantage of the breadth of survey data and the precision of official statistics. Information needs to be collected on a frequent and regular basis to make sure that the consequences of various major developments, such as legal changes, conflicts or natural disasters, can be identified. Innovative data sources, for instance information from mobile apps or social media, can provide additional useful insights into socio-economic trends, old and new dimensions of inequalities and regular timely updates on different aspects of gender disparities. These new data sources can become the basis for future innovative studies on gender inequalities, contributing to a better understanding of the mechanisms behind these inequalities, and providing evidence for policies and other efforts to effectively close the remaining gaps. Already now there is enough evidence to conclude that closing these gaps is not only just but that it also constitutes a fundamental basis for continued inclusive economic development.

Post Scriptum

Contributing to the existing pool of data sources we are happy to share a regional dataset with information on gender norms and gender-based violence: the FROGEE Survey 2021. The data was collected using the CATI method (phone interviews) in autumn 2021 in Belarus, Georgia, Latvia, Poland, Russia, Sweden and Ukraine. In each country interviews were conducted with between 925 and 1000 adults. The survey covered areas such as: basic demographics, material conditions, labor market status, gender norms, attitudes towards harassment and violence, awareness of violence against women and awareness of legal protection for gender violence victims.

The data collection was funded by the Swedish International Development Cooperation Agency (SIDA) as part of the FREE Network’s FROGEE project. The dataset and supporting materials are freely available for research purposes. For more information see: FROGEE Survey on Gender Equality.


  • Angrist, D. J., and Evans, N. W. (1998). Children and their parents’ labor supply: Evidence from exogenous variation in family size. American Economic Review, 88(2), 450-477.
  • Albrecht, J., Björklund, A., and Vroman, S. (2003). Is there a glass ceiling in Sweden? Journal of Labor Economics, 21(1), 145-177.
  • Angelov, N., Johansson, P., and Lindahl, E. (2016). Parenthood and the gender gap in pay. Journal of Labor Economics, 34(3), 545-579.
  • Black, S. E., and Strahan, P. E. (2001). The division of spoils: Rent-sharing and discrimination in a regulated industry. American Economic Review, 91(4), 814-831.
  • Boschini, A., Gunnarsson, K., and Roine, J. (2020). Women in top incomes: Evidence from Sweden 1971–2017. Journal of Public Economics, 181, 104-115.
  • Brainerd, E. (2017). The lasting effect of sex ratio imbalance on marriage and family: Evidence from World War II in Russia. The Review of Economics and Statistics, 99(2), 229-242.
  • Chattopadhyay, R., and Duflo, E. (2004). Women as policymakers: Evidence from a randomized policy experiment in India. Econometrica, 72(5), 1409-1443.
  • Criado Perez, C. (2020). Invisible women. Vintage, London.
  • Dahl, G. B., and Moretti, E. (2008). The demand for sons. Review of Economic Studies, 75(4), 1085-1120.
  • Goldin, C. (1990). Understanding the Gender Gap: An Economic History of American Women. Oxford University Press.
  • Kleven, H., Landais, C., and Søgaard, J. E. (2019). Children and gender inequality: Evidence from Denmark. American Economic Journal: Applied Economics, 11(4), 181-209.
  • Kuklinski, J. H., Sniderman, P. M., Knight, K., Piazza, T., Tetlock, P. E., Lawrence, G. R., & Mellers, B. (1997). Racial prejudice and attitudes toward affirmative action. American Journal of Political Science, 402-419.
  • Jayachandran, S., and Pande, R. (2017). Why are Indian children so short? The role of birth order and son preference. American Economic Review, 107(9), 2600-2629.
  • Peterman, A., Palermo, T. M., Handa, S., Seidenfeld, D., and Zambia Child Grant Program Evaluation Team (2018). List randomization for soliciting experience of intimate partner violence: Application to the evaluation of Zambia’s unconditional child grant program. Health Economics, 27(3), 622-628.

Disclaimer: Opinions expressed in policy briefs and other publications are those of the authors; they do not necessarily reflect those of the FREE Network and its research institutes.

Paradise Leaked: An Analysis of Offshore Data Leaks

20220131 Paradise Leaked Image 01

In recent years, there have been several high-profile leaks of documents related to the offshore financial industry, such as the Pandora Papers released last year. Some of the data contained in the leaked documents have now been made public. In this brief, we discuss the advantages and pitfalls of using these data for economic analysis. We show that despite some caveats, there are patterns in these data that can shed light on a secretive industry. For instance, the number of offshore entities linked to a country increases significantly when that country experiences a change in political leadership. By contrast, financial sanctions on a given country result in a reduction in the number of established offshore entities. In the immediate aftermath of the financial crisis, many countries signed bilateral treaties with tax havens in order to promote transparency. Our analysis of the leaked data shows that the overwhelming majority of offshore entities are not governed by these treaties.

“… that I may see and tell of things invisible to mortal sight.”

John Milton, Paradise Lost

Offshore Tax Haven Leaks

Zucman (2013) estimates that household wealth held in offshore tax havens is equivalent to 10% of world GDP. While there are many legitimate reasons for wealthy individuals to use offshore financial services, the secrecy surrounding offshore holdings has also enabled tax evasion and money laundering. The international community has launched several initiatives trying to increase the transparency of offshore wealth holdings. Over the past decade, several large collections of documents from offshore financial service providers have been leaked to the media: Pandora Papers (2021), Paradise Papers (2017/2018), Bahamas Leaks (2016), Panama Papers (2016), and Offshore Leaks (2013). Investigative journalists have used information from the leaks to expose many instances of secretive financial dealings linked to political leaders. Examples from FREE network countries include: the connections between a close ally of Belarussian President Alexander Lukashenko and a gold mining venture in Zimbabwe, the offshore business holdings of past and present Ukrainian presidents and their respective allies, and the wealth of Russian President Vladimir Putin’s close associates and childhood friends (see, for instance, Cosic 2021, Mylovanov and Mylovanova 2016).

The International Consortium of Investigative Journalists (ICIJ) has made public information on more than 800,000 offshore entities that are part of the offshore data leaks (see ICIJ Offshore Leaks database). The data contain information on the names of companies or people who set up offshore entities, their country of origin, the offshore jurisdiction, and the dates of incorporation and deactivation for offshore entities.

What Can We Learn from the Data?

Despite the wealth of information that this database contains, there has been relatively little academic research using the offshore leaks data. Two notable exceptions are Alstadsæter, Johannesen and Zucman (2019), and Londoño-Vélez and Ávila-Mahecha (2021), who link information from the Panama Papers to administrative records from Scandinavia and Columbia, respectively. They find that tax evasion is concentrated among the richest households. Guriev, Melnikov and Zhuravskaya (2021) use the revelation of the Panama Papers to study its effect on perceptions of corruption.

There are several challenges to using the offshore leaks data for systematic data analyses. First, there are both legitimate and illegal uses of offshore financial services, and without further information, it is not possible to distinguish between them. Second, as this information is obtained through leaks at specific offshore services providers, the data are unlikely to be representative of overall offshore financial activity. Third, there is no information on financial transactions, and we do not know the amounts of money involved in the offshore entities. Finally, more sophisticated offshore structures may make it impossible to deduce the ultimate owner of each entity and its country of origin. Especially for the second and third reasons, economists have tended to focus on balance of payments statistics and cross-border bank deposit data when estimating flows to offshore accounts. For example, Andersen, Johannesen, Lassen and Paltseva (2017) show how the oil wealth of countries with weak institutions is diverted into secret offshore accounts. Becker (2019) investigates recent trends in Russian capital flows and shows that a significant share of Russian money flows to Western European banks. See also Nyreröd and Spagnolo (2018, 2021) for discussions of the role of European banks in recent money laundering scandals.

With these caveats in mind, Figure 1 shows the correlation between the number of offshore entities in the data (on the y-axis) and the offshore wealth holdings of each country’s households (on the x-axis) as estimated by Alstadsæter, Johannesen and Zucman (2018). While the chart shows a positive correlation of 0.56 between these two measures, it also illustrates that the number of leaked entities may be a poor proxy for the stock of offshore wealth. Countries with a significant fraction of offshore wealth in European tax havens are underrepresented in the leaks (e.g., France, Germany, and Italy) while the UK, Russia, and Latvia account for a disproportionate share of leaked offshore entities.

Figure 1. Number of offshore entities and estimated offshore wealth

Source: ICIJ Offshore Leaks database, Alstadsæter, Johannesen and Zucman (2018) and authors’ calculations.

Timing of Offshore Entity Creation

While the number of overall leaked entities per country might not be a perfect measure of the amount of offshore wealth, we find that there are systematic patterns in the timing of the creation of offshore entities. In particular, more offshore entities are created when individuals face political uncertainty in their own countries and fewer offshore entities are created by individuals from countries under financial sanctions.

Elections and Change of Leadership

Figure 2 shows the average number of newly incorporated offshore entities linked to a given country (on the y-axis), depending on that country’s political situation. Panel A shows no clear pattern of offshore entities being created by companies or individuals around the time of elections. Elections are often predictable and frequently result in the reelection of the incumbent government. In contrast, Panel B shows a clear increase in the number of offshore entities linked to a country around the time when that country experiences a change in the de facto political leader. Around four months before there is a change in political leadership, the average number of entities created per country per month almost doubles. Offshore entity creation falls back to normal levels typically around half a year following the transition of power. This pattern suggests that wealth leaves countries at times of political uncertainty and is consistent with the findings of Andersen, Johannesen, Lassen and Paltseva (2017) and Earle, Shpak, Shirikov and Gehlbach (2021).

Figure 2. Offshore entity creation and national political situation

Panel a. Elections

Panel b. Change of political power

Source: ICIJ Offshore Leaks database, The Rulers, Elections, and Irregular Governance (REIGN) Dataset and authors’ calculations. A change of power is defined as a change in the de-facto political leader (e.g., due to the incumbent losing an election or the collapse of a coalition government).

International Sanctions

Figure 3 shows the impact of sanctions from the United Nations, European Union, and the United States on the average number of offshore entities linked to a given country (on the y-axis). Panel A shows that when a country is subject to financial sanctions, the number of linked offshore entities created falls to around 10 per year from an average of 25 before the introduction of sanctions. The impact of sanctions can already be seen in the year before the start of the sanctions, which could reflect measurement and reporting errors or anticipation of the sanctions. In contrast, Panel B shows that trade sanctions that are not accompanied by financial sanctions have no significant impact on offshore activities. These charts suggest that financial sanctions may have some impact on how much capital can be moved from countries under sanctions to offshore accounts.

Figure 3. Offshore entity creation and international sanctions

Panel a. Financial sanctions

Panel b. Trade (without financial) sanctions

Source: ICIJ Offshore Leaks database, Global Sanctions Data Base and authors’ calculations.

Promoting Transparency

After the Financial Crisis in 2009, G20 countries compelled offshore tax havens to sign bilateral treaties to allow for the exchange of banking information under the threat of economic sanctions. More than 300 treaties were signed by tax havens that year. The effectiveness of this policy has been debated. For instance, Johannesen and Zucman (2014) show that the treaties lead to a relocation of bank deposits from compliant to less compliant offshore tax havens.

The G20 crackdown required each tax haven to sign at least 12 bilateral treaties. Relative to a comprehensive multilateral agreement, this policy had two limitations. Firstly, it leaves room for the diversion of funds identified by Johannesen and Zucman (2014). Secondly, tax havens were able to choose freely among potential partner countries – regardless of the underlying financial flows. Figure 4 shows that only a small fraction of the entities in the offshore leak database have a country of origin that signed a treaty with the tax haven in which they were incorporated. In addition, the small share of entities that will be subject to treaties suggests that havens did not always sign treaties with the most important counterparts. While the leaked entities may not be representative of offshore finance as a whole, this picture appears inconsistent with the OECD’s claim that “the era of bank secrecy is over” (OECD 2011)

Figure 4. Entity creation by treaty status

Source: ICIJ Offshore Leaks database, treaty events from Johannesen and Zucman (2014) and authors’ calculations.


A series of leaks over the past decade have exposed over 40 million documents related to the secretive offshore financial industry. Information related to over 800,000 offshore financial entities has been made public by the ICIJ. While a few high-profile cases received significant media coverage and gave rise to further investigations, the vast majority of references to networks of individuals, trusts, and shell corporations are difficult to decipher. This brief argues that, collectively, these leaked documents can be informative. They can be used to analyze the reasons for moving money offshore (such as domestic political uncertainty) as well as the constraints individuals face when doing so (such as international sanctions or bilateral treaties on bank secrecy).

In an effort to further increase transparency, 102 jurisdictions committed to a new standard for the automatic exchange of certain financial account information between tax authorities from 2019. Until such reforms are successful, leaks by whistleblowers are likely to remain a valuable source of information on the offshore financial industry.


Disclaimer: Opinions expressed in policy briefs and other publications are those of the authors; they do not necessarily reflect those of the FREE Network and its research institutes.