Labor tax evasion is a major policy issue that is especially salient in transition and post-transition countries. In this brief, we use firm-level administrative data, tax authorities’ audit data and machine learning techniques to detect firms likely to be involved in labor tax evasion in Latvia. First, we show that this approach could complement tax authorities’ regular practices, increasing audit success rate by up to 35%. Second, we estimate that about 30% of firms operating in Latvia between 2013 and 2020 are likely to underreport the wage of (some of) their employees, with a slightly negative trend.
Tax evasion is a major policy issue that is especially salient in transition and post-transition countries. In particular, “envelop wage”, i.e., an unofficial part of the wage paid in cash, is a widespread phenomenon in Eastern Europe (European Commission, 2020). Putnins and Sauka (2021) estimate that the share of unreported wages in Latvia amounts to more than 20%. Fighting labor tax evasion is a key objective of tax authorities, which face two main challenges. The first is to make the best use of their resources. Audits are costly, so the choice of firms to audit is crucial. The second challenge is to track the evolution of the prevalence of labor tax evasion. For this purpose, most of the existing literature relies on survey data.
In our forthcoming paper (Gavoille and Zasova, 2022), we propose a novel methodology aiming at detecting tax-evading firms, using administrative firm-level data, tax authorities’ audit data and machine learning techniques.
This study provides two main contributions. First, this approach can help tax authorities to decide which firms to audit. Our results indicate that the audit success rate could increase by up to 20 percentage points, resulting in a 35% increase. Second, our methodology allows us to estimate the share of firms likely to be involved in labor tax evasion. To our knowledge, this paper is the first to provide such estimates, which are however of primary importance in guiding anti-tax evasion policy. We estimate that over the 2013-2020 period, about 30% of firms operating in Latvia are underreporting (at least some of) their workers’ wages.
The general idea of our approach is to train an algorithm to classify firms as either compliant or tax-evading based on observed firm characteristics. Tax evasion, like any financial manipulation, results in artifacts in the balance sheet. These artifacts may be invisible to the human eye, but machine learning algorithms can detect these systematic patterns. Such methods have been applied to corporate fraud detection (see for instance Cecchini et al. 2010, Ravisankar et al. 2011, West and Bhattacharya 2016).
The machine learning approach requires a subsample of firms for which we know the “true” firm behavior (i.e., tax-evading or compliant) in order to train the algorithm. For this purpose, we propose to use a dataset on tax audits provided by the Latvian State Revenue Service (SRS), which contains information about all personal income tax (PIT) and social security contributions (SSC) audits carried out by SRS during the period 2013-2020, including the outcome of the audit. The dataset also contains a set of firm characteristics and financial indicators, covering both audited and non-audited firms operating in Latvia (e.g., turnover, assets, profit). Assuming that auditors are highly likely to detect misconduct (e.g., wage underreporting) if present, audit outcomes provide information about a firm’s tax compliance. Firms sanctioned with a penalty for, say, personal income tax fraud are involved in tax evasion, whereas audited-but-not-sanctioned firms can be assumed compliant. The algorithm learns how to disentangle the two types of firms based on the information contained in their balance sheets. Practically, we randomly split the sample of audited firms into two parts, the training and the testing subsamples. In short, we use the former to train the algorithm, and then evaluate its performance on the latter, i.e., on data that has not been used during the training stage. If showing satisfying performance on the training sample, we can then apply it to the whole universe of firms and obtain an estimate of the share of tax-evading firms.
In this study, we successively implement four algorithms that differ in the way they learn from the data: (1) Random Forest, (2) Gradient Boosting, (3) Neural Networks, and (4) Logit (for a review of machine learning methods, see Athey and Imbens, 2019). These four data mining techniques have previously been used in the literature on corporate fraud detection (see Ravisankar et al. 2011 for a survey). Each of these four algorithms has specific strengths and weaknesses, motivating the implementation and comparison of several approaches.
Table 1 provides the out-of-sample performance of the four different algorithms. In other words, it shows how precise the algorithm is at classifying firms based on data that has not been included during the training stage. Accuracy is the percentage of firms correctly classified (i.e., the model prediction is consistent with the observed audit’s outcome). In our sample, about 44% of audited firms are required to pay extra personal income tax and social security contributions. This implies that a naive approach predicting all firms to be evading would be 44% accurate. Similarly, a classification predicting all firms to be tax compliant would be correct in 56% of the cases. This latter number can be used as a benchmark to evaluate the performance of the algorithms. ROC-AUC (standing for Area Under the Curve – Receiver Operating Characteristics) is another widespread classification performance measure. It provides a measure of separability, i.e., how well is the model able to distinguish between the two types. This measure is bounded between 0 and 1, the closer to 1 the better the performance. A score above 0.8 can be considered largely satisfying.
Table 1. Performance measures
Random Forest is the algorithm providing the best out-of-sample performance, with more than 75% of the observations in the testing set correctly classified. Random Forest is also the best performing model according to the ROC-AUC measure, with performance slightly better than Gradient Boosting.
Our results imply that a naive benchmark prediction is outperformed by almost 20 percentage points by Random Forest and Gradient Boosting in terms of accuracy. It is important to emphasize that this improvement in performance is achieved using a relatively limited set of firm-level observable characteristics that we obtained from SRS (which is limited compared to what SRS has access to), and that mainly come from firms’ balance sheets. This highlights the potential gain of using data-driven approaches for the selection of firms to audit in addition to the regular practices used by the fiscal authorities. It also suggests a promising path for further improvements, as in addition to this set of readily available information the SRS is likely to possess more detailed limited-access firm-level data.
Share of Tax-Evading Firms Over Time and Across NACE Sectors
We can now apply these algorithms to the whole universe of firms (i.e., to classify non-audited firms). Figure 1 shows the share of firms classified as tax-evading over the years 2014 to 2019 for our two preferred algorithms – Gradient Boosting and Random Forest. Random Forest (the best performing algorithm) predicts that 30-35% of firms are involved in tax evasion, Gradient Boosting predicts a slightly higher share (around 40%). Both algorithms, especially Random Forest, suggest a slight reduction in the share of tax-evading firms since 2014.
Figure 1. Share of tax-evading firms over time
The identified reduction, however, does not necessarily imply that the overall share of unreported wages has declined. In fact, existing survey-based evidence (Putnins and Sauka, 2021) indicate that the size of the shadow economy as a share of GDP remained roughly constant over the 2013-2019 period, and that there was no reduction in the contribution of the “envelope wages”. With our method, we are estimating the share of firms likely to be involved in labor tax evasion. Unlike the survey approach, our methodology does not allow the measurement of tax-evasion intensity. In other words, the share of non-tax compliant firms may have decreased, but the size of the envelope may have increased in firms involved in this scheme.
Next, we disaggregate the share of tax-evading firms by the NACE sector. Figure 2 displays the results obtained with Random Forest, our best performing algorithm.
Figure 2. Share of tax-evading firms by NACE, based on Random Forest
First, the sector where tax evasion is the most prevalent is the accommodation/food industry, where the predicted share of tax-evading firms is 70-80%. Second, our results indicate that the overall decrease in the share of firms likely to evade is not uniform. It is mostly driven by the accommodation/food and manufacturing sectors. Other sectors remain nearly flat. This highlights the fact that labor tax evasion varies both in levels and in changes across sectors.
We show that machine learning techniques can be successfully applied to administrative firm-level data to detect firms that are likely to be involved in (labor) tax evasion. Machine learning techniques can be used to improve the selection of firms to audit in order to maximize the probability to detect tax-evading firms, in addition to the regular practices already used by SRS. Our preferred algorithms – Random Forest and Gradient Boosting – outperform the naive benchmark classification by almost 20 percentage points, which is a substantial improvement. Once implemented, the use of these tools can improve the audit effectiveness at virtually no extra cost.
Our findings also suggest a promising path for further improvements in the application of such methods. The improvement in predictive power achieved by our proposed algorithm is attained by using a limited set of variables readily available from the firms’ balance sheets. Given that SRS is likely to have access to more detailed firm-level information that cannot be provided to third parties, there is clear room for improving the performance of the algorithms by using such limited-access data.
Acknowledgement: The authors gratefully acknowledge funding from the Latvian State Research Programme “Reducing the Shadow Economy to Ensure Sustainable Development of the Latvian State”, Project “Researching the Shadow Economy in Latvia (RE:SHADE)”; project No VPP-FM-2020/1-0005.
- Athey, Susan, and Guido Imbens. 2019. “Machine Learning Methods That Economists Should Know About.” Annual Review of Economics 11: 685–725.
- Cecchini, Mark, and Haldun Aytug, and Gary J. Koehler, and Praveen Pathak, 2010. “Detecting management fraud in public companies“. Management Science 56, 1146-1160.
- European Commission, 2020. “Undeclared Work in the European Union. Special Eurobarometer 498” (Report)
- Gavoille, Nicolas and Anna Zasova, 2022. “Estimating labor tax evasion using tax audits and machine learning”, SSE Riga/BICEPS Research papers, forthcoming.
- Putnins, Talis, and Arnis Sauka, 2021. “Shadow Economy Index for the Baltic Countries 2009–2020” (Report), SSE Riga
- Ravisankar, Pediredla, and Vadlamani Ravi, and Gundumalla Raghava Rao, and Indranil Bose, 2011. “Detection of financial statement fraud and feature selection using data mining techniques“. Decision Support Systems, 50(2), 491-500.
- West, Jarrod, and Maumita Bhattacharya, 2016. “Intelligent financial fraud detection: a comprehensive review“. Computers & security, 57, 47-66
Disclaimer: Opinions expressed in policy briefs and other publications are those of the authors; they do not necessarily reflect those of the FREE Network and its research institutes.
Taxes and benefits create incentives for people to adopt or avoid certain behaviours. They create premiums for (socially) preferred states. A premium can be determined by either taxing unwanted behaviour or by subsidizing desired behaviour. The resulting economic incentive for changing one’s behaviour is nominally equivalent under both mechanisms. However, the choice of frame for an incentive to be either described in terms of a tax or as a benefit can strongly influence perceptions of what is fair treatment of different, e.g. income, groups. Using a survey-experiment with Flemish local politicians, we show policy-makers to be highly susceptible to such tax and benefit framing effects. As such effects may (even unintendedly) lead to sharply different treatment of the same group under the two mechanisms, important questions arise, particularly for the design of new tax and benefit schemes.
The design and implementation of redistributive policies usually evoke much discussion. Opinions, both in public and often also in political debate, tend to be driven by ethical and fairness considerations. However, such concerns can lead to unintended consequences and – at least in terms of ex-ante intended fairness – to ex-post imbalanced incentive structures for different (income) groups.
An important function of taxes and benefits is the creation of premiums for certain behaviours or actions. Either unwanted behaviour may be taxed and thereby sanctioned, or desired behaviour may be encouraged through benefits. Irrespective of the method chosen, an economic incentive is created for individuals to opt for the desired behaviour.
The way such premiums are defined can usually be thought of as a two-step process. First, a baseline for a given behaviour, action, or state is chosen as a reference-point. For instance, baseline behaviours could be to not have retirement savings, to not use safety-certified equipment or follow accepted standards at work, or to not have children. Arguably, these are cases warranting the creation of incentives to encourage people to adopt the socially desirable behaviours of saving money for their old age, working in a safe environment, and having children. The second step, then, requires a choice of mechanism to create an incentive. The mechanism can be to either punish the unwanted behaviour – such as not adhering to safety standards at work – or to grant (cost-reducing) subsidies and benefits for taking the desired action, such as saving for old age or having children.
Importantly, the combination of the chosen reference point and the mechanism to create the incentive can influence the way people think about the fairness of an incentive when the targets belong to different (income) groups. Schelling (1981) demonstrated this point in an in-class experiment, which, somewhat simplified, runs as follows:
Families typically receive some child benefit: they get a certain sum per child. Imagine there are two families, one poor and one rich, both with their first child. What amounts of child benefit should each family get? Should the poor get more than the rich, should both families get the same, or should the rich family get more for having a child than the poor family? Schelling’s students would tend to voice support for either the poor getting more or both families getting the same. After all the rich family is surely already affluent enough to support their child. At the extreme, the rich family would get nothing for having a child, and the poor family quite a lot.
Now think of a world where the standard is to have a child, and couples who do not have a child have this ‘socially undesirable’ behaviour ‘penalised’ through a fee, for instance in the form of a tax. Should the poor couple pay a higher fee, should both couples pay the same, or should the rich couple pay a higher fee? The students now overwhelmingly supported requiring the rich couple to pay more. After all, they have more disposable income. However, in this case, the rich couple receives a lot for having a child (they no longer need to pay the steep fee), whereas the poor family may get no (additional) economic incentive for having a child. The treatment of the same family thus obviously drastically differs between the two frames. At the extreme, the poor family gets quite a lot for changing from having no children to having one child in the first frame, but nothing in the second frame. For the rich family, the situation is the reverse: there is no premium for having a child in the first frame, but potentially quite a high premium for having a child in the second frame.
Does this thought-experiment matter outside the classroom (see also Traub 1999, McCaffery & Baron 2004), beyond the context of child benefit, and among those actually exposed to the design considerations of tax and benefit systems? In a recent paper (Kuehnhanss & Heyndels 2018), we test the occurrence of such framing effects with elected local politicians in Flanders, Belgium, who are involved in the budgetary decision-making in their municipalities.
We invited 5,928 local politicians to take part in an online survey on economic and social preferences in spring 2016. Participation was voluntary, not incentivised, and questions were not compulsory, allowing respondents to skip them if they so chose. In total, 869 responses to the survey were registered and (N1=) 608 participants provided usable answers to the questions relevant to the framing effect described above.
Participants were randomly allocated to one of two groups, each receiving a slightly different wording of the following question:
“In Belgium couples receive financial benefits from the state. Suppose that it is not relevant how the transfer is funded, and ignore any other benefits, which might come into play. How much [more / less] should a couple [with their first child / without children] receive per month than a couple [without children / with their first child]?”
One group saw the question in the benefit frame with only the italicised phrases in the brackets displayed; the other group saw the question in the tax frame with only the phrases in boldface displayed. In both groups, participants were then asked to fill in amounts they would consider appropriate for each of three couples with different monthly net incomes: €2,000, €4,000, or €6,000, respectively.
With framing effects – and distinct from classic rational choice models – the expectation is that the three couples would be treated differently depending on the phrasing of the question. In the italicised benefit version the amount granted should be decreasing with the income of the family. In the boldface tax version the stated amount should be increasing with the families’ income.
Figure 1. Results child scenario
Source: Kuehnhanss & Heyndels (2018, p.32)
As Figure 1 shows, the results strongly conform to this pattern. The low-income (€2,000) couple is granted an average of €330 in the benefit frame, but only €178 in the tax frame (recall that the premium in the latter arises from no longer receiving less – or ‘paying a fee’ – once there is a child). For the high-income (€6,000) couple, the amounts granted average €132 in the benefit frame, but a much higher €368 in the tax frame.
Environmental taxes and benefits
Child benefit systems are usually a well-established part of countries’ tax and benefit systems. The design of new instruments is more common in policy areas undergoing, for instance, technological change or being newly regulated. A relevant example is policy on the promotion of environmentally friendly behaviour and technologies, e.g. through ‘green’ taxes and subsidies. To test the validity of the hypothesised framing effect, we also included a second scenario in our survey related to the municipal interests of our respondents, namely car taxes. Flemish municipalities receive income from a surcharge levied on the car taxes paid by motorists. Consequently, we asked our participants (N2 = 525, see the paper for details) to imagine the introduction of a new environmental certificate for cars in Belgium, and to provide amounts they would consider appropriate for the difference in annual tax paid on cars with or without the certificate. Specifically, roughly one half of participants was asked how much less the owner of a certified car should have to pay in annual car tax than the owner of a non-certified car (the subsidy frame). The other half was asked how much more the owner of a non-certified car should pay in annual car tax than the owner of a certified car (the tax frame). The question was again asked for three different levels, proxying wealth via the cost of the cars: €15,000, €30,000, and €45,000, respectively.
Figure 2. Results car scenario
Source: Kuehnhanss & Heyndels (2018, p.32)
Figure 2 shows the results. The effect is less pronounced in this scenario, as the slope for the granted amounts in the subsidy frame remains largely flat or slightly increases. Nonetheless, a substantial framing effect remains. In the tax frame, the amount of the premium (i.e. the amount of taxes no longer owed once a certificate is obtained) strongly increases with the cost of the car. Taking the most expensive car (€45,000) as an example, we thus observe differential treatment across frames also in this scenario. In the subsidy frame, the premium for having a certificate is €778, in the tax frame it is a much higher €1,333.
These results suggest a strong and economically meaningful effect of framing among policy-makers with a stake in tax and benefit systems. While the exact mechanism driving the results invites further research, the strongly divergent premiums, and hence distribution of incentives, across baseline frames raise concerns of unintended effects in the design of taxes and benefits. Especially new schemes – e.g. ‘green’ policy, reform, or regulatory expansion – may benefit from increased scrutiny in the design process. Awareness of susceptibilities to framing and its potential influence on the formulation of individual tax and benefit instruments may help to align intended fairness, incentive structures, and redistributive outcomes.
- Kuehnhanss Colin R.; and Bruno Heyndels, 2018. ‘All’s fair in taxation: A framing experiment with local politicians’ Journal of Economic Psychology, 65, 26-40.
- McCaffery, Edward. J.; and Jonathan Baron, 2004. ‘Framing and taxation: Evaluation of tax policies involving household composition’ Journal of Economic Psychology, 25(6), 679–705.
- Schelling, Thomas C., 1981. ‘Economic reasoning and the ethics of policy’ Public Interest, 63, 37–61.
- Traub, Stefan, 1999. Framing Effects in Taxation. Heidelberg: Physica-Verlag