Appendix I. Strategic Choice in Compliance Planning
’If everything is important then nothing is.’ Dwight Eisenhower.
Running CRM in a reactive, ‘business-as-usual’ mode is not effective. Prevention of significant non-compliance through strategic interventions is usually far more cost effective in improving long-term compliance. Business-as-usual compliance is the expected day-to-day compliance interventions undertaken at an operational level because a taxpayer has ‘hit’ some risk rule in the system – for example VAT high-risk refund rules or mandatory audits for liquidating companies etc. Left to themselves operating areas will ‘fill’ their workload to capacity with such work. The operating system evolves into a relatively steady state equilibrium and it is difficult to dramatically improve compliance doing what you have always done.
Strategic CRM is a deliberate choice by the tax administration executive to invest in future compliance by deploying resources from operational work into strategic projects targeted at a particular risk, region or industry segment with a view to making a significant change in compliance levels.
Often strategic compliance interventions will entail a campaign approach – the considered use of multiple compliance approaches, preventative and corrective, service and enforcement, to the target risk population. It may involve the use of media awareness campaigns, industry seminars and lower rates of penalties for those non-compliant taxpayers that come forward within particular timeframes before ramping up enforcement efforts. Ideally ‘before and after’ research audits will be used to identify changes in underlying compliance levels. Planning and designing these specific interventions takes time and this needs to be built into an annual strategic planning process.
Appendix II. Types of Machine Learning
Appendix III. Potential Uses of Predictive Modeling in Tax Administration
As the OECD19 and others20 point out, the emergence of advanced analytics, with its ability to examine data or content using sophisticated approaches such as pattern recognition, outlier detection, cluster analysis, experimental design, network analysis, and text mining, has opened new opportunities for the use of intelligence across ail aspects of revenue administration. Other applications (apart from case selection) for advanced analytics include:
✓ Predict who should be registered.
✓ Predict revenue associated with a non-registered person.
✓ Predict who will file late before the event.
✓ Predict who will file once they are late (self-finalize).
✓ Predict revenue associated with late or non-filing.
✓ Predict who is non-compliant (likelihood) for each tax type.
✓ Predict size of potential adjustment (consequence).
✓ Predict high risk refunds.
✓ Predict who will object to an amended assessment.
✓ Support text and social network mining in audit cases.
✓ Predict who will pay late before the event.
✓ Predict who will pay late but before intervention (self-finalize).
✓ Predict who will pay given alternative interventions (phone, mail, visit, court action etc.).
✓ Predict capacity to pay and propensity to pay.
✓ Predict business viability (see BVAT21 model on the ATO website).
✓ Taxpayer channel use to inform design decisions and identify self-service opportunities.
✓ Improve service delivery using proactive messaging, calling, and other interventions.
✓ Predict likelihood of smuggling contraband, drugs etc. y Predict behavior of traders and passengers.
✓ Tax gap measurement.
✓ Assessing or forecasting the impact of changes in tax policy.
Appendix IV. Practical Tips Regarding Predictive Model Use
Practical Tips regarding predictive analytical model development and use:
Use a formal process, such as CRISP_DM to document data mining efforts for better analysis, consistent reuse and learning.
Where possible have/use categorical data rather than free text fields.
Where possible use ordinal data (categories that have an order) rather than categorical data
Where possible use interval data (numeric with meaningful spacing) rather than ordinal data.
Manipulate and transform mirror data sets – not the original data.
Save SQL scripts for consistent data retrieval, reuse and process documentation.
If faced with extremely large data sets, use representative sampling to reduce the size analyzed.
If the target variable is rare (e.g. <10%) consider oversampling the target, under sampling the negative class or generate additional ‘synthetic’ examples using the SMOTE22 node in KNIME.
Always review the data and understand the distributions involved, particularly of target variables.
Modeling / Mining
Use a good ‘out of the box’ algorithm, such as Random Forest’ initially. As expertise develops explore the use of other modeling approaches to see if they can improve predictions over part of the data. (It is usually hard to beat Random Forest in practice.)
Use ensemble approaches (taking the best predictions from multiple models) when appropriate.
To reduce ‘noise’ evaluate and then eliminate variables that don’t provide predictive ability. (The Meta Node provided for Random Forest ‘variable importance’ indicates the relative use of variables in the random forest.)
Some modeling approaches work best with normalized interval data. Explore whether such transformations improve the predictions. (KNIME Normalizer node.)
To reduce the number of low value cases selected iteratively raise the threshold for a ‘strike’ and evaluate the results until a suitable balance between strike rate and caseload is obtained. (The median strike value is often a good starting point.) If the threshold reduces the target percentage below ~10% consider over / under sampling or SMOTE (synthetically generating targets using a SOM clustering algorithm) to rebalance the data set.
Once a robust predictive model has been built and evaluated, for the deployment build push the ‘training partition’ to 100% and re-execute the learner node to maximize the deployed models’ predictive ability.
Use a single regression tree (CART) to provide a broad explanation of what the more accurate ‘black box’ model is doing. It won’t be exact nor always ‘correct’ but should provide a useful indication.
A model’s predictive ability degrades over time as the underlying economy changes. Revisit models every six months with additional data to see if they need to be rebuilt or enhanced.
Use a small random component (using stratified random sampling) to maintain intelligence on new risks and monitor model performance over time. A random component also assists in estimating tax gaps and prioritizing compliance campaigns.
Appendix V. Analysis of SRC Case Selection and how to improve it
Appendix VI. Glossary of Technical Terms
Refer to Koukpaizan et al. Improving Tax Compliance Risk Management, May 2018.
Refer to Grote et al. Growth-Friendly Rebalancing of Taxes, October 2018.
Grote 2018, op. cit.
The VAT threshold for 2018 was AMD 115 million but has been reduced to 58.3 million in 2019. The government plans to reverse the reduction of the VAT threshold in the forthcoming tax reform package.
”Responses of Firms to Tax, Administrative and Accounting Rules: Evidence from Armenia”, Zareh Asatryan and Andreas Peichl, CESifo Working Papers, November 2017.
Grote 2018, op. cit.
The mission was provided the draft strategy plan but was not able to discuss it with the SRC management during its stay.
While the term revenue administration covers both tax and customs administrations, some of the information in this Box is more specific to the features of tax administration.
For more information on the legal design of an advance tax ruling regime, see Waerzeggers, Christophe and Cory Hillier, 2016, “Introducing an advance tax ruling (ATR) regime—Design considerations for achieving certainty and transparency,” Tax Law IMF Technical Note Volume 1, 2/2016, IMF Legal Department.
Except as an outcome of bankruptcy procedure or liquidation of a business entity.
Koukpaizan 2018, op. cit.
Advanced Analytics for Better Tax Administration: Putting Data to Work, OECD Publishing, Paris. http://dx.doi.org/10.1787/9789264256453-en
Armenia, Azerbaijan, Belarus, Georgia, Moldova and Ukraine.
See MOF’s November 1, 2018 request and FAD’s response dated December 4, 2018.
“Advanced Analytics for Better Tax Administration: Putting Data to Work, OECD Publishing, Paris. http://dx.doi.org/10.1787/9789264256453-en.
IOTA (2017), Good Practice Guide – Applying Data and Analytics in Tax Administrations, IOTA, Budapest; and WCO News (February 2017), Data Analysis for Effective Border Management, WCO, Brussels.
Business Viability Assessment Tool. https://www.ato.aov.au/calculators-and-tools/business-viabilitv-assessment-tool/.
Random Forest – a ML algorithm was used to create multiple decision trees (the Forest) that essentially ‘vote’ on the appropriate classification. The Random Forest algorithm is good ‘out of the box’, tolerant to missing data, uses both numerical and categorical data and is resistant to over fitting. https://en.wikipedia.org/wiki/Random_forest