AI and Taxes — A Work in Progress: Part 1

Joyce Beebe

Share this Publication

This is the first of two issue briefs by Joyce Beebe on the use of AI in the tax field. Part 2 can be found here.

Artificial intelligence (AI) has been getting a lot of attention, especially generative AI and its most well-known application, OpenAI’s ChatGPT.[1] Since its public debut in November 2022, ChatGPT has shown phenomenal growth: The chatbot reached a million users in five days and recorded over 100 million users within two months. In comparison, it took Instagram 2 1/2 years to reach the hundred million-user mark.[2]

ChatGPT’s capabilities are impressive. The latest version — GPT-4, released in March 2023 — can understand scratchy handwriting, arrange meetings, and read and describe images. In addition, it scores higher in many standardized tests (including bar exams, the SAT, and graduate record exams) than its predecessor, GPT-3.5.[3] GPT-4 can even handle some tax-related tasks. If that is not impressive enough, it can exhibit a sense of humor and, when asked, can turn a tax question into a rhyming poem. [4]

Some believe these AI tools will boost labor productivity,[5] whereas others caution that AI development is out of control and government regulation is necessary.[6] There is no doubt that AI will have profound impacts on many aspects of daily life, and tax is no exception, as outlined in this two-part issue brief series. This first brief reviews the impact of current AI developments on tax research and compliance, and the second brief discusses recent efforts to use AI tools to inform tax policymaking and congressional attempts to regulate AI while still encouraging innovation in the U.S. Together the briefs will highlight the most disruptive perspectives of AI in these areas and what issues to pay attention to when adopting AI.

The Remarkable Growth of Generative AI

In non-technical terms, generative AI is a type of AI technology that can receive inputs and produce several kinds of outputs, including text, image, video, or synthetic data in response to various prompts. This technology is not new, but technical advances have enabled its proliferation and sophistication.[7] The outputs of these algorithms are presented in the form of natural language chatbots, which have the ability to generate human-like, conversational outputs. This feature greatly facilitates its popularity and makes it highly approachable.

It is important to bear in mind that developing a powerful generative AI algorithm requires massive amounts of training data. The training datasets let the model learn the underlying structure and context of language and recognize patterns from these datasets. (See an interactive illustration from The New York Times.[8] ) Well-established AI models, such as ChatGPT, are trained on hundreds of billions of words of text. Once trained, an algorithm can generate new content by mimicking the patterns it has learned, and developers can continue to input new data to fine tune a model.[9]

The massive investment in AI reflects the popularity of this technology. Some researchers estimate that in 2021, private investment in AI was $53 billion in the U.S. and $94 billion globally.[10] These amounts show a five-fold increase between 2017 and 2021. Besides ChatGPT, Google’s Bard, Meta’s Blenderbot, Microsoft’s Bing AI, and Writesonic’s ChatSonic are a few well-known alternatives.

With the advancement of AI, a common concern is whether ChatGPT and other similar technologies will replace human jobs. A frequently cited Goldman Sachs study estimates that if the current trend continues, AI may substitute up to 25% of current jobs in the U.S. and Europe.[11] Although this prediction is dismal, the report states that there is a silver lining: The combination of significant labor cost savings, new job creation, and higher productivity for non-displaced workers will enhance overall labor productivity, which in turn will propel economic growth.

AI in Tax Research

Some in the tax profession wonder if AI can prepare tax returns and draft legal briefings, therefore replacing certified public accountants (CPAs) and tax attorneys. That could be the case in the future, but not yet.

In early 2023, a New York lawyer used ChatGPT for legal research and was provided with fake legal citations. The attorney failed to verify these results and submitted these non-existent cases in his briefing to the court. Although the judge recognized “the court [was] presented with an unprecedented circumstance,” the attorney was sanctioned.[12] This situation reveals that ChatGPT is not yet entirely trustworthy and should be used with caution. More of its limitations — and benefits — are highlighted below.

Limits

Several researchers recently tested generative AI’s tax technical knowledge by submitting a series of tax law questions to different generative AI models,[13] and the results highlighted the importance of using this technology carefully.[14] Common areas of concern were that generative AI could:

Fabricate information (commonly known as hallucination).
Present false information.
Not be fully transparent about the source of information.

Hallucination and False Information. The hallucination problem is one of the limitations recognized by OpenAI itself. Essentially, users can ask ChatGPT any question and it will always respond with something. Even if it does not know the answer, it will provide a response — which may be false. Because ChatGPT responds confidently in a fluent manner, users generally assume the answer is accurate. This is especially problematic for non-experts who lack subject matter knowledge and may not be able to discern inaccurate information. For example, when a group of researchers asked three different AI models tax research questions, some generative AI models confidently returned answers that were completely wrong. In certain cases, the technology presented a tax code section that does not exist, or simply grabbed sentences that appear close to given keywords.

Lack of Analysis and Transparency. In addition, some observers noted that generative AI’s responses to tax questions lacked proper analysis. After reviewing ChatGPT’s responses, practitioners realized it had simply assembled relevant sentences from sources across the internet.[15] In most cases, it did not provide analysis as to how the facts apply to relevant tax regulations. In situations where there were analyses, it was not always easy to identify where the information came from. This lack of transparency makes it difficult for users to verify the accuracy of information, so professionals need to exercise caution when reading generative AI’s responses at face value.

In many ways, the results are not surprising. When a legal term has a particular meaning in a specific context, a general-purpose generative AI model is likely to produce results in the wrong context. In addition, the interactions between different tax code sections may be complex or not apparent. Researchers point out that some provisions may expand or provide definitions for other sections, while others may contain specifics that override general rules. Generative AI models may not be able to see these hidden or tangled relationships. Besides, various forms of guidance — for example, court rulings, codes, and IRS publications — may have differing levels of legal authority and carry different weight in tax and legal settings. Further training with specific knowledge is necessary to improve generative AI’s tax capabilities.

Benefits

Although ChatGPT is not yet capable of writing legal briefs or being a tax expert, it does not mean AI won’t affect how tax professionals and attorneys conduct their daily activities. Researchers recognize that generative AI has plenty of benefits to offer. It can:

Keep professionals updated on changing rules and regulations.
Review large volumes of complex documents.
Earmark inconsistencies that have gone unnoticed.
Simply function as a sanity check.

Keeping Track of Rules and Regulations. AI tax research tools can not only keep track of new tax rules, but they can also monitor the superseded ones. A tax return may not be audited until several years after filing, and tax laws may change between the filing date and the audit date. For instance, during the pandemic, tax provisions were created and expired within a few years. Practitioners indicate that AI can help produce profiles that track the rules relevant to taxpayers’ specific situations and timeframe, even if the tax rules have been overridden.[16]

AI Can Help (Or Even Be) A Junior Team Member. Generally, researchers believe AI can add the most value for junior team members, or even function as a junior team member. Early career lawyers and accountants can benefit the most from using AI research tools, allowing them to quickly achieve a high level of competency.

For instance, AI can be used to automate mundane but essential tasks normally performed by junior team members. In the indirect tax field (i.e., VAT, sales, etc.), high volume routine questions and checkups such as filing deadlines, whether a return has been filed for a certain jurisdiction this month, or how much VAT should be charged for a particular transaction, can be accomplished by AI.[17] Tax research tools or lengthy documents can also be presented as AI-powered chatbots to help professionals retrieve data and information more easily.

Overall, AI still cannot replace accountants and attorneys, but it is capable of substituting some functionalities performed by these professionals. The best way to approach AI is to adopt the tools that complement human worker functions — indeed, there is a global trend toward wider adoption.[18] With AI’s help, professionals can spend less time collecting and entering data, and more time on value-added services like decision-making or advising clients.

AI in Tax Administration and Compliance

Tax administrations around the world have been actively pursuing ways to use AI to improve the efficiency, accuracy, and quality of their work. Many hope that AI can help detect tax evasion and narrow the tax gap,[19] which in the U.S. was roughly $500 billion per year between 2014 and 2016.[20]

Tax authorities are particularly interested in machine learning (ML),[21] a subfield of AI, for its ability to decipher layers of seemingly unrelated information.[22] For instance, ML can parse though convoluted partnership structures and predict which entities are more likely to be non-compliant and underpay taxes.[23] Academic studies also identify the potential of using ML to better identify multinational companies’ tax haven operations.[24]

IRS Initiatives

Long History of Data Use. The Internal Revenue Service (IRS) has a long history of using data to fulfill its mission. Over the years, these efforts have included statistical methods, analytical tools, rule-based algorithms, and an AI lab. [25] As the complexity of business operations advances over time, the agency faces additional needs to identify hidden relationships in complex business structures, detect anomalies in data with limited transparency, and manage the growing number of tax returns. Addressing these issues will not only require tools such as advanced analytics and AI, but also human changes, including putting people with analytical skills into leadership roles, changing existing agents’ mindsets to be data-driven, and training non-IT professionals to understand the analytics.

Recent Success of Return Review Program (RPP). A recent success was the agency’s Return Review Program (RPP) for the 2017 filing season: It used both supervised and unsupervised machine learning methods to detect noncompliance (including questionable refunds on individual income tax returns) using a combination of conventional approaches and machine learning.[26] Between 2009 and 2019, the IRS invested $597 million in the system. From 2015 to 2019, the RPP prevented the issuance of $11 billion in invalid refunds. As such, the researchers concluded RPP harvested an 18-fold return during the first decade of the program, which is highly encouraging.

A Low “No-Change Rate” as an Indicator of a Good Audit System. However, researchers also specified that a better indicator for a good algorithm-based audit system would not only consider a high return on investment, but also a low “no-change rate.” A high no-change rate means a large share of audited returns were settled in the taxpayers’ favor, so they created a burden for taxpayers and consumed resources for the IRS without contributing to tax collection.[27] This also implies the system may not be good at targeting or selecting cases for audits. The no-change rate associated with the RRP is not available, but this factor should be a matrix for future considerations, the researchers argue.

Concerns About IRS Use of AI. Many believe that the IRS is in an ideal position to implement AI because it has access to massive amounts of taxpayer data. This is true, but the agency is aware of the uneven quality of data. In other words, relying on historical data as the training dataset for audit selection means the algorithm could be influenced by prior audit patterns. In addition, the IRS needs to address taxpayers’ transparency concerns when expanding the applications of AI. Because AI algorithms are based on the structures and patterns of the underlying datasets instead of economic principles, it is even more critical that IRS staff or data scientists are capable of conveying outcomes to taxpayers.

The Legislative and Administrative Trend. The overall legislative and administrative trend is supportive of using additional analytics-driven approaches under proper governance. In 2018, a new law was passed that requires federal agencies to improve data and analytical governance.[28] Subsequently in 2019, an executive order was issued outlining steps to maintain U.S. leadership in AI.[29] The National Artificial Intelligence Initiative Act of 2020 codified the establishment of a national AI initiative and associated federal offices and committees.[30]

All these considerations, combined with advancements in technology, mean the IRS can definitely benefit from AI-based tools. Although there will be a steep learning curve, if properly implemented, the overall outcome will be more efficient tax administration and better taxpayer experience.

OECD Efforts

An Organization for Economic Cooperation and Development (OECD) report on tax administration shows that, as of 2022, more than 40 tax administrations globally are either already using AI or planning to.[31] As tax administrations become more comfortable with handling large amounts of data and advanced technology, they are adopting these applications at an accelerated rate. Between 2018 and 2020, 16% more administrations were using tools with embedded AI and ML technologies, and 14% more were planning to do so.

Use of AI in Tax Administration. Tax administrations primarily leverage AI’s capability through deploying chatbots to provide information, using algorithms to detect irregular patterns, and prioritizing enforced collection when taxpayers show signs of default. Specifically, over 75% of tax administrations (out of 52 jurisdictions surveyed) report that they are already using AI and ML to exploit data in ways that can uncover previously hidden assets or identify new risks, with the ultimate goal of reducing tax evasion and fraud.

Some of these efforts, such as the Canadian Revenue Agency’s “Charlie” chatbot, are still works in progress,[32] whereas others have already generated more quantifiable results. For example, the French tax administration used AI and satellite images to detect 20,000 undeclared swimming pools, resulting in 10 million euros of additional property tax revenue. The Italian Revenue Agency was authorized to use an algorithm that cross references financial data with tax filings, earnings, property records, bank accounts, and other electronic payment information to detect taxpayers with elevated risks of non-paying. This led to identification of 1 million high-risk cases and prevented fraud amounting to $6.85 million in 2022.[33]

The Need for Caution

The European tax authorities welcome the benefits of AI; however, they also need to observe the European Union’s privacy law — the General Data Protection Regulation (GDPR) — when implementing these tools. In particular, when using AI-based algorithms to collect data, the procedure must be transparent, with consent, and have limited and specific purposes. When processing data, personal information must be anonymized, and procedures to ensure proper protection must be established.

Many tax authorities also learned from a recent experience in the Netherlands. In 2013, the Dutch tax authority began using a self-learning algorithm to process childcare benefit applications. The algorithm not only checked for proper form use and accuracy of data, but also reviewed the applications for the risk of fraud. The intent was to detect benefits fraud at an early stage so childcare subsidies would only go to the proper recipients.[34]

However, it was later discovered that the tax authority would penalize families with suspected fraud based simply on the algorithm’s risk profile indicator. Some recipients were asked to return their benefits without an opportunity to appeal. Because the system was not transparent, denied applicants had no way of knowing why they had been flagged. An investigation later revealed that the indicator classified a disproportionate number of low-income and dual-nationality applicants as high risk.

As a result, the tax administration was fined by the Dutch data protection agency, and the prime minister resigned in 2021. This incident reveals how solely relying on AI without proper human safeguards could lead to devastating results. Observers caution that while AI can be a useful tool to streamline tax compliance or risk evaluation process, AI is only as good as the people who use it. AI cannot serve as a shield to evade accountability.[35]

Conclusion

Remarkable development in AI and similar technologies is changing the tax research and compliance landscape. AI can complement tax professionals’ abilities by updating them on regulation changes, helping them review voluminous complex documents, and doing basic tasks on their behalf. Currently, however, human workers need to use these AI functions with caution primarily due to the hallucination and transparency issues.

Global tax authorities are actively seeking to deploy or expand AI-type of tools in their daily operations. Some tax agencies have seen measurable results, whereas others continue to polish their algorithms. They must stay alert to the limitations of AI — its results depend on the training data used and human oversight, and sometimes intervention, is still required. In addition, safeguarding taxpayer data and privacy concerns are key considerations when it comes to establishing an algorithm-based audit system.

Endnotes

[1] GPT stands for generative pre-training transformer. See the section below for a brief discussion about the technical functionality.

[2] Krystal Hu, “ChatGPT Sets Record for Fastest Growing User Base – Analyst Note,” Reuters, February 2, 2023, https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/.

[3] OpenAI, GPT-4 Technical Report, 2023, https://cdn.openai.com/papers/gpt-4.pdf.

[4] OpenAI, GPT-4 Developer Livestream, March 2023, https://www.youtube.com/watch?v=outcGtbnMuQ. At 19:20 of this video, GPT-4 was provided 16 pages of tax code and calculated a couple’s tax liability, and starting at 22:30 it summarizes the tax problem in a rhyming poem.

[5] Jennifer Liu, “Stanford and MIT Study: A.I. Boosted Worker Productivity by 14% — Those Who Use It ‘Will Replace Those Who Don’t,’” CNBC, April 25, 2023, https://www.cnbc.com/2023/04/25/stanford-and-mit-study-ai-boosted-worker-productivity-by-14percent.html.

[6] Claire Duffy and Ramishah Maruf, “Elon Musk Warns AI Could Cause ‘Civilization Destruction’ Even as He Invests in It,” CNN, April 17, 2023, https://www.cnn.com/2023/04/17/tech/elon-musk-ai-warning-tucker-carlson/index.html.

[7] Examples of these technologies include transformer and large language model (LLM). Transformer is a type of machine learning that made it possible for researchers to train large models without having to label all of the data in advance. LLM can perform a variety of natural language processing tasks (such as classifying text, translating language, and answering questions) and process massive amounts of data. See George Lawton, “What is Generative AI? Everything You Need to Know,” TechTarget, accessed July 15, 2023, https://www.techtarget.com/searchenterpriseai/definition/generative-AI.

[8] Aatish Bhatia, “Watch an AI Learn to Write by Reading Nothing but Jane Austen,” New York Times, April 27, 2023, https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html.

[9] Kristen E. Busch, “Generative Artificial Intelligence and Data Privacy: A Primer,” R47569, Congressional Research Service, May 23, 2023, https://crsreports.congress.gov/product/pdf/R/R47569.

[10] Daniel Zhang, Nestor Maslej, Erik Brynjolfsson, John Etchemendy, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Michael Sellitto, Ellie Sakhaee, Yoav Shoham, Jack Clark, and Raymond Perrault, “The AI Index 2022 Annual Report,” AI Index Steering Committee, Stanford Institute for Human-Centered AI, Stanford University, March 2022, https://aiindex.stanford.edu/wp-content/uploads/2022/03/2022-AI-Index-Report_Master.pdf.

[11] Joseph Briggs and Devesh Kodnani, “The Potentially Large Effects of Artificial Intelligence on Economic Growth,” Goldman Sachs, March 26, 2023, https://bit.ly/47DAnvR.

[12] Ramishah Maruf, “Lawyer Apologizes for Fake Citations from ChatGPT,” CNN, May 28, 2023, https://www.cnn.com/2023/05/27/business/chat-gpt-avianca-mata-lawyers/index.html.

[13] Benjamin Alarie, Kim Condon, Susan Massey, and Christopher Yan, “The Rise of Generative AI in Tax Research,” Tax Notes, May 29, 2023, https://www.taxnotes.com/tax-notes-federal/tax-technology/rise-generative-ai-tax-research/2023/05/29/7grqz (subscription required).

[14] These models are: GPT-3.5, GPT-4, and Ask Blue J. The last model is fine-tuned as a tax specialty AI model. (The specialty AI was trained on legal terms and ensuring the model understands the language specific to the tax domain — it is also updated continuously for information and trained to not hallucinate.)

[15] Daniel Mayo, “Can I Replace My Tax Advisor with ChatGPT?” Forbes, February 18, 2023, https://www.forbes.com/sites/danielmayo/2023/02/18/can-i-replace-my-tax-advisor-with-chatgpt/?sh=56e267802ad9.

[16] Alarie et al., “The Rise of Generative AI in Tax Research.”

[17] Liam Larke and Emmie Nygard, “Artificial Intelligence: Transforming the World of Indirect Tax,” Tax Advisor, April 20, 2023, https://www.taxadvisermagazine.com/article/artificial-intelligence-transforming-world-indirect-tax.

[18] A Dubai-based company recently launched TaxGPT to use AI that pulls information from UAE’s Ministry of Finance and Federal Tax Authority to help business navigate tax regulations and future changes. See Virtuzone, “Virtuzone Launches TaxGPT – the World’s First AI-Powered UAE Corporate Tax Assistant,” May 24, 2023, https://www.vz.ae/press-release/virtuzone-launches-taxgpt/.

[19] “The gross tax gap is the difference between true tax liability for a given tax year and the amount that is paid on time:” see https://www.irs.gov/statistics/irs-the-tax-gap.

[20] IRS, “Tax Gap Estimates for Tax Years 2014-2016,” last updated October 28, 2022, https://www.irs.gov/newsroom/the-tax-gap. The tax gap is expected to increase to $540 per year for 2017-2019.

[21] Many people have already experienced consumer-oriented applications of ML. For instance, in advertising, social media websites use ML to predict what consumers like.

[22] According to ChatGPT, “AI encompasses a broad range of techniques focused on creating intelligent systems, while machine learning specializes in developing algorithms that learn from data to make predictions or decisions, and Generative AI leverages machine learning to generate original and realistic content.” Generative AI for Rackspace Technology, “Understanding the Distinctions between Artificial Intelligence, Machine Learning and Generative AI,” June 14, 2023, https://www.rackspace.com/blog/distinctions-ai-ml-generative-ai.

[23] Alexis Leondis, “AI Can Help the IRS Catch Wealthy Tax Cheats,” Washington Post, March 9, 2023, https://www.washingtonpost.com/business/2023/03/09/ai-can-help-the-irs-catch-wealthy-tax-cheats/7bae3784-be8a-11ed-9350-7c5fccd598ad_story.html.

[24] Kelvin Law and Lillian F. Mills, “Taxes and Haven Activities: Evidence from Linguistic Cues,” The Accounting Review, September 1, 2022, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2768605. The authors find companies with tax haven operations that cannot be identified from reviewing Exhibit 21 of the annual reports. In Exhibit 21 companies are required to list all existing significant subsidiaries and their jurisdiction of incorporation.

[25] Jeff Butler, “Analytical Challenges in Modern Tax Administration: A Brief History of Analytics at the IRS,” Ohio State Technology Law Journal, May 15, 2020, https://kb.osu.edu/handle/1811/91830?show=full.

[26] Supervised learning generally focused on a single known outcome, whereas unsupervised learning discovers patterns across datasets that might otherwise be missed. For details, see page 4 of Janet Holtzblatt and Alex Engler, “Machine Learning and Tax Enforcement,” Tax Policy Center, June 22, 2022, https://www.urban.org/sites/default/files/2022-06/Machine%20Learning%20and%20Tax%20Enforcement.pdf.

[27] Holtzblatt and Engler, “Machine Learning and Tax Enforcement.”

[28] Foundations for Evidence-Based Policymaking Act of 2018, Pub. L. 115-435, January 14, 2019, https://www.congress.gov/bill/115th-congress/house-bill/4174.

[29] Executive Office of the President, “Maintaining American Leadership in Artificial Intelligence,” Executive Order 13859, February 14, 2019, https://www.federalregister.gov/documents/2019/02/14/2019-02544/maintaining-american-leadership-in-artificial-intelligence.

[30] William M. (Mac) Thornberry National Defense Authorization Act for Fiscal Year 2021, Pub. L. 116-283, January 1, 2021, Division E — National Artificial Intelligence Initiative Act of 2020, https://www.congress.gov/bill/116th-congress/house-bill/6395/text.

[31] OECD, “Tax Administration 2022: Comparative Information on OECD and other Advanced and Emerging Economies,” June 23, 2022, https://www.oecd.org/ctp/tax-administration-23077727.htm.

[32] Nerissa McNaughton, “We Had a Chat with Charlie, CRA’s New Chatbot,” AF Accounting, October 28, 2021, https://afaccounting.ca/2021/10/28/we-had-a-chat-with-charlie-cras-new-bot/.

[33] Janna Brancolini, “Italy Turns to AI to Find Taxes in Cash-First, Evasive Culture,” Bloomberg Tax, October 31, 2022, https://news.bloombergtax.com/daily-tax-report-international/italy-turns-to-ai-to-find-taxes-in-cash-first-evasive-culture (subscription required).

[34] Melissa Heikkilä, “Dutch Scandal Serves as a Warning for Europe over Risks of Using Algorithms,” Politico, March 29, 2022, https://www.politico.eu/article/dutch-scandal-serves-as-a-warning-for-europe-over-risks-of-using-algorithms/.

[35] Andrew Leahey, “We Can All Learn a Thing or Two from the Dutch Tax Scandal,” Bloomberg Tax, July 12, 2022, https://news.bloombergtax.com/tax-insights-and-commentary/we-can-all-learn-a-thing-or-two-from-the-dutch-ai-tax-scandal (subscription required).

This material may be quoted or reproduced without prior permission, provided appropriate credit is given to the author and Rice University’s Baker Institute for Public Policy. The views expressed herein are those of the individual author(s), and do not necessarily represent the views of Rice University’s Baker Institute for Public Policy.