Five years ago, as part of Fara Group’s work in Anti-Money Laundering/Countering the Financing of Terrorism (AML/CFT), we audited a bank under liquidation, which had previously been sanctioned by the U.S. Treasury. Our primary mission was to identify toxic accounts and related account holders to prevent a further contamination of the wider financial system, both in the country where the bank was located and internationally. The project was a success as, among other things, our efforts uncovered various networks and transactions, and helped shape government policy actions against financers of a specially designated global terrorist organization.
Given the flawed record keeping and data maintenance practices of the bank, and the general environment in which it operated, the project proved to be an intensive, manual effort that required a dedicated team of linguists, data scientists, forensic accounting specialists, and other personnel based in multiple jurisdictions.
The engagement also involved significant security risks as senior management and analysts had to repeatedly travel to high-risk locations due to delays in acquiring and understanding the necessary data, most of which was kept in boxes of physical folders and/or on local computers.
In short, it was a costly and complex undertaking, burdened with challenges that, thanks to the AI revolution that has since taken place, may appear less daunting today. However, this case also serves as an example to the areas where AI will likely continue to fall short of nuanced human efforts, at least for the near future.
Difficulties with document exploitation
This engagement involved processing large volumes of unstructured data spread across a massive archive of PDF files and physical paper documents, which included, among other things, images of checks, hand filled Know Your Customer (KYC) forms, loan documents, and Suspicious Activity Reports (SARs). All of this had to be thoroughly reviewed by more than a dozen Fara Group examiners and analysts and manually entered into structured datasets.
The challenge was compounded by the fact that many documents were in Arabic and included critical handwritten details. Even though we employed the latest Optical Character Recognition (OCR) software at the time, the technology was insufficient. The software struggled with accuracy, often misinterpreting characters or skipping parts of documents altogether. It was slow, prone to errors, and incapable of managing complex layouts, making digitization tedious and time consuming. Handling non-Latin scripts or mixed-document formats was nearly impossible.
Today, the OCR technology has undergone a significant transformation. Advanced algorithms now deliver faster and more precise recognition, even for challenging layouts. The reliability of OCR has vastly improved, enabling smoother document digitization and data extraction processes. But there’s still work to be done in this field, particularly in bridging the gaps for non-Latin scripts, which remain a critical area for improvement.
Challenges in AML analytics
We soon noticed that a large part of the structured bank transaction data, pulled from a dated IT system, lacked critical codes linking transactions between accounts, making reconciliation a complex task. The inconsistent application of fees across fields further compounded the challenge, while SWIFT datasets from a separate server failed to align with the structured bank transaction data.
Although Excel and various forensic analytical tools we experimented with offered some ability to identify patterns and triage high-volume accounts, they were overly rigid and fell short in efficiency and scalability. AI solutions, leveraging machine learning, predictive analytics, and robust cloud computing, could have automated data matching, flagged anomalies, and processed vast datasets with greater speed, accuracy, and adaptability, significantly streamlining the process.
Innovating transliteration solutions to match data sets
A key task was to determine whether any individuals linked to a designated terrorist organization were present in our client’s database of account holders or among their transaction counterparties. One of the challenges was that an acquired dataset of suspected individuals was in Arabic script that could be transliterated in a wide variety of ways while the bank’s client list was in Latin script. (As one example, the highly common name “Mohammad” has over 20 different spelling variations in Latin script).
We found that the best way to effectively bridge this gap between Arabic and English sets and ensure accurate matching of the data was for the native Arabic speakers in our team to manually create a dictionary for over unique 200,000 Arabic names that could match with the bank’s data with all their possible English transliterations. This painstaking process required thousands of hours of effort and scaling up computational processing through AWS cloud services. We also engaged Python developers to automate significant portions of the process, enhancing efficiency and reducing manual effort.
We succeeded in identifying thousands of matches of account holders with aggregate transaction volumes exceeding $6 billion.
While machine learning AI could assist with parts of this process, the expertise of a native Arabic speaker was critical. They provided an in-depth understanding of name variations and could anticipate how names might be deliberately altered by money launderers to evade detection.
Today, with the right tools and queries, AI applications can accomplish this task much more efficiently. For instance, ChatGPT generates 22 variations of the name “Mohammad” in just 12 seconds, demonstrating the potential to complete an entire dictionary of Arabic names in hours, depending on processing power.
eDiscovery
Using an eDiscovery platform, we mapped relationships, transaction information and other relevant data found in hundreds of thousands of bank emails. Despite the ability to conduct this part of the investigation through a digital platform, the process still required careful manual reviews by our analysts. The team relied on key word searches, a method that may not capture nuanced or coded communications designed to evade detection (further complicating matters, some emails were written in Arabic). Additionally, with thousands of emails turning up for some searches, it was nearly impossible for analysts to review every single finding in a timely manner.
AI has revolutionized eDiscovery in money laundering investigations by moving beyond the limitations of older methodologies. Today, AI leverages natural language (NPL) processing and machine learning to analyze context, uncover patterns, and link suspicious communications to transaction records with remarkable precision. This shift has dramatically improved the speed and accuracy of investigations, enabling financial institutions as well as law enforcement departments to process vast amounts of data in minutes while reducing the risk of missing evidence.
eDiscovery platforms can integrate with structured transaction datasets to revolutionize financial investigations, enabling deeper insights into money laundering activities. By linking email communications with transaction records, these tools allow for advanced data correlation and pattern recognition, uncovering suspicious anomalies like unusual amounts or timing. This connection enhances investigations by tracing financial flows alongside communication trails, exposing hidden networks or schemes.
Our deliverables included various charts illustrating the transaction flow between individuals and entities, including dates and amounts, and in many cases accompanied by hard documentation as supporting evidence.
Why total reliance on AI may still be a bridge too far
AI has reduced the need for critical thought in many areas. It often provides results without revealing the logic behind them. While it mimics human processes, it operates at a much faster speed and on a far greater scale.
However, data is imperfect, and this is where subject matter experts still play a crucial role—they can recognize nuances and inconsistencies that AI may overlook. Data scientists may understand the science behind the numbers, but they may lack a deep understanding of the data itself. Human expertise remains essential to provide the context, judgment, and insights. The AI technology, which relies on imperfect data, works to complement rather than replace human efforts in many complex fields.
In other words, despite many advancements, AI falls short in understanding the finer details of banking dynamics, most notably in non-Western jurisdictions. Like some data scientists who can analyze raw numbers without a deeper context, AI struggles with recognizing cultural, geographic or other nuances that are vital in processes such as KYCs or loan approvals.
Moreover, it cannot (yet!) pick up on nonverbal cues obtained via human-source intelligence, such as during an in-person interview with a bank manager, nor does it have the ability to anticipate local political dynamics or sensitive relationships surrounding a specific institution and its clients. These gaps may lead to false flags, strained client relationships, or incorrect assessments of risk.
On the other hand, as stated above, the limitations of OCR technology that became glaringly apparent during our bank audit assignment five years ago have since significantly improved. Advanced algorithms now deliver faster and more precise recognition, even for challenging layouts, and the reliability of today’s OCR enables smoother document digitization and data extraction processes. Keep in mind, however, that while these advancements mark substantial progress, bridging the gap for non-Latin scripts still needs work.
In our experience, preparing bespoke reports on potential terrorism finance links presented significant unique challenges, primarily due to the inconsistent and incomplete nature of the available information. Problems included technological shortcomings of an old banking system, such as insecure data and character limitations, as well as cases where clients’ due diligence were done verbally by bank personnel rather than through the system. Through the resourceful scrutiny of our analysts who were able to cross-reference information from various data points, including hard copy unstructured forms, human source intelligence gathering and findings on eDiscovery platforms, we discovered significant missing, inconsistent and/or manually altered data in what should have been an automated banking system.
A further complicating matter that remains true today is the fact that, to the extent that they can, bad actors try to avoid traceable and detectable methods of moving their funds (such as direct wire transfers), coming up instead with ever-changing creative solutions. Depending on the specific weaknesses of the jurisdiction in which they operate, they also use unorthodox methods to bring funds into the financial system (such as deliberately overdrawing on accounts and settling in cash, to give just one example).
All of this means that on-the-ground expertise and an understanding of the seemingly obscure relationships between different actors is critically important.
In our case, the complexity and specificity of the analysis required human expertise to interpret and synthesize various types of data. While more modern AI could have served as a valuable tool for certain aspects of this process, such as identifying patterns in processing large datasets, it would have likely lacked the capability to fully generate the tailored outputs that required nuanced judgment, contextual understanding, and adaptability.
Together, AI and human insight could enhance efficiency. It seems likely, however, that human-led efforts will continue to remain at the core of generating detailed, context-sensitive outputs for specific cases like this bank audit, at least for the near future.
Conclusion
Fara Group has integrated AI into its workflow and views it as a powerful, complimentary tool rather than a one-size-fits-all solution. AI acts as a force multiplier, enabling tasks that once took thousands of hours to be completed in a fraction of the time.
We think of AI as a brilliant rookie joining our analytical team. It can surprise with its speed and ability to uncover valuable insights. However, it can also fall short, struggle to grasp context, overlook obvious themes, and misunderstand subtleties.
At the core of the analytical process remains the senior subject matter expert, whose experience brings a critical understanding and ability to pass judgment throughout and evaluate results. At least for the time being, they are better than AI at recognizing the provenance of data and its limitations, fill intelligence gaps, and incorporate information from various sources including human source intelligence insights, in order to create rich, meaningful analyses on what is truly important for our clients.