Bank examiners and intelligence officials fighting terrorism finance are confronted with arduous DOCEX (Document Exploitation) or SD (Source Document) processes that involve finding relevant data among mounds of often disorganized paper documents. Financial analysts and translators with relevant regional expertise and necessary language proficiencies are much desired but in short supply. Arabic language skills are most in demand, followed by Pashto, Urdu, and Persian.
New technologies that increase the ability to process and enhance text documents are badly needed to meet the shortfall in Arabic-speaking analysts. Larry Den, senior vice president of information technology at Vredenburg, said that the lack of available linguists increases the pressure for machine translation for certain types of data — especially because the material cannot be tagged for archiving until it is translated. One commonly used technology for data entry is Optical Character Recognition (OCR), which transforms text images into digitally identifiable texts, thus making them useable and searchable by other computer applications, including giving users the option to translate the text into another language.
Although OCR technology works seamlessly with Latin scripts (such as English, French, or Italian for instance), it is not always reliable when it comes to Arabic. The main obstacle is the complexity of heavily stylistic Arabic script. Each Arabic character could have up to four various shapes in relation to its position in a word (see Fig.1). Since the position and style are less rigid than a language like English, Arabic characters are more difficult to identify. Moreover, the four alternative shapes might include zigzag characters, loop characters, dot characters, and diacritics.
Because of deficiencies in OCR technology, Document Exploitation (Docex) and Source Document (SD) operations in the MENA region remain a labour-intensive process. With bilingual Arabic analysts in short supply, the personnel performing initial document reviews often possess limited to no knowledge of the documents’ original language. Critical information that would be useful for combatting terrorism finance can end up being overlooked when myriad complex documents are manually vetted by analysts unable to adequately assess the content. Even though the software used for machine translation is getting better, it is still nowhere near the capabilities of a fully qualified human linguist.
Until OCR technology is perfected for Arabic tasks, DOCEX / SD will remain a human process. Fara Group has cultivated a competent team of bilingual Arabic financial analysts that can be deployed for projects in the MENA region as well as in other jurisdictions. By employing a systematic approach to vetting both hard copy and digitized documents, relevant data can be identified and recorded in a structured format. E-discovery platforms are also applied. This data can be complimented by intelligence from human sources and derogatory databases as needed, and is then integrated into a coherent forensic report. Fara Group has successfully implemented international DOCEX / SD projects, including in Middle Eastern countries.