Trojan Source Code Vulnerabilities: No More with ECLAIR
A very recent paper presents a new type of attack: source code can be maliciously encoded so that compared to a compiler human readers will interpret it differently.
In a hurry? Watch this video and come back later
Unicode slogan is “Everyone in the world should be able to use their own language on phones and computers.” Which is fine and good, but the use of Unicode in computer programs comes with scary consequences. The Cyrillic letter En has the same form as the Latin letter H and the two are often displayed in a way that makes them indistinguishable to the human eye: but for a compiler they are different. Invisible characters can also be embedded in identifier names, providing even more ways to defeat peer review.
But there is more: Unicode provides direction-changing special characters LRI and RLI to support left-to-right languages, such as English and Russian, as well as right-to-left languages, such as Hebrew and Arabic. An algorithm called BiDi (Bidirectional Algorithm) uses these and other special characters to establish what text is actually displayed. As a result, the inclusion of such Unicode control characters can make human readers expect code as they see it to be passed to the compiler, while in fact something quite different is compiled. For example, Unicode characters can be used to move code that appears to be outside a comment to be included in the comment; similarly for string literals and identifiers. Attackers can easily craft such comments, string literals and identifiers in order to achieve their goals.
Are you thinking that attackers do not have access to your source code? Well, not directly, but your programmers may copy-and-paste stuff from random web sources. “Hmmm... how do I debounce a switch? Let me check with Google... here! Copy, paste, done.” For the consequences of this increasing attitude on security see, e.g.:
“You get where you’re looking for: The impact of information sources,”
— Y. Acar, M. Backes, S. Fahl, D. Kim, M. L. Mazurek, and C. Stransky, in 2016 IEEE Symposium on Security and Privacy, 2016,pp. 289–305.
“Stack overflow considered harmful? The impact of copy&paste on Android application security,”
— F. Fischer, K. Böttinger, H. Xiao, C. Stransky, Y. Acar, M. Backes, and S. Fahl, in 2017 IEEE Symposium on Security and Privacy, 2017, pp. 121–136.
So, how can you protect yourself from Trojan source? Some believe that it is not the business of the compiler, let alone the language standard, to forbid the abuse of Unicode. In any case, it will be some time before compilers and IDEs include defensive code against such attacks and, as usual, niche compilers and IDEs will be left behind.
ECLAIR 3.12.0, the new, forthcoming release of our static analysis platform, sports a new service, called B.TROJANSOURCE, which is included in all ECLAIR packages, including B, our entry level. This provides complete protection against Trojan source that is suitable for the development of safety- and/or security critical applications. B.TROJANSOURCE flags all BiDi control characters and homoglyphs that occur in comments, string literals, and identifiers. (In contrast, clang-tidy only flags BiDi control characters and only in comments and string literals.) The only homoglyphs tolerated by B.TROJANSOURCE are those in ASCII code, such as, e.g., 1 (one) vs l (ell) and 0 (zero) vs O (capital letter ‘O’): for identifiers, these are caught by MISRA C/C++ guidelines.
This short video on BUGSENG YouTube channel shows how popular IDEs and editors are vulnerable to Trojan source attacks and how ECLAIR 3.12.0 protects you from those. We also recommend this paper by the Institute for Defense Analyses for an in-depth discussion of this threat.
If you would like to book an exclusive demo of the tool with our ECLAIR Experts drop us an email and let us know your availability. In these calls we go through your main requirements and show you the best ECLAIR solution for your specific use case.
Join our LinkedIn community to keep up to date with all our news.