Obfuscation for Fun and Profit

One of the fun things to do with computer languages is abuse them. Confusing human readers of code can be pretty easy, but it takes a specially crafted program to be thoroughly incomprehensible to readers of the source code yet still be legal within the syntax of whatever language the program is written in.

Not dissimilar from building a well-obfuscated program is using esoteric languages and building quines. All of these things can be mind-bending but also provide excellent learning resources for some dark corners of language specification, as well as the occasional clever optimization.

Obfuscation

It’s not uncommon for malware source code to be pretty heavily obfuscated, but that’s nothing compared to properly obfuscated code. What follows is some publically-released Linux exploit code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
ver = wtfyourunhere_heee(krelease, kversion);
if(ver < 0)
    __yyy_tegdtfsrer("!!!  Un4bl3 t0 g3t r3l3as3 wh4t th3 fuq!\n");
__gggdfstsgdt_dddex("$$$ K3rn3l r3l3as3: %s\n", krelease);
if(argc != 1) {
   while( (ret = getopt(argc, argv, "siflc:k:o:")) > 0) {
      switch(ret) {
          case 'i':
              flags |= KERN_DIS_GGDHHDYQEEWR4432PPOI_LSM|KERN_DIS_DGDGHHYTTFSR34353_FOPS;
              useidt=1; // u have to use -i to force IDT Vector
              break;
          case 'f':
              flags |= KERN_DIS_GGDHHDYQEEWR4432PPOI_LSM|KERN_DIS_GGDYYTDFFACVFD_IDT;
              break;

It reads like gibberish, but examination of the numerous #define statements at beginning of that file and some find/replace action make quick work to deobfuscate the source. Beyond that, the sheer pointlessness of ‘1337 5p33k’ in status messages makes my respect for the author plummet, no matter how skilled they may be at creating exploits.

Let’s now consider an entry to the International Obfuscated C Code Contest (IOCCC) from 1986, submitted by Jim Hague:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#define    DIT (
#define DAH )
#define __DAH   ++
#define DITDAH  *
#define DAHDIT  for
#define DIT_DAH malloc
#define DAH_DIT gets
#define _DAHDIT char
_DAHDIT _DAH_[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:"
;main           DIT         DAH{_DAHDIT
DITDAH          _DIT,DITDAH     DAH_,DITDAH DIT_,
DITDAH          _DIT_,DITDAH        DIT_DAH DIT
DAH,DITDAH      DAH_DIT DIT     DAH;DAHDIT
DIT _DIT=DIT_DAH    DIT 81          DAH,DIT_=_DIT
__DAH;_DIT==DAH_DIT DIT _DIT        DAH;__DIT
DIT'\n'DAH DAH      DAHDIT DIT      DAH_=_DIT;DITDAH
DAH_;__DIT      DIT         DITDAH
_DIT_?_DAH DIT      DITDAH          DIT_ DAH:'?'DAH,__DIT
DIT' 'DAH,DAH_ __DAH    DAH DAHDIT      DIT
DITDAH          DIT_=2,_DIT_=_DAH_; DITDAH _DIT_&&DIT
DITDAH _DIT_!=DIT   DITDAH DAH_>='a'?   DITDAH
DAH_&223:DITDAH     DAH_ DAH DAH;       DIT
DITDAH          DIT_ DAH __DAH,_DIT_    __DAH DAH
DITDAH DIT_+=       DIT DITDAH _DIT_>='a'?  DITDAH _DIT_-'a':0
DAH;}_DAH DIT DIT_  DAH{            __DIT DIT
DIT_>3?_DAH     DIT          DIT_>>1 DAH:'\0'DAH;return
DIT_&1?'-':'.';}__DIT DIT           DIT_ DAH _DAHDIT
DIT_;{DIT void DAH write DIT            1,&DIT_,1 DAH;}

What does it do? I couldn’t say without spending a while examining the code. Between clever abuse of the C preprocessor to redefine important language constructs and use of only a few language elements, it’s very difficult to decipher that program. According to the author’s comments, it seems to convert ASCII text on standard input to Morse code.

Aside from (ab)using the preprocessor extensively, IOCCC entries frequently use heavily optimized algorithms which do clever manipulation of data in only a few statements. For a good waste of time, I suggest browsing the list of IOCCC winners. At the least, C experts can work through some pretty good brain teasers, and C learners might pick up some interesting tricks or learn something new while puzzling through the code.

So what? Obfuscating code intentionally is fun and makes for an interesting exercise.

Quines

Another interesting sort of program is a quine- a program that prints its own source code when run. Wikipedia has plenty of information on quines as well as a good breakdown on how to create one. My point in discussing quines, however, is simply to point out a fun abuse of the quine ‘rules’, as it were. Consider the following:

1
#!/bin/cat

On a UNIX or UNIX-like system, that single line is a quine, because it’s abusing the shebang. The shebang (’#!’), when used in a plain-text file, indicates to the kernel when loading a file with intent to run it that the file is not itself executable, but should be interpreted.

The system then invokes the program given on the shebang line (in this case /bin/cat) and gives the name of the original file as an argument. Effectively, this makes the system do the following, assuming that line is in the file quine.sh:

1
$ /bin/cat quine.sh

As most UNIX users will know, cat takes all inputs and writes them back to output, and is useful for combining multiple files (invocation like cat file1 file2 > both) or just viewing the contents of a file as plain text on the terminal. Final result: cat prints the contents of quine.sh.

Is that an abuse of the quine rules? Possibly. Good for learning more about system internals? Most definitely.

Esoteric Languages

Finally in our consideration of mind-bending ways to (ab)use computer languages, we come to the general topic of esoteric languages. Put concisely, an esoteric language is one intended to be difficult to use or just be unusual in some way. Probably the most well-known one is brainfuck, which is.. aptly named, being Turing-complete but also nearly impossible to create anything useful with.

The Esoteric language site has a variety of such languages listed, few of which are of much use. However, the mostly arbitrary limitations imposed on programmers in such languages can make for very good logic puzzles and often require use of rarely-seen tricks to get anything useful done.

One of my personal favorites is Petrovich. More of a command interpreter than programming language, Petrovich does whatever it wants and must be trained to do the desired operations.