CA CTF 2022: Breaking decompilers for fun and profit - Shuffleme

Write-up covering the solution for the Hard Reversing challenge "Shuffleme" from Cyber Apocalypse CTF 2022.

clubby789,
Jul 22
2022

Description
Solution
Initial analysis
Shuffling and Unshuffling
Unshuffled analysis
- Blackhole

Description

Intelligence indicates that the ancient data storage device you've obtained contains schematics for a never-before-seen weapon. But there's a problem - it's locked, and strange symbiotic life forms on its surface are constantly removing parts and reinserting them elsewhere. Can you get a clear picture of what's going on?

Solution

This challenge first employed an obfuscation technique that shuffled the names of functions, then made use of opaque predicates to extract cryptography keys.

For this writeup, I will be using Binary Ninja, but the process should be similar in any decompiler such as Ghidra or IDA Pro.

Initial analysis

If we disassemble the program, we see this:

00001425  int32_t main(int32_t argc, char** argv, char** envp)
00001438      if (argc != 2)
0000143f          exit(status: 0xffffffff)
0000143f          noreturn
00001456      if (getenv(name: "LD_BIND_NOT") == 0)
00001471          setenv(name: "LD_BIND_NOT", value: &data_2366, replace: 1)
00001487          execv(path: *argv, argv: argv)
0000149a      go(argv[1])
000014a5      return 0

It requires both 2 command-line arguments, and the environment variable LD_BIND_NOT to be set. If it isn’t, it re-executes itself after setting this variable. man ld.so reveals this about the environment var:

If this environment variable is set to a nonempty string, do not update the GOT (global offset table) and PLT (procedure linkage table) after resolving a function symbol.

Typically, programs on systems using the GNU C library will, when they first call a function, jump into ld.so in order to resolve the real address of the function in the library using its name. As this is an expensive process, the address is saved into the GOT for the next time the function is called. This environment variable disables this caching, forcing it to look up every time.

We can deduce that this program is therefore doing something unusual with the dynamic symbol resolution process, which we should keep in mind going forward.

000014a6  int64_t go(char* arg1)
000014b8      void* fsbase
000014b8      int64_t rax = *(fsbase + 0x28)
000014dd      void var_88
000014dd      extract_blob(&key_blob, 0x20, &var_88)
000014f8      void var_68
000014f8      extract_blob(&data_blob, 0x50, &var_68)
0000152e      char const* const rax_4
0000152e      if (blackhole(&var_88, arg1, open(file: arg1), &var_68) == 0)
00001539          rax_4 = "No!"
00001530      else
00001530          rax_4 = "Yes!"
00001543      EVP_EncryptInit_ex(rax_4)
0000154d      int64_t rax_6 = rax - *(fsbase + 0x28)
00001556      if (rax == *(fsbase + 0x28))
0000155e          return rax_6
00001558      __stack_chk_fail()
00001558      noreturn

EVP_EncryptInit_ex is a function from the openssl library used for initialising a new crypto context. It is called with no arguments - here, it has a single pointer argument passed to it, which is definitely suspicious. We see similar unusual patterns going forward.

0000168b  int64_t blackhole(int64_t arg1, int64_t arg2, int64_t arg3, int64_t arg4)
00001697      int64_t var_68 = arg2
000016a3      void* fsbase
000016a3      int64_t rax = *(fsbase + 0x28)
000016b2      int64_t var_28 = 0
000016ba      int64_t var_20 = 0
000016c2      int64_t rax_2 = EVP_CIPHER_CTX_new()
000016d7      int32_t var_4c = EVP_CIPHER_CTX_free(rax_2)
000016e6      int32_t var_48 = EVP_aes_256_cbc(rax_2)
000016e9      uint64_t rax_7 = pthread_create()
00001720      int64_t rax_11 = calloc(n: arg3 + 0x20, elem_size: pthread_join(rax_7, rax_2, 0, arg1, &var_28))
00001745      memfrob(s: rax_7)
0000176c      malloc(bytes: rax_7)
0000177e      strlen(rax_7)
0000179e      int32_t var_50
0000179e      int64_t rax_22
0000179e      rax_22.b = EVP_EncryptFinal_ex(arg4, rax_11, sx.q(var_50 + var_50), rax_11).d == 0
000017a5      *(fsbase + 0x28)
000017ae      if (rax == *(fsbase + 0x28))
000017b6          return rax_22
000017b0      __stack_chk_fail()
000017b0      noreturn

Running the program it appears to function normally, so all is not as it seems.

If we trace through the symbol resolution process, we can see that completely different functions are being called. At this point, players could make use of ltrace (a utility for tracing calls into libc) to determine the real functions called. However, I will explain how to statically determine the truth.

Shuffling and Unshuffling

Some tracing of the process will reveal that the .rela.plt table has been tampered with. This table is used to map imported symbols to the address of the GOT entry. The dynamic loader will get the symbol at the given offset, resolve it by name, then store it back into the GOT. It turns out that most disassemblers and decompilers make use of this table to discover import names - the binary has had its GOT addresses in the relocations shuffled, which means that decompilers incorrectly identify the function being called. As address caching was disabled at the start, this incorrect GOT usage doesn’t prevent the program from running properly.

Luckily, the actual symbol offsets and their names haven’t been tampered with as they are required for proper execution. We can therefore recover the real function names, and rename them in our decompiler.

#!/usr/bin/env python3

from pwn import *
from elftools.elf.elffile import ELFFile
import sys
context.arch = "amd64"

e = ELF(sys.argv[1], checksec=False)

stream = open(sys.argv[1], 'rb')
file = ELFFile(stream)
dynsym = file.get_section_by_name(".dynsym")
dynstr = file.get_section_by_name(".dynstr")
dynamic = file.get_section_by_name('.dynamic')
jmprels = dynamic.get_relocation_tables()['JMPREL']

for name, addr in e.plt.items():
    stream.seek(addr + 7)
    off = stream.read(1)[0]
    reloc = jmprels.get_relocation(off)
    # This gives us the actual symbol this PLT entry corresponds to when looked up
    sym = dynsym.get_symbol(reloc['r_info_sym'])
    print(f"The PLT entry at {addr:#x} is {sym.name}")

We now can examine the program properly.

Unshuffled analysis

0000135a  int64_t go(char* arg1)

0000136c      void* fsbase
0000136c      int64_t rax = *(fsbase + 0x28)
00001391      void var_88
00001391      extract_blob(&key_blob, 0x20, &var_88)
000013ac      void var_68
000013ac      extract_blob(&data_blob, 0x50, &var_68)
000013e2      char const* const rax_4
000013e2      if (blackhole(&var_88, arg1, strlen(s: arg1), &var_68) == 0)
000013ed          rax_4 = "No!"
000013e4      else
000013e4          rax_4 = "Yes!"
000013f7      puts(s: rax_4)
00001401      int64_t rax_6 = rax - *(fsbase + 0x28)
0000140a      if (rax == *(fsbase + 0x28))
00001412          return rax_6
0000140c      __stack_chk_fail()
0000140c      noreturn

Our command-line argument is passed into blackhole along with its length and the results of extract_blob.

00001413  int64_t extract_blob(uint8_t* blob, int64_t bufsize, uint8_t* buffer)
00001427      void* fsbase
00001427      int64_t rax = *(fsbase + 0x28)
00001436      int32_t i = 0
00001522      while (bufsize u> sx.q(i))
00001442          pthread_t threads[0x4]
00001442          threads[0] = 0
0000144a          threads[1] = 0
00001452          threads[2] = 0
0000145a          threads[3] = 0
0000149c          for (int32_t j = 0; j s<= 3; j = j + 1)
0000148f              pthread_create(thread: &threads[sx.q(j)], attr: nullptr, start: get_byte, arg: nullptr)
0000149e          bool next_loop = false
0000150d          for (int32_t j = 0; j s<= 3; j = j + 1)
000014bf              int64_t ret
000014bf              pthread_join(thread: threads[sx.q(j)], ret: &ret)
000014c4              char rax_9 = ret.b
000014ec              if (rax_9 == blob[sx.q(i << 2)] && (next_loop ^ 1) != 0)
000014f8                  buffer[sx.q(i)] = rax_9
00001501                  next_loop = true
00001513          if (next_loop != 0)
00001515              i = i + 1
0000152d      int64_t rax_23 = rax - *(fsbase + 0x28)
00001536      if (rax == *(fsbase + 0x28))
0000153e          return rax_23
00001538      __stack_chk_fail()
00001538      noreturn

The program sets up an array of 4 threads, each running the function get_byte. If the return value of any of them equal blob[i*4], then the value is placed into the out-buffer and i is incremented.

0000166b  uint64_t get_byte()

00001673      void* fsbase
00001673      int64_t rax = *(fsbase + 0x28)
00001696      int32_t rax_2 = open(file: "/dev/urandom", mode: 0)
000016af      char byte
000016af      read(fd: rax_2, buf: &byte, size: 1)
000016b9      close(fd: rax_2)
000016be      uint64_t rax_5 = zx.q(byte)
000016c6      *(fsbase + 0x28)
000016cf      if (rax == *(fsbase + 0x28))
000016d7          return rax_5
000016d1      __stack_chk_fail()
000016d1      noreturn

get_byte simply returns a random byte using /dev/urandom.

This is known as an opaque predicate - it is an action which will always produce exactly the same output despite appearing dynamic. The term ‘opaque’ comes from the fact that the decompiler (and possibly humans) cannot ‘see through’ the logic to find the output.

Blackhole

0000153f  int64_t blackhole(uint8_t* key, char* arg, uint64_t arglen, uint8_t* databuf)

00001557      void* fsbase
00001557      int64_t rax = *(fsbase + 0x28)
00001566      uint8_t iv[0x10]
00001566      iv[0].q = 0
0000156e      iv[8].q = 0
00001576      struct EVP_CIPHER* cipher = EVP_aes_256_cbc()
0000158b      int32_t var_4c = EVP_CIPHER_key_length(e: cipher)
0000159a      int32_t var_48 = EVP_CIPHER_iv_length(e: cipher)
0000159d      struct EVP_CIPHER_CTX* ctx = EVP_CIPHER_CTX_new()
000015c4      EVP_EncryptInit_ex(ctx: ctx, type: cipher, impl: nullptr, key: key, iv: &iv)
000015d4      uint8_t* buffer = malloc(arglen + 0x20)
000015f9      int32_t outlen
000015f9      EVP_EncryptUpdate(ctx: ctx, out: buffer, outlen: &outlen, in: arg, inlen: arglen.d)
000015fe      int32_t total_len = outlen
00001620      EVP_EncryptFinal_ex(ctx: ctx, out: &buffer[sx.q(outlen)], outlen: &outlen)
00001628      int32_t final_len = total_len + outlen
00001632      EVP_CIPHER_CTX_free(ctx)
00001652      int64_t rax_18
00001652      rax_18.b = memcmp(databuf, buffer, sx.q(final_len), buffer).d == 0
00001659      *(fsbase + 0x28)
00001662      if (rax == *(fsbase + 0x28))
0000166a          return rax_18
00001664      __stack_chk_fail()
00001664      noreturn

Here openssl is used to set up an AES-256-CBC encryption context, and encrypt our argument with the given key and a blank IV. The result is then compared to the static databuf. Let's go the other way, and decrypt the data buffer using a script.

from pwn import *
from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
import sys

e = ELF(sys.argv[1])
iv = bytes([0 for _ in range(16)])
key = []
keybuf = e.read(e.sym['key_blob'], 32*4)
# Read every 4th byte
for i in range(0, 32*4, 4):
    key.append(keybuf[i])
data = []
databuf = e.read(e.sym['data_blob'], 0x50*4)
for i in range(0, len(databuf), 4):
    data.append(databuf[i])
dec = AES.new(bytes(key), AES.MODE_CBC, iv)

print(unpad(dec.decrypt(bytes(data)), 16).decode())

This allows us to decrypt the expected result of encryption, thereby extracting the expected input value, which turns out to be the flag: `HTB{sw4pp1ng_h3r3_4nd_sw1tch1ng_th3r3-1t5_m0r3_th4n_1_c4n_b34r!}`

PLAY NOW!