Overview

This problem is a rehash of the reeses problem from DEF CON 21 CTF Finals. In this case the problem is compiled for x64 instead of ARM. We were provided with a binary and 4 sample files.

Let me start by saying that I only had a small part in solving this problem during the competition and my teammates did a huge majority of the work. I vaguely remember this problem from DEF CON 21 CTF, and back then I think that I may have identified it was an MIPS emulator, but we were never able to get execution (MIPS nor ARM).

The problem starts by reading a 4 byte length and then a buffer of that size, as long as it is less than 100,000. This buffer is then processed as a custom executable file format which includes an RSA signature. Not having the private key means that the only binaries that can be loaded are the 4 sample programs included with the challenge.

File format

The file format is rather simple with a structure similar to an elf header with a count and offsets of the number of sections and number of register initializations.

struct file_header {
   uint32_t unk1;
   uint32_t unk2;
   uint32_t unk3;
   uint32_t section_info_offset;
   uint32_t reg_init_info_offset;
   uint16_t num_sections;
   uint16_t num_reg_init;
   uint32_t sig_offset;
   uint32_t entry;
}

The header had the following validations:

sig_offset had to be file_size - 256.
a max of 100 sections
a max of 33 reg inits
each section info was 20 bytes
each reg init was 8 bytes

The sections headers were fairly simple:

struct section_info {
   uint32_t load_addr;
   uint32_t unk;
   uint32_t length;
   uint32_t offset;
   uint32_t unk1;
}

And the register initialization was even simpler:

struct reg_init_info {
   uint8_t reg_num;
   uint8_t pad[3]
   uint32_t value;
}

Even though the file header has an entry, the emulator actually used register '32' as the entry (this doesn't matter because the samples had both values equal). One of my teammates took the time to write a loader script for IDA to be able to load programs with this file format.

MIPS

The emulator handled a few system calls:

Read
Write
Malloc
Time
Rand
Exit

Read and write were restricted to stdin and stdout respectively; and due to the lack of open this meant that not only would we need to get MIPS code execution, we would need to find a vulnerability in the emulator to get x64 code execution.

The problem included 4 sample programs. Each was signed appropriately to be loaded into the emulator. My teammates started by looking at sample1, which was lucky because sample1 is the one with the vulnerability that enables MIPS code execution. I quickly identified what the other 3 samples were doing while writing this (for completeness)

sample1 - compression/decompression
sample2 - echo service
sample3 - encrypt/decrypt service
sample4 - sha256 service

Sample1

Sample1 reads a single byte for the operation type and then a 4 byte length. The length must be less than 0x4001. After reversing it for a while sample1 was identified as performing compression or decompression based on the operation (0 or 1 respectively).
The compression algorithm was identified as lzss, however it did not use the standard parameters - EI was 13 and EJ was 5. My teammates quickly modified a version of lzss.c they were able to find to have the adjusted parameters.

I'm not sure if testing or reverse engineering lead to the realization that the uncompress operation does not properly bounds check. My teammates were able to identify that the uncompress buffer would eventual run into the current stack area enabling an overwrite of the saved ra value, so that when the uncompress function returns we would have control of the pc.

It turned out that the offset into the uncompressed buffer for this saved ra was 0x25edc. As long as the payload would compress to less than 16k, we were able to control MIPS execution. They were also able to identify that the buffer is uncompressed to 0x40a0b8, and due to the lack of ASLR and NX, we could include our MIPS shellcode at the start of the buffer and return to it.

X64 control

So now that we had control of the MIPS execution environment, the next step was to get control in x64 context.

One of my teammates identified that the the Halt and Spontaneously Combust instruction had a s64 simple stack overflow when executed in the branch delay slot. It would copy the registers onto the stack overflowing the return address. The exact opcode for the instruction had to be 0xcfe00000 in order for the overflow to occur. In addition in order for the exception handler to not exit, register $20 had to be 100 so that when copied the parameter passed would cause the handler to process it as an unknown exception.

Where to go?

Now that we can control rip, the question lies as to where to point it. reeses_revenge was compiled with full PIE and NX, and ASLR is enabled, meaning there are no known fixed addresses and a memory leak or other known address is needed.
This is the part where I started helping.. and sadly it took me far to long to finish. I think I looked at the implementation of every MIPS instruction in the code at least 3 times. My teammates also identified a bug in the system call instruction in that it was possible to use a negative index into the system call table - system calls were 4000 through 4015, but the check for range was val-4000 < 15 and it was a signed check. So system calls less that 4000 would use other memory locations for the information. I tried looking at all possible combinations of using that to leak information, but due to differences in how the system calls were done compared to instructions this proved useless.
I then investigated the coprocessor instructions. After reversing the two supported instructions I was able to determine that they implemented rc4, with one being the set key operation and the other being encrypt/decrypt. It turned out that the encrypt/decrypt could be called without calling the set key one, and it would use unitialized memory as the keystate, but the values leaked were not very helpful as it wasn't the full value of any addresses as the rc4 key state is only byte values.

Eventually I looked at the malloc system call again and realized that not only did it allocate an RWX page, it also used the internal random function to generate the address. So the next step was to figure out if we could predict the value returned by random. This is eventually what lead us to success.

allocate_page() { //0x4b60
  ...
  v8 = do_rand();
  v9 = mmap((void *)(((unsigned __int64)v8 << 12) | 0x400000000000LL), 0x1000uLL, 7, 50, -1, 0LL);
  ...
}

In order to predict the value for rand we need send some rand values back from our MIPS code:

.set noreorder
main:
# assemble with mips-linux-gnu-as -O0 --EL -o mips_sc.elf mips_sc.asm ; mips-linux-gnu-objcopy -O binary -j .text mips_sc.elf mips_sc.bin; mips-linux-gnu-objdump -d mips_sc.elf;

#ra contains the address of main
	add $s7, $ra,0

leak_data:
	add $s1, $s7, 0x1000
#time systemcall
	li $v0, 4013
	syscall
	sw  $v0, 0($s1)
	add $s1, 4


#leak 0x10 RAND values
	li $s2, 0x10
rand_again:
	li $v0, 4005
	syscall
	sw  $v0, 0($s1)
	add $s1, 4
	sub $s2, 1
	bne $s2, 0, rand_again
	nop

#WRITE leak
	li $v0, 4004
	li $a0, 1
	add $a1, $s7, 0x1000
	li $a2, 68
	syscall

Once we had the rand values we needed to both determine the rand algorithm and the seed value used. Reversing the algorithm was pretty easy, and I implemented it in python. The seed calculation was pretty simple to find as well.

main() {
  ...
  v4 = time(0LL);
  srand(((unsigned __int64)&printf >> 12) ^ v4); //0x54c2
  ...

One thing to note is that even though the address is treated as a 64 bit value, srand only uses 32 bits for the seed. At first I missed the shift right operation and thought that because the bottom 12 bits are not randomized and that because we can get the time it will be pretty simple to brute force the entire randomized space (20 bits). However when trying this I realized there was the shift 12... oops.

When looking at ALSR addressed of printf with the shift 12, it's usually 0xfxxxxxx or 0xexxxxxx with a high probability (in my testing) of 0xfxxxxxxx. And then when looking at the current unix time, it's around 0x59xxxxx, so combining these two things the seed is likely to be between 0xa0000000 and 0xafffffff.

Instead of just searching the space, which we would not have enough time to due to alarm(), I used random.randint to just try various seeds, and then would compare the first four values against what was leaked. . If they matched, the 17th random value would be calculated, and the same operations in malloc_page would be done (val <<12 | 0x400000000000) and then this value as well as the length of our x64 shellcode would be sent to the MIPS shellcode.

The MIPS shellcode would read those values, allocate a page of memory, and read the x64 shellcode into that newly allocated page, which thanks to the rand calculations we know the address of. The MIPS shellcode would then setup the x64 stack overwrite to return to that known address and perform the overwrite which would then give our x64 shellcode execution. Simple /bin/sh shellcode was used.

Because the brute force was a random try, I just ran the script in a loop until it found a successful match.

Full python script

Full mips shellcode

DEF CON 25 CTF Quals - reeses revenge

Kenny K