I’m a few years late to the party here, but I’ve recently managed to finish Microcorruption. I had finished most of the challenges back in 2014, when they were released, but not having IDA back then, I was stumped when the first challenge that relocated showed up, because it broke the online disassembler, and I was helpless without it :).
I recently took another look, having picked up some skill and knowledge in the intervening years. Hollywood, the last challenge, was a very interesting challenge, and I found that most writeups I found online glossed over the difficulty of getting something coherent out of it, so I figured I would try to add a writeup that goes into more detail.
The challenge is a simple prompt for the password, and the CPU just halts on improper input. Looking at the disassembly output was useless, since the web disassembler just disassembles linearly, and they used jumps to jump over a few, uesless bytes, but these bytes were disassembled as six-byte opcodes, hiding the actual instruction.
Linear web disassembly:
Using a script I found online, I was able to convert the memory dump into a binary blob, which I could then import into IDA, selecting Texas Instrument MSP430 as the CPU type. Following through this code you see that the code relocates and seems to unpack. No problem, I just ran it in the web debugger until it prompted me for input, then I copied that memory dump, created a new binary, and analyzed that in IDA.
There were two problems with this approach. First, it used the
rand syscall, so where the code was unpacked to was non-deterministic, and would change between runs. This meant that I couldn’t follow along in both IDA and the web disassembler. The second problem was that the program seems to relocate again. Stepping through until it jumps to the next spot, I figured I could just generate another dump when the program was fully unpacked. I generated a new binary, stepped through that, generated another, stepped through that… I did this twelve times before I figured out that there was probably no end in sight. Time to bust out the tooling.
I found a MSP430 emulator, specifically one that implements all the idiosyncrasies and syscalls of the Microcorruption MSP430. I could run the original dumped rom through this and generate a trace, then I could use another python script to disassemble the trace. Problem: the trace was over half a million instructions, and since it’s just a disassembly listing, I had to look at it linearly.
Figuring that this was a multi-stage unpacker, I had another idea: I would modify the emulator to write every executed instruction to a binary file, at the address of its PC. The goal being to end up with a rom that has each stage of the unpacker present. Since I was not committing writes to the file, but instead just what was executed, I would not wipe out old stages. A simple change to the emulator to create a sparse file, then for each instruction executed, seek to the PC, and write the opcode bytes to the file accomplished this goal.
Now with this full ROM I could get to reverse engineering the code that matters. Popping the binary into IDA showed an unpacking stage at 0x4400 (The entry point), 0x5000, and 0x8000. These nice addresses led me to figure out that my emulator always returned zero from the
rand syscall, definitely convenient. When running on the online debugger the stages were scattered somewhere on each of these pages.
Reversing the binary was mostly straight forward in IDA, since it isn’t a linear sweep disassembler it wasn’t tricked by jumps into instructions. There were many basic blocks that apparently had no path to them, but that was because of another trick: Instead of calling a function, they would push PC, branch to the function, then pop the PC, increment it, and jump back to it, effectively returning to where it had started. IDA didn’t recognize this, but it just meant that the control flow returned to the next basic block.
Reversing these three ‘unpackers’ showed about the same thing: It would copy some bytes to a random location, decrypt them, and then ultimately jump to them, then finally return and wipe out the instructions - Lucky for me I could see these ephemeral instructions because, remember, I only write executed bytes to the rom file, not written bytes. Whenever a stage of the unpacker was done, it would jump to
eXXX, where XXX is
rand & 0xFFE. Because the emulator always returned 0 from the
rand syscall, this was always exactly
0xe000. That meant that the few instructions at 0xe000 weere just the last few that executed, and I was missing the actual code.
Taking a step back, I realized that everything I had reversed was only the unpacked. There was no hint of the prompt or the password validation. Staring at this for some time it hit me: This isn’t an unpacker, it’s more of a VM! A few instructions are unpacked, jumped to, wiped out, and then this is repeated. Everything that mattered was executed at 0xeXXX.
Back to the emulator, I made another modification that prepended every opcode trace with the address it executes at, and then I modified the disassembler to account for this and print the line with the PC preceding it. I then filtered the output to only keep any address that started with 0xeXXX.
Boom! Down from half a million instructions to a few hundred! I can reverse this.
That’s more fucking like it. The
br R13 is just the return to the unpacker, and the
mov #0xXXXX, R12 was just another address used for unpacking. Filter them out!
What we are seeing here is the prompt “What’s the password?” being printed, one character at a time. After this is the validation:
Using this filtered output, I got the original bytes and converted them back to a binary that was only 732 bytes and running this in the emulator worked! That confirmed that I filtered it down to exactly what I needed. Looking at this tiny assembly it’s easy to see what’s happening: Two bytes of the input are checked at a time, and they are mixed into R4, R6, and R8. After the loop runs eight times, the value of R4 and R6 is checked, and then the status register is copied to R7, rotated around, and written back to SR. This is why the CPU was halting: The SR was set to some bogus value. We want 0x7F00 in R7 on the last instruction. The comparison to 0xfeb1 sets the status bits, and to preserve them they have to survive the AND with the status register after the comparison to 0x9298. This means that we need R4 and R6 to equal these values after the loop, meaning everything to do with R7 and R8 in the loop is trash!
That’s very reasonable. I was able to convert this to python easily:
I tried bruteforcing this, but with a 16 byte input this was not happening. Time for z3!
Then simply converting the solution from 8 16-bit integers to a byte string, and submitting that, I was treated to that delicious win.