I’m a few years late to the party here, but I’ve recently managed to finish Microcorruption. I had finished most of the challenges back in 2014, when they were released, but not having IDA back then, I was stumped when the first challenge that relocated showed up, because it broke the online disassembler, and I was helpless without it :).
I recently took another look, having picked up some skill and knowledge in the intervening years. Hollywood, the last challenge, was a very interesting challenge, and I found that most writeups I found online glossed over the difficulty of getting something coherent out of it, so I figured I would try to add a writeup that goes into more detail.
The challenge is a simple prompt for the password, and the CPU just halts on improper input. Looking at the disassembly output was useless, since the web disassembler just disassembles linearly, and they used jumps to jump over a few, uesless bytes, but these bytes were disassembled as six-byte opcodes, hiding the actual instruction.
Linear web disassembly:
4400: 013c jmp #0x4404 <main+0x4>
4402: d1a1 3140 0044 dadd.b 0x4031(sp), 0x4400(sp)
4408: 013c jmp #0x440c <main+0xc>
440a: d1a1 1542 5c01 dadd.b 0x4215(sp), 0x15c(sp)
IDA’s disassembly:
ROM:00004400 ; ---------------------------------------------------------------------------
ROM:00004400 jmp loc_4404
ROM:00004400 ; ---------------------------------------------------------------------------
ROM:00004402 .byte 0D1h
ROM:00004403 .byte 0A1h
ROM:00004404 ; ---------------------------------------------------------------------------
ROM:00004404
ROM:00004404 loc_4404: ; CODE XREF: ROM:00004400↑j
ROM:00004404 mov.w #4400h, SP
ROM:00004408 jmp loc_440C
ROM:00004408 ; ---------------------------------------------------------------------------
ROM:0000440A .byte 0D1h
ROM:0000440B .byte 0A1h
ROM:0000440C ; ---------------------------------------------------------------------------
Using a script I found online, I was able to convert the memory dump into a binary blob, which I could then import into IDA, selecting Texas Instrument MSP430 as the CPU type. Following through this code you see that the code relocates and seems to unpack. No problem, I just ran it in the web debugger until it prompted me for input, then I copied that memory dump, created a new binary, and analyzed that in IDA.
There were two problems with this approach. First, it used the rand
syscall, so where the code was unpacked to was non-deterministic, and would change between runs. This meant that I couldn’t follow along in both IDA and the web disassembler. The second problem was that the program seems to relocate again. Stepping through until it jumps to the next spot, I figured I could just generate another dump when the program was fully unpacked. I generated a new binary, stepped through that, generated another, stepped through that… I did this twelve times before I figured out that there was probably no end in sight. Time to bust out the tooling.
I found a MSP430 emulator, specifically one that implements all the idiosyncrasies and syscalls of the Microcorruption MSP430. I could run the original dumped rom through this and generate a trace, then I could use another python script to disassemble the trace. Problem: the trace was over half a million instructions, and since it’s just a disassembly listing, I had to look at it linearly.
Figuring that this was a multi-stage unpacker, I had another idea: I would modify the emulator to write every executed instruction to a binary file, at the address of its PC. The goal being to end up with a rom that has each stage of the unpacker present. Since I was not committing writes to the file, but instead just what was executed, I would not wipe out old stages. A simple change to the emulator to create a sparse file, then for each instruction executed, seek to the PC, and write the opcode bytes to the file accomplished this goal.
Now with this full ROM I could get to reverse engineering the code that matters. Popping the binary into IDA showed an unpacking stage at 0x4400 (The entry point), 0x5000, and 0x8000. These nice addresses led me to figure out that my emulator always returned zero from the rand
syscall, definitely convenient. When running on the online debugger the stages were scattered somewhere on each of these pages.
Reversing the binary was mostly straight forward in IDA, since it isn’t a linear sweep disassembler it wasn’t tricked by jumps into instructions. There were many basic blocks that apparently had no path to them, but that was because of another trick: Instead of calling a function, they would push PC, branch to the function, then pop the PC, increment it, and jump back to it, effectively returning to where it had started. IDA didn’t recognize this, but it just meant that the control flow returned to the next basic block.
ROM:00004508 ; ---------------------------------------------------------------------------
ROM:00004508 mov.w R15, 8(R12)
ROM:0000450C mov.w @R10+, R15
ROM:0000450E push.w PC
ROM:00004510 jmp loc_45BE ; Basically a call
ROM:00004512 ; ---------------------------------------------------------------------------
ROM:00004512 mov.w R15, 0Ah(R12)
ROM:00004516 mov.w @R10+, R15
ROM:00004518 push.w PC
ROM:0000451A jmp loc_45BE ; Basically a call
...
ROM:0000460A loc_460A: ; CODE XREF: sub_4400+206↑j
ROM:0000460A pop R14 ; Pop the PC that was pushed earlier
ROM:0000460C incd.w R14 ; Increment it to the next instruction
ROM:0000460E jmp loc_4612
ROM:0000460E ; ---------------------------------------------------------------------------
ROM:00004610 .byte 0
ROM:00004611 .byte 0
ROM:00004612 ; ---------------------------------------------------------------------------
ROM:00004612
ROM:00004612 loc_4612: ; CODE XREF: sub_4400+20E↑j
ROM:00004612 br R14 ; Return
Reversing these three ‘unpackers’ showed about the same thing: It would copy some bytes to a random location, decrypt them, and then ultimately jump to them, then finally return and wipe out the instructions - Lucky for me I could see these ephemeral instructions because, remember, I only write executed bytes to the rom file, not written bytes. Whenever a stage of the unpacker was done, it would jump to eXXX
, where XXX is rand & 0xFFE
. Because the emulator always returned 0 from the rand
syscall, this was always exactly 0xe000
. That meant that the few instructions at 0xe000 weere just the last few that executed, and I was missing the actual code.
Taking a step back, I realized that everything I had reversed was only the unpacked. There was no hint of the prompt or the password validation. Staring at this for some time it hit me: This isn’t an unpacker, it’s more of a VM! A few instructions are unpacked, jumped to, wiped out, and then this is repeated. Everything that mattered was executed at 0xeXXX.
Back to the emulator, I made another modification that prepended every opcode trace with the address it executes at, and then I modified the disassembler to account for this and print the line with the PC preceding it. I then filtered the output to only keep any address that started with 0xeXXX.
Boom! Down from half a million instructions to a few hundred! I can reverse this.
e000 9E1C: 3182 sub #8, SP
e002 9E1D: 3C40 EA49 mov #0x49ea, R12
e006 9E1E: 004D br R13
e4c6 A5B5: 3240 0080 mov #0x8000, SR
e4ca A5B6: 3C40 CA48 mov #0x48ca, R12
e4ce A5B7: 004D br R13
e000 AD4D: B140 5700 0600 mov #0x57, 0x6(SP)
e006 AD4E: 3C40 8448 mov #0x4884, R12
e00a AD4F: 004D br R13
e000 B4E6: B012 1000 call #0x10
e004 B4E8: 3C40 584D mov #0x4d58, R12
e008 B4E9: 004D br R13
e000 BC7F: B140 6800 0600 mov #0x68, 0x6(SP)
e006 BC80: 3C40 9C45 mov #0x459c, R12
e00a BC81: 004D br R13
...
That’s more fucking like it. The br R13
is just the return to the unpacker, and the mov #0xXXXX, R12
was just another address used for unpacking. Filter them out!
e000 9E1C: 3182 sub #8, SP
e4c6 A5B5: 3240 0080 mov #0x8000, SR
e000 AD4D: B140 5700 0600 mov #0x57, 0x6(SP)
e000 B4E6: B012 1000 call #0x10
e000 BC7F: B140 6800 0600 mov #0x68, 0x6(SP)
e000 C418: B012 1000 call #0x10
e000 CBB1: B140 6100 0600 mov #0x61, 0x6(SP)
e000 D34A: B012 1000 call #0x10
e000 DAE3: B140 7400 0600 mov #0x74, 0x6(SP)
e000 E27C: B012 1000 call #0x10
e000 EA15: B140 2700 0600 mov #0x27, 0x6(SP)
e000 F1AE: B012 1000 call #0x10
e000 F947: B140 7300 0600 mov #0x73, 0x6(SP)
e000 100E0: B012 1000 call #0x10
e000 10879: B140 2000 0600 mov #0x20, 0x6(SP)
e000 11012: B012 1000 call #0x10
e000 117AB: B140 7400 0600 mov #0x74, 0x6(SP)
e000 11F44: B012 1000 call #0x10
e000 126DD: B140 6800 0600 mov #0x68, 0x6(SP)
e000 12E76: B012 1000 call #0x10
...
What we are seeing here is the prompt “What’s the password?” being printed, one character at a time. After this is the validation:
...
B140 0026 0600 mov #0x2600, 0x6(SP)
B140 0001 0800 mov #0x100, 0x8(SP)
3240 0082 mov #0x8200, SR
B012 1000 call #0x10
3540 0026 mov #0x2600, R5
0643 clr R6
repeats 8 times {
2455 add @R5, R4
8410 swpb R4
36E5 xor @R5+, R6
06E4 xor R4, R6
04E6 xor R6, R4
06E4 xor R4, R6
8593 0000 tst 0x0(R5)
0742 mov SR, R7
27F3 and #2, R7
0711 rra R7
17E3 xor #1, R7
8710 swpb R7
0711 rra R7
8711 sxt R7
8710 swpb R7
8711 sxt R7
3840 184B mov #0x4b18, R8
08F7 and R7, R8
37E3 xor #-1, R7
37F0 AA47 and #0x47aa, R7
0857 add R7, R8
0743 clr R7
}
3490 B1FE cmp #0xfeb1, R4
0742 mov SR, R7
0443 clr R4
3690 9892 cmp #0x9298, R6
07F2 and SR, R7
0643 clr R6
0711 rra R7
17E3 xor #1, R7
8710 swpb R7
0711 rra R7
0711 rra R7
0711 rra R7
0711 rra R7
02D7 bis R7, SR
Using this filtered output, I got the original bytes and converted them back to a binary that was only 732 bytes and running this in the emulator worked! That confirmed that I filtered it down to exactly what I needed. Looking at this tiny assembly it’s easy to see what’s happening: Two bytes of the input are checked at a time, and they are mixed into R4, R6, and R8. After the loop runs eight times, the value of R4 and R6 is checked, and then the status register is copied to R7, rotated around, and written back to SR. This is why the CPU was halting: The SR was set to some bogus value. We want 0x7F00 in R7 on the last instruction. The comparison to 0xfeb1 sets the status bits, and to preserve them they have to survive the AND with the status register after the comparison to 0x9298. This means that we need R4 and R6 to equal these values after the loop, meaning everything to do with R7 and R8 in the loop is trash!
B140 0026 0600 mov #0x2600, 0x6(SP)
B140 0001 0800 mov #0x100, 0x8(SP)
3240 0082 mov #0x8200, SR
B012 1000 call #0x10
3540 0026 mov #0x2600, R5
0643 clr R6
repeats 8 times {
2455 add @R5, R4
8410 swpb R4
36E5 xor @R5+, R6
06E4 xor R4, R6
04E6 xor R6, R4
06E4 xor R4, R6
}
3490 B1FE cmp #0xfeb1, R4
0742 mov SR, R7
0443 clr R4
3690 9892 cmp #0x9298, R6
07F2 and SR, R7
...
That’s very reasonable. I was able to convert this to python easily:
r4 = 0
r6 = 0
for i in range(8):
r4 = (r4 + u16(s[i*2:i*2+2]) & 0xFFFF)
r4 = swpb(r4)
r6 ^= u16(s[i*2:i*2+2])
r4, r6 = r6, r4
I tried bruteforcing this, but with a 16 byte input this was not happening. Time for z3!
password = [BitVec("s0", 16), BitVec("s1", 16), BitVec("s2", 16), BitVec("s3", 16),
BitVec("s4", 16), BitVec("s5", 16), BitVec("s6", 16), BitVec("s7", 16)]
r4 = BitVecVal(0, 16)
r6 = BitVecVal(0, 16)
for i in range(8):
r4 = r4 + password[i]
r4 = RotateRight(r4, 8)
r6 ^= password[i]
r4, r6 = r6, r4
solver = Solver()
# Add constraints
solver.add(r4 == 0xfeb1)
solver.add(r6 == 0x9298)
# Solve
solver.check()
m = solver.model()
Then simply converting the solution from 8 16-bit integers to a byte string, and submitting that, I was treated to that delicious win.
Last modified on 2018-11-24