Plaid CTF 2018 coconut

First we’re given an address to connect to. Upon connecting with nc we see a bunch of (AT&T 🤮) x86_64 assembly, and we are told to delete lines to get under a certain threshold without changing the return value. Easy enough! I backtrack from the last write to eax and trace back until I see its final immediate value being written to the stack. Submit those lines, easy flag.

Let’s look at an example:

1    .globl    myfunction
2    myfunction:
3    pushq    %rbp
4    movq    %rsp, %rbp
5    movl    $-872808834, -20(%rbp)
6    movl    $1162927757, -16(%rbp)
7    movl    -16(%rbp), %eax
8    movl    -20(%rbp), %edx
9    andl    %edx, %eax
10    movl    %eax, -12(%rbp)
11    movl    -20(%rbp), %eax
12    movl    %eax, -8(%rbp)
13    movl    -16(%rbp), %eax
14    movl    -8(%rbp), %edx
15    andl    %edx, %eax
16    movl    %eax, -4(%rbp)
17    movl    -4(%rbp), %eax
18    popq    %rbp
19    ret

Let’s walk through it:

We see that eax is last written to on line 17, and it gets its value from -4(%rbp), which was written to from line 16, which gets its value from eax. That’s redundant, and we know we don’t need those.
Line 15 is now the latest line that writes eax, and it gets its value from edx and from a previous value of eax (since it’s an and instruction). We have to fork here and track both variables.
Following edx, we see it’s copied from -8(%rbp) on line 14.
-8(%rbp) is copied from eax on line 12
eax was last written to from -20(%rbp) on line 11
-20(%rbp) was copied from on line 8, but we don’t care about that. It was last written to on line 5, and it was an immediate value, so we know we’re done this path.
Going back to line 15, we track eax writes. eax was last written to on line 13, using the value at -16(%rbp).
-16(%rbp) was last written to on line 6, and it’s an immediate, so we’re done.

Now we obviously need the directives (lines 1 and 2), plus the stack setup (lines 3 and 4), plus the return (line 19). Other than these lines, and the lines mentioned above, we can delete everything else! If we submit these line numbers, we should get a flag.

Congrats! Your flag is PCTF{iN3ffic1eent__A5m_K1ll5}

Now on to the next one. This one, as expected, is more of the same. Connecting a few times I see that the assembly listing provided changes. @uafio and I try to download all of them and start working on them manually. In 1000 iterations we only found 100 unique assembly listings… That’s reasonable. While working through, though, the sense that this isn’t it won’t leave me. We’ll have to automate this.

Now I know in theory how this is done. I’ve read on Static single assignment form before. The idea is to re-version every variable every time it’s assigned to, and track which version of which variable was used in determining its current value.

So I start writing code!

There are many ways about this. I could use something like Capstone engine or binja’s new SSA form, but truly I don’t think I’m going to solve this problem, I just want to understand SSA (plus my binja license has expired and I don’t have the latest updates). I decide to parse the instructions using good ‘ol vanilla python. I write some typical fetching/parsing code with pwntools, and think about how to track this.

From all the challenges I’ve collected, there are only a handful of unique instructions. Mostly movs and arithmetic. All memory accesses are relative to rbp. All register access, other than rbp, is in the extended (32 bit) form. I decide to make it simple: Each register and each offset from rbp is a variable. I can use the name as a key in a dictionary. I can keep track of each version, and for each version, I can track which other variable(s) were used. This gives me a dictionary with a list (or dict) of versions, where for each version I keep a list of which variables were used to determine the new value. I also need to keep track of which line this version was created, so that if the version is relevant, I can make sure to keep the line it shows up on. Here is a nice short sample (using the same code listing as above):

[+] Opening connection to coconut.chal.pwning.xxx on port 6817: Done
1    .globl    myfunction
2    myfunction:
3    pushq    %rbp
4    movq    %rsp, %rbp
5    movl    $-872808834, -20(%rbp)
6    movl    $1162927757, -16(%rbp)
7    movl    -16(%rbp), %eax
8    movl    -20(%rbp), %edx
9    andl    %edx, %eax
10    movl    %eax, -12(%rbp)
11    movl    -20(%rbp), %eax
12    movl    %eax, -8(%rbp)
13    movl    -16(%rbp), %eax
14    movl    -8(%rbp), %edx
15    andl    %edx, %eax
16    movl    %eax, -4(%rbp)
17    movl    -4(%rbp), %eax
18    popq    %rbp
19    ret
{'%eax': {1: [('[rbp-16]', 1, '7')],
          2: [('%eax', 1, '9'), ('%edx', 1, '9')],
          3: [('[rbp-20]', 1, '11')],
          4: [('[rbp-16]', 1, '13')],
          5: [('%eax', 4, '15'), ('%edx', 2, '15')],
          6: [('[rbp-4]', 1, '17')],
          'last': 6},
 '%edx': {1: [('[rbp-20]', 1, '8')], 2: [('[rbp-8]', 1, '14')], 'last': 2},
 '[rbp-12]': {1: [('%eax', 2, '10')], 'last': 1},
 '[rbp-16]': {1: [('$1162927757', 0, '6')], 'last': 1},
 '[rbp-20]': {1: [('$-872808834', 0, '5')], 'last': 1},
 '[rbp-4]': {1: [('%eax', 5, '16')], 'last': 1},
 '[rbp-8]': {1: [('%eax', 3, '12')], 'last': 1}}

We see that there are six version of eax. Looking at the sixth version, we see that it used [rbp-4] to determine its new value (and not an old version of itself!), it used version 1 of that variable, and that happened on line 17. Looking at line 17 we can clearly see this is the case! Now going to the only version of [rbp-4], we see that it was written from a previous value of eax! We can track this back until we fully exhaust all paths. If we keep track of the lines these occurred on, then we know all of the lines that matter! The astute reader will notice that tracking back in this way is not necessarily optimal: Looking at the last two versions of eax, we see that it’s a useless write to [rbp-4], since it’s immediately read back and we don’t prune this! Luckily, though, that was not an issue, as my tactic seems to always prune enough to get within the required threshold.

Here’s some pseudo code for how to build this dictionary:

for line in code:
    get src(s), dest, and line number from line text
    newver = dest.lastver + 1
    update dest.lastver
    for each src:
        dest[newver].prereqs.append(src)

The back-tracking algorithm is a simple DFS. We track all the versions of all variables we need to visit in a queue, starting with the last version of eax (since this is the one that acts as the return value). From this first var we enqueue all variables that were used to determine this value, and repeat until this queue is empty. When we hit an immediate value, which has no prereq, we’ve exhausted this particular path. When the queue is completely empty, we’re built up a nice set of line numbers that are required to build this path, and we simply tell the server to delete everything else! This is exactly how I manually loved the first listing above!

Here’s some more pseudocode:

queue = [last version of eax]
while there are items in the queue:
    current = queue.dequeue()
    for each var that is used in cur:
        queue.enqueue(var)
    mark current as important

At this point my code was miraculously solving a few rounds, but I hit a snag. Turns out every five rounds they add a few instructions, and my original naive code only checked for a mov, and, and maybe a few others. While the debugging wasn’t super fun, it mostly just involved finding the problematic instruction and making sure we properly handle it, meaning we correctly parse out the destination and sources, and revision the destination. There were instructions like imull which has a 3-op form, and leal, which has what looks like a memory access but actually isn’t, these needed special handling.

With eight hours to go in the CTF my brain was dying and I still had a bug I could not figure out. My code was only passing about 15 rounds, and I had no idea how many total were needed to get the flag. I made the wise decision to go to bed.

I woke up again after seven hours, and with a clearer head (and a revealing question from @vakzz) I was able to spot my bug! For certain instructions, e.g. xor, the destination is also a source. I needed to make sure that new version of some variable marked the old version as a prereq! I had this correct for and but not for others. Advice: If you’re tired and batting your head against a problem, take a nap.

With this fixed, it was just a matter of cleaning my script up to make it reliable, and making sure to get the flag when the server finally spits it out.

Don’t judge my code too much! I hope it’s easy to understand.

import sys
import re
from pprint import pprint

from pwn import *

vars = {}

class dummyremote:
    def recv(self): return ""
    def recvline(self): return ""
    def recvuntil(self, s, **kwargs): return ""
    def send(self, s): print s
    def sendline(self, s): print s
    def interactive(self): pass

if len(sys.argv) > 1:
    r = remote('coconut.chal.pwning.xxx', 6817)
else:
    r = dummyremote()

def get_newver(var):
    if dest not in vars:
        vars[dest] = {
            "last": 1
        }
        newver = 1
    else:
        newver = vars[dest]["last"] + 1
        vars[dest]["last"] = newver
    return newver

buf = ""
success = 0
while True:
    if len(sys.argv) > 1: # Add random arg to connect to remote
        r.recvuntil('Function to optimize:\n')

        challenge = r.recvuntil('\n<<<EOF>>>', drop=True)
        r.recvuntil("Note:")
    else:
        with open("bla", "r") as f:
            challenge = f.read()

    # print challenge
    code = challenge.split("\n")

    seenlines = set()
    for ins in code:
        if ins == "":
            continue
        line, ins = ins.rstrip().split("\t", 1)
        # Prologue instructions must remain
        if "leave" in ins or "movq" in ins or "globl" in ins or "myfunction" in ins or "push" in ins or "pop" in ins or "ret" in ins or "subq" in ins:
            seenlines.add(line) # So that we do not remove prologue/epilogue
            continue

        op, args = ins.split("\t", 1)
        args = args.replace(",", "")

        try:
            src, dest = args.split(" ", 1)
        except: # unary ops, where src is dest
            dest = args
            src = dest

        if re.match(r"-?\d+\(", dest): # Mem access, assumed rbp offset
            dest = "[rbp"+dest.split("(")[0] + "]"
        if re.match(r"-?\d+\(", src):
            src = "[rbp"+src.split("(")[0] + "]"

        newver = get_newver(dest)
        if op == "movl":
            if src in vars:
                vars[dest][newver] = [(src, vars[src]["last"], line)]
            elif "$" in src:
                vars[dest][newver] = [(src, 0, line)]
        elif op in {"xorl", "andl", "imull", "addl", "orl", "subl"}:
            if " " in dest: # A 3-arg imull
                dest = dest.split()[0]
                newver = get_newver(dest) # The extra newver above, versioning a bogus var is harmless
                # Assuming always the same two regs for src and dest1
                vars[dest][newver] = [(dest, vars[dest]["last"]-1, line), (src, 0, line)]
            elif "$" not in src:
                vars[dest][newver] = [(dest, vars[dest]["last"]-1, line), (src, vars[src]["last"], line)]
            else:
                vars[dest][newver]  = [(dest, vars[dest]["last"]-1, line), (src, 0, line)]
        elif op in {"notl"}:
            vars[src][newver]  = [(dest, vars[dest]["last"]-1, line)]
        elif op == "leal":
            try:
                # leal (%rdx,%rax)
                op1, op2 = src.replace(")", "").replace("(%", "").split("%")
                op1 = "%e" + op1[1:] # rdx -> edx
                op2 = "%e" + op2[1:]
                vars[dest][newver] = [(op1, vars[op1]["last"], line), (op2, vars[op2]["last"], line)]
            except:
                # leal 1(%rax)
                # HACK: The src is already trashed by the regex looking for mem access. Just assume eax
                src = "%eax"
                vars[dest][newver] = [(src, vars[src]["last"], line)]
        else:
            print ins
            print "UNKNOWN OPCODE"

    # pprint(vars)
    # log.info("Done SSA")

    # Now track back from the newest eax
    last = vars["%eax"]["last"]
    cur, cur_ver, line = vars["%eax"][last][0]
    # print cur, cur_ver, line
    needed = [(cur, cur_ver, line)]
    seen = {(cur, cur_ver, line)}

    while needed:
        # Pop from needed
        cur, cur_ver, line = needed[0]
        needed = needed[1:]

        # Queue up its prereqs
        for op in vars[cur][cur_ver]:
            cur, cur_ver, line = op
            if (cur, cur_ver, line) not in seen:
                seen.add((cur, cur_ver, line))
                # print cur, cur_ver, line
                if "$" not in cur: # Immediate
                    needed.append((cur, cur_ver, line))

    # log.info("Done backtrace")

    seenlines = seenlines.union({_[2] for _ in seen})
    loc = code[4:-2][:]
    i = 0
    first = None
    prev = None
    while i < len(loc):
        start = int(loc[i].split("\t")[0])
        if str(start) not in seenlines:
            while i < len(loc) and loc[i].split("\t")[0] not in seenlines:
                end = int(loc[i].split("\t")[0])
                i += 1
            if start != end:
                r.sendline("{}-{}".format(start, end))
            else:
                r.sendline(str(start))
        i += 1
    r.sendline("#")

    r.recvuntil("Running...\n")
    buf = r.recvuntil("Result:", timeout=1)
    buf += r.recvline().rstrip()

    if "Success" in buf:
        success += 1
    else:
        # For post-mortem
        with open("bla", "w") as f:
            f.write(challenge)
    print buf + ": " + str(success)
    if success >= 55:
        print r.recv()
        r.close()
        break

Congrats! Your flag is PCTF{Y0u_Just_Imp!em3nt3D_A_LLVM_pass!}

Last modified on 2018-05-06