#uiuctf
Explore tagged Tumblr posts
Text
UIUCTF 2020: Nookstop 2.0
This is a very un-hacking solution: we basically just guess the flag!
We start with this page, which looks a lot like Nookstop 1.0. Just as before, we run
document.cookie="secret_backend_service=true";
to enable the "secret backend service" (as clued by the Japanese text in the source code of the page). Reloading the page gives a link to the real challenge, which is running Emscripten.
Grabbing your banking information...... Your routing number is: 0x9a0 2464 Your account number has been successfully transmitted using the latest XOR encryption, and is not shown here for security reasons. Please stand by while the ABD decrypts it....... Calling decrypt function.... wkwcwg{c3oo33os[Byte 7F here] Uh-oh! There's been an error!
One approach would be to dump out the WebAssembly file, as easily found by looking at the page's network requests; then it can be reversed with the help of tools like wasm or wabt. Unfortunately, Emscripten adds a bunch of external code that is hard to reason about within the context of just the .wasm file, and so these decompilers produce output that is next-to-useless. Let's skip the reversing.
The text above hints that there's some kind of XOR going on, and we have a corrupted key: "wkwcwg{c3oo33os\x7f". This corrupted key looks awfully similar to "uiuctf{", in the start. Let's XOR the two together and see what the difference is.
>>> os = "wkwcwg{c3oo33os\x7f" >>> bs = "uiuctf{" >>> print( [ord(os[i])^ord(bs[i]) for i in range(len(bs))] ) [2, 2, 2, 0, 3, 1, 0]
Aha, so all of the XORs are off by a number from 0 to 3 -- that is, just the bottom two bits are wrong. Let's dump out all the candiate characters in the flag:
>>> for b in os: ... print( chr(ord(b) ^ 0) + chr(ord(b) ^ 1) + chr(ord(b) ^ 2) + chr(ord(b) ^ 3) ) ... wvut kjih wvut cba` wvut gfed {zyx cba` <-- Start of flag content 3210 onml onml 3210 3210 onml srqp <-- End of flag content ~}|
The first 7 characters are the uiuctf{, and the last is the final }, so we need to guess the remaining 8. We figure that it will be English leetspeak, so take the letter options, and convert the numbers 3210 into the letters eszilo. Then search for an answer in a regex-based dictionary. Nutrimatic helpfully ranks by frequency as an English word, and the results give one big answer: "billions". Given that we're hacking a bank, this makes perfect sense.
Convert back to leetspeak to get the flag uiuctf{b1ll10ns}.
1 note
·
View note
Text
refun Writeup
Here is the original program they give you: https://ctf.acm.illinois.edu/archive/refun
This challenge seemed much harder than it really was once I figured out how to actually approach this. The refun program basically takes a key file (just a text file), does some magic to it, and spits out a value to you and says you’re wrong (unless you’re right, then congrats!). One thing you’ll notice if you start playing with different keys, is that all of the outputs are valid base64 encodings, but not straight up encodings of the key you give. If you run “strings” on the refun program, you’ll notice near the bottom is a base64 value “lylifb58djw9RIDxjK9jLOuJet==“. The nice folks at Illinois were kind enough to tell us what our goal is!
If you play with the key some more, you’ll notice that changing one value in the key will only change one or two values in the output of refun. What this means is that the closer we get to the answer, the closer the key we have is to the actual key. So, I wrote a script that does exactly that:
http://pastebin.com/Riuqc033
What’s going on is that the function ‘getDist’ checks each byte of the program output against the bytes of the answer, and from that will try to optimize the key byte per byte. So the output is not perfect, but it’s close. Since base64 kind of adds up the ascii from characters, the smaller the difference in the base64 encoding the smaller the difference in the original text. The program should get stuck at this “keyybasfv4Shufflet&“. Now, if you play with this a little you can get closer. What I did is changed the second the ‘b’ to ‘B’ (not sure why the program didn’t do this since it’s better), the ‘f’ to ‘e’, and the ‘v’ to ‘6′. If you keep on playing with the values and such, the closest I think you’ll is this: “key:Base64Shuffled” (there was some weird stuff going on between editing the file with python and through nano, not sure what was up there). Anyways, I’ll the reader decide what the flag is from here.
by AuPhishYellow member cookthebook for UIUCTF.
0 notes
Text
UIUCTF 2020: Cricket32
This writeup will be an example in using angr to “automatically” solve this problem. But along the way, we’ll find two deficiencies in angr, and show how to fix them! Beyond being a writeup of this problem -- there is very little information specific to this challenge -- this can serve as an angr use and development tutorial.
Challenge:
We are given a file cricket32.S, which contains 32-bit x86 assembly and instructions for assembling:
// gcc -m32 cricket32.S -o cricket32 .text str_usage: .string "Usage: ./cricket32 flag\nFlag is ascii with form uiuctf{...}\n" str_yes: .string "Flag is correct! Good job!\n" str_nope: .string "Flag is not correct.\n" .global main main: mov $str_usage, %ebx xor %esi, %esi SNIP jmp printf
Note that the -m32 flag will force gcc to compile a 32-bit executable. The assembly is pretty short, but has a few hard-to-interpret instructions, such as aaa (ASCII Adjust After Addition), sahf (Store AH into Flags), and so on. It's possible these are filler meant to distract; it's also possible that they're being used as part of some bizarre check, or as data (if the program is reading its own code).
Approach
Instead of trying to reverse engineer this ourselves, we'll be using angr to solve for the input. For background, angr is a tool that turns the problem of "get this program into this particular state" into a constraint problem. We interact with angr mostly through the functions explore(), which tells to seek out various states (such as printing "Success!"), and add_constraints(), which let us tell angr rules about the input or states of the program. Let's dive in!
After compiling cricket32, we open it up in IDA to see what addresses we might care about:
We see that if the code reaches 0x12BD, it will load str_nope, and then proceed straight to loc_12BC and printf. This is our failure condition. We'll load the binary in angr and tell to find an input that allows us to avoid that address, 0x12BD.
import angr import claripy project = angr.Project("./cricket32", auto_load_libs=True) flag_len = 32 arg1 = claripy.BVS('arg1', flag_len*8) initial_state = project.factory.entry_state(args=["./cricket32", arg1])
Here arg1 is our symbolic representation of the argument to the binary. We've allocated 32 bytes to (flag_len), hoping this is enough. Since arg1 can have zero-bytes, effectively being shorter than 32, this will work as long as the flag is at most 32 bytes long.
Running the above code produces the output:
WARNING | 2020-07-19 20:08:15,647 | cle.loader | The main binary is a position-independent executable. It is being loaded with a base address of 0x400000.
so the address that we wanted to avoid, 0x12bd, will actually be 0x4012bd. Let's now explore for that address:
sm = project.factory.simulation_manager(initial_state) sm.explore(avoid=[0x40128D])
Unfortunately, we are left with the result <SimulationManager with 26 avoid (7 errored)>. When angr explores paths, it has a few different "stashes" that paths can be in. There are:
active: paths that have not been fully explored yet
deadended: which is paths that terminated the program
avoid: paths that reached somewhere we didn't want it to
found: paths that found an explicit destination request (we gave no such destination here)
errored: indicating that angr was unable to continue with a path.
Ideally we would have had a deadended path here, meaning that a program successfully terminated without reaching the avoided point of 0x4012bd. Since we didn't, let's take a look at the errors.
sm.errored[0] will get the first state in the errored stash, and on the console we get:
<State errored with "IR decoding error at 0x4012a2. You can hook this instruction with a python replacement using project.hook(0x4012a2, your_function, length=length_of_instruction).">
What's 0x4012a2? Looking in IDA again, we see the crc32 edx, dword ptr [esi] instruction. The crc32 instruction is used to compute a cyclic redundancy check. This is a somewhat complicated operation, and generally rare, so it's not terribly surprising that angr doesn't know how to translate it into constraints. As suggested, we can implement a hook to emulate this.
Angr Hooks
A hook in angr is used to modify or replace a piece of code. A syscall instruction may, for instance, invoke some complicated kernel behavior, and hooking it could allow us to substitute our own behavior; or a simple call might be hooked to modify the execution. When writing a hook, it is important to remember we are working with symbolic values, not actual integers, so that we ultimately return a mathematical expression representing the modified state.
Our hook will intercept the crc32 instruction, emulate it, and then return control flow at the end of the instruction. The crc32 instruction is 5 bytes long, so our hook starts:
def crc32_hook(state): pass project.hook(0x4012a2, crc32_hook, length=5)
Next we need to implement the correct behavior. Since all computations must be symbolic, we can't just use a library implementation of CRC32. Additionally, most CRC32 implementation use the "CRC-32" specification, with the 0x04C11DB7 polynomial (essentially a magic number). The x86 crc32 instruction instead uses the "CRC-32C", or "Castagnoli", specification, with the 0x1EDC6F41 polynomial. Some googling turns up this implementation of CRC-32C we can adapt.
CRC_TABLE = ( 0x00000000, 0xf26b8303, 0xe13b70f7, 0x1350f3f4, # ... SNIP ... 0xbe2da0a5, 0x4c4623a6, 0x5f16d052, 0xad7d5351, ) table_dict = {i: claripy.ast.bv.BVV(CRC_TABLE[i],32) for i in range(256)} def get_crc32_table_BVV(i): i = i.to_claripy() return claripy.ast.bool.ite_dict(i, table_dict, 0xFFFFFFFF) def crc32c(dst,src): b32 = src crc = dst for i in [3,2,1,0]: b = b32.get_byte(i) shift = (crc >> 8) & 0x00FFFFFF onebyte = crc.get_byte(3) crc = get_crc32_table_BVV(onebyte ^ b) ^ shift return crc
Here we need to do a symbolic table lookup, which we can implement with ite_dict. ite_dict(i, table_dict, 0xFFFFFFFF) means that we use the key-value pairs in table_dict to look up i; the default value will be 0xFFFFFFFF is not in the dict (but in our case it always will be). Then our crc32c method computes the update to running CRC32 dst using the four bytes in src. Our hook is then:
def crc32_hook(state): crc = state.regs.edx addr = state.regs.esi b32 = state.memory.load(addr).reversed print('CRC32 accessing ',b32) state.regs.edx = crc32c(crc, b32)
This clearly not a drop-in replacement for any crc32 instruction: we've hard-coded that the source operand is dword ptr [esi] and our destination is edx. But that will work for our purposes here. After adding our hook, we reload the initial state and explore as before. The search continues for a few seconds, until we get a very long stack trace that eventually ends:
RecursionError: maximum recursion depth exceeded while calling a Python object
Urk.
Removing recursion
This turns out to stem from a shortcoming of how ite_dict is implemented. It passes the dictionary to ite_cases, which loops a claripy.If expression -- essentially a ternary operator. In our case, this produces an expression like
claripy.If(i == 0, 0x00000000, claripy.If(i == 1, 0xf26b8303, claripy.If(i == 2, 0xe13b70f7, claripy.If(i == 3, 0x1350f3f4, ... <255 levels like this> ))))
and when Z3 (the backend constraint solver) tries to analyze this, it balks at the depth of the expression. There are two options now: (1) implement a better version of ite_dict, or (2) use a table-free imeplemtation of CRC-32C. In the competition, we used option (2). Googling some more for how to generate the table leads us to this page on generating the table. Instead of generating a table and storing it, we can just "recompute" (symbolically) the value each our crc32 instruction is hit. This leads to the code,
def get_crc32_calc_BVV(i): crc = i.zero_extend(32 - i.size()); if isinstance(crc, angr.state_plugins.sim_action_object.SimActionObject): crc = crc.to_claripy() for j in range(8): shift = ((crc >> 1) & 0x7FFFFFFF) cond = crc & 1 > 0 crc = claripy.If(cond, shift ^ 0x82f63b78, shift); return crc
The crc.to_claripy() here is necessary in case angr passes us a SimActionObject instead of an actual symbolic value. Then the rest of the operations work just like the C code in the link above, with the claripy.If replacing C's ternary operator. Then we replace the appropriate line in our crc32c function to use get_crc32_calc_BVV instead of get_crc32_table_BVV. Looking at our code so far:
import angr import claripy #copy implementation from https://medium.com/@chenfelix/crc32c-algorithm-79e0a7e33f61 def get_crc32_calc_BVV(i): crc = i.zero_extend(32 - i.size()); if isinstance(crc, angr.state_plugins.sim_action_object.SimActionObject): crc = crc.to_claripy() for j in range(8): shift = ((crc >> 1) & 0x7FFFFFFF) cond = crc & 1 > 0 crc = claripy.If(cond, shift ^ 0x82f63b78, shift); return crc def crc32c(dst,src): b32 = src crc = dst for i in [3,2,1,0]: b = b32.get_byte(i) shift = (crc >> 8) & 0x00FFFFFF onebyte = crc.get_byte(3) crc = get_crc32_calc_BVV(onebyte ^ b) ^ shift return crc def crc32_hook(state): crc = state.regs.edx addr = state.regs.esi b32 = state.memory.load(addr).reversed print('CRC32 accessing ',b32) state.regs.edx = crc32c(crc, b32) project = angr.Project("./cricket32", auto_load_libs=True) flag_len = 32 arg1 = claripy.BVS('arg1', flag_len*8) initial_state = project.factory.entry_state(args=["./cricket32", arg1]) sm = project.factory.simulation_manager(initial_state) project.hook(0x4012a2, crc32_hook, length=5) sm.explore(avoid=[0x40128D])
At the end we are left with: <SimulationManager with 6 deadended, 33 avoid>. Fantastic! That means 6 paths successfully terminated without hitting 0x4012bd. Each of these SimStates are listed in sm.deadended. To get the input, we can call state.make_concrete_int(arg1), which will return arg1 as a big integer; then .to_bytes(32,"big") to turn it into a string:
>>> for state in sm.deadended: ... state.make_concrete_int(arg1).to_bytes(32,"big") ... b'uiuc\xdbK\xdf\x9d\xf0N\xd6\x95cket_a_c\xddL\xc7\x97it}\x00\x00\x00\x00\x00' b'\xdaD\xd1\x9ftf{a_cricket_a_c\xddL\xc7\x97\xc6Y\xd9\xfc\x00\x00\x00\x00' b'uiuctf{a\xf0N\xd6\x95\xccF\xc1\x88_a_crack\xc6Y\xd9\xfc\x02\x00\x00\x00' b'\xdaD\xd1\x9f\xdbK\xdf\x9d\xf0N\xd6\x95cket_a_c\xddL\xc7\x97\xc6Y\xd9\xfc\x10\x10\x00\x00' b'\xdaD\xd1\x9ftf{a_cri\xccF\xc1\x88\xf0L\xfb\x9frack\xc6Y\xd9\xfc \x01\x00' b'uiuctf{a\xf0N\xd6\x95\xccF\xc1\x88_a_c\xddL\xc7\x97\xc6Y\xd9\xfc\x01\x04\x01\x08'
We see that angr has found 6 solutions that technically would work as input to the program, and indeed several start with uiuctf{ -- a sign we're on the right track! But they're filled with non-printable characters. The astute eye might piece together the flag from the above malformed fragments, but the "right" thing to do is to do tell angr that each byte of input will be printable ASCII (or zero).
Cleanup
for b in arg1.chop(8): initial_state.add_constraints((b == 0) | ((b > 31) & (b < 127))) for i in range(len("uiuctf{")): b = arg1.chop(8)[i] initial_state.add_constraints(b == ord("uiuctf{"[i]))
It's important to note that -- perhaps a bit surprisingly -- compound constraints are formed using | and &, not Python's or and and keywords. arg1.chop(8) breaks the 256-bit vector arg1 into a list of 8-bit bytes, and we add a constraint for each byte. The second loop actually forces that the flag starts with uiuctf{. Probably not strictly necessary, but will definitely accelerate solving. This code gets inserted right before initial_state = project.factory.... The evaluation now takes less than a minute, and ends with
<SimulationManager with 1 deadended, 26 avoid> >>> >>> for sol in sm.deadended: ... print(sol.posix.dumps(1)) ... sol.make_concrete_int(arg1).to_bytes(32,"big") ... b'Flag is correct! Good job!\n' b'uiuctf{a_cricket_a_crackit}\x00 @@\x00'
Flag get!
A reality check
In practice, progress wasn't quite this linear. Some gotchas I encountered were:
Needing to use .to_claripy() in the hook -- if you don't, the hook runs fine, and then you get weird and inscrutable errors later.
Figuring out what the recursion error was coming from. Was determined by shrinking the dictionary size and watching the problem go away (although of course no solution was found).
Lots of byte-ordering issues. angr will often store the bytes the opposite order you expect, so e.g. forcing the string to start with uiuctf{ was something initally done backwards.
LOTS more byte-ordering issues in implementing the CRC. And sign-bit madness. If you think translating bit-fiddly C code to Python is bad, try translating bit-fiddly C code to symbolic Python with unusual endianness.
This program is small enough that it might have been very routine to solve with angr in just a few lines of code, or that IDA might have decompiled it into something very readable. angr was thwarted by the crc32 instruction. It is worth noting that, had the binary been 64-bit, angr would have known how to do a CRC32; 32-bit executables are less supported. And IDA refused to parse the main code chunk as a function, because it couldn't handle the jump-to-the-middle-of-an-instruction we see at 0x12ba. IDA already parsed an instruction at 0x1278, so a jump to 0x1279 broke it. Finally, the ASCII-Adjust operations (aaa) are pretty rare, and I was surprised that angr supported them. I doubt many decompilers would.
Postscript: A better ite_dict
During the competition, I use a loop implementation of CRC-32 that didn't use a table. In practice, there's little reason why a table couldn't be used, if the if-then-else statement went in a binary tree instead. So I wanted to try that too! Code:
#Improved version of ite_dict that uses a binary search tree instead of a "linear" search tree. #This improves Z3 search capability (eliminating branches) and decreases recursion depth: #linear search trees make Z3 error out on tables larger than a couple hundred elements.) # Compare with https://github.com/angr/claripy/blob/f2c1998731efca4838a4edb9dec77e0424c5f691/claripy/ast/bool.py#L164 def ite_dict(i, d, default): i = i.ast if type(i) is claripy.ast.base.ASTCacheKey else i #for small dicts fall back to the old implementation if len(d) < 4: return claripy.ast.bool.ite_cases([ (i == c, v) for c,v in d.items() ], default) #otherwise, binary search. #Find the median: keys = list(d.keys()) keys.sort() split_val = keys[len(keys)//2] #split the dictionary dictLow = {c:v for c,v in d.items() if c <= split_val} dictHigh = {c:v for c,v in d.items() if c > split_val} valLow = ite_dict(i, dictLow, default) valHigh = ite_dict(i, dictHigh, default) return claripy.If(i <= split_val, valLow, valHigh) def get_crc32_table_BVV(i): i = i.to_claripy() return ite_dict(i, table_dict, 0xFFFFFFFF)
With this modified version of get_crc32_table_BVV, the table-based code ran fine. Interestingly, it took several minutes to solve -- quite a bit slower than the "slow" and "complicated" loop-based implementation. In general though, I expect the binary tree would be faster for Z3 to analyze than a linear expression. I've created a pull request to get this improvement into angr, hopefully. :)
EDIT: It got merged -- hooray for CTFs leading to productive behavior!
0 notes
Text
QR writeup:
Here is the script they give you: http://pastebin.com/3HmfYfXM
And here is the image they give you:
This was one of my favorite challenges. If you read through the script, there’s two things going on with the picture. Either, the R, G, or B value is greater than 250 and it’s replaced with the randomly selected R, G, or B value which is fixed (defined near the top), or the value is less than 250 and it’s assigned to a unique random value (which is why the picture looks so funky). The first order of business for this challenge is to figure out what fixed random values for R, G, and B they got. The way I approached this is to check each (x, y) position and see if there is some R, G, or B that fits the get_color(x, y, r) function. If we put all the counts of values of R, G, and B we find into an array, there should be one value for each which is much higher than the rest.
You can find the script I wrote for this purpose here: http://pastebin.com/5KScb9Kd
(It takes a little while to run, don’t worry if it’s slow).
If you run that, you should get R = 179, G = 83, and B = 156, which have counts much much higher than the rest. Great! All we have to do now is go through every pixel and see which ones fit the first criteria of replacing the color value with get_color(x, y, r) and which don’t. Now, if the original picture was really a QR code, any pixel that gets a color replaced with get_color(x, y, r) will have it happen to all the colors since white has an (r,g,b) of (255,255,255) and black has an (r,g,b) of (0,0,0). Thus, using that logic I just replaced all the pixels that had each color value match the criteria with (255,255,255), and the ones that don’t with (0,0,0).
That script that I wrote can be found here: http://pastebin.com/mu6pr0ay
The output of that script is “dec.png”, which looks like this:
Wow! It’s a QR code! From here I just straight up scanned it with a phone and got the flag, but I’m sure you could clean it up with gimp and scan it through a website.
by AuPhishYellow member cookthebook for UIUCTF.
0 notes