Hey, let’s keep going on our exploitation journey by talking about off-by-one. I’ll quickly speak about stack-based off-by-one and heap-based off-by-one. It is a subject I understand but I never really scratched the surface, even if I already did exploit stack off-by-one vulnerabilities before.
What is an off-by-one?
An off-by-one, if I had to explain it using my own terminology, would be a vulnerability where you can only overwrite one byte out-of-bound using a controlled, or not, string. Generally it happens when someone create a buffer and copy information into it with a specified size without taking into consideration the appended null byte.
char buf; strcpy(buf, s); // When len(s) buffer is equal to 256 == OR == strncat(buf, s, sizeof(buf)); // strncat will write sizeof(buf) + \x00
In these cases, the attacker is overwriting one null byte after the buffer allocated size in the memory, because of the null byte which is used to terminate the string.
Exploiting a classic stack overflow is pretty straight-forward, generally you want to overwrite the saved-eip, so that when it gets to the ret instruction, it will put your controlled string into EIP. To exploit an off-by-one, you’ll only be able to overwrite saved-ebp (base pointer), which will be unwrapped when you are inside a function and it exits, using the “leave” instruction.
Doing the leave instruction (which is “mov ebp, esp; pop ebp;”), it will put your null-terminated saved-ebp (for example 0xbffff200, the 00 being the byte you managed to overwrite using your off-by-one) into ebp. Then do a “ret;” When it encounters another “leave; ret;”, it will then put your current ebp to esp, effectively moving the buffer to a space you could control (there is a good chance when a buffer is over 256 for example) then do a ret.
The ESP being at a place you control, if you manage to know the offset you did overwrite (example: a saved ebp set at 0xbffff220 which you overwrite to 0xbffff200, meaning -32) you could put the instruction you want and effectively exploit this vulnerability.
If you can’t control saved-ebp for whatever reason (for example if there is another variable before “buf”) there is still interesting things you can do, such as overwrite them using the off-by-one.
You also need to avoid the alignment space from gcc, which will automatically try to align the stack space (esp) to a 16-bytes boundary (and $0xfffffff0,%esp), meaning there could be a maximum of 12 bytes between your buffer and saved-ebp, we could compile our vulnerable program using “-mpreferred-stack-boundary=2” to avoid this issue.
1. Before off-by-one [saved eip] [saved ebp] [buf] ebp: "ebp" esp: "esp" 2. After off-by-one [saved eip] [saved ebp+null byte] [controlled] 3. After leave; ret; ebp: "saved ebp" + null byte esp: "ebp" 4. After second leave; ret; ebp: "whatever value is in the stack" esp: "saved ebp" + null byte eip: "controlled value inside buf"
I didn’t do a PoC for this one because I already did exploit it before, the exploitation is pretty straightforward (Put a cyclic string, get the offset of EIP, put a shellcode somewhere, either in the env or inside the buf then exploit it).
This one, compared to all of the other articles, is a vulnerability I never had to exploit before: I had to ret-to-libc exploit a long time ago, for ROP I always used ROPGadget even if I understood how it worked, stack-based off-by-one are pretty easy to exploit even if the complete explanation is a little bit more complicated than what I had in mind, I thought you just had to overwrite “\x00” to EIP, which is kind of what the final exploit does but nowhere near what is happening.
The principled of heap-based off-by-one are similar to stack-based, except that we will overwrite one null byte into another chunk rather than into the saved-ebp.
I’m going to exploit a code created by sploitfun.wordpress.com that contains an off-by-one vulnerability in the heap:
There is only one free in this code so we won’t have to worry about freelist datastructures/bin for the moment, it is at the end, made on p. I’m going to remove ASLR as well for the moment, in a real exploit I might need a leak. The off-by-one is at line 32 or , it is copying a string of maximum size 1020 (check at line 21 or ) in a buffer of size 1020 using strcpy, so it is really writing 1021 bytes. In this example, off-by-one in p2 should overwrite over p3.
Since the two chunks are allocated, in malloc_chunk datastructure, the prev_size should not be set but used for data purpose so the overwrite would overwrite one byte into p3’s malloc_chunk size. The reason it is not bothered by the alignment is because of the requested size being 1020, that gets converted to 1024 in real internal size by “checked_request2size” ((1020 + 4 + 7) & ~7). Normally, the header of malloc_chunk should take 8 bytes (two size_t) but in this case, 4 bytes will be allocated to the user_data.
MALLOC_ALIGN_MASK = MALLOC_ALIGNMENT - 1 = SIZE_SZ * 2 - 1 = 7 (at least in x86)
Overwriting one byte into the LSB of the next chunk’ size field will clear the flag information (N M P) of the chunk. Because of the P flag (PREV_INUSE), the chunk we did write into (p2) will be set as “free’d”. This change of state will mean that there will be other information inside the chunk that will now be used, such as fd and bk which are used only on free’d chunks and that if p (the chunk before) gets free’d, glibc will unlink p2 (because there should not be two adjacent free chunks, unlink here meaning removing it from its binlist) and try to coalesce it.
A long time ago, in a galaxy far far away, there was a way to exploit simple unlink vulnerabilities to gain remote code execution.
If an attacker was able to overwrite the flag information to unset the PREV_INUSE flag, it could force a block to coalesce either backward (checking current PREV_INUSE) or forward (checking the next-to-next PREV_INUSE flag) with a chunk that was in reality used. The consolidation is made by removing the backward or forward chunk from its binlist, addition the size then add the new free chunk inside the unsorted bin.
Imagine two chunks, [chunk 1][chunk 2], if we could overflow chunk 1 to write and set chunk 2’s prev_size to an even value (so that PREV_INUSE flag is equal to 0), size to -4, fd to free address – 12 then bk to a shellcode address and if there was a free(chunk1) then a free(chunk2) here’s what would happen:
- During the consolidation check, the forward check, normally checking the next-to-next chunk flags with (ptr chunk1 + size chunk1 + size chunk2) would change due to the overwrite of chunk 2′ size to -4. It will effectively trick glibc to think that next-to-next size = chunk2 prev_size, since we put prev_size to an even value, PREV_INUSE won’t be set and it will trick glibc to think that chunk 2 is free’d.
- Since the second chunk is free, it will try to consolidate by unlinking it.
- To unlink it it will first copy the second chunk fd and bk to two variables FD and BK.
- It will then try to do FD->bk = BK and BK -> fd = FD; (where FD->bk is at 12 bytes distance from FD: 8 bytes for prev_size and size and 4 for fd)
- Since FD is equal to free – 12, FD->bk is now equal to the GOT entry of free and will overwritten with BK, BK being the shellcode address <- Pwned
- It will then add the consolidated chunk to the unsorted bin.
At the next call of free, it should execute our shellcode. The shellcode should start with a jump instruction since BK -> fd = FD will overwrite part of the shellcode with the address of free – 12.
Unfortunately, this technique doesn’t work because multiple checks has been added to ptmalloc2:
- Double free: is not possible anymore, in our example we did a “double free” by freeing chunk 1, since we did overwrite the size of chunk 2 with -4, PREV_INUSE is set to 0 meaning it has already been free’d and shouldn’t be free’d twice. Doing the free(chunk1) when overwriting chunk2 headers would raise an error now.
- Invalid next size: A check has been added to next size so that it should be superior than 2x SIZE_SZ (prev_size and size) and will raise an error if the size is inferior to 8 (in x86 systems)
- Corrupted double linked list:
if (__builtin_expect (FD->bk != P || BK->fd != P, 0)) malloc_printerr (check_action, "corrupted double-linked list", P);
This check has been added, it means that the forward pointer’s backward pointer and the backward pointer’s forward pointer should point to the current chunk. In our attack example, FD->bk would point to .got free and BK->fd would point to shellcode+8, triggering this check and so raising an error.
New Google Zero technique from 2014: The poisoned NUL byte
Alright, now that we talked about unlink that is not attackable with the old method, there is a new technique created by Google Zero (the small but top-tier security division of Google) to bypass the corrupted double linked list condition. The technique is called The poisoned NUL byte 2014 edition.
Here’s a code snippet from unlink, in this code we can see the check for the corrupted double linked list condition at line 5 or . There is two other malloc_chunk structure variables I didn’t know about called fd_nextsize and bk_nextsize which are used only for large chunks (the check is at line 11) and it is a double linked list as well, however this time it not a runtime check but a debug assert check which doesn’t get compiled into production build (at least fedora x86). Because this check doesn’t get compiled into production build, we can overwrite an address using the line 27 and 28 or  and  using a technique similar to the old unlink method.
We could try to exploit the vulnerability using these conditions:
- fd and bk have to point to the current chunk
- fd_nextsize have to point to free – 0x14 (or -20) because prev_size 4 + size 4 + fd 4 + bk 4 + fd_nextsize = 20
- bk_nextsize have to point to system
However, line 27 and 28 of unlink will require that free will be writable (it is ok since it is in the GOT) as well as system + 16, this one shouldn’t be possible however, because system_addr is located inside a text segment of libc.so. There is a bypass however, using an overwrite to “tls_dtor_list”.
tls_dtor_list is a thread-local variable which will call a list of function pointer during exit(). “__call_tls_dtors” will walk through the list and will invoke the functions one by one. What it means is that if we are able to overwrite the list with a heap address containing the arguments we should be able to execute system, kind of similar to our ret-to-libc article!
Image taken from sploitfun.wordpress.com
With that in mind, here’s the modified conditions:
- fd and bk should still point to the current chunk
- fd_nextsize should point to tls_dtor_list – 20 (or -0x14)
- bk_nextsize should point to a heap address containing a dtor_list element.
Doing that, it bypass our “writable” condition because tls_dtor_list is inside a writable section of libc.so and the heap address should be writable as well.
By disassembling “__call_tls_dtors()” we can get the address of tls_dtor_list, in Kali/debian the function is __GI___call_tls_dtors().
I will do the exploitation on the next article since there is still a lot of knowledge I need to grasp, I’m not too familiar with how this tls_dtor_list really work for example.
What did I learn today?
Thanks for reading, it was a lengthy article once again but I learned a lot today. Still no real demonstration but I think I’ll be ready to do it for my next post. Here’s a list of what I learned:
- fgets and strncpy does not null terminate
- The prerequisite to exploit a stack-based off-by-one with saved ebp (being inside a function two-level deep, no alignment space)
- How to exploit a heap-based off-by-one
- How to calculate the alignment based on size_sz size
- Review of the outdated unlink exploitation method
- Learning about new checks in ptmalloc2 (invalid next size)
- The poisoned NUL byte 2014 edition (by Google Zero)
- The presence of fd_nextsize and bk_nextsize for large chunks
- ulimit -s unlimited and ulimit -d 1 to fix a binary image, libraries and the heap to bypass ASLR
- What is tls_dtor_list and use it to execute libc functions
- Presence of __exit_funcs, doing something similar but needing an enum with a low number + null bytes
- Using tls_dtor_list to bypass the writing restriction