Bug 971900
Summary: | BUG: unable to handle kernel paging request at 000000067b556720 on v3.8-rt | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | John Kacur <jkacur> |
Component: | realtime-kernel | Assignee: | John Kacur <jkacur> |
Status: | CLOSED WONTFIX | QA Contact: | MRG Quality Engineering <mrgqe-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | Development | CC: | bhu, lgoncalv, srostedt, williams |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-04-10 19:16:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
John Kacur
2013-06-07 14:14:42 UTC
Stephen Rostedt suggests that this may be due to stack corruption and This is fixed in mainline, and needs to be backported to 3.8: commit 091d0d55b286c9340201b4ed4470be87fc568228 Author: Li Zefan <lizefan> Date: Thu May 9 15:08:15 2013 +0800 shm: fix null pointer deref when userspace specifies invalid hugepage size Luis Claudio has been doing work to reproduce this, but it is apparently more easily caught with our automated testing system. No, this one is not fixed in mainline. I just said it might be the fix, but there's nothing in that output that suggests that the commit I posted fixes this issue. Beth posted several crashes, this was just the first one, which looks to be a stack corruption on the wake up. I'll explain what happened in this crash: [12017.143924] RIP: 0010:[<ffffffff81089bb3>] [<ffffffff81089bb3>] try_to_wake_up+0xb3/0x2d0 Looking at the disassembled code: 0xffffffff81089b82 <+130>: mov 0x10(%rbx),%rax 0xffffffff81089b86 <+134>: test %r10d,%r10d 0xffffffff81089b89 <+137>: mov 0x18(%rax),%r15d 0xffffffff81089b9a <+154>: mov %r15d,%eax 0xffffffff81089bb3 <+179>: mov -0x7e509ba0(,%rax,8),%rax <-- crash here The location in code was here: rq = task_rq(p); But we need to look at task_rq(p): #define task_rq(p) cpu_rq(task_cpu(p)) static inline unsigned int task_cpu(const struct task_struct *p) { return task_thread_info(p)->cpu; } Where: #define task_thread_info(task) ((struct thread_info *)(task)->stack) and: #define cpu_rq(cpu) (&per_cpu(runqueues, (cpu))) I wont go into the ugliness of the per_cpu() but that's the: mov -0x7e509ba0(,%rax,8),%rax When starting into this, we have the task structure (saved in %rbx) RBX: ffff880493f08000 Which looks like a legit pointer to a task_struct. The 0x10(%rbx), %rax is: struct task_struct { volatile long state; volatile long saved_state; void *stack; As this is 64bit, each long there is 8 bytes, and that leaves us with the offset of 0x10 being "stack". Thus we moved the stack pointer into %rax, and then took the offset of that: mov 0x18(%rax),%r15d struct thread_info { struct task_struct *task; struct exec_domain *exec_domain; __u32 flags; __u32 status; __u32 cpu; The task and exec_domain pointers are 8 bytes each, the flags and status are 4 bytes each: 8 + 8 + 4 + 4 = 24 = 0x18 Thus we moved the cpu # of the task into %r15 R15: 00000000df34c058 ??? Unless we have more than 3,744,776,280 CPUs something totally went bad here. mov %r15d,%eax Just chops it down to 32 bits. mov -0x7e509ba0(,%rax,8),%rax Loads the per_cpu value based off of the cpu (which is a really huge number). As I highly doubt we have a box with over 3 billion CPUs, the offset generated points into some unknown memory and we take a kernel page fault and crash. As the task cpu (task_cpu) is stored at the bottom of the tasks stack, this can be corrupted if the stack gets too big. Which could have been what happened. Usually when this happens the task with the big stack will crash and we might get a clue to what happened. But unfortunately, here it was another task reading the corrupted tasks stack that died, and we have no idea why that task had a corrupted stack. But that was the above reported bug, and has really nothing to do with the fix. For the suggested fix, 091d0d55b "shm: fix null pointer deref when userspace specifies invalid hugepage size", this fixes the other crashes the Beth reported in that same email: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff8113ec85>] sys_mmap_pgoff+0x145/0x260 PGD 424340067 PUD 424341067 PMD 0 Oops: 0000 [#11] PREEMPT SMP Modules linked in: sunrpc ipv6 acpi_power_meter gpio_ich iTCO_wdt iTCO_vendor_support joydev coretemp hwmon crc32c_intel ghash_clmulni_intel microcode sg serio_raw pcspkr hpwdt hpilo lpc_ich i7core_edac edac_core be2iscsi iscsi_boot_sysfs libiscsi scsi_transport_iscsi be2net ext4 jbd2 mbcache sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul hpsa mgag200 ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea dm_mirror dm_region_hash dm_log dm_mod [last unloaded: hwlat_detector] CPU 14 Pid: 27314, comm: scrashme Tainted: G D 3.8.13-rt9.3.el6rt.x86_64 #1 HP ProLiant BL460c G7 RIP: 0010:[<ffffffff8113ec85>] [<ffffffff8113ec85>] sys_mmap_pgoff+0x145/0x260 RSP: 0018:ffff88049616bf08 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1fed4a6cd07d13ec RCX: 0000000000000009 RDX: 0000000000100000 RSI: 0000000000001000 RDI: 0000000000100000 RBP: ffff88049616bf68 R08: 0000000000200000 R09: 0000000000000034 R10: 272d3d7983b6f408 R11: 0000000000000206 R12: 0000000000000009 R13: 0000000000000000 R14: 1654e0c99499bc30 R15: 0000000000000000 FS: 00007fb6640d5700(0000) GS:ffff880c0bce0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000003c68d3000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process scrashme (pid: 27314, threadinfo ffff88049616a000, task ffff8805d5552fc0) Stack: ffff8805d5552fc0 0000000000100000 0000000000000034 272d3d7983b6f408 00007fffdc63e30e 0000000000000000 ffff88049616bf48 0000000000000360 0000000000000009 0000000000000007 00007fffdc63e30e 0000000000402eab Call Trace: [<ffffffff810070d9>] sys_mmap+0x29/0x30 [<ffffffff81553482>] system_call_fastpath+0x16/0x1b Code: 4c 8b 7d f8 c9 c3 44 89 c9 ba 01 00 00 00 44 89 4d b0 d3 e2 4c 89 55 b8 48 63 d2 48 89 d7 48 89 55 a8 e8 cf 18 01 00 48 8b 55 a8 <8b> 48 08 be 00 10 00 00 48 89 f0 48 89 75 a8 48 d3 e0 48 89 d7 RIP [<ffffffff8113ec85>] sys_mmap_pgoff+0x145/0x260 RSP <ffff88049616bf08> Where the crash happened here: sys_mmap_pgoff (mm/mmap.c:1304) 1301 } else if (flags & MAP_HUGETLB) { 1302 struct user_struct *user = NULL; 1303 1304 len = ALIGN(len, huge_page_size(hstate_sizelog( 1305 (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK))); Disassembling this: 0xffffffff8113ec7c <+316>: callq 0xffffffff81150550 <size_to_hstate> 0xffffffff8113ec81 <+321>: mov -0x58(%rbp),%rdx 0xffffffff8113ec85 <+325>: mov 0x8(%rax),%ecx rax is the return value of size_to_hstate, which is hidden in the hstate_sizelog(): static inline struct hstate *hstate_sizelog(int page_size_log) { if (!page_size_log) return &default_hstate; return size_to_hstate(1 << page_size_log); } As you can see, it returns the size_to_hstate(). If we look at what was returned: RAX: 0000000000000000 It is NULL. Which is something that size_to_hstate() can return. The current code doesn't handle a NULL pointer being returned and crashes. The commit I showed has this: @@ -1367,9 +1367,13 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len, len = ALIGN(len, huge_page_size(hstate_file(file))); } else if (flags & MAP_HUGETLB) { struct user_struct *user = NULL; + struct hstate *hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & + SHM_HUGE_MASK); - len = ALIGN(len, huge_page_size(hstate_sizelog( - (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK))); + if (!hs) + return -EINVAL; + + len = ALIGN(len, huge_page_size(hs)); /* * VM_NORESERVE is used because the reservations will be * taken when vm_ops->mmap() is called As you can see above, it checks the return value of hstate_sizelog(), and returns with a -EINVAL if it returns NULL and does not crash the system. Now did this cause the original bug in this BZ? I don't know. But I never said that the commit would fix it. This issue has not been updated in a while and is using an older, unsupported kernel. This BZ is being closed WONTFIX. |