Bug 1350457
| Summary: | [crash] [exception RIP: unknown or invalid address] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | PaulB <pbunyan> |
| Component: | crash | Assignee: | Dave Anderson <anderson> |
| Status: | CLOSED ERRATA | QA Contact: | Emma Wu <xiawu> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.3 | CC: | asavkov, bpeck, cye, jburke, jstancek, pbunyan, xiawu |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | crash-7.1.8-1.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-01 22:04:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1394638, 1404314 | ||
|
Description
PaulB
2016-06-27 13:47:54 UTC
Dave, As we spoke about, the issue was reproduced with a Beaker job. https://beaker.engineering.redhat.com/jobs/1383562 I have sent you a separate email with the connection info. best, -pbunyan The problem seen here that the crash utility is mistaking the
"vhost-<pid>" kernel thread for a user task, because it has a
full user-space virtual address space attached to it. Because
it thinks it's a user task, the "bt" command displays a bogus
kernel-entry exception frame, recognizes that it contains an
invalid user-space RIP, and therefore displays "exception RIP:
unknown or invalid address":
crash> bt 13809
PID: 13809 TASK: ffff88022becb980 CPU: 0 COMMAND: "vhost-13798"
#0 [ffff880230b1fdf0] __schedule at ffffffff8163a26d
#1 [ffff880230b1fe58] schedule at ffffffff8163a909
#2 [ffff880230b1fe68] vhost_worker at ffffffffa069e625 [vhost]
#3 [ffff880230b1fec8] kthread at ffffffff810a5aef
#4 [ffff880230b1ff50] ret_from_fork at ffffffff81645858
[exception RIP: unknown or invalid address]
RIP: 0000000000000000 RSP: ffff880230b1ff58 RFLAGS: 00000202
RAX: 0000000000000000 RBX: ffffffff810a5a20 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88022f21bcd8 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
crash>
Each "vhost-<pid>" task gets created by a gemu-kvm task, which
encodes its pid number into the vhost kernel thread name. For
example, the "vhost-13798" kernel thread above was created by
the "qemu-kvm" task with pid 13798:
crash> bt 13798
PID: 13798 TASK: ffff88022dc92e00 CPU: 3 COMMAND: "qemu-kvm"
#0 [ffff88022f21b9d0] __schedule at ffffffff8163a26d
#1 [ffff88022f21ba38] schedule at ffffffff8163a909
#2 [ffff88022f21ba48] schedule_hrtimeout_range_clock at ffffffff81639a22
#3 [ffff88022f21bae0] schedule_hrtimeout_range at ffffffff81639ad3
#4 [ffff88022f21baf0] poll_schedule_timeout at ffffffff811f2bb5
#5 [ffff88022f21bb20] do_sys_poll at ffffffff811f413d
#6 [ffff88022f21bf40] sys_poll at ffffffff811f42f4
#7 [ffff88022f21bf80] system_call_fastpath at ffffffff81645909
RIP: 00007fcb92d93b7d RSP: 00007ffcf3ab9700 RFLAGS: 00000293
RAX: 0000000000000007 RBX: ffffffff81645909 RCX: ffffffffffffffff
RDX: 00000000000003e7 RSI: 0000000000000007 RDI: 00007fcb9d1a1dc0
RBP: 00007ffcf3ab9714 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000003 R11: 0000000000000293 R12: 00007fcb9aae8e00
R13: 00000000000003e7 R14: 00007fcb9ce460c0 R15: 000000006f579fd7
ORIG_RAX: 0000000000000007 CS: 0033 SS: 002b
crash>
The "qemu-kvm" task 13798 created the "vhost-13798" kernel thread
and as part of that process, and passed its own mm_struct address
to it:
dev->mm = get_task_mm(current);
worker = kthread_create(vhost_worker, dev, "vhost-%d", current->pid);
And the vhost_worker() function called use_mm() to set the qemu-kvm user space
mm_struct as its own, leaving it in place until the thread exits:
static int vhost_worker(void *data)
{
struct vhost_dev *dev = data;
struct vhost_work *work = NULL;
unsigned uninitialized_var(seq);
mm_segment_t oldfs = get_fs();
set_fs(USER_DS);
use_mm(dev->mm);
for (;;) {
/* mb paired w/ kthread_stop */
set_current_state(TASK_INTERRUPTIBLE);
spin_lock_irq(&dev->work_lock);
if (work) {
work->done_seq = seq;
if (work->flushing)
wake_up_all(&work->done);
}
if (kthread_should_stop()) {
spin_unlock_irq(&dev->work_lock);
__set_current_state(TASK_RUNNING);
break;
}
if (!list_empty(&dev->work_list)) {
work = list_first_entry(&dev->work_list,
struct vhost_work, node);
list_del_init(&work->node);
seq = work->queue_seq;
} else
work = NULL;
spin_unlock_irq(&dev->work_lock);
if (work) {
__set_current_state(TASK_RUNNING);
work->fn(work);
if (need_resched())
schedule();
} else
schedule();
}
unuse_mm(dev->mm);
set_fs(oldfs);
return 0;
}
The crash utility fix will require that it can recognize a kernel
thread that has a user-space virtual memory region attached to it.
Patch posted upstream: https://github.com/crash-utility/crash/commit/15994b89b9cbbd5f95f78cb9cdb927f406dff511 Fix to recognize a kernel thread that has user space virtual memory attached to it. While kernel threads typically do not have an mm_struct referencing a user-space virtual address space, they can either temporarily reference one for a user-space copy operation, or in the case of KVM "vhost" kernel threads, keep a reference to the user space of the "quem-kvm" task that created them. Without the patch, they will be mistaken for user tasks; the "bt" command will display an invalid kernel-entry exception frame that indicates "[exception RIP: unknown or invalid address]", the "ps" command will not enclose the command name with brackets, and the "ps -[uk]" and "foreach [user|kernel]" options will show the kernel thread as a user task. (anderson) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2019 |