Bug 1350457

Summary: [crash] [exception RIP: unknown or invalid address]
Product: Red Hat Enterprise Linux 7 Reporter: PaulB <pbunyan>
Component: crashAssignee: Dave Anderson <anderson>
Status: CLOSED ERRATA QA Contact: Emma Wu <xiawu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: asavkov, bpeck, cye, jburke, jstancek, pbunyan, xiawu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: crash-7.1.8-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 22:04:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1394638, 1404314    

Description PaulB 2016-06-27 13:47:54 UTC
Description of problem:
When crashing a RHEL-7.2 KVM host that has a RHEL-7.2 KVM guest running,
crash reports the following issue when analysing the vmcore file on the 
"KVM Host":
 [exception RIP: unknown or invalid address]

Version-Release number of selected component (if applicable):
 distro: RHEL-7.2 Server x86_64
 crash: 7.1.2-2.el7.x86_64
 kexec-tools: 2.0.7-38.el7.x86_64
 kernel: 3.10.0-327.el7.x86_64

How reproducible:
 Consistently

Steps to Reproduce:
1. Install a HOST system with RHEL-7.2 Server x86_64
2. Create a KVM-Guest on the HOST (also installed with RHEL-7.2 Server x86_64)
3. Trigger a crash on the "HOST" with the KVM-Guest running.
   echo c > /proc/sysrq-trigger

Actual results:
https://beaker.engineering.redhat.com/jobs/1383562
https://beaker.engineering.redhat.com/recipes/2824992#task42413480
http://lab-02.rhts.eng.bos.redhat.com/beaker/logs/tasks/42413+/42413480/crash.vmcore.log
---<-snip->---
PID: 13809  TASK: ffff88022becb980  CPU: 0   COMMAND: "vhost-13798"
 #0 [ffff880230b1fdf0] __schedule at ffffffff8163a26d
 #1 [ffff880230b1fe58] schedule at ffffffff8163a909
 #2 [ffff880230b1fe68] vhost_worker at ffffffffa069e625 [vhost]
 #3 [ffff880230b1fec8] kthread at ffffffff810a5aef
 #4 [ffff880230b1ff50] ret_from_fork at ffffffff81645858
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff880230b1ff58  RFLAGS: 00000202
    RAX: 0000000000000000  RBX: ffffffff810a5a20  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88022f21bcd8   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
---<-snip->---

Expected results:
 no unknown issues reported when running crash utility

Additional info:

Comment 1 PaulB 2016-06-27 14:12:41 UTC
Dave,
As we spoke about, the issue was reproduced with a Beaker job.
https://beaker.engineering.redhat.com/jobs/1383562

I have sent you a separate email with the connection info.

best,
-pbunyan

Comment 3 Dave Anderson 2016-06-27 18:37:43 UTC
The problem seen here that the crash utility is mistaking the 
"vhost-<pid>" kernel thread for a user task, because it has a 
full user-space virtual address space attached to it.  Because 
it thinks it's a user task, the "bt" command displays a bogus
kernel-entry exception frame, recognizes that it contains an 
invalid user-space RIP, and therefore displays "exception RIP: 
unknown or invalid address":

  crash> bt 13809
  PID: 13809  TASK: ffff88022becb980  CPU: 0   COMMAND: "vhost-13798"
   #0 [ffff880230b1fdf0] __schedule at ffffffff8163a26d
   #1 [ffff880230b1fe58] schedule at ffffffff8163a909
   #2 [ffff880230b1fe68] vhost_worker at ffffffffa069e625 [vhost]
   #3 [ffff880230b1fec8] kthread at ffffffff810a5aef
   #4 [ffff880230b1ff50] ret_from_fork at ffffffff81645858
      [exception RIP: unknown or invalid address]
      RIP: 0000000000000000  RSP: ffff880230b1ff58  RFLAGS: 00000202
      RAX: 0000000000000000  RBX: ffffffff810a5a20  RCX: 0000000000000000
      RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
      RBP: ffff88022f21bcd8   R8: 0000000000000000   R9: 0000000000000000
      R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
      R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  crash>

Each "vhost-<pid>" task gets created by a gemu-kvm task, which
encodes its pid number into the vhost kernel thread name.  For 
example, the "vhost-13798" kernel thread above was created by
the "qemu-kvm" task with pid 13798:
  
  crash> bt 13798
  PID: 13798  TASK: ffff88022dc92e00  CPU: 3   COMMAND: "qemu-kvm"
   #0 [ffff88022f21b9d0] __schedule at ffffffff8163a26d
   #1 [ffff88022f21ba38] schedule at ffffffff8163a909
   #2 [ffff88022f21ba48] schedule_hrtimeout_range_clock at ffffffff81639a22
   #3 [ffff88022f21bae0] schedule_hrtimeout_range at ffffffff81639ad3
   #4 [ffff88022f21baf0] poll_schedule_timeout at ffffffff811f2bb5
   #5 [ffff88022f21bb20] do_sys_poll at ffffffff811f413d
   #6 [ffff88022f21bf40] sys_poll at ffffffff811f42f4
   #7 [ffff88022f21bf80] system_call_fastpath at ffffffff81645909
      RIP: 00007fcb92d93b7d  RSP: 00007ffcf3ab9700  RFLAGS: 00000293
      RAX: 0000000000000007  RBX: ffffffff81645909  RCX: ffffffffffffffff
      RDX: 00000000000003e7  RSI: 0000000000000007  RDI: 00007fcb9d1a1dc0
      RBP: 00007ffcf3ab9714   R8: 0000000000000000   R9: 0000000000000000
      R10: 0000000000000003  R11: 0000000000000293  R12: 00007fcb9aae8e00
      R13: 00000000000003e7  R14: 00007fcb9ce460c0  R15: 000000006f579fd7
      ORIG_RAX: 0000000000000007  CS: 0033  SS: 002b
  crash> 

The "qemu-kvm" task 13798 created the "vhost-13798" kernel thread
and as part of that process, and passed its own mm_struct address 
to it: 

        dev->mm = get_task_mm(current);
        worker = kthread_create(vhost_worker, dev, "vhost-%d", current->pid);

And the vhost_worker() function called use_mm() to set the qemu-kvm user space
mm_struct as its own, leaving it in place until the thread exits:

  static int vhost_worker(void *data)
  {
  	struct vhost_dev *dev = data;
  	struct vhost_work *work = NULL;
  	unsigned uninitialized_var(seq);
  	mm_segment_t oldfs = get_fs();
  
  	set_fs(USER_DS);
  	use_mm(dev->mm);
  
  	for (;;) {
  		/* mb paired w/ kthread_stop */
  		set_current_state(TASK_INTERRUPTIBLE);
  
  		spin_lock_irq(&dev->work_lock);
  		if (work) {
  			work->done_seq = seq;
  			if (work->flushing)
  				wake_up_all(&work->done);
  		}
  
  		if (kthread_should_stop()) {
  			spin_unlock_irq(&dev->work_lock);
  			__set_current_state(TASK_RUNNING);
  			break;
  		}
  		if (!list_empty(&dev->work_list)) {
  			work = list_first_entry(&dev->work_list,
  						struct vhost_work, node);
  			list_del_init(&work->node);
  			seq = work->queue_seq;
  		} else
  			work = NULL;
  		spin_unlock_irq(&dev->work_lock);
  
  		if (work) {
  			__set_current_state(TASK_RUNNING);
  			work->fn(work);
  			if (need_resched())
  				schedule();
  		} else
  			schedule();
  
  	}
  	unuse_mm(dev->mm);
  	set_fs(oldfs);
  	return 0;
  }
  
The crash utility fix will require that it can recognize a kernel 
thread that has a user-space virtual memory region attached to it.

Comment 4 Dave Anderson 2016-06-28 18:11:23 UTC
Patch posted upstream:

https://github.com/crash-utility/crash/commit/15994b89b9cbbd5f95f78cb9cdb927f406dff511

  Fix to recognize a kernel thread that has user space virtual memory
  attached to it.  While kernel threads typically do not have an
  mm_struct referencing a user-space virtual address space, they can
  either temporarily reference one for a user-space copy operation, or
  in the case of KVM "vhost" kernel threads, keep a reference to the
  user space of the "quem-kvm" task that created them.  Without the
  patch, they will be mistaken for user tasks; the "bt" command will
  display an invalid kernel-entry exception frame that indicates
  "[exception RIP: unknown or invalid address]", the "ps" command
  will not enclose the command name with brackets, and the "ps -[uk]"
  and "foreach [user|kernel]" options will show the kernel thread as
  a user task.
  (anderson)

Comment 8 errata-xmlrpc 2017-08-01 22:04:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2019