1350457 – [crash] [exception RIP: unknown or invalid address]

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1350457 - [crash] [exception RIP: unknown or invalid address]

Summary: [crash] [exception RIP: unknown or invalid address]

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	crash
Sub Component:
Version:	7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Dave Anderson
QA Contact:	Emma Wu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1394638 1404314
TreeView+	depends on / blocked

Reported:	2016-06-27 13:47 UTC by PaulB
Modified:	2019-02-19 21:40 UTC (History)
CC List:	7 users (show)
Fixed In Version:	crash-7.1.8-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-01 22:04:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2019	0	normal	SHIPPED_LIVE	crash bug fix and enhancement update	2017-08-01 19:31:13 UTC

Description PaulB 2016-06-27 13:47:54 UTC

Description of problem:
When crashing a RHEL-7.2 KVM host that has a RHEL-7.2 KVM guest running,
crash reports the following issue when analysing the vmcore file on the 
"KVM Host":
 [exception RIP: unknown or invalid address]

Version-Release number of selected component (if applicable):
 distro: RHEL-7.2 Server x86_64
 crash: 7.1.2-2.el7.x86_64
 kexec-tools: 2.0.7-38.el7.x86_64
 kernel: 3.10.0-327.el7.x86_64

How reproducible:
 Consistently

Steps to Reproduce:
1. Install a HOST system with RHEL-7.2 Server x86_64
2. Create a KVM-Guest on the HOST (also installed with RHEL-7.2 Server x86_64)
3. Trigger a crash on the "HOST" with the KVM-Guest running.
   echo c > /proc/sysrq-trigger

Actual results:
https://beaker.engineering.redhat.com/jobs/1383562
https://beaker.engineering.redhat.com/recipes/2824992#task42413480
http://lab-02.rhts.eng.bos.redhat.com/beaker/logs/tasks/42413+/42413480/crash.vmcore.log
---<-snip->---
PID: 13809  TASK: ffff88022becb980  CPU: 0   COMMAND: "vhost-13798"
 #0 [ffff880230b1fdf0] __schedule at ffffffff8163a26d
 #1 [ffff880230b1fe58] schedule at ffffffff8163a909
 #2 [ffff880230b1fe68] vhost_worker at ffffffffa069e625 [vhost]
 #3 [ffff880230b1fec8] kthread at ffffffff810a5aef
 #4 [ffff880230b1ff50] ret_from_fork at ffffffff81645858
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff880230b1ff58  RFLAGS: 00000202
    RAX: 0000000000000000  RBX: ffffffff810a5a20  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88022f21bcd8   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
---<-snip->---

Expected results:
 no unknown issues reported when running crash utility

Additional info:

Comment 1 PaulB 2016-06-27 14:12:41 UTC

Dave,
As we spoke about, the issue was reproduced with a Beaker job.
https://beaker.engineering.redhat.com/jobs/1383562

I have sent you a separate email with the connection info.

best,
-pbunyan

Comment 3 Dave Anderson 2016-06-27 18:37:43 UTC

The problem seen here that the crash utility is mistaking the 
"vhost-<pid>" kernel thread for a user task, because it has a 
full user-space virtual address space attached to it.  Because 
it thinks it's a user task, the "bt" command displays a bogus
kernel-entry exception frame, recognizes that it contains an 
invalid user-space RIP, and therefore displays "exception RIP: 
unknown or invalid address":

  crash> bt 13809
  PID: 13809  TASK: ffff88022becb980  CPU: 0   COMMAND: "vhost-13798"
   #0 [ffff880230b1fdf0] __schedule at ffffffff8163a26d
   #1 [ffff880230b1fe58] schedule at ffffffff8163a909
   #2 [ffff880230b1fe68] vhost_worker at ffffffffa069e625 [vhost]
   #3 [ffff880230b1fec8] kthread at ffffffff810a5aef
   #4 [ffff880230b1ff50] ret_from_fork at ffffffff81645858
      [exception RIP: unknown or invalid address]
      RIP: 0000000000000000  RSP: ffff880230b1ff58  RFLAGS: 00000202
      RAX: 0000000000000000  RBX: ffffffff810a5a20  RCX: 0000000000000000
      RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
      RBP: ffff88022f21bcd8   R8: 0000000000000000   R9: 0000000000000000
      R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
      R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  crash>

Each "vhost-<pid>" task gets created by a gemu-kvm task, which
encodes its pid number into the vhost kernel thread name.  For 
example, the "vhost-13798" kernel thread above was created by
the "qemu-kvm" task with pid 13798:
  
  crash> bt 13798
  PID: 13798  TASK: ffff88022dc92e00  CPU: 3   COMMAND: "qemu-kvm"
   #0 [ffff88022f21b9d0] __schedule at ffffffff8163a26d
   #1 [ffff88022f21ba38] schedule at ffffffff8163a909
   #2 [ffff88022f21ba48] schedule_hrtimeout_range_clock at ffffffff81639a22
   #3 [ffff88022f21bae0] schedule_hrtimeout_range at ffffffff81639ad3
   #4 [ffff88022f21baf0] poll_schedule_timeout at ffffffff811f2bb5
   #5 [ffff88022f21bb20] do_sys_poll at ffffffff811f413d
   #6 [ffff88022f21bf40] sys_poll at ffffffff811f42f4
   #7 [ffff88022f21bf80] system_call_fastpath at ffffffff81645909
      RIP: 00007fcb92d93b7d  RSP: 00007ffcf3ab9700  RFLAGS: 00000293
      RAX: 0000000000000007  RBX: ffffffff81645909  RCX: ffffffffffffffff
      RDX: 00000000000003e7  RSI: 0000000000000007  RDI: 00007fcb9d1a1dc0
      RBP: 00007ffcf3ab9714   R8: 0000000000000000   R9: 0000000000000000
      R10: 0000000000000003  R11: 0000000000000293  R12: 00007fcb9aae8e00
      R13: 00000000000003e7  R14: 00007fcb9ce460c0  R15: 000000006f579fd7
      ORIG_RAX: 0000000000000007  CS: 0033  SS: 002b
  crash> 

The "qemu-kvm" task 13798 created the "vhost-13798" kernel thread
and as part of that process, and passed its own mm_struct address 
to it: 

        dev->mm = get_task_mm(current);
        worker = kthread_create(vhost_worker, dev, "vhost-%d", current->pid);

And the vhost_worker() function called use_mm() to set the qemu-kvm user space
mm_struct as its own, leaving it in place until the thread exits:

  static int vhost_worker(void *data)
  {
  	struct vhost_dev *dev = data;
  	struct vhost_work *work = NULL;
  	unsigned uninitialized_var(seq);
  	mm_segment_t oldfs = get_fs();
  
  	set_fs(USER_DS);
  	use_mm(dev->mm);
  
  	for (;;) {
  		/* mb paired w/ kthread_stop */
  		set_current_state(TASK_INTERRUPTIBLE);
  
  		spin_lock_irq(&dev->work_lock);
  		if (work) {
  			work->done_seq = seq;
  			if (work->flushing)
  				wake_up_all(&work->done);
  		}
  
  		if (kthread_should_stop()) {
  			spin_unlock_irq(&dev->work_lock);
  			__set_current_state(TASK_RUNNING);
  			break;
  		}
  		if (!list_empty(&dev->work_list)) {
  			work = list_first_entry(&dev->work_list,
  						struct vhost_work, node);
  			list_del_init(&work->node);
  			seq = work->queue_seq;
  		} else
  			work = NULL;
  		spin_unlock_irq(&dev->work_lock);
  
  		if (work) {
  			__set_current_state(TASK_RUNNING);
  			work->fn(work);
  			if (need_resched())
  				schedule();
  		} else
  			schedule();
  
  	}
  	unuse_mm(dev->mm);
  	set_fs(oldfs);
  	return 0;
  }
  
The crash utility fix will require that it can recognize a kernel 
thread that has a user-space virtual memory region attached to it.

Comment 4 Dave Anderson 2016-06-28 18:11:23 UTC

Patch posted upstream:

https://github.com/crash-utility/crash/commit/15994b89b9cbbd5f95f78cb9cdb927f406dff511

  Fix to recognize a kernel thread that has user space virtual memory
  attached to it.  While kernel threads typically do not have an
  mm_struct referencing a user-space virtual address space, they can
  either temporarily reference one for a user-space copy operation, or
  in the case of KVM "vhost" kernel threads, keep a reference to the
  user space of the "quem-kvm" task that created them.  Without the
  patch, they will be mistaken for user tasks; the "bt" command will
  display an invalid kernel-entry exception frame that indicates
  "[exception RIP: unknown or invalid address]", the "ps" command
  will not enclose the command name with brackets, and the "ps -[uk]"
  and "foreach [user|kernel]" options will show the kernel thread as
  a user task.
  (anderson)

Comment 8 errata-xmlrpc 2017-08-01 22:04:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2019

Note You need to log in before you can comment on or make changes to this bug.