839966 – Trigger RHEL7 crash in guest domU, host don't generate core file

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 839966 - Trigger RHEL7 crash in guest domU, host don't generate core file

Summary: Trigger RHEL7 crash in guest domU, host don't generate core file

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	7.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Vitaly Kuznetsov
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:	xen
Depends On:
Blocks:	741684
TreeView+	depends on / blocked

Reported:	2012-07-13 10:15 UTC by Wei Shi
Modified:	2015-03-05 11:28 UTC (History)
CC List:	6 users (show)
Fixed In Version:	kernel-3.10.0-137.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-03-05 11:28:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:0290	0	normal	SHIPPED_LIVE	Important: kernel security, bug fix, and enhancement update	2015-03-05 16:13:58 UTC

Description Wei Shi 2012-07-13 10:15:44 UTC

Description of problem:
After trigger a crash on rhel7 guest, the crashinfo is dump to the guest's screen output, but host seems don't catch this crash message, no dump core file is generated in /var/lib/xen/dump dir

I also try this case on the same host with same xen-config.sxp to test rhel6.3 (2.6.32-278.el6.x86_64) HVM guest, rhel6.3 works fine with core file generated in host

Version-Release number of selected component (if applicable):
Host: RHEL5.8 2.6.18-318.el5xen x86_64
Guest: RHEL7.0 3.3.0-0.20.el7 HVM x86_64

How reproducible:
100%

Steps to Reproduce:
1. Check config items
xend-config.sxp
(enable-dump yes)

xen-hvm-guest-el7.cfg
on_crash = "restart"

2. lunch rhel7 guest

3. trigger guest crash(guest)
[root@rhel7 ~]# echo c > /proc/sysrq-trigger
[   43.625628] SysRq : Trigger a crash
[   43.626007] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   43.626007] IP: [<ffffffff813e8256>] sysrq_handle_crash+0x16/0x20
[   43.626007] PGD 3998f067 PUD 394b4067 PMD 0
[   43.626007] Oops: 0002 [#1] SMP
[   43.626007] CPU 0
[   43.626007] Modules linked in: 8139too xen_netfront 8139cp pcspkr i2c_piix4 mii i2c_core ata_generic pata_acpi xen_blkfront ata_piix libata floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[   43.626007]
[   43.626007] Pid: 639, comm: bash Not tainted 3.3.0-0.20.el7.x86_64 #1 Red Hat HVM domU
[   43.626007] RIP: 0010:[<ffffffff813e8256>]  [<ffffffff813e8256>] sysrq_handle_crash+0x16/0x20
[   43.626007] RSP: 0018:ffff880039e85e28  EFLAGS: 00010096
[   43.626007] RAX: 0000000000000010 RBX: ffffffff819dcfa0 RCX: 0000000000000001
[   43.626007] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000063
[   43.626007] RBP: ffff880039e85e28 R08: 0000000000000000 R09: 0000000000000000
[   43.626007] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000063
[   43.626007] R13: 0000000000000282 R14: 0000000000000000 R15: 0000000000000007
[   43.626007] FS:  00007f127540f740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   43.626007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.626007] CR2: 0000000000000000 CR3: 000000003998d000 CR4: 00000000000006f0
[   43.626007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   43.626007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   43.626007] Process bash (pid: 639, threadinfo ffff880039e84000, task ffff88003befcd20)
[   43.626007] Stack:
[   43.626007]  ffff880039e85e68 ffffffff813e89b7 ffff880039e85e68 0000000000000002
[   43.626007]  ffff88003950a840 ffffffff813e8a20 ffff880038998080 ffff880039e85f50
[   43.626007]  ffff880039e85e98 ffffffff813e8a6a ffff880039e85e98 00007f1275414000
[   43.626007] Call Trace:
[   43.626007]  [<ffffffff813e89b7>] __handle_sysrq+0x127/0x190
[   43.626007]  [<ffffffff813e8a20>] ? __handle_sysrq+0x190/0x190
[   43.626007]  [<ffffffff813e8a6a>] write_sysrq_trigger+0x4a/0x50
[   43.626007]  [<ffffffff812279f0>] proc_reg_write+0x80/0xc0
[   43.626007]  [<ffffffff811bd35f>] vfs_write+0xaf/0x190
[   43.626007]  [<ffffffff811bd69d>] sys_write+0x4d/0x90
[   43.626007]  [<ffffffff8166ba29>] system_call_fastpath+0x16/0x1b
[   43.626007] Code: d0 88 81 63 2a 95 82 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 c7 05 1d 64 56 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 55 48 89 e5 53 48 83 ec 08 66 66
[   43.626007] RIP  [<ffffffff813e8256>] sysrq_handle_crash+0x16/0x20
[   43.626007]  RSP <ffff880039e85e28>
[   43.626007] CR2: 0000000000000000
[   43.626007] ---[ end trace 9cd54253aac3e4d4 ]---
[   43.626007] Kernel panic - not syncing: Fatal exception

4. no dump core file generated(host)
[root@dhcp-8-204 ~]# xm li
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     4858     8 r-----   2657.5
hvm-guest-el7                             57     1032     1 r-----    144.2
[root@dhcp-8-204 ~]# ls /var/lib/xen/dump/
[root@dhcp-8-204 ~]# 

Actual results:
no dump file generated, and guest domain is still in running status

Expected results:
host generate a dump core file, guest action is match on_crash's value in cfg file

Additional info:

Comment 2 Laszlo Ersek 2012-07-13 11:30:02 UTC

I'm sorry but precisely as reported this seems NOTABUG.

(In reply to comment #0)

> xen-hvm-guest-el7.cfg
> on_crash = "restart"

> 4. no dump core file generated(host)
> [root@dhcp-8-204 ~]# xm li
> Name                                      ID Mem(MiB) VCPUs State   Time(s)
> Domain-0                                   0     4858     8 r-----   2657.5
> hvm-guest-el7                             57     1032     1 r-----    144.2
> [root@dhcp-8-204 ~]# ls /var/lib/xen/dump/
> [root@dhcp-8-204 ~]# 
> 
> Actual results:
> no dump file generated, and guest domain is still in running status

The domid is quite high (57) which does not exclude at all that the domain was simply restarted (= new domain booted with the same guest config).

> Expected results:
> host generate a dump core file, guest action is match on_crash's value in
> cfg file

These two are contradictory in this exact case (see on_crash="restart" above); the second requirement is fulfilled (xend action matches on_crash setting).

Comment 3 Laszlo Ersek 2012-07-13 11:39:19 UTC

Hmmm, I may be wrong. enable-dump seems orthogonal.

Comment 4 Wei Shi 2012-07-16 01:45:11 UTC

(In reply to comment #2)
> I'm sorry but precisely as reported this seems NOTABUG.
> 
> (In reply to comment #0)
> 
> > xen-hvm-guest-el7.cfg
> > on_crash = "restart"
> 
> > 4. no dump core file generated(host)
> > [root@dhcp-8-204 ~]# xm li
> > Name                                      ID Mem(MiB) VCPUs State   Time(s)
> > Domain-0                                   0     4858     8 r-----   2657.5
> > hvm-guest-el7                             57     1032     1 r-----    144.2
> > [root@dhcp-8-204 ~]# ls /var/lib/xen/dump/
> > [root@dhcp-8-204 ~]# 
> > 
> > Actual results:
> > no dump file generated, and guest domain is still in running status
> 
> The domid is quite high (57) which does not exclude at all that the domain
> was simply restarted (= new domain booted with the same guest config).
> 
> > Expected results:
> > host generate a dump core file, guest action is match on_crash's value in
> > cfg file
> 
> These two are contradictory in this exact case (see on_crash="restart"
> above); the second requirement is fulfilled (xend action matches on_crash
> setting).

Sorry, i forgot to mention that no reboot is happenning, the domid 57 is just the original crash domU, no new domU is launched.
That's why i said it seems dom0 never catch the crash signal with domU.

Comment 6 Andrew Jones 2014-05-02 14:52:10 UTC

Assigning to Vitaly. I recommend trying this over Fedora 20 xen. It it doesn't reproduce, then we can close as wont-fix. If it does reproduce, then, if it looks like a host problem, we should open a bug to Fedora, if it's a guest problem we should fix it.

Comment 7 Lingfei Kong 2014-05-05 01:24:48 UTC

I can reproduce it on Fedora 20 xen(xen-4.3.2-2.fc20). rhel6.5 and rhel5.11 guest can generate a core file when trigger a crash in the guest, but rhel7.0 guest didn't generate the core file. So it probably a guest problem.

Comment 8 Vitaly Kuznetsov 2014-05-09 16:08:08 UTC

This issue is present in upstream 3.11.10 but was fixed in 3.12. Here is the commit:
commit 669b0ae961e87c824233475e987b2d39996d4849
Author: Vaughan Cao <vaughan.cao>
Date:   Fri Aug 16 16:10:56 2013 +0800

    xen/pvhvm: Initialize xen panic handler for PVHVM guests
    
    kernel use callback linked in panic_notifier_list to notice others when panic
    happens.
    NORET_TYPE void panic(const char * fmt, ...){
        ...
        atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
    }
    When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to
    send out an event with reason code - SHUTDOWN_crash.
    
    xen_panic_handler_init() is defined to register on panic_notifier_list but
    we only call it in xen_arch_setup which only be called by PV, this patch is
    necessary for PVHVM.
    
    Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config
    file won't lead a vmcore to be generate when the guest panics. It can be
    reproduced with 'echo c > /proc/sysrq-trigger'.
    
    Signed-off-by: Vaughan Cao <vaughan.cao>
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk>
    Acked-by: Joe Jin <joe.jin>

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index b5a22fa..15939e8 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1713,6 +1713,8 @@ static void __init xen_hvm_guest_init(void)
 
        xen_hvm_init_shared_info();
 
+       xen_panic_handler_init();
+
        if (xen_feature(XENFEAT_hvm_callback_vector))
                xen_have_vector_callback = 1;
        xen_hvm_smp_init();

Comment 11 Jarod Wilson 2014-07-18 14:24:09 UTC

Patch(es) available on kernel-3.10.0-137.el7

Comment 14 Lingfei Kong 2015-01-13 07:01:32 UTC

Verify with kernel-3.10.0-221.el7.

Steps to verify:
1. Enable core-dumps in /etc/xen/xend-config.sxp
# grep enable-dump /etc/xen/xend-config.sxp
(enable-dump yes)
 
2. Create rhel7 hvm guest with on_crash = "restart"
# grep on_crash hvm-7.1-64-1.cfg 
on_crash = "restart"

# xm create hvm-7.1-64-1.cfg

# xm list 
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0    13954    16 r-----   3891.7
hvm-7.1-64-1                              52     1032     4 r-----     21.4

3. Trigger guest crash
# echo c > /proc/sysrq-trigger 
[  149.299511] SysRq : Trigger a crash
[  149.300030] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  149.300030] IP: [<ffffffff81398026>] sysrq_handle_crash+0x16/0x20
[  149.300030] PGD 3b9d4067 PUD 3aa11067 PMD 0 
[  149.300030] Oops: 0002 [#1] SMP 
[  149.300030] Modules linked in: ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr serio_raw i2c_piix4 i2c_core xfs libcrc32c sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi ata_piix libata xen_blkfront xen_netfront floppy dm_mirror dm_region_hash dm_log dm_mod
[  149.454922] CPU: 1 PID: 2463 Comm: bash Not tainted 3.10.0-221.el7.x86_64 #1
[  149.454922] Hardware name: Red Hat HVM domU, BIOS 3.1.2-402.el5 05/07/2013
[  149.454922] task: ffff88003ce3e660 ti: ffff88003b234000 task.ti: ffff88003b234000
[  149.454922] RIP: 0010:[<ffffffff81398026>]  [<ffffffff81398026>] sysrq_handle_crash+0x16/0x20
[  149.454922] RSP: 0018:ffff88003b237e80  EFLAGS: 00010046
[  149.454922] RAX: 000000000000000f RBX: ffffffff819c5660 RCX: 0000000000000000
[  149.454922] RDX: 0000000000000000 RSI: ffff88003fc8d488 RDI: 0000000000000063
[  149.454922] RBP: ffff88003b237e80 R08: 0000000000000092 R09: 00000000000001ff
[  149.454922] R10: 00000000000001fe R11: 0000000000000003 R12: 0000000000000063
[  149.454922] R13: 0000000000000246 R14: 0000000000000004 R15: 0000000000000000
[  149.454922] FS:  00007f87fafe0740(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
[  149.454922] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  149.454922] CR2: 0000000000000000 CR3: 000000003cdb1000 CR4: 00000000000006e0
[  149.454922] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  149.454922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  149.454922] Stack:
[  149.454922]  ffff88003b237eb8 ffffffff813987d2 0000000000000002 00007f87fafe9000
[  149.454922]  ffff88003b237f48 0000000000000002 0000000000000000 ffff88003b237ed0
[  149.454922]  ffffffff81398caf ffff88003683e480 ffff88003b237ef0 ffffffff8122de6d
[  149.454922] Call Trace:
[  149.454922]  [<ffffffff813987d2>] __handle_sysrq+0xa2/0x170
[  149.454922]  [<ffffffff81398caf>] write_sysrq_trigger+0x2f/0x40
[  149.454922]  [<ffffffff8122de6d>] proc_reg_write+0x3d/0x80
[  149.454922]  [<ffffffff811c66dd>] vfs_write+0xbd/0x1e0
[  149.454922]  [<ffffffff811c7128>] SyS_write+0x58/0xb0
[  149.454922]  [<ffffffff816152a9>] system_call_fastpath+0x16/0x1b
[  149.454922] Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 66 66 66 66 90 55 c7 05 b0 0b 5a 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 2e 
[  149.454922] RIP  [<ffffffff81398026>] sysrq_handle_crash+0x16/0x20
[  149.454922]  RSP <ffff88003b237e80>
[  149.454922] CR2: 0000000000000000
[  149.454922] ---[ end trace 6f476705252cca2c ]---
[  149.454922] Kernel panic - not syncing: Fatal exception

4. After some seconds guest restart with new domain ID and core file was generated in /var/lib/xen/dump/
# xm list 
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0    13954    16 r-----   3914.7
hvm-7.1-64-1                              53     1032     1 r-----      3.2

# ls /var/lib/xen/dump/
2015-0113-2257.13-hvm-7.1-64-1.52.core

5. The core file is useable in guest:
crash> bt
PID: 2463   TASK: ffff88003ce3e660  CPU: 1   COMMAND: "bash"
 #0 [ffff88003b237ae8] xen_panic_event at ffffffff81003533
 #1 [ffff88003b237af8] notifier_call_chain at ffffffff81610c6c
 #2 [ffff88003b237b30] atomic_notifier_call_chain at ffffffff81610cca
 #3 [ffff88003b237b40] panic at ffffffff815fece8
 #4 [ffff88003b237bc0] oops_end at ffffffff8160da9b
 #5 [ffff88003b237be8] no_context at ffffffff815fe501
 #6 [ffff88003b237c38] __bad_area_nosemaphore at ffffffff815fe597
 #7 [ffff88003b237c80] bad_area at ffffffff815fe915
 #8 [ffff88003b237ca8] __do_page_fault at ffffffff816109f5
 #9 [ffff88003b237da8] do_page_fault at ffffffff81610aca
#10 [ffff88003b237dd0] page_fault at ffffffff8160cd08
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffff81398026  RSP: ffff88003b237e80  RFLAGS: 00010046
    RAX: 000000000000000f  RBX: ffffffff819c5660  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff88003fc8d488  RDI: 0000000000000063
    RBP: ffff88003b237e80   R8: 0000000000000092   R9: 00000000000001ff
    R10: 00000000000001fe  R11: 0000000000000003  R12: 0000000000000063
    R13: 0000000000000246  R14: 0000000000000004  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#11 [ffff88003b237e88] __handle_sysrq at ffffffff813987d2
#12 [ffff88003b237ec0] write_sysrq_trigger at ffffffff81398caf
#13 [ffff88003b237ed8] proc_reg_write at ffffffff8122de6d
#14 [ffff88003b237ef8] vfs_write at ffffffff811c66dd
#15 [ffff88003b237f38] sys_write at ffffffff811c7128
#16 [ffff88003b237f80] system_call_fastpath at ffffffff816152a9
    RIP: 00007f87fa6c29e0  RSP: 00007fff6efc3208  RFLAGS: 00010202
    RAX: 0000000000000001  RBX: ffffffff816152a9  RCX: 0000000000000063
    RDX: 0000000000000002  RSI: 00007f87fafe9000  RDI: 0000000000000001



So bug is fxied.

Comment 16 errata-xmlrpc 2015-03-05 11:28:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0290.html

Note You need to log in before you can comment on or make changes to this bug.