Bug 839966
| Summary: | Trigger RHEL7 crash in guest domU, host don't generate core file | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Wei Shi <wshi> |
| Component: | kernel | Assignee: | Vitaly Kuznetsov <vkuznets> |
| kernel sub component: | Xen | QA Contact: | Virtualization Bugs <virt-bugs> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | drjones, leiwang, lersek, lkong, qwan, shwang |
| Version: | 7.0 | Keywords: | EC2 |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | xen | ||
| Fixed In Version: | kernel-3.10.0-137.el7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-03-05 11:28:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 741684 | ||
|
Description
Wei Shi
2012-07-13 10:15:44 UTC
I'm sorry but precisely as reported this seems NOTABUG. (In reply to comment #0) > xen-hvm-guest-el7.cfg > on_crash = "restart" > 4. no dump core file generated(host) > [root@dhcp-8-204 ~]# xm li > Name ID Mem(MiB) VCPUs State Time(s) > Domain-0 0 4858 8 r----- 2657.5 > hvm-guest-el7 57 1032 1 r----- 144.2 > [root@dhcp-8-204 ~]# ls /var/lib/xen/dump/ > [root@dhcp-8-204 ~]# > > Actual results: > no dump file generated, and guest domain is still in running status The domid is quite high (57) which does not exclude at all that the domain was simply restarted (= new domain booted with the same guest config). > Expected results: > host generate a dump core file, guest action is match on_crash's value in > cfg file These two are contradictory in this exact case (see on_crash="restart" above); the second requirement is fulfilled (xend action matches on_crash setting). Hmmm, I may be wrong. enable-dump seems orthogonal. (In reply to comment #2) > I'm sorry but precisely as reported this seems NOTABUG. > > (In reply to comment #0) > > > xen-hvm-guest-el7.cfg > > on_crash = "restart" > > > 4. no dump core file generated(host) > > [root@dhcp-8-204 ~]# xm li > > Name ID Mem(MiB) VCPUs State Time(s) > > Domain-0 0 4858 8 r----- 2657.5 > > hvm-guest-el7 57 1032 1 r----- 144.2 > > [root@dhcp-8-204 ~]# ls /var/lib/xen/dump/ > > [root@dhcp-8-204 ~]# > > > > Actual results: > > no dump file generated, and guest domain is still in running status > > The domid is quite high (57) which does not exclude at all that the domain > was simply restarted (= new domain booted with the same guest config). > > > Expected results: > > host generate a dump core file, guest action is match on_crash's value in > > cfg file > > These two are contradictory in this exact case (see on_crash="restart" > above); the second requirement is fulfilled (xend action matches on_crash > setting). Sorry, i forgot to mention that no reboot is happenning, the domid 57 is just the original crash domU, no new domU is launched. That's why i said it seems dom0 never catch the crash signal with domU. Assigning to Vitaly. I recommend trying this over Fedora 20 xen. It it doesn't reproduce, then we can close as wont-fix. If it does reproduce, then, if it looks like a host problem, we should open a bug to Fedora, if it's a guest problem we should fix it. I can reproduce it on Fedora 20 xen(xen-4.3.2-2.fc20). rhel6.5 and rhel5.11 guest can generate a core file when trigger a crash in the guest, but rhel7.0 guest didn't generate the core file. So it probably a guest problem. This issue is present in upstream 3.11.10 but was fixed in 3.12. Here is the commit:
commit 669b0ae961e87c824233475e987b2d39996d4849
Author: Vaughan Cao <vaughan.cao>
Date: Fri Aug 16 16:10:56 2013 +0800
xen/pvhvm: Initialize xen panic handler for PVHVM guests
kernel use callback linked in panic_notifier_list to notice others when panic
happens.
NORET_TYPE void panic(const char * fmt, ...){
...
atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
}
When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to
send out an event with reason code - SHUTDOWN_crash.
xen_panic_handler_init() is defined to register on panic_notifier_list but
we only call it in xen_arch_setup which only be called by PV, this patch is
necessary for PVHVM.
Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config
file won't lead a vmcore to be generate when the guest panics. It can be
reproduced with 'echo c > /proc/sysrq-trigger'.
Signed-off-by: Vaughan Cao <vaughan.cao>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk>
Acked-by: Joe Jin <joe.jin>
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index b5a22fa..15939e8 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1713,6 +1713,8 @@ static void __init xen_hvm_guest_init(void)
xen_hvm_init_shared_info();
+ xen_panic_handler_init();
+
if (xen_feature(XENFEAT_hvm_callback_vector))
xen_have_vector_callback = 1;
xen_hvm_smp_init();
Patch(es) available on kernel-3.10.0-137.el7 Verify with kernel-3.10.0-221.el7.
Steps to verify:
1. Enable core-dumps in /etc/xen/xend-config.sxp
# grep enable-dump /etc/xen/xend-config.sxp
(enable-dump yes)
2. Create rhel7 hvm guest with on_crash = "restart"
# grep on_crash hvm-7.1-64-1.cfg
on_crash = "restart"
# xm create hvm-7.1-64-1.cfg
# xm list
Name ID Mem(MiB) VCPUs State Time(s)
Domain-0 0 13954 16 r----- 3891.7
hvm-7.1-64-1 52 1032 4 r----- 21.4
3. Trigger guest crash
# echo c > /proc/sysrq-trigger
[ 149.299511] SysRq : Trigger a crash
[ 149.300030] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 149.300030] IP: [<ffffffff81398026>] sysrq_handle_crash+0x16/0x20
[ 149.300030] PGD 3b9d4067 PUD 3aa11067 PMD 0
[ 149.300030] Oops: 0002 [#1] SMP
[ 149.300030] Modules linked in: ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr serio_raw i2c_piix4 i2c_core xfs libcrc32c sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi ata_piix libata xen_blkfront xen_netfront floppy dm_mirror dm_region_hash dm_log dm_mod
[ 149.454922] CPU: 1 PID: 2463 Comm: bash Not tainted 3.10.0-221.el7.x86_64 #1
[ 149.454922] Hardware name: Red Hat HVM domU, BIOS 3.1.2-402.el5 05/07/2013
[ 149.454922] task: ffff88003ce3e660 ti: ffff88003b234000 task.ti: ffff88003b234000
[ 149.454922] RIP: 0010:[<ffffffff81398026>] [<ffffffff81398026>] sysrq_handle_crash+0x16/0x20
[ 149.454922] RSP: 0018:ffff88003b237e80 EFLAGS: 00010046
[ 149.454922] RAX: 000000000000000f RBX: ffffffff819c5660 RCX: 0000000000000000
[ 149.454922] RDX: 0000000000000000 RSI: ffff88003fc8d488 RDI: 0000000000000063
[ 149.454922] RBP: ffff88003b237e80 R08: 0000000000000092 R09: 00000000000001ff
[ 149.454922] R10: 00000000000001fe R11: 0000000000000003 R12: 0000000000000063
[ 149.454922] R13: 0000000000000246 R14: 0000000000000004 R15: 0000000000000000
[ 149.454922] FS: 00007f87fafe0740(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
[ 149.454922] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 149.454922] CR2: 0000000000000000 CR3: 000000003cdb1000 CR4: 00000000000006e0
[ 149.454922] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 149.454922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 149.454922] Stack:
[ 149.454922] ffff88003b237eb8 ffffffff813987d2 0000000000000002 00007f87fafe9000
[ 149.454922] ffff88003b237f48 0000000000000002 0000000000000000 ffff88003b237ed0
[ 149.454922] ffffffff81398caf ffff88003683e480 ffff88003b237ef0 ffffffff8122de6d
[ 149.454922] Call Trace:
[ 149.454922] [<ffffffff813987d2>] __handle_sysrq+0xa2/0x170
[ 149.454922] [<ffffffff81398caf>] write_sysrq_trigger+0x2f/0x40
[ 149.454922] [<ffffffff8122de6d>] proc_reg_write+0x3d/0x80
[ 149.454922] [<ffffffff811c66dd>] vfs_write+0xbd/0x1e0
[ 149.454922] [<ffffffff811c7128>] SyS_write+0x58/0xb0
[ 149.454922] [<ffffffff816152a9>] system_call_fastpath+0x16/0x1b
[ 149.454922] Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 66 66 66 66 90 55 c7 05 b0 0b 5a 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 2e
[ 149.454922] RIP [<ffffffff81398026>] sysrq_handle_crash+0x16/0x20
[ 149.454922] RSP <ffff88003b237e80>
[ 149.454922] CR2: 0000000000000000
[ 149.454922] ---[ end trace 6f476705252cca2c ]---
[ 149.454922] Kernel panic - not syncing: Fatal exception
4. After some seconds guest restart with new domain ID and core file was generated in /var/lib/xen/dump/
# xm list
Name ID Mem(MiB) VCPUs State Time(s)
Domain-0 0 13954 16 r----- 3914.7
hvm-7.1-64-1 53 1032 1 r----- 3.2
# ls /var/lib/xen/dump/
2015-0113-2257.13-hvm-7.1-64-1.52.core
5. The core file is useable in guest:
crash> bt
PID: 2463 TASK: ffff88003ce3e660 CPU: 1 COMMAND: "bash"
#0 [ffff88003b237ae8] xen_panic_event at ffffffff81003533
#1 [ffff88003b237af8] notifier_call_chain at ffffffff81610c6c
#2 [ffff88003b237b30] atomic_notifier_call_chain at ffffffff81610cca
#3 [ffff88003b237b40] panic at ffffffff815fece8
#4 [ffff88003b237bc0] oops_end at ffffffff8160da9b
#5 [ffff88003b237be8] no_context at ffffffff815fe501
#6 [ffff88003b237c38] __bad_area_nosemaphore at ffffffff815fe597
#7 [ffff88003b237c80] bad_area at ffffffff815fe915
#8 [ffff88003b237ca8] __do_page_fault at ffffffff816109f5
#9 [ffff88003b237da8] do_page_fault at ffffffff81610aca
#10 [ffff88003b237dd0] page_fault at ffffffff8160cd08
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff81398026 RSP: ffff88003b237e80 RFLAGS: 00010046
RAX: 000000000000000f RBX: ffffffff819c5660 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88003fc8d488 RDI: 0000000000000063
RBP: ffff88003b237e80 R8: 0000000000000092 R9: 00000000000001ff
R10: 00000000000001fe R11: 0000000000000003 R12: 0000000000000063
R13: 0000000000000246 R14: 0000000000000004 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#11 [ffff88003b237e88] __handle_sysrq at ffffffff813987d2
#12 [ffff88003b237ec0] write_sysrq_trigger at ffffffff81398caf
#13 [ffff88003b237ed8] proc_reg_write at ffffffff8122de6d
#14 [ffff88003b237ef8] vfs_write at ffffffff811c66dd
#15 [ffff88003b237f38] sys_write at ffffffff811c7128
#16 [ffff88003b237f80] system_call_fastpath at ffffffff816152a9
RIP: 00007f87fa6c29e0 RSP: 00007fff6efc3208 RFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffffff816152a9 RCX: 0000000000000063
RDX: 0000000000000002 RSI: 00007f87fafe9000 RDI: 0000000000000001
So bug is fxied.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0290.html |