Bug 839966
Summary: | Trigger RHEL7 crash in guest domU, host don't generate core file | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Wei Shi <wshi> |
Component: | kernel | Assignee: | Vitaly Kuznetsov <vkuznets> |
kernel sub component: | Xen | QA Contact: | Virtualization Bugs <virt-bugs> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | drjones, leiwang, lersek, lkong, qwan, shwang |
Version: | 7.0 | Keywords: | EC2 |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | xen | ||
Fixed In Version: | kernel-3.10.0-137.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-03-05 11:28:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 741684 |
Description
Wei Shi
2012-07-13 10:15:44 UTC
I'm sorry but precisely as reported this seems NOTABUG. (In reply to comment #0) > xen-hvm-guest-el7.cfg > on_crash = "restart" > 4. no dump core file generated(host) > [root@dhcp-8-204 ~]# xm li > Name ID Mem(MiB) VCPUs State Time(s) > Domain-0 0 4858 8 r----- 2657.5 > hvm-guest-el7 57 1032 1 r----- 144.2 > [root@dhcp-8-204 ~]# ls /var/lib/xen/dump/ > [root@dhcp-8-204 ~]# > > Actual results: > no dump file generated, and guest domain is still in running status The domid is quite high (57) which does not exclude at all that the domain was simply restarted (= new domain booted with the same guest config). > Expected results: > host generate a dump core file, guest action is match on_crash's value in > cfg file These two are contradictory in this exact case (see on_crash="restart" above); the second requirement is fulfilled (xend action matches on_crash setting). Hmmm, I may be wrong. enable-dump seems orthogonal. (In reply to comment #2) > I'm sorry but precisely as reported this seems NOTABUG. > > (In reply to comment #0) > > > xen-hvm-guest-el7.cfg > > on_crash = "restart" > > > 4. no dump core file generated(host) > > [root@dhcp-8-204 ~]# xm li > > Name ID Mem(MiB) VCPUs State Time(s) > > Domain-0 0 4858 8 r----- 2657.5 > > hvm-guest-el7 57 1032 1 r----- 144.2 > > [root@dhcp-8-204 ~]# ls /var/lib/xen/dump/ > > [root@dhcp-8-204 ~]# > > > > Actual results: > > no dump file generated, and guest domain is still in running status > > The domid is quite high (57) which does not exclude at all that the domain > was simply restarted (= new domain booted with the same guest config). > > > Expected results: > > host generate a dump core file, guest action is match on_crash's value in > > cfg file > > These two are contradictory in this exact case (see on_crash="restart" > above); the second requirement is fulfilled (xend action matches on_crash > setting). Sorry, i forgot to mention that no reboot is happenning, the domid 57 is just the original crash domU, no new domU is launched. That's why i said it seems dom0 never catch the crash signal with domU. Assigning to Vitaly. I recommend trying this over Fedora 20 xen. It it doesn't reproduce, then we can close as wont-fix. If it does reproduce, then, if it looks like a host problem, we should open a bug to Fedora, if it's a guest problem we should fix it. I can reproduce it on Fedora 20 xen(xen-4.3.2-2.fc20). rhel6.5 and rhel5.11 guest can generate a core file when trigger a crash in the guest, but rhel7.0 guest didn't generate the core file. So it probably a guest problem. This issue is present in upstream 3.11.10 but was fixed in 3.12. Here is the commit: commit 669b0ae961e87c824233475e987b2d39996d4849 Author: Vaughan Cao <vaughan.cao> Date: Fri Aug 16 16:10:56 2013 +0800 xen/pvhvm: Initialize xen panic handler for PVHVM guests kernel use callback linked in panic_notifier_list to notice others when panic happens. NORET_TYPE void panic(const char * fmt, ...){ ... atomic_notifier_call_chain(&panic_notifier_list, 0, buf); } When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to send out an event with reason code - SHUTDOWN_crash. xen_panic_handler_init() is defined to register on panic_notifier_list but we only call it in xen_arch_setup which only be called by PV, this patch is necessary for PVHVM. Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config file won't lead a vmcore to be generate when the guest panics. It can be reproduced with 'echo c > /proc/sysrq-trigger'. Signed-off-by: Vaughan Cao <vaughan.cao> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk> Acked-by: Joe Jin <joe.jin> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index b5a22fa..15939e8 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1713,6 +1713,8 @@ static void __init xen_hvm_guest_init(void) xen_hvm_init_shared_info(); + xen_panic_handler_init(); + if (xen_feature(XENFEAT_hvm_callback_vector)) xen_have_vector_callback = 1; xen_hvm_smp_init(); Patch(es) available on kernel-3.10.0-137.el7 Verify with kernel-3.10.0-221.el7. Steps to verify: 1. Enable core-dumps in /etc/xen/xend-config.sxp # grep enable-dump /etc/xen/xend-config.sxp (enable-dump yes) 2. Create rhel7 hvm guest with on_crash = "restart" # grep on_crash hvm-7.1-64-1.cfg on_crash = "restart" # xm create hvm-7.1-64-1.cfg # xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 13954 16 r----- 3891.7 hvm-7.1-64-1 52 1032 4 r----- 21.4 3. Trigger guest crash # echo c > /proc/sysrq-trigger [ 149.299511] SysRq : Trigger a crash [ 149.300030] BUG: unable to handle kernel NULL pointer dereference at (null) [ 149.300030] IP: [<ffffffff81398026>] sysrq_handle_crash+0x16/0x20 [ 149.300030] PGD 3b9d4067 PUD 3aa11067 PMD 0 [ 149.300030] Oops: 0002 [#1] SMP [ 149.300030] Modules linked in: ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr serio_raw i2c_piix4 i2c_core xfs libcrc32c sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi ata_piix libata xen_blkfront xen_netfront floppy dm_mirror dm_region_hash dm_log dm_mod [ 149.454922] CPU: 1 PID: 2463 Comm: bash Not tainted 3.10.0-221.el7.x86_64 #1 [ 149.454922] Hardware name: Red Hat HVM domU, BIOS 3.1.2-402.el5 05/07/2013 [ 149.454922] task: ffff88003ce3e660 ti: ffff88003b234000 task.ti: ffff88003b234000 [ 149.454922] RIP: 0010:[<ffffffff81398026>] [<ffffffff81398026>] sysrq_handle_crash+0x16/0x20 [ 149.454922] RSP: 0018:ffff88003b237e80 EFLAGS: 00010046 [ 149.454922] RAX: 000000000000000f RBX: ffffffff819c5660 RCX: 0000000000000000 [ 149.454922] RDX: 0000000000000000 RSI: ffff88003fc8d488 RDI: 0000000000000063 [ 149.454922] RBP: ffff88003b237e80 R08: 0000000000000092 R09: 00000000000001ff [ 149.454922] R10: 00000000000001fe R11: 0000000000000003 R12: 0000000000000063 [ 149.454922] R13: 0000000000000246 R14: 0000000000000004 R15: 0000000000000000 [ 149.454922] FS: 00007f87fafe0740(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000 [ 149.454922] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 149.454922] CR2: 0000000000000000 CR3: 000000003cdb1000 CR4: 00000000000006e0 [ 149.454922] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 149.454922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 149.454922] Stack: [ 149.454922] ffff88003b237eb8 ffffffff813987d2 0000000000000002 00007f87fafe9000 [ 149.454922] ffff88003b237f48 0000000000000002 0000000000000000 ffff88003b237ed0 [ 149.454922] ffffffff81398caf ffff88003683e480 ffff88003b237ef0 ffffffff8122de6d [ 149.454922] Call Trace: [ 149.454922] [<ffffffff813987d2>] __handle_sysrq+0xa2/0x170 [ 149.454922] [<ffffffff81398caf>] write_sysrq_trigger+0x2f/0x40 [ 149.454922] [<ffffffff8122de6d>] proc_reg_write+0x3d/0x80 [ 149.454922] [<ffffffff811c66dd>] vfs_write+0xbd/0x1e0 [ 149.454922] [<ffffffff811c7128>] SyS_write+0x58/0xb0 [ 149.454922] [<ffffffff816152a9>] system_call_fastpath+0x16/0x1b [ 149.454922] Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 66 66 66 66 90 55 c7 05 b0 0b 5a 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 2e [ 149.454922] RIP [<ffffffff81398026>] sysrq_handle_crash+0x16/0x20 [ 149.454922] RSP <ffff88003b237e80> [ 149.454922] CR2: 0000000000000000 [ 149.454922] ---[ end trace 6f476705252cca2c ]--- [ 149.454922] Kernel panic - not syncing: Fatal exception 4. After some seconds guest restart with new domain ID and core file was generated in /var/lib/xen/dump/ # xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 13954 16 r----- 3914.7 hvm-7.1-64-1 53 1032 1 r----- 3.2 # ls /var/lib/xen/dump/ 2015-0113-2257.13-hvm-7.1-64-1.52.core 5. The core file is useable in guest: crash> bt PID: 2463 TASK: ffff88003ce3e660 CPU: 1 COMMAND: "bash" #0 [ffff88003b237ae8] xen_panic_event at ffffffff81003533 #1 [ffff88003b237af8] notifier_call_chain at ffffffff81610c6c #2 [ffff88003b237b30] atomic_notifier_call_chain at ffffffff81610cca #3 [ffff88003b237b40] panic at ffffffff815fece8 #4 [ffff88003b237bc0] oops_end at ffffffff8160da9b #5 [ffff88003b237be8] no_context at ffffffff815fe501 #6 [ffff88003b237c38] __bad_area_nosemaphore at ffffffff815fe597 #7 [ffff88003b237c80] bad_area at ffffffff815fe915 #8 [ffff88003b237ca8] __do_page_fault at ffffffff816109f5 #9 [ffff88003b237da8] do_page_fault at ffffffff81610aca #10 [ffff88003b237dd0] page_fault at ffffffff8160cd08 [exception RIP: sysrq_handle_crash+22] RIP: ffffffff81398026 RSP: ffff88003b237e80 RFLAGS: 00010046 RAX: 000000000000000f RBX: ffffffff819c5660 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88003fc8d488 RDI: 0000000000000063 RBP: ffff88003b237e80 R8: 0000000000000092 R9: 00000000000001ff R10: 00000000000001fe R11: 0000000000000003 R12: 0000000000000063 R13: 0000000000000246 R14: 0000000000000004 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #11 [ffff88003b237e88] __handle_sysrq at ffffffff813987d2 #12 [ffff88003b237ec0] write_sysrq_trigger at ffffffff81398caf #13 [ffff88003b237ed8] proc_reg_write at ffffffff8122de6d #14 [ffff88003b237ef8] vfs_write at ffffffff811c66dd #15 [ffff88003b237f38] sys_write at ffffffff811c7128 #16 [ffff88003b237f80] system_call_fastpath at ffffffff816152a9 RIP: 00007f87fa6c29e0 RSP: 00007fff6efc3208 RFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffffff816152a9 RCX: 0000000000000063 RDX: 0000000000000002 RSI: 00007f87fafe9000 RDI: 0000000000000001 So bug is fxied. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0290.html |