Bug 845471

Summary: xen: kdump fails for HVM guests (even without pv-ness) when guests have 4G or more
Product: Red Hat Enterprise Linux 7 Reporter: Andrew Jones <drjones>
Component: kernelAssignee: Vitaly Kuznetsov <vkuznets>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: bsarathy, hhuang, ketuzsezr, leiwang, lkong, ruyang, wshi, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: xen
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 974114 (view as bug list) Environment:
Last Closed: 2014-05-02 15:21:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 741684, 974114    

Description Andrew Jones 2012-08-03 07:47:20 UTC
Currently kdump doesn't work upstream with pvdrivers. However, we can still use kdump on HVM guests as long as the kdump kernel only uses emulated devices. There are also other issues that can pop up if the guest is PV-aware at all though and tries to [re]init things with the HV. In rhel6 we added

        if (is_kdump_kernel())
                return;

to xen_hvm_guest_init() to avoid those issues. If we don't have a proper solution for rhel7 (i.e. kdump with pvdrivers), then we'll probably want to patch xen_hvm_platform() to return false when we're a kdump kernel.

Comment 4 Lingfei Kong 2013-09-09 01:53:48 UTC
Kdump can works well with HVM RHEL7.0 guest.

guest kernel: 3.10.0-9.el7.x86_64 (RHEL7.0)
host kernel: 2.6.18-370.el5xen (RHEL5.10)
xe	n: xen-3.0.3-144.el5

Steps:
Without PV driver
1. Install the kexec-tools package,if it's not already installed.
2. Add the kernel command line parameter xen_emul_unplug=never    crashkernel=300M to the kernel's command line and boot
3. Check whether the paravirt driver modules have been loaded.
 [guest]# lsmod|grep xen
4. Start the kdump server in DomU
[guest]# service kdump start
Redirecting to /bin/systemctl start  kdump.service

5. Check the kdump status
[guest]# service kdump status
Redirecting to /bin/systemctl status  kdump.service
kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; disabled)
   Active: active (exited) since Fri 2013-09-06 17:13:33 HKT; 6s ago
  Process: 769 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
 Main PID: 769 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/kdump.service

Sep 06 17:13:33  kdumpctl[769]: kexec: loaded kdump ...
Sep 06 17:13:33 kdumpctl[769]: Starting kdump: [OK]
Sep 06 17:13:33 systemd[1]: Started Crash recovery ...

6. [guest]# echo c > /proc/sysrq-trigger 
7. Check the vmcore file 
[guest]# ls /var/crash/
127.0.0.1-2013.09.06-17:15:48

With PV driver
1. Change xen_emul_unplug=unnecessary in the kernel command line and reboot
2. Check whether the paravirt driver modules have been loaded.
[guest]# lsmod|grep xen
xen_netfront           26503  0 
xen_blkfront           22528  2 
3. Start the kdump server in DomU
[guest]# service kdump start
Redirecting to /bin/systemctl start  kdump.service

4. Check the kdump status
[guest]# service kdump status
Redirecting to /bin/systemctl status  kdump.service
kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; disabled)
   Active: active (exited) since Fri 2013-09-06 17:20:12 HKT; 29s ago
  Process: 880 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
 Main PID: 880 (code=exited, status=0/SUCCESS)

Sep 06 17:20:12  systemd[1]: Starting Crash recovery...
Sep 06 17:20:12 kdumpctl[880]: kexec: loaded kdump ...
Sep 06 17:20:12 kdumpctl[880]: Starting kdump: [OK]
Sep 06 17:20:12 systemd[1]: Started Crash recovery ...

5. [guest]# echo c > /proc/sysrq-trigger 
6. Check the vmcore file 
[guest]# ls /var/crash/
127.0.0.1-2013.09.06-17:15:48  127.0.0.1-2013.09.06-17:21:53

Comment 5 Andrew Jones 2013-09-11 10:27:13 UTC
(In reply to Lingfei Kong from comment #4)
> Kdump can works well with HVM RHEL7.0 guest.
> 

I'm glad to see it works. How much memory did the guest have? I know of a problem (that may have disappeared) with guests configured with 4G or more memory. So we should make sure we test with that.

Also, we should test over a xen4 host such as can be installed with a recent Fedora.

thanks,
drew

Comment 7 Lingfei Kong 2014-01-07 09:09:51 UTC
Hi drew,
I did a test with kdump installed on RHEL-7.0-20131222.0. But I found this version of kdump can work well with guest memory=1024M, but when I increased the memory to 7G, It still cost a long time to dump the memory. It cost 3 hours, but only dump 43%. 

Also I did the test on fedora 18 (xen-4.2.3-12), the guest hang after I triggered a crash(Just like bug: 1007328). Here is the output:
[guest]# echo c > /proc/sysrq-trigger 
[  121.900181] SysRq : Trigger a crash
[  121.901059] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  121.901059] IP: [<ffffffff8135ba66>] sysrq_handle_crash+0x16/0x20
[  121.901059] PGD 3b198067 PUD 37364067 PMD 0 
[  121.901059] Oops: 0002 [#1] SMP 
[  121.901059] Modules linked in: kvm sg crc32c_intel xen_netfront ppdev i2c_piix4 parport_pc i2c_core parport serio_raw pcspkr mperf microcode nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi ata_piix xen_blkfront libata floppy dm_mirror dm_region_hash dm_log dm_mod
[  121.901059] CPU: 3 PID: 2176 Comm: bash Not tainted 3.10.0-64.el7.x86_64 #1
[  121.901059] Hardware name: Xen HVM domU, BIOS 4.2.3 12/11/2013
[  121.901059] task: ffff8800368fb610 ti: ffff88003c3e0000 task.ti: ffff88003c3e0000
[  121.901059] RIP: 0010:[<ffffffff8135ba66>]  [<ffffffff8135ba66>] sysrq_handle_crash+0x16/0x20
[  121.901059] RSP: 0018:ffff88003c3e1e88  EFLAGS: 00010082
[  121.901059] RAX: 000000000000000f RBX: ffffffff8195b980 RCX: ffff88003fc70000
[  121.901059] RDX: 0000000000000000 RSI: ffff88003fc6e3e8 RDI: 0000000000000063
[  121.901059] RBP: ffff88003c3e1e88 R08: 0000000000000096 R09: 000000000000023c
[  121.901059] R10: 000000000000023b R11: 0000000000000003 R12: 0000000000000063
[  121.901059] R13: 0000000000000246 R14: 0000000000000007 R15: 0000000000000000
[  121.901059] FS:  00007f7a3ee92740(0000) GS:ffff88003fc60000(0000) knlGS:0000000000000000
[  121.901059] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  121.901059] CR2: 0000000000000000 CR3: 000000003b1e6000 CR4: 00000000000007e0
[  121.901059] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  121.901059] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  121.901059] Stack:
[  121.901059]  ffff88003c3e1ec0 ffffffff8135c1c2 0000000000000002 00007f7a3ee9e000
[  121.901059]  ffff88003c3e1f50 0000000000000002 0000000000000000 ffff88003c3e1ed8
[  121.901059]  ffffffff8135c69f ffff88003b29f900 ffff88003c3e1ef8 ffffffff812031dd
[  121.901059] Call Trace:
[  121.901059]  [<ffffffff8135c1c2>] __handle_sysrq+0xa2/0x170
[  121.901059]  [<ffffffff8135c69f>] write_sysrq_trigger+0x2f/0x40
[  121.901059]  [<ffffffff812031dd>] proc_reg_write+0x3d/0x80
[  121.901059]  [<ffffffff8119fc8d>] vfs_write+0xbd/0x1e0
[  121.901059]  [<ffffffff811a0659>] SyS_write+0x49/0xa0
[  121.901059]  [<ffffffff815cc959>] system_call_fastpath+0x16/0x1b
[  121.901059] Code: 65 34 75 e5 4c 89 ef e8 f9 f7 ff ff eb db 0f 1f 80 00 00 00 00 66 66 66 66 90 55 c7 05 f0 55 57 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 4e 
[  121.901059] RIP  [<ffffffff8135ba66>] sysrq_handle_crash+0x16/0x20
[  121.901059]  RSP <ffff88003c3e1e88>
[  121.901059] CR2: 0000000000000000


-------------------------------------------------------------------------------------------------
The following is the steps when I did the test with  guest RHEL-7.0-20131222.0 on rhel5.10 host. 
kdump works well with guest memory=1024M.

component version:
guest kernel: kernel-3.10.0-64.el7(RHEL7.0)
host kernel: kernel-xen-2.6.18-376.el5 (RHEL5.10)
xen: xen-3.0.3-144.el5

Steps(Memory=1024M):
With PV driver
1. Install the kexec-tools package,if it's not already installed.
2. Change xen_emul_unplug=unnecessary in the kernel command line and reboot
3. Check whether the paravirt driver modules have been loaded.
[guest]# lsmod|grep xen
xen_netfront           26679  0 
xen_blkfront           26864  2 
4. Start the kdump server in DomU
[guest]# service kdump start
Redirecting to /bin/systemctl start  kdump.service

5. Check the kdump status
[guest]# service kdump status
Redirecting to /bin/systemctl status  kdump.service
kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled)
   Active: active (exited) since Tue 2014-01-07 20:53:58 CST
 Main PID: 1075 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/kdump.service

Jan 07 20:53:58 rhel7libguestfs kdumpctl[1075]: kexec: loaded kdump kernel
Jan 07 20:53:58 rhel7libguestfs kdumpctl[1075]: Starting kdump: [OK]
Jan 07 20:53:58 rhel7libguestfs systemd[1]: Started Crash recovery kernel arming.
Jan 07 20:55:50 rhel7libguestfs systemd[1]: Started Crash recovery kernel arming.

5. [guest]# echo c > /proc/sysrq-trigger 
6. When the guest was rebooted, use `xm dump-core domname` to get the vmcore file.
[host] xm dump-core 8
Dumping core of domain: 8 ...
[host]# ls /var/lib/xen/dump/
2014-0107-2118.34-hvm-7.0-64-1.8.core

7. Check the vmcore file generated by `echo c > /proc/sysrq-trigger`
[guest]# ls /var/crash/
127.0.0.1-2014.01.07-20:58:05

8. Check the two vmcore file use crash tool
[guest]# crash /usr/lib/debug/lib/modules/3.10.0-64.el7.x86_64/vmlinux /var/crash/127.0.0.1-2014.01.07-20\:58\:05/vmcore
....
 KERNEL: /usr/lib/debug/lib/modules/3.10.0-64.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2014.01.07-20:58:05/vmcore  [PARTIAL DUMP]
        CPUS: 4
        DATE: Tue Jan  7 12:57:59 2014
      UPTIME: 00:04:14
LOAD AVERAGE: 0.01, 0.07, 0.05
       TASKS: 154
    NODENAME: rhel7libguestfs
     RELEASE: 3.10.0-64.el7.x86_64
     VERSION: #1 SMP Tue Dec 17 16:46:38 EST 2013
     MACHINE: x86_64  (2128 Mhz)
      MEMORY: 1 GB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)
         PID: 2256
     COMMAND: "bash"
        TASK: ffff88003b385680  [THREAD_INFO: ffff88003d1bc000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)
crash> 
crash> bt
PID: 2256   TASK: ffff88003b385680  CPU: 0   COMMAND: "bash"
 #0 [ffff88003d1bdab8] machine_kexec at ffffffff8103ef82
 #1 [ffff88003d1bdb08] crash_kexec at ffffffff810c6c73
 #2 [ffff88003d1bdbd0] oops_end at ffffffff815c5268
 #3 [ffff88003d1bdbf8] no_context at ffffffff815b62de
 #4 [ffff88003d1bdc40] __bad_area_nosemaphore at ffffffff815b635e
 #5 [ffff88003d1bdc88] bad_area at ffffffff815b66d9
 #6 [ffff88003d1bdcb0] __do_page_fault at ffffffff815c809c
 #7 [ffff88003d1bdda8] do_page_fault at ffffffff815c816a
 #8 [ffff88003d1bddd0] page_fault at ffffffff815c4508
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffff8135ba66  RSP: ffff88003d1bde88  RFLAGS: 00010082
    RAX: 000000000000000f  RBX: ffffffff8195b980  RCX: ffff88003fc10000
    RDX: 0000000000000000  RSI: ffff88003fc0e3e8  RDI: 0000000000000063
    RBP: ffff88003d1bde88   R8: 0000000000000096   R9: 00000000000001fe
    R10: 00000000000001fd  R11: 0000000000000003  R12: 0000000000000063
    R13: 0000000000000246  R14: 0000000000000007  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88003d1bde90] __handle_sysrq at ffffffff8135c1c2
#10 [ffff88003d1bdec8] write_sysrq_trigger at ffffffff8135c69f
#11 [ffff88003d1bdee0] proc_reg_write at ffffffff812031dd
#12 [ffff88003d1bdf00] vfs_write at ffffffff8119fc8d
#13 [ffff88003d1bdf40] sys_write at ffffffff811a0659
#14 [ffff88003d1bdf80] system_call_fastpath at ffffffff815cc959
    RIP: 00007fb06a6a9b00  RSP: 00007ffff2303fa8  RFLAGS: 00010202
    RAX: 0000000000000001  RBX: ffffffff815cc959  RCX: 0000000000000063
    RDX: 0000000000000002  RSI: 00007fb06afc7000  RDI: 0000000000000001
    RBP: 00007fb06afc7000   R8: 000000000000000a   R9: 00007fb06afc2740
    R10: 0000000000000001  R11: 0000000000000246  R12: 0000000000000001
    R13: 0000000000000002  R14: 00007fb06a97d400  R15: 0000000000000002
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

[guest] # crash /usr/lib/debug/lib/modules/3.10.0-64.el7.x86_64/vmlinux 2014-0107-2118.34-hvm-7.0-64-1.8.core
...

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-64.el7.x86_64/vmlinux
    DUMPFILE: 2014-0107-2118.34-hvm-7.0-64-1.8.core
        CPUS: 4
        DATE: Tue Jan  7 21:18:32 2014
      UPTIME: 00:01:47
LOAD AVERAGE: 0.10, 0.07, 0.03
       TASKS: 151
    NODENAME: rhel7libguestfs
     RELEASE: 3.10.0-64.el7.x86_64
     VERSION: #1 SMP Tue Dec 17 16:46:38 EST 2013
     MACHINE: x86_64  (2127 Mhz)
      MEMORY: 1 GB
       PANIC: ""
         PID: 0
     COMMAND: "swapper/0"
        TASK: ffffffff818af440  (1 of 4)  [THREAD_INFO: ffffffff8189c000]
         CPU: 0
       STATE: TASK_RUNNING (ACTIVE)
     WARNING: panic task not found

crash> bt
PID: 0      TASK: ffffffff818af440  CPU: 0   COMMAND: "swapper/0"
 #0 [ffffffff8189de78] __schedule at ffffffff815c1b6d
 #1 [ffffffff8189dec0] default_idle at ffffffff8101acef
 #2 [ffffffff8189dee0] arch_cpu_idle at ffffffff8101b5b6
 #3 [ffffffff8189def0] cpu_startup_entry at ffffffff810acc2e
 #4 [ffffffff8189df40] rest_init at ffffffff8159fd37
 #5 [ffffffff8189df50] start_kernel at ffffffff819e1f3d
 #6 [ffffffff8189df90] x86_64_start_reservations at ffffffff819e15de
 #7 [ffffffff8189dfa0] x86_64_start_kernel at ffffffff819e171e


Without PV driver
1. Add the kernel command line parameter xen_emul_unplug=never    crashkernel=300M to the kernel's command line and boot
2. Check whether the paravirt driver modules have been loaded.
 [guest]# lsmod|grep xen
3. Restart the kdump server in DomU
[guest]# service kdump restart
Redirecting to /bin/systemctl start  kdump.service
4. Check the kdump status
[guest]# service kdump status
Redirecting to /bin/systemctl status  kdump.service
kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled)
   Active: active (exited) since Tue 2014-01-07 21:15:25 CST; 4s ago
  Process: 1997 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
  Process: 2002 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
 Main PID: 2002 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/kdump.service

Jan 07 21:15:25 rhel7libguestfs kdumpctl[2002]: kexec: loaded kdump kernel
Jan 07 21:15:25 rhel7libguestfs kdumpctl[2002]: Starting kdump: [OK]
Jan 07 21:15:25 rhel7libguestfs systemd[1]: Started Crash recovery kernel ar....
Hint: Some lines were ellipsized, use -l to show in full.


5. [guest]# echo c > /proc/sysrq-trigger 
6. When the guest was rebooted, use `xm dump-core domname` to get the vmcore file.
[host] xm dump-core 9
Dumping core of domain: 8 ...
[host]# ls /var/lib/xen/dump/
2014-0107-2118.34-hvm-7.0-64-1.8.core  2014-0107-2121.57-hvm-7.0-64-1.9.core
7. Check the vmcore file generated by ` echo c > /proc/sysrq-trigger`
[guest]# ls /var/crash/
# ls /var/crash/
127.0.0.1-2014.01.07-20:58:05  127.0.0.1-2014.01.07-21:16:26
8. Check the two vmcore file 
[guest] # crash /usr/lib/debug/lib/modules/3.10.0-64.el7.x86_64/vmlinux /var/crash/127.0.0.1-2014.01.07-21\:16\:26/vmcore
...
    KERNEL: /usr/lib/debug/lib/modules/3.10.0-64.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2014.01.07-21:16:26/vmcore  [PARTIAL DUMP]
        CPUS: 4
        DATE: Tue Jan  7 21:16:18 2014
      UPTIME: 00:01:34
LOAD AVERAGE: 0.24, 0.13, 0.05
       TASKS: 151
    NODENAME: rhel7libguestfs
     RELEASE: 3.10.0-64.el7.x86_64
     VERSION: #1 SMP Tue Dec 17 16:46:38 EST 2013
     MACHINE: x86_64  (2126 Mhz)
      MEMORY: 1 GB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)
         PID: 1948
     COMMAND: "bash"
        TASK: ffff88003d25cbb0  [THREAD_INFO: ffff88003be76000]
         CPU: 3
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 1948   TASK: ffff88003d25cbb0  CPU: 3   COMMAND: "bash"
 #0 [ffff88003be77ab8] machine_kexec at ffffffff8103ef82
 #1 [ffff88003be77b08] crash_kexec at ffffffff810c6c73
 #2 [ffff88003be77bd0] oops_end at ffffffff815c5268
 #3 [ffff88003be77bf8] no_context at ffffffff815b62de
 #4 [ffff88003be77c40] __bad_area_nosemaphore at ffffffff815b635e
 #5 [ffff88003be77c88] bad_area at ffffffff815b66d9
 #6 [ffff88003be77cb0] __do_page_fault at ffffffff815c809c
 #7 [ffff88003be77da8] do_page_fault at ffffffff815c816a
 #8 [ffff88003be77dd0] page_fault at ffffffff815c4508
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffff8135ba66  RSP: ffff88003be77e88  RFLAGS: 00010082
    RAX: 000000000000000f  RBX: ffffffff8195b980  RCX: ffff88003fd90000
    RDX: 0000000000000000  RSI: ffff88003fd8e3e8  RDI: 0000000000000063
    RBP: ffff88003be77e88   R8: 0000000000000096   R9: 00000000000001f7
    R10: 00000000000001f6  R11: 0000000000000003  R12: 0000000000000063
    R13: 0000000000000246  R14: 0000000000000007  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88003be77e90] __handle_sysrq at ffffffff8135c1c2
#10 [ffff88003be77ec8] write_sysrq_trigger at ffffffff8135c69f
#11 [ffff88003be77ee0] proc_reg_write at ffffffff812031dd
#12 [ffff88003be77f00] vfs_write at ffffffff8119fc8d
#13 [ffff88003be77f40] sys_write at ffffffff811a0659
#14 [ffff88003be77f80] system_call_fastpath at ffffffff815cc959
    RIP: 00007fe2de38eb00  RSP: 00007ffffc439bf8  RFLAGS: 00010202
    RAX: 0000000000000001  RBX: ffffffff815cc959  RCX: 0000000000000063
    RDX: 0000000000000002  RSI: 00007fe2decb3000  RDI: 0000000000000001
    RBP: 00007fe2decb3000   R8: 000000000000000a   R9: 00007fe2deca7740
    R10: 0000000000000001  R11: 0000000000000246  R12: 0000000000000001
    R13: 0000000000000002  R14: 00007fe2de662400  R15: 0000000000000002
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> 

[guest]# crash /usr/lib/debug/lib/modules/3.10.0-64.el7.x86_64/vmlinux 2014-0107-2121.57-hvm-7.0-64-1.9.core
...

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-64.el7.x86_64/vmlinux
    DUMPFILE: 2014-0107-2121.57-hvm-7.0-64-1.9.core
        CPUS: 4
        DATE: Tue Jan  7 21:21:55 2014
      UPTIME: 00:01:37
LOAD AVERAGE: 0.31, 0.21, 0.08
       TASKS: 151
    NODENAME: rhel7libguestfs
     RELEASE: 3.10.0-64.el7.x86_64
     VERSION: #1 SMP Tue Dec 17 16:46:38 EST 2013
     MACHINE: x86_64  (2128 Mhz)
      MEMORY: 1 GB
       PANIC: ""
         PID: 0
     COMMAND: "swapper/0"
        TASK: ffffffff818af440  (1 of 4)  [THREAD_INFO: ffffffff8189c000]
         CPU: 0
       STATE: TASK_RUNNING (ACTIVE)
     WARNING: panic task not found

crash> bt
PID: 0      TASK: ffffffff818af440  CPU: 0   COMMAND: "swapper/0"
 #0 [ffffffff8189de78] __schedule at ffffffff815c1b6d
 #1 [ffffffff8189de88] native_safe_halt at ffffffff81044136
 #2 [ffffffff8189dec0] default_idle at ffffffff8101acef
 #3 [ffffffff8189dee0] arch_cpu_idle at ffffffff8101b5b6
 #4 [ffffffff8189def0] cpu_startup_entry at ffffffff810acc2e
 #5 [ffffffff8189df40] rest_init at ffffffff8159fd37
 #6 [ffffffff8189df50] start_kernel at ffffffff819e1f3d
 #7 [ffffffff8189df90] x86_64_start_reservations at ffffffff819e15de
 #8 [ffffffff8189dfa0] x86_64_start_kernel at ffffffff819e171e
crash> 
All vmcore files are accessable by crash tool.
-----------------------------------

Comment 8 Andrew Jones 2014-01-07 09:28:52 UTC
(In reply to Lingfei Kong from comment #7)
> Hi drew,
> I did a test with kdump installed on RHEL-7.0-20131222.0. But I found this
> version of kdump can work well with guest memory=1024M, but when I increased
> the memory to 7G, It still cost a long time to dump the memory. It cost 3
> hours, but only dump 43%. 

Hi Lingfei,

Thanks for the testing. At one point we found this problem to start at the 4G boundary, i.e. configs w/ less than 4G memory would work fine, but with 4G or more they would exhibit this bug. I was hoping that problem would go away with other changes to the kernel/kexec, as I don't see how it's related to xen. Unfortunately, it looks like that problem is still here. I'll convert this bug to track it. We can try to get more information from the xen side first, but likely we'll need to send this bug to kexec folk eventually.

It also looks like we've missed the boat for 7.0 at this point, so I'm bumping to 7.1. As there are no partner/customer requests for this feature at this time, then that should be OK.

Comment 11 Andrew Jones 2014-05-02 15:11:08 UTC
Need to try this again over Fedora 20 xen and see what happens for the < 4G, 4G, and > 4G cases. It'd be nice if we could tell customers that they can enable kexec in EC2. If things are still broken for the >= 4G cases, then we'll have to dig into this.

Comment 13 Konrad Rzeszutek Wilk 2014-05-02 19:30:25 UTC
It probably is this one:

it 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f
Author: Olaf Hering <olaf>
Date:   Thu Nov 1 22:02:30 2012 +0100

    xen PVonHVM: use E820_Reserved area for shared_info

which was reverted because it broke under Xen 4.1 during migration:
commit e9daff24a266307943457086533041bd971d0ef9
Author: Konrad Rzeszutek Wilk <konrad.wilk>
Date:   Thu Feb 14 21:29:31 2013 -0500

    Revert "xen PVonHVM: use E820_Reserved area for shared_info"
    
    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
    
    We are doing this b/c on 32-bit PVonHVM with older hypervisors
    (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
    we use it for the time keeping, and the timekeeping code loops
    forever - as the version field never changes. Olaf says to
    revert it, so lets do that.


Olaf never got down to figure out why. Any volunteers?