Bug 688504

Summary: Kernel panic when switch kernel with command "kexec -e" on AMD host
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: amwang, bcao, michen, phan, qcai, vgoyal
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-18 03:35:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Mike Cao 2011-03-17 09:13:27 UTC
Description of problem:


Version-Release number of selected component (if applicable):
kexec-tools-2.0.0-167.el6.x86_64
kernel-2.6.32-71.18.1.el6.x86_64
kernel-2.6.32-120.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.start VM to kernel-2.6.32-120.el6.x86_64
2.#kexec -l /boot/vmlinuz-2.6.32-71.18.1.el6.x86_64 --initrd=/boot/initramfs-2.6.32-71.18.1.el6.x86_64.img --command-line="$(cat /proc/cmdline)"
3.#kexec -e
  
Actual results:
host kernel panic

Expected results:
switch to kernel-2.6.32-71.18

Additional info:
1.works well on intel host
2.panic info
login: Starting new kernel
�[drm] nouveau 0000:02:00.0: Pointer to BIT loadval table invalid
irq 18: nobody cared (try booting with the "irqpoll" option)
handlers:
[<ffffffff813841a0>] (usb_hcd_irq+0x0/0x90)
[<ffffffff813841a0>] (usb_hcd_irq+0x0/0x90)
[<ffffffff813841a0>] (usb_hcd_irq+0x0/0x90)
Disabling IRQ #18
could not read byte from child: Success
irq 18: nobody cared (try booting with the "irqpoll" option)
handlers:
[<ffffffff813841a0>] (usb_hcd_irq+0x0/0x90)
[<ffffffff813841a0>] (usb_hcd_irq+0x0/0x90)
[<ffffffff813841a0>] (usb_hcd_irq+0x0/0x90)
[<ffffffffa00bdde0>] (nouveau_irq_handler+0x0/0xba0 [nouveau])
Disabling IRQ #18

udevadm settle - timeout of 180 seconds reached, the event queue contains:
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/PNP0C02:00 (778)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/PNP0C02:01 (779)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00 (788)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/IFX0102:00 (789)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0000:00 (790)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0003:00 (791)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0100:00 (792)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0103:00 (793)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0200:00 (794)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0303:00 (795)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0501:00 (796)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0800:00 (797)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0B00:00 (798)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0C04:00 (799)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/PNP0F13:00 (800)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:01 (801)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:01/device:02 (802)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:03 (803)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:04 (804)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:04/device:05 (805)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:06 (806)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:06/device:07 (807)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:08 (808)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:08/device:09 (809)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0a (810)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0a/device:0b (811)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0c (812)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0d (813)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0e (814)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0f (815)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:10 (816)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:11 (817)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:12 (818)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:13 (819)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:13/device:14 (820)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:13/device:15 (821)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C01:00 (822)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0 (824)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0/event0 (825)
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C14:00 (826)
  /sys/devices/LNXSYSTM:00/LNXTHERM:00 (827)
  /sys/devices/pci0000:00/0000:00:11.0 (848)
  /sys/devices/pci0000:00/0000:00:12.0/usb3/3-2/3-2.1/3-2.1:1.0/input/input3 (857)
  /sys/devices/pci0000:00/0000:00:12.0/usb3/3-2/3-2.1/3-2.1:1.0/input/input3/event3 (858)
  /sys/devices/pci0000:00/0000:00:12.0/usb3/3-2/3-2.1/3-2.1:1.1/input/input4 (862)
  /sys/devices/pci0000:00/0000:00:12.0/usb3/3-2/3-2.1/3-2.1:1.1/input/input4/event4 (863)
  /sys/devices/pci0000:00/0000:00:12.0/usb3/3-2/3-2.1/3-2.1:1.1/input/input4/mouse1 (864)
  /sys/devices/platform/i8042/serio1 (906)
  /sys/devices/platform/pcspkr (907)
  /sys/devices/virtual/dmi/id (983)
  /sys/devices/pci0000:00/0000:00:11.0/host0 (1094)
  /sys/devices/pci0000:00/0000:00:11.0/host0/scsi_host/host0 (1095)
  /sys/devices/pci0000:00/0000:00:11.0/host1 (1096)
  /sys/devices/pci0000:00/0000:00:11.0/host1/scsi_host/host1 (1097)
  /sys/devices/pci0000:init: Corrupted page table at address 2106f48
00/0000:00:11.0/PGD 212ff6067 PUD 212ff5067 PMD 212803067 PTE 5f96d791e006065
host2 (1098)
  Bad pagetable: 000f [#1] SMP 
/sys/devices/pcilast sysfs file: /sys/module/ahci/initstate
0000:00/0000:00:CPU 1 
11.0/host2/scsi_Modules linked in: ahci nouveau ttm drm_kms_helper drm i2c_algo_bit video output i2c_core dm_mod
host/host2 (1099
)
  /sys/deviceModules linked in: ahci nouveau ttm drm_kms_helper drm i2c_algo_bit video output i2c_core dm_mod
s/pci0000:00/000Pid: 1, comm: init Not tainted 2.6.32-71.18.1.el6.x86_64 #1 HP Compaq 6005 Pro MT PC
0:00:11.0/host3 RIP: 0033:[<00007fab73a04f94>]  [<00007fab73a04f94>] 0x7fab73a04f94
(1100)

Comment 5 Qian Cai 2011-03-18 03:35:51 UTC
OK, I am fairly confident to say that if kdump is working that is not much to worry about for kexec -l case since you'll need to get your command-line option right and avoid limitation that kdump is set up to solve.

As far as I am aware that there is no REAL use case for kexec -l from our customers and partners. Even if you found a corner case that kexec by panic is working but kexec by directly loading is not working, this is likely to be low priority that unlikely to be hit by our customers and partners.

So, if it is working, that is great. Otherwise, let's take a few more minutes to reboot or use virt guests to test new kernels. That is likely a more quick solution to fix kexec -l alone.

Therefore, I am going to close this as NOTABUG.