Bug 442670

Summary: [5.2][kdump][xen] kdump on Dom0 Kernel not work properly on ibm-x3200m2-01
Product: Red Hat Enterprise Linux 5 Reporter: Qian Cai <qcai>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED NOTABUG QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 5.2   
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-22 12:32:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qian Cai 2008-04-16 07:06:43 UTC
Description of problem:
Kdump on Dom0 Kernel does not work properly on
ibm-x3200m2-01.rhts.boston.redhat.com. Each time starting kdump service on this
box, there is something suspicious,

printk: 27263 messages suppressed.
4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd
4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd
4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd
4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd

I have seen several random failures. Capture kernel could Oops,
http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=72151

mptbase: ioc0: Initiating bringup
ioc0: LSISAS1064E B1: Capabilities={Initiator}
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
ca847022
*pde = 02302001
Oops: 0002 [#1]
SMP 
last sysfs file: 
Modules linked in: mptsas scsi_transport_sas mptscsih sd_mod scsi_mod mptbase
CPU:    0
EIP:    0060:[<ca847022>]    Not tainted VLI
EFLAGS: 00010206   (2.6.18-89.el5PAE #1) 
EIP is at mpt_findImVolumes+0x4d3/0x525 [mptbase]
eax: 00000000   ebx: c9f11000   ecx: c227a800   edx: c227a88c
esi: c9f11040   edi: c9eec800   ebp: 09f11000   esp: c9e18b04
ds: 007b   es: 007b   ss: 0068
Process exe (pid: 430, ti=c9e18000 task=c9e19aa0 task.ti=c9e18000)
Stack: c9eec800 00000000 c9f12000 0000008c 00000282 c202ddce c9e18b44 ffffffff 
       00100100 00200200 00000000 00200200 fffbb1d4 ca848255 c9eec800 c239d080 
       c9e18c24 09f11000 00000000 00000001 00000000 c2000001 00000000 c9eec800 
Call Trace:
 [<c202ddce>] lock_timer_base+0x15/0x2f
 [<ca848255>] mpt_timer_expired+0x0/0x4e [mptbase]
 [<c202e1d8>] msleep+0x17/0x1c
 [<ca8436c9>] WaitForDoorbellInt+0x37/0x95 [mptbase]
 [<ca843a43>] mpt_handshake_req_reply_wait+0x298/0x3d0 [mptbase]
 [<ca8443c6>] SendIocInit+0x2ce/0x3ba [mptbase]
 [<ca848255>] mpt_timer_expired+0x0/0x4e [mptbase]
 [<ca8477fa>] mpt_do_ioc_recovery+0x786/0x107e [mptbase]
 [<c20e4f50>] __delay+0x6/0x7
 [<c2206f54>] schedule+0x920/0x9cd
 [<c21a3d24>] pci_read+0x1c/0x21
 [<c2021ab6>] __cond_resched+0x16/0x34
 [<c220702b>] cond_resched+0x2a/0x31
 [<c201789b>] smp_call_function+0x23/0xc3
 [<c206494e>] __get_vm_area_node+0xa6/0x165
 [<c2017a86>] do_flush_tlb_all+0x0/0x5a
 [<c202a551>] on_each_cpu+0x17/0x1f
 [<c21a2b47>] pci_conf1_read+0xa4/0xad
 [<c21a3d24>] pci_read+0x1c/0x21
 [<c2038c81>] down_read+0x8/0x11
 [<ca849e3e>] mpt_attach+0xa4e/0xb2e [mptbase]
 [<c214ccad>] __driver_attach+0x0/0x6b
 [<ca86fa52>] mptsas_probe+0x10/0x3fb [mptsas]
 [<c20eda28>] pci_match_device+0x10/0xac
 [<c214ccad>] __driver_attach+0x0/0x6b
 [<c20edb10>] pci_device_probe+0x36/0x57
 [<c214cc00>] driver_probe_device+0x42/0x92
 [<c214ccf1>] __driver_attach+0x44/0x6b
 [<c214c6fe>] bus_for_each_dev+0x37/0x59
 [<c214cb6a>] driver_attach+0x11/0x13
 [<c214ccad>] __driver_attach+0x0/0x6b
 [<c214c406>] bus_add_driver+0x64/0xfd
 [<c20edc35>] __pci_register_driver+0x3e/0x58
 [<ca83b0b5>] mptsas_init+0xb5/0xc9 [mptsas]
 [<c203e859>] sys_init_module+0x18b5/0x1a60
 [<c207ae01>] permission+0xa2/0xb5
 [<ca82af52>] sas_release_transport+0x0/0x47 [scsi_transport_sas]
 [<c200946a>] sys_mmap2+0x99/0xa3
 [<c2004eff>] syscall_call+0x7/0xb
 =======================
Code: 94 24 21 01 00 00 ff b4 24 14 01 00 00 0f 45 c1 ff b4 24 14 01 00 00 89 d9
c1 e2 ff ff ff ff ff ff 00 07 e9 d8 17 98 08 06 00 01 <08> 00 06 04 00 01 00 07
e9 d8 17 98 c0 a8 4f 93 ff ff ff ff ff 
EIP: [<ca847022>] mpt_findImVolumes+0x4d3/0x525 [mptbase] SS:ESP 0068:c9e18b04
 <0>Kernel panic - not syncing: Fatal exception

Capture kernel could soft lockup,
http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=72152

BUG: soft lockup - CPU#0 stuck for 10s! [ifconfig:1151]

Pid: 1151, comm:             ifconfig
EIP: 0060:[<c2208701>] CPU: 0
EIP is at _spin_lock_bh+0x12/0x18
 EFLAGS: 00000286    Not tainted  (2.6.18-89.el5PAE #1)
EAX: c89be000 EBX: c9c1c860 ECX: 00000000 EDX: 00203100
ESI: 00000000 EDI: 00000218 EBP: 00000860 DS: 007b ES: 007b
CR0: 80050033 CR2: 08205698 CR3: 0231f9c0 CR4: 000006f0
 [<c21c453e>] rt_run_flush+0x47/0x8f
 [<c22f30c0>] powernow_cpu_init+0x2fd/0x568
 [<c21ebe62>] ip_mc_inc_group+0x168/0x194
 [<c22f30c0>] powernow_cpu_init+0x2fd/0x568
 [<c22f317c>] powernow_cpu_init+0x3b9/0x568
 [<c22f30c0>] powernow_cpu_init+0x2fd/0x568
 [<c21ebec7>] ip_mc_up+0x39/0x4e
 [<c22f3124>] powernow_cpu_init+0x361/0x568
 [<c21e7eac>] inetdev_init+0xe5/0x101
 [<c21e89a5>] devinet_ioctl+0x3a8/0x542
 [<c21a4dd7>] sock_ioctl+0x191/0x1b3
 [<c21a4c46>] sock_ioctl+0x0/0x1b3
 [<c207ecec>] do_ioctl+0x1c/0x5d
 [<c207ef77>] vfs_ioctl+0x24a/0x25c
 [<c207efd1>] sys_ioctl+0x48/0x5f
 [<c2004eff>] syscall_call+0x7/0xb

Even if capture kernel booted successfully by chance, it may still failed to
save vmcore to a network target.

Version-Release number of selected component (if applicable):
RHEL5.2-Server-20080409.0
kexec-tools-1.102pre-20.el5
kernel-2.6.18-89.el5

How reproducible:
Always

Steps to Reproduce:
1. reserve intel-s6e5231-01.rhts.boston.redhat.com
2. export intel-s6e5231-01.rhts.boston.redhat.com:/mnt as NFS share
*(rw,no_root_squash)
3. run automated test /kernel/distribution/kexec-tools/net

Comment 1 Qian Cai 2008-04-16 07:11:23 UTC
Sometimes, capture kernel panic due to,

NET: Registered protocol family 1
NET: Registered protocol family 17
Using IPI No-Shortcut mode
ACPI: (supports<6>Time: tsc clocksource has been installed.
 S0 S1 S4 S5)
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
RAMDISK: Compressed image found at block 0
crc error
VFS: Cannot open root device "VolGroup00/LogVol00" or unknown-block(0,0)
Please append a correct "root=" boot option
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=72153

Comment 2 Qian Cai 2008-10-22 12:05:11 UTC
I'll close this, as it has been fixed magically in the latest RHEL 5.3 Beta candidate tree.