Bug 442670 - [5.2][kdump][xen] kdump on Dom0 Kernel not work properly on ibm-x3200m2-01
Summary: [5.2][kdump][xen] kdump on Dom0 Kernel not work properly on ibm-x3200m2-01
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.2
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-16 07:06 UTC by Qian Cai
Modified: 2008-10-22 12:32 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-10-22 12:32:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Qian Cai 2008-04-16 07:06:43 UTC
Description of problem:
Kdump on Dom0 Kernel does not work properly on
ibm-x3200m2-01.rhts.boston.redhat.com. Each time starting kdump service on this
box, there is something suspicious,

printk: 27263 messages suppressed.
4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd
4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd
4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd
4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd

I have seen several random failures. Capture kernel could Oops,
http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=72151

mptbase: ioc0: Initiating bringup
ioc0: LSISAS1064E B1: Capabilities={Initiator}
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
ca847022
*pde = 02302001
Oops: 0002 [#1]
SMP 
last sysfs file: 
Modules linked in: mptsas scsi_transport_sas mptscsih sd_mod scsi_mod mptbase
CPU:    0
EIP:    0060:[<ca847022>]    Not tainted VLI
EFLAGS: 00010206   (2.6.18-89.el5PAE #1) 
EIP is at mpt_findImVolumes+0x4d3/0x525 [mptbase]
eax: 00000000   ebx: c9f11000   ecx: c227a800   edx: c227a88c
esi: c9f11040   edi: c9eec800   ebp: 09f11000   esp: c9e18b04
ds: 007b   es: 007b   ss: 0068
Process exe (pid: 430, ti=c9e18000 task=c9e19aa0 task.ti=c9e18000)
Stack: c9eec800 00000000 c9f12000 0000008c 00000282 c202ddce c9e18b44 ffffffff 
       00100100 00200200 00000000 00200200 fffbb1d4 ca848255 c9eec800 c239d080 
       c9e18c24 09f11000 00000000 00000001 00000000 c2000001 00000000 c9eec800 
Call Trace:
 [<c202ddce>] lock_timer_base+0x15/0x2f
 [<ca848255>] mpt_timer_expired+0x0/0x4e [mptbase]
 [<c202e1d8>] msleep+0x17/0x1c
 [<ca8436c9>] WaitForDoorbellInt+0x37/0x95 [mptbase]
 [<ca843a43>] mpt_handshake_req_reply_wait+0x298/0x3d0 [mptbase]
 [<ca8443c6>] SendIocInit+0x2ce/0x3ba [mptbase]
 [<ca848255>] mpt_timer_expired+0x0/0x4e [mptbase]
 [<ca8477fa>] mpt_do_ioc_recovery+0x786/0x107e [mptbase]
 [<c20e4f50>] __delay+0x6/0x7
 [<c2206f54>] schedule+0x920/0x9cd
 [<c21a3d24>] pci_read+0x1c/0x21
 [<c2021ab6>] __cond_resched+0x16/0x34
 [<c220702b>] cond_resched+0x2a/0x31
 [<c201789b>] smp_call_function+0x23/0xc3
 [<c206494e>] __get_vm_area_node+0xa6/0x165
 [<c2017a86>] do_flush_tlb_all+0x0/0x5a
 [<c202a551>] on_each_cpu+0x17/0x1f
 [<c21a2b47>] pci_conf1_read+0xa4/0xad
 [<c21a3d24>] pci_read+0x1c/0x21
 [<c2038c81>] down_read+0x8/0x11
 [<ca849e3e>] mpt_attach+0xa4e/0xb2e [mptbase]
 [<c214ccad>] __driver_attach+0x0/0x6b
 [<ca86fa52>] mptsas_probe+0x10/0x3fb [mptsas]
 [<c20eda28>] pci_match_device+0x10/0xac
 [<c214ccad>] __driver_attach+0x0/0x6b
 [<c20edb10>] pci_device_probe+0x36/0x57
 [<c214cc00>] driver_probe_device+0x42/0x92
 [<c214ccf1>] __driver_attach+0x44/0x6b
 [<c214c6fe>] bus_for_each_dev+0x37/0x59
 [<c214cb6a>] driver_attach+0x11/0x13
 [<c214ccad>] __driver_attach+0x0/0x6b
 [<c214c406>] bus_add_driver+0x64/0xfd
 [<c20edc35>] __pci_register_driver+0x3e/0x58
 [<ca83b0b5>] mptsas_init+0xb5/0xc9 [mptsas]
 [<c203e859>] sys_init_module+0x18b5/0x1a60
 [<c207ae01>] permission+0xa2/0xb5
 [<ca82af52>] sas_release_transport+0x0/0x47 [scsi_transport_sas]
 [<c200946a>] sys_mmap2+0x99/0xa3
 [<c2004eff>] syscall_call+0x7/0xb
 =======================
Code: 94 24 21 01 00 00 ff b4 24 14 01 00 00 0f 45 c1 ff b4 24 14 01 00 00 89 d9
c1 e2 ff ff ff ff ff ff 00 07 e9 d8 17 98 08 06 00 01 <08> 00 06 04 00 01 00 07
e9 d8 17 98 c0 a8 4f 93 ff ff ff ff ff 
EIP: [<ca847022>] mpt_findImVolumes+0x4d3/0x525 [mptbase] SS:ESP 0068:c9e18b04
 <0>Kernel panic - not syncing: Fatal exception

Capture kernel could soft lockup,
http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=72152

BUG: soft lockup - CPU#0 stuck for 10s! [ifconfig:1151]

Pid: 1151, comm:             ifconfig
EIP: 0060:[<c2208701>] CPU: 0
EIP is at _spin_lock_bh+0x12/0x18
 EFLAGS: 00000286    Not tainted  (2.6.18-89.el5PAE #1)
EAX: c89be000 EBX: c9c1c860 ECX: 00000000 EDX: 00203100
ESI: 00000000 EDI: 00000218 EBP: 00000860 DS: 007b ES: 007b
CR0: 80050033 CR2: 08205698 CR3: 0231f9c0 CR4: 000006f0
 [<c21c453e>] rt_run_flush+0x47/0x8f
 [<c22f30c0>] powernow_cpu_init+0x2fd/0x568
 [<c21ebe62>] ip_mc_inc_group+0x168/0x194
 [<c22f30c0>] powernow_cpu_init+0x2fd/0x568
 [<c22f317c>] powernow_cpu_init+0x3b9/0x568
 [<c22f30c0>] powernow_cpu_init+0x2fd/0x568
 [<c21ebec7>] ip_mc_up+0x39/0x4e
 [<c22f3124>] powernow_cpu_init+0x361/0x568
 [<c21e7eac>] inetdev_init+0xe5/0x101
 [<c21e89a5>] devinet_ioctl+0x3a8/0x542
 [<c21a4dd7>] sock_ioctl+0x191/0x1b3
 [<c21a4c46>] sock_ioctl+0x0/0x1b3
 [<c207ecec>] do_ioctl+0x1c/0x5d
 [<c207ef77>] vfs_ioctl+0x24a/0x25c
 [<c207efd1>] sys_ioctl+0x48/0x5f
 [<c2004eff>] syscall_call+0x7/0xb

Even if capture kernel booted successfully by chance, it may still failed to
save vmcore to a network target.

Version-Release number of selected component (if applicable):
RHEL5.2-Server-20080409.0
kexec-tools-1.102pre-20.el5
kernel-2.6.18-89.el5

How reproducible:
Always

Steps to Reproduce:
1. reserve intel-s6e5231-01.rhts.boston.redhat.com
2. export intel-s6e5231-01.rhts.boston.redhat.com:/mnt as NFS share
*(rw,no_root_squash)
3. run automated test /kernel/distribution/kexec-tools/net

Comment 1 Qian Cai 2008-04-16 07:11:23 UTC
Sometimes, capture kernel panic due to,

NET: Registered protocol family 1
NET: Registered protocol family 17
Using IPI No-Shortcut mode
ACPI: (supports<6>Time: tsc clocksource has been installed.
 S0 S1 S4 S5)
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
RAMDISK: Compressed image found at block 0
crc error
VFS: Cannot open root device "VolGroup00/LogVol00" or unknown-block(0,0)
Please append a correct "root=" boot option
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=72153

Comment 2 Qian Cai 2008-10-22 12:05:11 UTC
I'll close this, as it has been fixed magically in the latest RHEL 5.3 Beta candidate tree.


Note You need to log in before you can comment on or make changes to this bug.