Description of problem: Kdump on Dom0 Kernel does not work properly on ibm-x3200m2-01.rhts.boston.redhat.com. Each time starting kdump service on this box, there is something suspicious, printk: 27263 messages suppressed. 4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd 4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd 4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd 4gb seg fixup, process ldd (pid 5538), cs:ip 73:001d53dd I have seen several random failures. Capture kernel could Oops, http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=72151 mptbase: ioc0: Initiating bringup ioc0: LSISAS1064E B1: Capabilities={Initiator} BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: ca847022 *pde = 02302001 Oops: 0002 [#1] SMP last sysfs file: Modules linked in: mptsas scsi_transport_sas mptscsih sd_mod scsi_mod mptbase CPU: 0 EIP: 0060:[<ca847022>] Not tainted VLI EFLAGS: 00010206 (2.6.18-89.el5PAE #1) EIP is at mpt_findImVolumes+0x4d3/0x525 [mptbase] eax: 00000000 ebx: c9f11000 ecx: c227a800 edx: c227a88c esi: c9f11040 edi: c9eec800 ebp: 09f11000 esp: c9e18b04 ds: 007b es: 007b ss: 0068 Process exe (pid: 430, ti=c9e18000 task=c9e19aa0 task.ti=c9e18000) Stack: c9eec800 00000000 c9f12000 0000008c 00000282 c202ddce c9e18b44 ffffffff 00100100 00200200 00000000 00200200 fffbb1d4 ca848255 c9eec800 c239d080 c9e18c24 09f11000 00000000 00000001 00000000 c2000001 00000000 c9eec800 Call Trace: [<c202ddce>] lock_timer_base+0x15/0x2f [<ca848255>] mpt_timer_expired+0x0/0x4e [mptbase] [<c202e1d8>] msleep+0x17/0x1c [<ca8436c9>] WaitForDoorbellInt+0x37/0x95 [mptbase] [<ca843a43>] mpt_handshake_req_reply_wait+0x298/0x3d0 [mptbase] [<ca8443c6>] SendIocInit+0x2ce/0x3ba [mptbase] [<ca848255>] mpt_timer_expired+0x0/0x4e [mptbase] [<ca8477fa>] mpt_do_ioc_recovery+0x786/0x107e [mptbase] [<c20e4f50>] __delay+0x6/0x7 [<c2206f54>] schedule+0x920/0x9cd [<c21a3d24>] pci_read+0x1c/0x21 [<c2021ab6>] __cond_resched+0x16/0x34 [<c220702b>] cond_resched+0x2a/0x31 [<c201789b>] smp_call_function+0x23/0xc3 [<c206494e>] __get_vm_area_node+0xa6/0x165 [<c2017a86>] do_flush_tlb_all+0x0/0x5a [<c202a551>] on_each_cpu+0x17/0x1f [<c21a2b47>] pci_conf1_read+0xa4/0xad [<c21a3d24>] pci_read+0x1c/0x21 [<c2038c81>] down_read+0x8/0x11 [<ca849e3e>] mpt_attach+0xa4e/0xb2e [mptbase] [<c214ccad>] __driver_attach+0x0/0x6b [<ca86fa52>] mptsas_probe+0x10/0x3fb [mptsas] [<c20eda28>] pci_match_device+0x10/0xac [<c214ccad>] __driver_attach+0x0/0x6b [<c20edb10>] pci_device_probe+0x36/0x57 [<c214cc00>] driver_probe_device+0x42/0x92 [<c214ccf1>] __driver_attach+0x44/0x6b [<c214c6fe>] bus_for_each_dev+0x37/0x59 [<c214cb6a>] driver_attach+0x11/0x13 [<c214ccad>] __driver_attach+0x0/0x6b [<c214c406>] bus_add_driver+0x64/0xfd [<c20edc35>] __pci_register_driver+0x3e/0x58 [<ca83b0b5>] mptsas_init+0xb5/0xc9 [mptsas] [<c203e859>] sys_init_module+0x18b5/0x1a60 [<c207ae01>] permission+0xa2/0xb5 [<ca82af52>] sas_release_transport+0x0/0x47 [scsi_transport_sas] [<c200946a>] sys_mmap2+0x99/0xa3 [<c2004eff>] syscall_call+0x7/0xb ======================= Code: 94 24 21 01 00 00 ff b4 24 14 01 00 00 0f 45 c1 ff b4 24 14 01 00 00 89 d9 c1 e2 ff ff ff ff ff ff 00 07 e9 d8 17 98 08 06 00 01 <08> 00 06 04 00 01 00 07 e9 d8 17 98 c0 a8 4f 93 ff ff ff ff ff EIP: [<ca847022>] mpt_findImVolumes+0x4d3/0x525 [mptbase] SS:ESP 0068:c9e18b04 <0>Kernel panic - not syncing: Fatal exception Capture kernel could soft lockup, http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=72152 BUG: soft lockup - CPU#0 stuck for 10s! [ifconfig:1151] Pid: 1151, comm: ifconfig EIP: 0060:[<c2208701>] CPU: 0 EIP is at _spin_lock_bh+0x12/0x18 EFLAGS: 00000286 Not tainted (2.6.18-89.el5PAE #1) EAX: c89be000 EBX: c9c1c860 ECX: 00000000 EDX: 00203100 ESI: 00000000 EDI: 00000218 EBP: 00000860 DS: 007b ES: 007b CR0: 80050033 CR2: 08205698 CR3: 0231f9c0 CR4: 000006f0 [<c21c453e>] rt_run_flush+0x47/0x8f [<c22f30c0>] powernow_cpu_init+0x2fd/0x568 [<c21ebe62>] ip_mc_inc_group+0x168/0x194 [<c22f30c0>] powernow_cpu_init+0x2fd/0x568 [<c22f317c>] powernow_cpu_init+0x3b9/0x568 [<c22f30c0>] powernow_cpu_init+0x2fd/0x568 [<c21ebec7>] ip_mc_up+0x39/0x4e [<c22f3124>] powernow_cpu_init+0x361/0x568 [<c21e7eac>] inetdev_init+0xe5/0x101 [<c21e89a5>] devinet_ioctl+0x3a8/0x542 [<c21a4dd7>] sock_ioctl+0x191/0x1b3 [<c21a4c46>] sock_ioctl+0x0/0x1b3 [<c207ecec>] do_ioctl+0x1c/0x5d [<c207ef77>] vfs_ioctl+0x24a/0x25c [<c207efd1>] sys_ioctl+0x48/0x5f [<c2004eff>] syscall_call+0x7/0xb Even if capture kernel booted successfully by chance, it may still failed to save vmcore to a network target. Version-Release number of selected component (if applicable): RHEL5.2-Server-20080409.0 kexec-tools-1.102pre-20.el5 kernel-2.6.18-89.el5 How reproducible: Always Steps to Reproduce: 1. reserve intel-s6e5231-01.rhts.boston.redhat.com 2. export intel-s6e5231-01.rhts.boston.redhat.com:/mnt as NFS share *(rw,no_root_squash) 3. run automated test /kernel/distribution/kexec-tools/net
Sometimes, capture kernel panic due to, NET: Registered protocol family 1 NET: Registered protocol family 17 Using IPI No-Shortcut mode ACPI: (supports<6>Time: tsc clocksource has been installed. S0 S1 S4 S5) md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. RAMDISK: Compressed image found at block 0 crc error VFS: Cannot open root device "VolGroup00/LogVol00" or unknown-block(0,0) Please append a correct "root=" boot option Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=72153
I'll close this, as it has been fixed magically in the latest RHEL 5.3 Beta candidate tree.