Bug 594505 - Install Kernel Panics on Dell PowerEdge 1750
Summary: Install Kernel Panics on Dell PowerEdge 1750
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: i386
OS: Linux
Target Milestone: rc
: ---
Assignee: Shyam Iyer
QA Contact: Red Hat Kernel QE team
Depends On:
TreeView+ depends on / blocked
Reported: 2010-05-20 21:05 UTC by Joseph Mann
Modified: 2015-04-28 04:18 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2010-07-15 21:26:53 UTC
Target Upstream Version:

Attachments (Terms of Use)
core dump (31 bytes, text/plain)
2010-05-23 04:00 UTC, Emanuel Rietveld
no flags Details
Wait for 15s on Doorbell ack to detect IOC READY State (516 bytes, patch)
2010-07-09 16:39 UTC, Shyam Iyer
no flags Details | Diff

Description Joseph Mann 2010-05-20 21:05:53 UTC
Description of problem:
A kernel panic is seen during the install of RHEL6 Snap 4 on a Dell PowerEdge 1750, this issue is has been observed on at least two servers

Version-Release number of selected component (if applicable):

How reproducible:
Attempt an install on this type of server

Actual results:
Kernel Panic

Expected results:
No Kernel Panic

Additional info:
Here is the kernel trace:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<e0ad1a16>] SendIocReset+0x46/0x110 [mptbase]
*pdpt = 000000001a212001 *pde = 000000001a208067 *pte = 0000000000000000 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/module/sr_mod/initstate
Modules linked in: sg(U) sr_mod(U) cdrom(U) lpfc(U) mptspi(U) mptscsih(U) ata_g)

Pid: 287, comm: scsi_scan_3 Not tainted (2.6.32-25.el6.i686 #1) PowerEdge 1750  
EIP: 0060:[<e0ad1a16>] EFLAGS: 00010246 CPU: 1
EIP is at SendIocReset+0x46/0x110 [mptbase]
EAX: 00000000 EBX: df099000 ECX: de07c000 EDX: 00000000
ESI: 20000000 EDI: 00000001 EBP: 00001389 ESP: df017b60
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process scsi_scan_3 (pid: 287, ti=df016000 task=df8a9560 task.ti=df016000)
 df8a9560 fffbd320 00000286 00000000 df099000 00000001 000003e7 00000001
<0> e0ad2157 00000000 00000000 00000000 00000000 92928b80 00000000 df099000
<0> 20000000 00000003 df099000 00000000 00000001 e0ad2449 e0adb9e4 df099008
Call Trace:
 [<e0ad2157>] ? KickStart+0x677/0x8f0 [mptbase]
 [<e0ad2449>] ? MakeIocReady+0x79/0x380 [mptbase]
 [<e0ad3469>] ? mpt_do_ioc_recovery+0x299/0x18d0 [mptbase]
 [<c043fe29>] ? finish_task_switch+0x39/0xa0
 [<c0809cb3>] ? schedule+0x433/0xad0
 [<c0471efb>] ? up+0xb/0x40
 [<c044da7e>] ? release_console_sem+0x19e/0x1f0
 [<c045c860>] ? process_timeout+0x0/0x10
 [<e0b4ac57>] ? mptspi_ioc_reset+0x17/0x50 [mptspi]
 [<e0ad4b53>] ? mpt_HardResetHandler+0xb3/0x220 [mptbase]
 [<e0ad5220>] ? mpt_config+0x380/0x550 [mptbase]
 [<e0b4bb50>] ? mptspi_write_spi_device_pg1+0x160/0x450 [mptspi]
 [<c0449dd5>] ? check_preempt_wakeup+0x285/0x380
 [<c05dd4d4>] ? vsnprintf+0xd4/0x400
 [<e0b4be9d>] ? mptspi_write_width+0x5d/0x70 [mptspi]
 [<e0b4c020>] ? mptspi_target_alloc+0x170/0x270 [mptspi]
 [<c069c791>] ? attribute_container_add_device+0x51/0x180
 [<c06b0e28>] ? scsi_alloc_target+0x248/0x2b0
 [<c06b1df6>] ? __scsi_scan_target+0x66/0x6d0
 [<c0441187>] ? update_curr+0x167/0x2c0
 [<c04493f9>] ? dequeue_entity+0x1a9/0x200
 [<c0408147>] ? __switch_to+0xd7/0x1a0
 [<c06b24d7>] ? scsi_scan_channel+0x77/0x90
 [<c06b25d1>] ? scsi_scan_host_selected+0xe1/0x170
 [<c06b26d6>] ? do_scsi_scan_host+0x76/0x80
 [<c06b26f1>] ? do_scan_async+0x11/0x120
 [<c043b330>] ? complete+0x40/0x60
 [<c06b26e0>] ? do_scan_async+0x0/0x120
 [<c046d154>] ? kthread+0x74/0x80
 [<c046d0e0>] ? kthread+0x0/0x80
 [<c040a4a7>] ? kernel_thread_helper+0x7/0x10
Code: f2 8b 83 e8 00 00 00 c1 e6 18 89 30 89 fa 89 d8 e8 a0 e7 ff ff 31 ed 85 c 
EIP: [<e0ad1a16>] SendIocReset+0x46/0x110 [mptbase] SS:ESP 0068:df017b60
CR2: 0000000000000000
---[ end trace c82a65b75ffe35ff ]---
Kernel panic - not syncing: Fatal exception

Comment 2 Emanuel Rietveld 2010-05-23 04:00:31 UTC
Created attachment 415914 [details]
core dump

I have the same problem on a Dell poweredge 2600. These are the steps I took to produce the crash dump that I attached

1) Install CentOs 5.3 on dell poweredge 2600
2) yum install kexec-tools
3) service kdump start
4) download rhel 6 beta pxe images
5) modify the rhel 6 beta initrd to include
  * centos 5.3 vmlinuz
  * centos 5.3 kdump initrd
  * rm init in the root of the initrd
  * make an init that says 
      mount -t proc /proc /proc
      kexec -p vmlinuz-kdump --initrd=initrd-kdump.img
      umount /proc
      exec /sbin/init
6) modify grub.conf to boot into rhel 6 beta kernel and modified initrd with kernel cmdline argument crashkernel=128M 

A related bug, perhaps? https://bugzilla.redhat.com/show_bug.cgi?id=544220

If you need any more information let me know.

Comment 3 Prarit Bhargava 2010-06-02 18:46:58 UTC
We don't support this type of install.


Comment 4 Prarit Bhargava 2010-06-02 18:57:28 UTC
Oops.  Sorry Joseph, I mixed up comments #3 and #1 -- my bad.


Comment 5 RHEL Program Management 2010-06-07 15:54:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for

Comment 6 Emanuel Rietveld 2010-06-07 21:57:28 UTC
Just to clarify, I originally experienced the problem when attempting a DVD install. That is just not how I captured that core dump. If it helps, I can try the same procedure with the initrd and vmlinuz from the DVD.

Comment 7 Joseph Mann 2010-06-14 16:08:00 UTC
Issue still exists with RHEL6 Snap6 (2.6.32-30.el6.i686) with same NULL pointer dereference error.

Comment 8 Emanuel Rietveld 2010-06-17 13:19:50 UTC
For a temporary workaround you can use upstream driver

I have been able to install rhel6 beta on a dell poweredge 2600 by replacing the modules in the pxe initrd.img and once again on disk after installation. The system has since been stable.

However, please be very careful when using this driver as it does not have much testing so far as I know.

Comment 9 Shyam Iyer 2010-07-09 16:39:35 UTC
Created attachment 430717 [details]
Wait for 15s on Doorbell ack to detect IOC READY State

This patch should fix the issue

I will try to get a test kernel rpm..

Comment 10 RHEL Program Management 2010-07-15 13:59:12 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 11 Joseph Mann 2010-07-15 19:48:42 UTC
This issue appears to have been fixed in Snap 7.
Feel free to close this bug.


Note You need to log in before you can comment on or make changes to this bug.