Red Hat Bugzilla – Bug 144111
SMP kernel hangs when loading sym53c8xx.ko module
Last modified: 2015-01-04 17:14:41 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Description of problem:
Fedora Core 3 hangs when trying to boot the SMP kernel somewhere in
the SCSI initialization process. The non-SMP kernel boots with no
problem. This system (a dual Xeon HP Kayak XU) used to run the RedHat
7.2 SMP kernel with no problem but I've changed the SCSI device chain
since then (added an external disk box) so I'm not sure how useful
that information is.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Boot and allow grub to start the default (SMP) kernel
Actual Results: The system hangs
Expected Results: The boot process should have completed
Created attachment 109334 [details]
dmesg output from non-SMP boot
Created attachment 109335 [details]
Contents of /proc/pci
Created attachment 109336 [details]
Contents of /proc/scsi/scsi
Created attachment 109337 [details]
Hand copied output from hung SMP kernel
This looks like it could be a duplicate of bug 144246.
I'm having almost the same problem, but my SMP system (dual 700MHz
Pentium 3) with a Symbios 53c895 boots up to the text login prompt and
then gets stuck in a loop spitting out these messages (hand copied):
sym0:0:0: ABORT operation started.
sym0:0:0: ABORT operation timed-out.
sym0:0:1: ABORT operation started.
sym0:0:1: ABORT operation timed-out.
X never starts and though I can type a username at the login: prompt,
I never get Password: prompt, just the above messages continuously.
My boot disks (software raid1) are both on the sym53c895 scsi bus.
(Paul: Could be. Very hard for me to say.)
I took the oportunity of rebooting to try the SMP kernel without the
external disk box connected, but it still hangs so it appears that
this is a regression from RedHat 7.2 after all.
I am seeing this as well on an old dual P3 733Mhz VA Linux box.
I honestly don't understand why this is marked as a kernel issue when killing
the haldaemon serves as a workaround. Seems like a haldaemon bug to me.
> I honestly don't understand why this is marked as a kernel issue when killing
> the haldaemon serves as a workaround. Seems like a haldaemon bug to me.
What makes you think that killing hald solves the problem? Plus, how do you do
that when the system hangs on boot?
Remove haldaemon from all run level startups, reboot, problem goes away. I've
verified this on two different systems.
I don't have the references at the moment, but a short time ago I did little
googling around and did find some discussion of this problem on lkml and maybe
elsewhere. It seems it's been around for some time but has only recently become
more serious due to haldaemon. As best as I can remember, it has to do with
buggy hardware in that certain SCSI cards, when some registers are *read*,
believe it or not, the parity error is triggered.
Sorry I don't have more details, but from what I can remember, this problem is
not uniquely caused by hal and very likley cannot be fix in that code. I don't
know enough about the details how it can be fixed in the kernel, but I can't
think of anywhere else that problem could possibly be addressed.
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem. Please update to this new kernel, and
report whether or not it fixes your problem.
If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.
It no longer hangs in FC4. Instead now, it panics:
sym0: <896> rev 0x7 at pci 0000:01:05.0 irq 145
sym0: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: open drain IRQ line driver
sym0: using LOAD/STORE-based firmware.
sym0: handling phase mismatch from SCRIPTS.
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.0
Vendor: HITACHI Model: DK32CJ-36MW Rev: JBBB
Type: Direct-Access ANSI SCSI revision: 03
target0:0:0: tagged command queuing enabled, command queue depth 16.
target0:0:0: Beginning Domain Validation
Debug: sleeping function called from invalid context at mm/slab.c:2126
[<f887cf31>] sym_alloc_lcb_tags+0x6a/0x106 [sym53c8xx]
[<f887c9b6>] sym_get_ccb+0x272/0x282 [sym53c8xx]
[<f88381b7>] scsi_get_command+0xb2/0xc5 [scsi_mod]
[<f8874bbb>] sym_queue_command+0x89/0xd3 [sym53c8xx]
[<f8874f60>] sym53c8xx_queue_command+0x64/0x6c [sym53c8xx]
[<f8838782>] scsi_dispatch_cmd+0x170/0x31a [scsi_mod]
[<f883e4e1>] scsi_request_fn+0x1d5/0x3ba [scsi_mod]
[<f883d22b>] scsi_insert_special_req+0x2b/0x32 [scsi_mod]
[<f883d46b>] scsi_wait_req+0x82/0xb3 [scsi_mod]
[<f883d36a>] scsi_wait_done+0x0/0x66 [scsi_mod]
[<f881c0cb>] spi_wait_req+0x3b/0x64 [scsi_transport_spi]
[<f881d23a>] spi_dv_device_compare_inquiry+0xb9/0x102 [scsi_transport_spi]
[<f881d4e6>] spi_dv_device_internal+0x6a/0x280 [scsi_transport_spi]
[<f881d7db>] spi_dv_device+0xdf/0x14a [scsi_transport_spi]
[<f8875610>] sym53c8xx_slave_configure+0xb3/0xf0 [sym53c8xx]
[<f883f996>] scsi_add_lun+0x19e/0x32a [scsi_mod]
[<f883fc8a>] scsi_probe_and_add_lun+0x168/0x230 [scsi_mod]
[<f884046c>] scsi_scan_target+0xc3/0x13b [scsi_mod]
[<f8840564>] scsi_scan_channel+0x80/0x95 [scsi_mod]
[<f884061f>] scsi_scan_host_selected+0xa6/0x11e [scsi_mod]
[<f88394ec>] scsi_add_host+0x185/0x1a7 [scsi_mod]
[<f88406b8>] scsi_scan_host+0x21/0x25 [scsi_mod]
[<f8876936>] sym2_probe+0xe0/0xfd [sym53c8xx]
[<f880d028>] sym2_init+0x28/0x40 [sym53c8xx]
WIDTH IS 1
target0:0:0: wide asynchronous.
target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)
This was on:
Linux version 2.6.12-1.1398_FC4smp (firstname.lastname@example.org) (gcc
version 4.0.0 20050519 (Red Hat 4.0.0-8)) #1 SMP Fri Jul 15 01:30:13 EDT 2005
2.6.12-1.1372_FC3 also hangs for me even with acpi turned off (which works as a
workaround for 2.6.11-1.35_FC3). Hand copied output:
dm_snapshot: Unknown symbol dm_vcalloc
dm_snapshot: Unknown symbol dm_table_get_mode
dm_snapshot: Unknown symbol dm_get_device
Reading all physical volumes. This may take a while...
insmod: error inserting 'insmod: error inserting '/lib/dm-zero.ko': -1 Unknown s
ymbol in module
/lib/dm-mirror.ko': -1 Unknown symbol in module
insmod: error inserting '/lib/dm-snapshot.ko': -1 Unknown symbol in module
Activating local volumes
Making device nodes
ERROR: /bin/lvm exited abnormally!
Creating root device
Mounting root filesystem
mount: error 19 mounting ext3
mount: error 2 mounting none
Switching to new root
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!
This doesn't look to be related, but it means I can't test.
Aargh, that should be 2.6.12-1.1372_FC3smp and 2.6.11-1.35_FC3smp, sorry.
I am seeing the same error message with 2.6.12-1.1398_FC4smp as in
except that my system is an old Dual PII/266 with
sym0: <895> rev 0x1 at pci 0000:00:09.0 irq 177
sym0: Tekram NVRAM, ID 7, Fast-40, LVD, NO parity
sym0: SCSI BUS has been reset.
Comment 13 & Comment 16: That bug was fixed upstream in 2.6.13-rc4 (or maybe
-rc3). http://bugzilla.kernel.org/show_bug.cgi?id=4786 It will probably take a
while to make it to the Fedora kernel.
Comment 14 & Comment 15: That bug was fixed a couple days ago. See bug 163407.
*** Bug 164995 has been marked as a duplicate of this bug. ***
Fixed in CVS, will be in the next build.