Bug 144111 - SMP kernel hangs when loading sym53c8xx.ko module
SMP kernel hangs when loading sym53c8xx.ko module
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
3
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
: 164995 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-01-04 12:09 EST by J. Ali Harlow
Modified: 2015-01-04 17:14 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-08-29 21:47:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg output from non-SMP boot (15.05 KB, text/plain)
2005-01-04 12:11 EST, J. Ali Harlow
no flags Details
Contents of /proc/pci (2.66 KB, text/plain)
2005-01-04 12:11 EST, J. Ali Harlow
no flags Details
Contents of /proc/scsi/scsi (1.10 KB, text/plain)
2005-01-04 12:12 EST, J. Ali Harlow
no flags Details
Hand copied output from hung SMP kernel (837 bytes, text/plain)
2005-01-04 12:13 EST, J. Ali Harlow
no flags Details

  None (edit)
Description J. Ali Harlow 2005-01-04 12:09:57 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
Fedora Core 3 hangs when trying to boot the SMP kernel somewhere in
the SCSI initialization process. The non-SMP kernel boots with no
problem. This system (a dual Xeon HP Kayak XU) used to run the RedHat
7.2 SMP kernel with no problem but I've changed the SCSI device chain
since then (added an external disk box) so I'm not sure how useful
that information is.

Version-Release number of selected component (if applicable):
kernel-2.6.9-1.724_FC3

How reproducible:
Always

Steps to Reproduce:
1. Boot and allow grub to start the default (SMP) kernel
2.
3.
    

Actual Results:  The system hangs

Expected Results:  The boot process should have completed

Additional info:
Comment 1 J. Ali Harlow 2005-01-04 12:11:04 EST
Created attachment 109334 [details]
dmesg output from non-SMP boot
Comment 2 J. Ali Harlow 2005-01-04 12:11:55 EST
Created attachment 109335 [details]
Contents of /proc/pci
Comment 3 J. Ali Harlow 2005-01-04 12:12:29 EST
Created attachment 109336 [details]
Contents of /proc/scsi/scsi
Comment 4 J. Ali Harlow 2005-01-04 12:13:35 EST
Created attachment 109337 [details]
Hand copied output from hung SMP kernel
Comment 5 Paul Iadonisi 2005-01-06 08:29:25 EST
This looks like it could be a duplicate of bug 144246.

I'm having almost the same problem, but my SMP system (dual 700MHz
Pentium 3) with a Symbios 53c895 boots up to the text login prompt and
then gets stuck in a loop spitting out these messages (hand copied):

sym0:0:0: ABORT operation started.
sym0:0:0: ABORT operation timed-out.
sym0:0:1: ABORT operation started.
sym0:0:1: ABORT operation timed-out.

X never starts and though I can type a username at the login: prompt,
I never get Password: prompt, just the above messages continuously.

My boot disks (software raid1) are both on the sym53c895 scsi bus.
Comment 6 J. Ali Harlow 2005-01-06 12:41:26 EST
(Paul: Could be. Very hard for me to say.)

I took the oportunity of rebooting to try the SMP kernel without the
external disk box connected, but it still hangs so it appears that
this is a regression from RedHat 7.2 after all.
Comment 7 Tom Duffy 2005-03-08 14:30:24 EST
I am seeing this as well on an old dual P3 733Mhz VA Linux box.
Comment 8 Lonni J Friedman 2005-03-08 14:50:47 EST
I honestly don't understand why this is marked as a kernel issue when killing
the haldaemon serves as a workaround.  Seems like a haldaemon bug to me.
Comment 9 J. Ali Harlow 2005-03-09 06:18:22 EST
> I honestly don't understand why this is marked as a kernel issue when killing
> the haldaemon serves as a workaround.  Seems like a haldaemon bug to me.

What makes you think that killing hald solves the problem? Plus, how do you do
that when the system hangs on boot?
Comment 10 Lonni J Friedman 2005-03-09 09:05:43 EST
Remove haldaemon from all run level startups, reboot, problem goes away.  I've
verified this on two different systems.
Comment 11 Paul Iadonisi 2005-03-09 11:54:38 EST
  I don't have the references at the moment, but a short time ago I did little
googling around and did find some discussion of this problem on lkml and maybe
elsewhere.  It seems it's been around for some time but has only recently become
more serious due to haldaemon.  As best as I can remember, it has to do with
buggy hardware in that certain SCSI cards, when some registers are *read*,
believe it or not, the parity error is triggered.
  Sorry I don't have more details, but from what I can remember, this problem is
not uniquely caused by hal and very likley cannot be fix in that code.  I don't
know enough about the details how it can be fixed in the kernel, but I can't
think of anywhere else that problem could possibly be addressed.
Comment 12 Dave Jones 2005-07-15 15:39:01 EDT
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.
Comment 13 Tom Duffy 2005-07-15 16:26:21 EDT
It no longer hangs in FC4.  Instead now, it panics:

sym0: <896> rev 0x7 at pci 0000:01:05.0 irq 145
sym0: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: open drain IRQ line driver
sym0: using LOAD/STORE-based firmware.
sym0: handling phase mismatch from SCRIPTS.
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.0
  Vendor: HITACHI   Model: DK32CJ-36MW       Rev: JBBB
  Type:   Direct-Access                      ANSI SCSI revision: 03
 target0:0:0: tagged command queuing enabled, command queue depth 16.
 target0:0:0: Beginning Domain Validation
Debug: sleeping function called from invalid context at mm/slab.c:2126
in_atomic():0, irqs_disabled():1
 [<c014ce0a>] __kmalloc+0x89/0x8b
 [<c014cf2f>] kcalloc+0x16/0x4d
 [<f887cf31>] sym_alloc_lcb_tags+0x6a/0x106 [sym53c8xx]
 [<f887c9b6>] sym_get_ccb+0x272/0x282 [sym53c8xx]
 [<c01493bc>] __alloc_pages+0xcd/0x401
 [<f88381b7>] scsi_get_command+0xb2/0xc5 [scsi_mod]
 [<f8874bbb>] sym_queue_command+0x89/0xd3 [sym53c8xx]
 [<f8874f60>] sym53c8xx_queue_command+0x64/0x6c [sym53c8xx]
 [<f8838782>] scsi_dispatch_cmd+0x170/0x31a [scsi_mod]
 [<c024147e>] elv_next_request+0x53/0x153
 [<f883e4e1>] scsi_request_fn+0x1d5/0x3ba [scsi_mod]
 [<c0243f1b>] blk_insert_request+0x8f/0xbb
 [<f883d22b>] scsi_insert_special_req+0x2b/0x32 [scsi_mod]
 [<f883d46b>] scsi_wait_req+0x82/0xb3 [scsi_mod]
 [<f883d36a>] scsi_wait_done+0x0/0x66 [scsi_mod]
 [<f881c0cb>] spi_wait_req+0x3b/0x64 [scsi_transport_spi]
 [<f881d23a>] spi_dv_device_compare_inquiry+0xb9/0x102 [scsi_transport_spi]
 [<f881d4e6>] spi_dv_device_internal+0x6a/0x280 [scsi_transport_spi]
 [<f881d7db>] spi_dv_device+0xdf/0x14a [scsi_transport_spi]
 [<f8875610>] sym53c8xx_slave_configure+0xb3/0xf0 [sym53c8xx]
 [<f883f996>] scsi_add_lun+0x19e/0x32a [scsi_mod]
 [<f883fc8a>] scsi_probe_and_add_lun+0x168/0x230 [scsi_mod]
 [<f884046c>] scsi_scan_target+0xc3/0x13b [scsi_mod]
 [<f8840564>] scsi_scan_channel+0x80/0x95 [scsi_mod]
 [<f884061f>] scsi_scan_host_selected+0xa6/0x11e [scsi_mod]
 [<f88394ec>] scsi_add_host+0x185/0x1a7 [scsi_mod]
 [<f88406b8>] scsi_scan_host+0x21/0x25 [scsi_mod]
 [<f8876936>] sym2_probe+0xe0/0xfd [sym53c8xx]
 [<c01de1f5>] pci_device_probe_static+0x25/0x31
 [<c01de221>] __pci_device_probe+0x20/0x30
 [<c01de24c>] pci_device_probe+0x1b/0x32
 [<c023cc98>] driver_probe_device+0x21/0x55
 [<c023cdb6>] driver_attach+0x4f/0x85
 [<c01d2ed3>] kobject_register+0x2e/0x59
 [<c023d1c6>] bus_add_driver+0x88/0xb6
 [<c01de3f7>] pci_register_driver+0x73/0x94
 [<f880d028>] sym2_init+0x28/0x40 [sym53c8xx]
 [<c013b87b>] sys_init_module+0xd8/0x20f
 [<c0161868>] filp_close+0x4f/0x6d
 [<c0104025>] syscall_call+0x7/0xb
 target0:0:0: asynchronous.
WIDTH IS 1
 target0:0:0: wide asynchronous.
 target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)

This was on:

Linux version 2.6.12-1.1398_FC4smp (bhcompile@tweety.build.redhat.com) (gcc
version 4.0.0 20050519 (Red Hat 4.0.0-8)) #1 SMP Fri Jul 15 01:30:13 EDT 2005
Comment 14 J. Ali Harlow 2005-07-18 07:11:44 EDT
2.6.12-1.1372_FC3 also hangs for me even with acpi turned off (which works as a
workaround for 2.6.11-1.35_FC3). Hand copied output:

dm_snapshot: Unknown symbol dm_vcalloc
dm_snapshot: Unknown symbol dm_table_get_mode
dm_snapshot: Unknown symbol dm_get_device
  Reading all physical volumes.  This may take a while...
insmod: error inserting 'insmod: error inserting '/lib/dm-zero.ko': -1 Unknown s
ymbol in module
/lib/dm-mirror.ko': -1 Unknown symbol in module
insmod: error inserting '/lib/dm-snapshot.ko': -1 Unknown symbol in module
Activating local volumes
Making device nodes
ERROR: /bin/lvm exited abnormally!
Creating root device
Mounting root filesystem
mount: error 19 mounting ext3
mount: error 2 mounting none
Switching to new root
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!
 [<c0120e85>] panic+0x42/0x1ca
 [<c0121ff1>] profile_task_exit+0x31/0x45
 [<c0123d8d>] do_exit+0x252/0x35a
 [<c0123eb5>] next_thread+0x0/0xc
 [<c0103fd9>] syscall_call+0x7/0xb

This doesn't look to be related, but it means I can't test.
Comment 15 J. Ali Harlow 2005-07-18 07:13:02 EDT
Aargh, that should be 2.6.12-1.1372_FC3smp and 2.6.11-1.35_FC3smp, sorry.
Comment 16 Ralf Corsepius 2005-07-22 17:59:38 EDT
I am seeing the same error message with 2.6.12-1.1398_FC4smp  as in
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=144111#13
except that my system is an old Dual PII/266 with

sym0: <895> rev 0x1 at pci 0000:00:09.0 irq 177
sym0: Tekram NVRAM, ID 7, Fast-40, LVD, NO parity
sym0: SCSI BUS has been reset.
Comment 17 Dan Carpenter 2005-07-30 04:56:34 EDT
Comment 13 & Comment 16:  That bug was fixed upstream in 2.6.13-rc4 (or maybe
-rc3).  http://bugzilla.kernel.org/show_bug.cgi?id=4786  It will probably take a
while to make it to the Fedora kernel.

Comment 14 & Comment 15:  That bug was fixed a couple days ago.  See bug 163407.




Comment 18 Dave Jones 2005-08-04 01:00:41 EDT
*** Bug 164995 has been marked as a duplicate of this bug. ***
Comment 19 Dave Jones 2005-08-26 03:00:53 EDT
Fixed in CVS, will be in the next build.

Note You need to log in before you can comment on or make changes to this bug.