Bug 781625 - kernel-3.1.7 crash due to sym53c8xx module
Summary: kernel-3.1.7 crash due to sym53c8xx module
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Stanislaw Gruszka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-13 21:27 UTC by Ferdinand Badescu
Modified: 2012-01-24 19:57 UTC (History)
7 users (show)

Fixed In Version: kernel-2.6.41.10-3.fc15
Clone Of:
Environment:
Last Closed: 2012-01-24 07:56:01 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
The frozen kernel screen with trace codes. (129.34 KB, image/jpeg)
2012-01-13 21:27 UTC, Ferdinand Badescu
no flags Details

Description Ferdinand Badescu 2012-01-13 21:27:28 UTC
Created attachment 555148 [details]
The frozen kernel screen with trace codes.

Description of problem:

After the system update and reboot, the newest kernel, kernel-3.1.7-1.fc16.x86_64 crashes at boot time.In order to troubleshoot the problem, I removed the SCSI card from the computer. The SCSI card is a Tekram-DC390U3, and is based on a Symbios Logic 53C1010 chipset. The card uses the sym53c8xx kernel module. Without the card, the kernel boots just fine, GUI and all, and I could log into my account. Re-inserting the card freezes again the kernel at boot time.

As a footnote, this problem existed ever since kernel 3.1.4, but I didn't reported it until now as I thought it will be fixed in a subsequent kernel release. I am currently running kernel-3.1.2-1.fc16.x86_64.

Version-Release number of selected component (if applicable):
kernel-3.1.7-1.fc16.x86_64 (and all other versions after 3.1.2-1.fc16.x86_64)

How reproducible:
Every single time.

Steps to Reproduce:
1. Update the system to the newest kernel - kernel-3.1.7-1
2. Reboot; the kernel freezes at boot time.
3. Power OFF the computer and remove the SCSI card (chipset Symbios 53C1010, kernel module sym53c8xx). 
4. Reboot; the boot process goes through, and can use the computer.
5. Power OFF the computer and re-insert the SCSI card.
6. Reboot; the kernel freezes again.

Actual results:


Expected results:
The kernel boots without a problem with the SCSI card inserted, and the system can be used.

Additional info:
I searched the bugs database, and it looks like this bug has not been submitted before.
I am attaching a jpeg picture of the frozen kernel screen with trace codes.

Comment 1 Stanislaw Gruszka 2012-01-17 12:39:05 UTC
Between 3.1.2 and 3.1.4, I see the only one suspicious commit:

commit bf6f111b5e891b4cfbd4f966488fd824543ba2aa
Author: James Bottomley <James.Bottomley>
Date:   Mon Nov 7 08:51:24 2011 -0600

    fix WARNING: at drivers/scsi/scsi_lib.c:1704

Let's try to revert it. Here is kernel build with patch reverted:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3708451
Please test it (when finish to build, currently it still compile).

Comment 2 Stanislaw Gruszka 2012-01-19 10:28:22 UTC
Info from Ferdinand:

I downloaded and installed two kernels: kernel-3.1.9-1 and kernel-
3.1.9-3.781625.
kernel-3.1.9-1 still crashes - I attached a picture of my screen with the
trace codes.
However, kernel-3.1.9-3.781625 boots correctly; I am currently using it

But "fixed" kernel generate lot's of warnings like below:

> Jan 17 10:33:22 bloemberger kernel: [    1.528148] WARNING: at drivers/scsi/scsi_lib.c:1704 scsi_free_queue+0x69/0x70()
> Jan 17 10:33:22 bloemberger kernel: [    1.528151] Hardware name: System Product Name
> Jan 17 10:33:22 bloemberger kernel: [    1.528153] Modules linked in: firewire_ohci(+) firewire_core pata_via(+) crc_itu_t sata_via(+) sym53c8xx(+) scsi_transport_spi
> Jan 17 10:33:22 bloemberger kernel: [    1.528163] Pid: 278, comm: scsi_scan_1 Not tainted 3.1.9-3.781625.fc16.x86_64 #1
> Jan 17 10:33:22 bloemberger kernel: [    1.528165] Call Trace:
> Jan 17 10:33:22 bloemberger kernel: [    1.528172]  [<ffffffff8106b7ef>] warn_slowpath_common+0x7f/0xc0
> Jan 17 10:33:22 bloemberger kernel: [    1.528176]  [<ffffffff8106b84a>] warn_slowpath_null+0x1a/0x20
> Jan 17 10:33:22 bloemberger kernel: [    1.528180]  [<ffffffff813aad39>] scsi_free_queue+0x69/0x70
> Jan 17 10:33:22 bloemberger kernel: [    1.528183]  [<ffffffff813ab629>] scsi_alloc_sdev+0x239/0x2a0
> Jan 17 10:33:22 bloemberger kernel: [    1.528187]  [<ffffffff813abaa8>] scsi_probe_and_add_lun+0x418/0xda0
> Jan 17 10:33:22 bloemberger kernel: [    1.528191]  [<ffffffff81387719>] ? get_device+0x19/0x20
> Jan 17 10:33:22 bloemberger kernel: [    1.528196]  [<ffffffff8138fa52>] ? internal_container_klist_get+0x12/0x20
> Jan 17 10:33:22 bloemberger kernel: [    1.528204]  [<ffffffffa000349a>] ? spi_host_match+0x1a/0x80 [scsi_transport_spi]
> Jan 17 10:33:22 bloemberger kernel: [    1.528210]  [<ffffffff812aa1ca>] ? kobject_get+0x1a/0x30
> Jan 17 10:33:22 bloemberger kernel: [    1.528214]  [<ffffffff813ac9cd>] __scsi_scan_target+0x12d/0x7a0
> Jan 17 10:33:22 bloemberger kernel: [    1.528218]  [<ffffffff813ad0a7>] scsi_scan_channel.part.2+0x67/0x90
> Jan 17 10:33:22 bloemberger kernel: [    1.528222]  [<ffffffff813ad47a>] scsi_scan_host_selected+0x15a/0x1b0
> Jan 17 10:33:22 bloemberger kernel: [    1.528226]  [<ffffffff813ad570>] ? do_scsi_scan_host+0xa0/0xa0
> Jan 17 10:33:22 bloemberger kernel: [    1.528229]  [<ffffffff813ad561>] do_scsi_scan_host+0x91/0xa0
> Jan 17 10:33:22 bloemberger kernel: [    1.528233]  [<ffffffff813ad595>] do_scan_async+0x25/0x150
> Jan 17 10:33:22 bloemberger kernel: [    1.528236]  [<ffffffff813ad570>] ? do_scsi_scan_host+0xa0/0xa0
> Jan 17 10:33:22 bloemberger kernel: [    1.528240]  [<ffffffff8108df6c>] kthread+0x8c/0xa0
> Jan 17 10:33:22 bloemberger kernel: [    1.528245]  [<ffffffff815df374>] kernel_thread_helper+0x4/0x10
> Jan 17 10:33:22 bloemberger kernel: [    1.528249]  [<ffffffff8108dee0>] ? kthread_worker_fn+0x190/0x190
> Jan 17 10:33:22 bloemberger kernel: [    1.528253]  [<ffffffff815df370>] ? gs_change+0x13/0x13
> Jan 17 10:33:22 bloemberger kernel: [    1.528255] ---[ end trace c29c979256923a2c ]---
> Jan 17 10:33:22 bloemberger kernel: [    1.528307] scsi target1:0:1: Scan at boot disabled in NVRAM

Comment 3 Stanislaw Gruszka 2012-01-19 11:45:25 UTC
I'm not quite sure how the proper fix should looks like, for now I posted the query to the SCSI maintainer: http://marc.info/?l=linux-scsi&m=132697345914537&w=2

Comment 4 Stanislaw Gruszka 2012-01-20 07:44:17 UTC
SCSI maintainer pointed to other fix (not yet in -stable):

commit cced5041ed5a2d1352186510944b0ddfbdbe4c0b
Author: Stratos Psomadakis <psomas>
Date:   Sun Dec 4 02:23:54 2011 +0200

    [SCSI] sym53c8xx: Fix NULL pointer dereference in slave_destroy

Ferdinand, here is kernel build with that fix, please test when finish to compile:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3716955

Comment 5 Ferdinand Badescu 2012-01-21 00:50:58 UTC
kernel-3.2.1-1.bz781625.fc16.x86_64 boots correctly. All the warnings are gone; the "messages" logfile lists only the normal messages - not one single warning. (Yes!)

As a side note, I encountered yet another problem using the previous (i.e., buggy) kernels, that (I think) may be related to this bug: I could not scan on my USB scanner. As soon as I tried to preview the future scan, each one of the three scanning programs I currently use would freeze.

It was not coincidental: Using kernel-3.2.1-1.bz781625 seems to solve that problem, too. The scanner now works just fine - and I'm so happy about it!
Rebooting and using the other kernels (3.1.2-1, and 3.1.9-3.781625) freezes again the scanning program. (The log file for those kernels lists many modules, including usb_storage and sym53c8xx, being linked in.)

Comment 6 Stanislaw Gruszka 2012-01-21 13:33:50 UTC
Patch from comment 4, which fix this bug was CCed to -stable, but seems to be missed in 3.2 -stable queue at least for now, so seems to be reasonable to apply it to fedora.

Moving status to POST.

Comment 7 Josh Boyer 2012-01-23 14:32:54 UTC
Thanks Stanislaw.  I'll get this rolled into f16 today.

Comment 8 Josh Boyer 2012-01-23 14:56:16 UTC
Patch applied in Fedora git.  Should be in the next update submitted.

Comment 9 Fedora Update System 2012-01-23 18:31:42 UTC
kernel-3.2.1-3.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.2.1-3.fc16

Comment 10 Fedora Update System 2012-01-23 18:35:49 UTC
kernel-2.6.41.10-3.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.41.10-3.fc15

Comment 11 Fedora Update System 2012-01-24 01:42:24 UTC
Package kernel-2.6.41.10-3.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.41.10-3.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0861/kernel-2.6.41.10-3.fc15
then log in and leave karma (feedback).

Comment 12 Fedora Update System 2012-01-24 07:56:01 UTC
kernel-3.2.1-3.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2012-01-24 19:57:44 UTC
kernel-2.6.41.10-3.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.