Bug 781625

Summary: kernel-3.1.7 crash due to sym53c8xx module
Product: [Fedora] Fedora Reporter: Ferdinand Badescu <fb.commerce>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 16CC: gansalmon, itamar, jonathan, jwboyer, kernel-maint, madhu.chinakonda, sgruszka
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.41.10-3.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-24 07:56:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The frozen kernel screen with trace codes. none

Description Ferdinand Badescu 2012-01-13 21:27:28 UTC
Created attachment 555148 [details]
The frozen kernel screen with trace codes.

Description of problem:

After the system update and reboot, the newest kernel, kernel-3.1.7-1.fc16.x86_64 crashes at boot time.In order to troubleshoot the problem, I removed the SCSI card from the computer. The SCSI card is a Tekram-DC390U3, and is based on a Symbios Logic 53C1010 chipset. The card uses the sym53c8xx kernel module. Without the card, the kernel boots just fine, GUI and all, and I could log into my account. Re-inserting the card freezes again the kernel at boot time.

As a footnote, this problem existed ever since kernel 3.1.4, but I didn't reported it until now as I thought it will be fixed in a subsequent kernel release. I am currently running kernel-3.1.2-1.fc16.x86_64.

Version-Release number of selected component (if applicable):
kernel-3.1.7-1.fc16.x86_64 (and all other versions after 3.1.2-1.fc16.x86_64)

How reproducible:
Every single time.

Steps to Reproduce:
1. Update the system to the newest kernel - kernel-3.1.7-1
2. Reboot; the kernel freezes at boot time.
3. Power OFF the computer and remove the SCSI card (chipset Symbios 53C1010, kernel module sym53c8xx). 
4. Reboot; the boot process goes through, and can use the computer.
5. Power OFF the computer and re-insert the SCSI card.
6. Reboot; the kernel freezes again.

Actual results:


Expected results:
The kernel boots without a problem with the SCSI card inserted, and the system can be used.

Additional info:
I searched the bugs database, and it looks like this bug has not been submitted before.
I am attaching a jpeg picture of the frozen kernel screen with trace codes.

Comment 1 Stanislaw Gruszka 2012-01-17 12:39:05 UTC
Between 3.1.2 and 3.1.4, I see the only one suspicious commit:

commit bf6f111b5e891b4cfbd4f966488fd824543ba2aa
Author: James Bottomley <James.Bottomley>
Date:   Mon Nov 7 08:51:24 2011 -0600

    fix WARNING: at drivers/scsi/scsi_lib.c:1704

Let's try to revert it. Here is kernel build with patch reverted:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3708451
Please test it (when finish to build, currently it still compile).

Comment 2 Stanislaw Gruszka 2012-01-19 10:28:22 UTC
Info from Ferdinand:

I downloaded and installed two kernels: kernel-3.1.9-1 and kernel-
3.1.9-3.781625.
kernel-3.1.9-1 still crashes - I attached a picture of my screen with the
trace codes.
However, kernel-3.1.9-3.781625 boots correctly; I am currently using it

But "fixed" kernel generate lot's of warnings like below:

> Jan 17 10:33:22 bloemberger kernel: [    1.528148] WARNING: at drivers/scsi/scsi_lib.c:1704 scsi_free_queue+0x69/0x70()
> Jan 17 10:33:22 bloemberger kernel: [    1.528151] Hardware name: System Product Name
> Jan 17 10:33:22 bloemberger kernel: [    1.528153] Modules linked in: firewire_ohci(+) firewire_core pata_via(+) crc_itu_t sata_via(+) sym53c8xx(+) scsi_transport_spi
> Jan 17 10:33:22 bloemberger kernel: [    1.528163] Pid: 278, comm: scsi_scan_1 Not tainted 3.1.9-3.781625.fc16.x86_64 #1
> Jan 17 10:33:22 bloemberger kernel: [    1.528165] Call Trace:
> Jan 17 10:33:22 bloemberger kernel: [    1.528172]  [<ffffffff8106b7ef>] warn_slowpath_common+0x7f/0xc0
> Jan 17 10:33:22 bloemberger kernel: [    1.528176]  [<ffffffff8106b84a>] warn_slowpath_null+0x1a/0x20
> Jan 17 10:33:22 bloemberger kernel: [    1.528180]  [<ffffffff813aad39>] scsi_free_queue+0x69/0x70
> Jan 17 10:33:22 bloemberger kernel: [    1.528183]  [<ffffffff813ab629>] scsi_alloc_sdev+0x239/0x2a0
> Jan 17 10:33:22 bloemberger kernel: [    1.528187]  [<ffffffff813abaa8>] scsi_probe_and_add_lun+0x418/0xda0
> Jan 17 10:33:22 bloemberger kernel: [    1.528191]  [<ffffffff81387719>] ? get_device+0x19/0x20
> Jan 17 10:33:22 bloemberger kernel: [    1.528196]  [<ffffffff8138fa52>] ? internal_container_klist_get+0x12/0x20
> Jan 17 10:33:22 bloemberger kernel: [    1.528204]  [<ffffffffa000349a>] ? spi_host_match+0x1a/0x80 [scsi_transport_spi]
> Jan 17 10:33:22 bloemberger kernel: [    1.528210]  [<ffffffff812aa1ca>] ? kobject_get+0x1a/0x30
> Jan 17 10:33:22 bloemberger kernel: [    1.528214]  [<ffffffff813ac9cd>] __scsi_scan_target+0x12d/0x7a0
> Jan 17 10:33:22 bloemberger kernel: [    1.528218]  [<ffffffff813ad0a7>] scsi_scan_channel.part.2+0x67/0x90
> Jan 17 10:33:22 bloemberger kernel: [    1.528222]  [<ffffffff813ad47a>] scsi_scan_host_selected+0x15a/0x1b0
> Jan 17 10:33:22 bloemberger kernel: [    1.528226]  [<ffffffff813ad570>] ? do_scsi_scan_host+0xa0/0xa0
> Jan 17 10:33:22 bloemberger kernel: [    1.528229]  [<ffffffff813ad561>] do_scsi_scan_host+0x91/0xa0
> Jan 17 10:33:22 bloemberger kernel: [    1.528233]  [<ffffffff813ad595>] do_scan_async+0x25/0x150
> Jan 17 10:33:22 bloemberger kernel: [    1.528236]  [<ffffffff813ad570>] ? do_scsi_scan_host+0xa0/0xa0
> Jan 17 10:33:22 bloemberger kernel: [    1.528240]  [<ffffffff8108df6c>] kthread+0x8c/0xa0
> Jan 17 10:33:22 bloemberger kernel: [    1.528245]  [<ffffffff815df374>] kernel_thread_helper+0x4/0x10
> Jan 17 10:33:22 bloemberger kernel: [    1.528249]  [<ffffffff8108dee0>] ? kthread_worker_fn+0x190/0x190
> Jan 17 10:33:22 bloemberger kernel: [    1.528253]  [<ffffffff815df370>] ? gs_change+0x13/0x13
> Jan 17 10:33:22 bloemberger kernel: [    1.528255] ---[ end trace c29c979256923a2c ]---
> Jan 17 10:33:22 bloemberger kernel: [    1.528307] scsi target1:0:1: Scan at boot disabled in NVRAM

Comment 3 Stanislaw Gruszka 2012-01-19 11:45:25 UTC
I'm not quite sure how the proper fix should looks like, for now I posted the query to the SCSI maintainer: http://marc.info/?l=linux-scsi&m=132697345914537&w=2

Comment 4 Stanislaw Gruszka 2012-01-20 07:44:17 UTC
SCSI maintainer pointed to other fix (not yet in -stable):

commit cced5041ed5a2d1352186510944b0ddfbdbe4c0b
Author: Stratos Psomadakis <psomas>
Date:   Sun Dec 4 02:23:54 2011 +0200

    [SCSI] sym53c8xx: Fix NULL pointer dereference in slave_destroy

Ferdinand, here is kernel build with that fix, please test when finish to compile:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3716955

Comment 5 Ferdinand Badescu 2012-01-21 00:50:58 UTC
kernel-3.2.1-1.bz781625.fc16.x86_64 boots correctly. All the warnings are gone; the "messages" logfile lists only the normal messages - not one single warning. (Yes!)

As a side note, I encountered yet another problem using the previous (i.e., buggy) kernels, that (I think) may be related to this bug: I could not scan on my USB scanner. As soon as I tried to preview the future scan, each one of the three scanning programs I currently use would freeze.

It was not coincidental: Using kernel-3.2.1-1.bz781625 seems to solve that problem, too. The scanner now works just fine - and I'm so happy about it!
Rebooting and using the other kernels (3.1.2-1, and 3.1.9-3.781625) freezes again the scanning program. (The log file for those kernels lists many modules, including usb_storage and sym53c8xx, being linked in.)

Comment 6 Stanislaw Gruszka 2012-01-21 13:33:50 UTC
Patch from comment 4, which fix this bug was CCed to -stable, but seems to be missed in 3.2 -stable queue at least for now, so seems to be reasonable to apply it to fedora.

Moving status to POST.

Comment 7 Josh Boyer 2012-01-23 14:32:54 UTC
Thanks Stanislaw.  I'll get this rolled into f16 today.

Comment 8 Josh Boyer 2012-01-23 14:56:16 UTC
Patch applied in Fedora git.  Should be in the next update submitted.

Comment 9 Fedora Update System 2012-01-23 18:31:42 UTC
kernel-3.2.1-3.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.2.1-3.fc16

Comment 10 Fedora Update System 2012-01-23 18:35:49 UTC
kernel-2.6.41.10-3.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.41.10-3.fc15

Comment 11 Fedora Update System 2012-01-24 01:42:24 UTC
Package kernel-2.6.41.10-3.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.41.10-3.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0861/kernel-2.6.41.10-3.fc15
then log in and leave karma (feedback).

Comment 12 Fedora Update System 2012-01-24 07:56:01 UTC
kernel-3.2.1-3.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2012-01-24 19:57:44 UTC
kernel-2.6.41.10-3.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.