Bug 144246

Summary: Kernel oops due to cranky 53c810 scsi card
Product: [Fedora] Fedora Reporter: Rob Kearey <rkearey>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED CANTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-03 01:12:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rob Kearey 2005-01-05 10:16:48 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041111 Firefox/1.0

Description of problem:
Upon booting, sym53c8xx is receiving scsi parity errors from my old ncr/symbios 53c810 card. I get the following kernel oops:

sym0: SCSI parity error detected: SCR1=132 DBC=50000000 SBCL=0
sym0:6: ERROR (81:0) (8-10-0) (0/3/0) @ (scripta 20:10000000).
sym0: script cmd = da001034
sym0: regdump: da 00 40 13 47 00 06 1f 00 08 0f 10 80 00 0f 12 ff b0 4a 16 02 ff ff ff.
sym0: suspicious SCSI data while resetting the BUS.
sym0: dp0,d7-0,rst,req,ack,bsy,sel,atn,msg,c/d,i/o = 0x110, expecting 0x100
sym0: SCSI BUS reset detected.
sym0: SCSI BUS has been reset.
sym0:0: ERROR (a0:0) (8-10-0) (0/3/0) @ (mem 10000008:10000000).
sym0: regdump: ca 00 40 13 47 00 00 1f 00 08 0f 10 80 00 0f 12 00 b0 4a 16 20 ff ff ff.
sym0: PCI STATUS = 0xb000
sym0: SCSI BUS reset detected.
sym0: SCSI BUS has been reset.
[drm] Initialized mga 3.1.0 20021029 on minor 0:
agpgart: Found an AGP 1.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 1x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 1x mode
sym0:6:0: ABORT operation started.
sym0: SCSI BUS reset detected.
sym0: SCSI BUS has been reset.
sym0:6:0: ABORT operation complete.
sym0:6:0: ABORT operation started.
sym0:6:0: ABORT operation failed.
sym0:6:0: DEVICE RESET operation started.
sym0:6:0: DEVICE RESET operation complete.
sym0: SCSI BUS reset detected.
sym0: SCSI BUS has been reset.
sym0:6:0: ABORT operation started.
sym0:6:0: ABORT operation failed.
sym0:6:0: BUS RESET operation started.
sym0: suspicious SCSI data while resetting the BUS.
sym0: dp0,d7-0,rst,req,ack,bsy,sel,atn,msg,c/d,i/o = 0x1ffff, expecting 0x100
sym0:6:0: BUS RESET operation complete.
sym0: SCSI BUS reset detected.
sym0: SCSI BUS has been reset.
sym0: SCSI BUS reset detected.
sym0: SCSI BUS has been reset.
sym0:6:0: HOST RESET operation started.
sym0: suspicious SCSI data while resetting the BUS.
sym0: dp0,d7-0,rst,req,ack,bsy,sel,atn,msg,c/d,i/o = 0x1ffff, expecting 0x100
sym0: SCSI BUS has been reset.
sym0:6:0: HOST RESET operation complete.
sym0: SCSI BUS reset detected.
sym0: SCSI BUS has been reset.
scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 6 lun 0
scsi0 (6:0): rejecting I/O to offline device
scsi0 (6:0): rejecting I/O to offline device
scsi0 (6:0): rejecting I/O to offline device
SCSI error: host 0 id 6 lun 0 return code = 4000000
        Sense class 0, sense error 0, extended sense 0
Unable to handle kernel paging request at virtual address d503ef80
 printing eip:
d88a362b
*pde = 00054067
Oops: 0000 [#1]
DEBUG_PAGEALLOC
Modules linked in: mga md5 ipv6 parport_pc lp parport autofs4 rfcomm l2cap bluetooth sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables microcode dm_mod uhci_hcd i2c_piix4 i2c_core snd_ens1371 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc gameport 8139too mii de2104x floppy sr_mod ext3 jbd sym53c8xx scsi_transport_spi sd_mod scsi_mod
CPU:    0
EIP:    0060:[<d88a362b>]    Not tainted VLI
EFLAGS: 00010286   (2.6.10-1.727_FC3)
EIP is at sr_block_open+0xad/0xc8 [sr_mod]
eax: 00000002   ebx: fffffffa   ecx: 0e4ef000   edx: 00000640
esi: d503ef78   edi: c9f1ae44   ebp: c9d4bf54   esp: cb5feefc
ds: 007b   es: 007b   ss: 0068
Process hald (pid: 3617, threadinfo=cb5fe000 task=ceb45aa0)
Stack: c9f1adb0 c9d4bf54 c9f1adb0 d88a7200 c016779b d5deddf8 c9d4bf54 00000000
       c9d4bf54 c9d4bf54 c9f1adb0 ffffffe9 c0167a51 c9d4bf54 d4da5e44 c137ab20
       c015ddde d4d6af58 cb5fef58 00008880 d72f6000 cb5fe000 c015dd1b d4d6af58
Call Trace:
 [<c016779b>] do_open+0x8f/0x2c3
 [<c0167a51>] blkdev_open+0x1a/0x42
 [<c015ddde>] dentry_open+0xbd/0x180
 [<c015dd1b>] filp_open+0x36/0x3c
 [<c01d4b94>] strncpy_from_user+0x37/0x56
 [<c015e1e5>] sys_open+0x31/0x7d
 [<c0103337>] syscall_call+0x7/0xb
Code: c3 74 3c ba 6b 00 00 00 b8 15 52 8a d8 e8 3f 4f 87 e7 ff 0d 60 70 8a d8 0f 88 bc 0a 00 00 ba 74 3f 8a d8 8d 46 68 e8 4b ed 92 e7 <8b> 46 08 e8 dd 5c fb ff ff 05 60 70 8a d8 0f 8e ab 0a 00 00 89

Possibly just a dying card, but the oops is included here out of completeness.



Version-Release number of selected component (if applicable):
2.6.10-1.727_FC3

How reproducible:
Sometimes

Steps to Reproduce:
1. Install card.
2. Reboot.
3. Prang.
  

Actual Results:  Kernel oops, SCSI device still seems usable.

Expected Results:  No kernel oops.

Additional info:

Comment 1 Rob Kearey 2005-01-05 10:58:11 UTC
And a further oops on trying an rmmod 53c8xx, which segfaulted:

Unable to handle kernel NULL pointer dereference at virtual address 00000008
 printing eip:
c019bc1d
*pde = 00000000
Oops: 0000 [#2]
DEBUG_PAGEALLOC
Modules linked in: mga md5 ipv6 parport_pc lp parport autofs4 rfcomm l2cap bluet
ooth sunrpc microcode dm_mod uhci_hcd i2c_piix4 i2c_core snd_ens1371 snd_rawmidi
 snd_seq_device snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd s
oundcore snd_page_alloc gameport 8139too mii de2104x floppy sr_mod ext3 jbd sym5
3c8xx scsi_transport_spi sd_mod scsi_mod
CPU:    0
EIP:    0060:[<c019bc1d>]    Not tainted VLI
EFLAGS: 00210282   (2.6.10-1.727_FC3)
EIP is at del_gendisk+0x2/0xa9
eax: 00000000   ebx: d503ef78   ecx: d88a7008   edx: d88a4076
esi: d64a2dac   edi: d88a6fa4   ebp: caff6000   esp: caff6eb0
ds: 007b   es: 007b   ss: 0068
Process rmmod (pid: 5297, threadinfo=caff6000 task=cee5eaa0)
Stack: d503ef78 d88a6fa4 d88a4082 d64a2dd0 c023978c d8875120 d887516c d64a2dac
       c0239957 d64a2dac c03601a8 d64fedfc c0238c55 d64a2bf8 d64a2dac d64d9bf8
       d8861201 d64d9bf8 00200282 d64d9bf0 d88606ad d64d9bf8 d64d9bf8 d64ab000
Call Trace:
 [<d88a4082>] sr_remove+0xc/0x44 [sr_mod]
 [<c023978c>] device_release_driver+0x4c/0x57
 [<c0239957>] bus_remove_device+0x51/0x8a
 [<c0238c55>] device_del+0x66/0x87
 [<d8861201>] scsi_remove_device+0x5b/0x8e [scsi_mod]
 [<d88606ad>] scsi_forget_host+0xc6/0x19c [scsi_mod]
 [<d8859813>] scsi_remove_host+0x8/0x4b [scsi_mod]
 [<d8879d8e>] sym2_remove+0x18/0x3e [sym53c8xx]
 [<c01db29c>] pci_device_remove+0x16/0x28
 [<c023978c>] device_release_driver+0x4c/0x57
 [<c02397af>] driver_detach+0x18/0x1f
 [<c0239b33>] bus_remove_driver+0x41/0x75
 [<c0239f63>] driver_unregister+0x8/0x23
 [<c01db44a>] pci_unregister_driver+0xb/0x13
 [<d88804ea>] sym2_exit+0xa/0x1a [sym53c8xx]
 [<c0133c38>] sys_delete_module+0x125/0x15d
 [<c015105c>] unmap_vma_list+0xe/0x17
 [<c015140b>] do_munmap+0x1c8/0x1d2
 [<c0103337>] syscall_call+0x7/0xb
Code: e8 af f9 fa ff 89 fa 83 e2 07 c1 e2 09 01 d0 eb 13 89 d8 e8 56 ba fa ff 8b
 54 24 14 31 c0 c7 02 00 00 00 00 5b 5e 5f 5d c3 57 53 <8b> 78 08 89 c3 4f 85 ff
 7e 14 89 fa 89 d8 e8 45 70 0a 00 89 fa

This may well be an it-hurts-when-I-do-this thing.


Comment 2 Dave Jones 2005-07-15 19:02:01 UTC
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 3 Dave Jones 2005-10-03 01:12:00 UTC
This bug has been automatically closed as part of a mass update.
It had been in NEEDINFO state since July 2005.
If this bug still exists in current errata kernels, please reopen this bug.

There are a large number of inactive bugs in the database, and this is the only
way to purge them.

Thank you.