From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2 Description of problem: I noticed that in Update 1, your linux-2.6.9-scsi-inverted-refcounting.patch fixes an "inverted refcounting" problem in sd.c. The same problem also exists in sr.c. Here's a patch that fixes it. --- sr.c.orig 2005-06-23 12:38:10.000000000 -0400 +++ sr.c 2005-06-13 17:32:29.000000000 -0400 @@ -155,9 +155,11 @@ static inline struct scsi_cd *scsi_cd_ge static inline void scsi_cd_put(struct scsi_cd *cd) { + struct scsi_device *sdev = cd->device; + down(&sr_ref_sem); kref_put(&cd->kref, sr_kref_release); - scsi_device_put(cd->device); + scsi_device_put(sdev); up(&sr_ref_sem); } Version-Release number of selected component (if applicable): kernel-2.6.9-5.EL How reproducible: Didn't try Steps to Reproduce: Additional info:
Ok, slab debugging was what was needed. This popped up in dmesg after attempting to reproduce on a kernel with CONFIG_DEBUG_SLAB. I'll build a kernel with the above patch and make certain that it fixes the problem here: scsi0 (6:0): rejecting I/O to dead device SCSI error: host 0 id 6 lun 0 return code = 4000000 Sense class 0, sense error 0, extended sense 0 scsi0 (6:0): rejecting I/O to dead device sr0: CDROM (ioctl) error, command: Xpwrite, Read disk info 00 00 00 00 00 00 00 02 00 sr: old sense key No Sense Non-extended sense class 0 code 0x0 Unable to handle kernel paging request at virtual address 6b6b6b6b printing eip: e0145ce1 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: i915 nfsd exportfs lockd nfs_acl parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc md5 ipv6 button battery ac uhci_hcd ehci_hcd snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore e1000 sr_mod aic7xxx scsi_mod dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod CPU: 1 EIP: 0060:[<e0145ce1>] Not tainted VLI EFLAGS: 00010286 (2.6.9-41.EL.TEST.bz161591.1smp) EIP is at scsi_device_put+0x3/0x40 [scsi_mod] eax: 6b6b6b6b ebx: 6b6b6b6b ecx: 6b76f680 edx: c17f86e0 esi: c17f86e4 edi: cc425964 ebp: df65aab4 esp: cc204e48 ds: 007b es: 007b ss: 0068 Process sr_open (pid: 10536, threadinfo=cc204000 task=df12b330) Stack: e007cde8 e00796c0 cc204000 cc425964 c0162c56 cc4259d8 00000000 d3f69784 c15699c0 def23190 ded2cf08 c015c8da d3f69784 00000000 df069db0 00000001 c015b4f9 df069db0 00000001 0000000c c0123b5b df12b870 df069db0 df12b330 Call Trace: [<e00796c0>] sr_block_release+0x59/0x6d [sr_mod] [<c0162c56>] blkdev_put+0x8d/0x18f [<c015c8da>] __fput+0x55/0x100 [<c015b4f9>] filp_close+0x59/0x5f [<c0123b5b>] put_files_struct+0x57/0xc0 [<c012476f>] do_exit+0x245/0x404 [<c0124a19>] sys_exit_group+0x0/0xd [<c012cd46>] get_signal_to_deliver+0x31e/0x346 [<c0105bd4>] do_signal+0x55/0xd9 [<c0129f38>] del_timer+0x5d/0x65 [<c0129fe4>] del_singleshot_timer_sync+0x8/0x21 [<c02d3d48>] schedule_timeout+0x140/0x154 [<c012a6de>] process_timeout+0x0/0x5 [<c012a85a>] sys_nanosleep+0x167/0x1a1 [<c0105c80>] do_notify_resume+0x28/0x38 [<c02d55ba>] work_notifysig+0x13/0x15 Code: 8b 40 10 74 0e c1 e0 07 8d 04 02 ff 80 00 01 00 00 eb 0e 89 f0 e8 7c 9d 0d e0 ba fa ff ff ff eb 02 31 d2 5b 89 d0 5e c3 53 89 c3 <8b> 00 8b 40 74 8b 10 85 d2 74 26 b8 00 f0 ff ff 21 e0 8b 40 10 <0>Fatal exception: panic in 5 seconds
Patch seems to fix problem so I'll propose it internally.
Created attachment 132379 [details] patch that seems to correct problem Here's the patch sent by IBM that seems to correct the issue.
committed in stream U5 build 42.7. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html