Bug 139665
Summary: | External USB DVD-RW causes a kernel OOPS | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Srihari Vijayaraghavan <noldoli> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED CANTFIX | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | CC: | nixuser, pfrields, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-10-03 01:18:27 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Srihari Vijayaraghavan
2004-11-17 10:39:27 UTC
Same thing happens to me with an external USB CD writer. With vanilla 2.6.10-rc2 on FC3, Kernel does not oops due to on and off of this device. Also I did a test DVD burn (FC3 DVD image itself), and it works great too. Thank you. Hari. I think I spoke a bit too soon. Although kernel.org's 2.6.10-rc2 did not crash during on/off, burning a DVD etc., it did crash once, which I am unable to reproduce despite my sincere efforts. Here is that oops message: ##### Starts Here ##### usb 1-2: USB disconnect, address 4 target4:0:0: Illegal state transition <NULL>->cancel Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1717 Call Trace:<ffffffffa0006828>{:scsi_mod:scsi_device_set_state+264} <ffffffffa00010d9>{:scsi_mod:scsi_device_cancel+41} <ffffffff8018ae27>{simple_rmdir+55} <ffffffffa00011f0>{:scsi_mod:scsi_device_cancel_cb+0} <ffffffff80214aa1>{device_for_each_child+81} <ffffffffa000122f>{:scsi_mod:scsi_host_cancel+47} <ffffffff80214a09>{device_del+105} <ffffffffa0008690>{:scsi_mod:scsi_remove_device+160} <ffffffffa00012f3>{:scsi_mod:scsi_remove_host+19} <ffffffffa024da84>{:usb_storage:storage_disconnect+116} <ffffffff80244c72>{usb_unbind_interface+82} <ffffffff802158c7>{device_release_driver+119} <ffffffff80215ab9>{bus_remove_device+153} <ffffffff802149f8>{device_del+88} <ffffffff8024b71b>{usb_disable_device+123} <ffffffff80246d05>{usb_disconnect+197} <ffffffff802480c7>{hub_thread+759} <ffffffff80144100>{autoremove_wake_function+0} <ffffffff80144100>{autoremove_wake_function+0} <ffffffff80133563>{do_exit+2819} <ffffffff8010ebe3>{child_rip+8} <ffffffff80247dd0>{hub_thread+0} <ffffffff8010ebdb>{child_rip+0} Unable to handle kernel NULL pointer dereference at 0000000000000d68 RIP: <ffffffffa00010e7>{:scsi_mod:scsi_device_cancel+55} PML4 30aef067 PGD 2c7a8067 PMD 0 Oops: 0000 [1] CPU 0 Modules linked in: reiserfs sr_mod usb_storage radeon ipt_LOG ipt_limit ipt_MASQUERADE ipt_multiport ipt_conntrack ip_nat_ftp ip_conntrack_ftp iptable_nat nfsdexportfs lockd autofs4 sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod video button ohci1394 ieee1394 uhci_hcd ehci_hcd snd_via82xx snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc gameportsnd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore via_rhine mii r8169 floppy ext3 mbcache jbd sata_via libata sd_mod scsi_mod Pid: 102, comm: khubd Not tainted 2.6.10-rc2 RIP: 0010:[<ffffffffa00010e7>] <ffffffffa00010e7>{:scsi_mod:scsi_device_cancel+55} RSP: 0018:000001003fd6fc58 EFLAGS: 00010016 RAX: 00000000ffffffea RBX: 000001002ea3d228 RCX: 0000000000020000 RDX: 0000000000000d68 RSI: 00000000000106e6 RDI: ffffffff803260a0 RBP: 0000000000000d48 R08: 00000000fffffffa R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000000 R12: 000001003fd6fc68 R13: 0000000000000000 R14: 000001003fd6fce4 R15: 000001003f5a7c00 FS: 0000002a95d90020(0000) GS:ffffffff803d5c80(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000d68 CR3: 0000000000101000 CR4: 00000000000006e0 Process khubd (pid: 102, threadinfo 000001003fd6e000, task 000001003fed2070) Stack: 00000100342b7638 0000000000000216 000001003fd6fc68 000001003fd6fc68 0000010030fb0940 0000010039577158 000001002ea3d408 0000000000000000 ffffffffa00011f0 ffffffff80214aa1 Call Trace:<ffffffffa00011f0>{:scsi_mod:scsi_device_cancel_cb+0} <ffffffff80214aa1>{device_for_each_child+81} <ffffffffa000122f>{:scsi_mod:scsi_host_cancel+47} <ffffffff80214a09>{device_del+105} <ffffffffa0008690>{:scsi_mod:scsi_remove_device+160} <ffffffffa00012f3>{:scsi_mod:scsi_remove_host+19} <ffffffffa024da84>{:usb_storage:storage_disconnect+116} <ffffffff80244c72>{usb_unbind_interface+82} <ffffffff802158c7>{device_release_driver+119} <ffffffff80215ab9>{bus_remove_device+153} <ffffffff802149f8>{device_del+88} <ffffffff8024b71b>{usb_disable_device+123} <ffffffff80246d05>{usb_disconnect+197} <ffffffff802480c7>{hub_thread+759} <ffffffff80144100>{autoremove_wake_function+0} <ffffffff80144100>{autoremove_wake_function+0} <ffffffff80133563>{do_exit+2819} <ffffffff8010ebe3>{child_rip+8} <ffffffff80247dd0>{hub_thread+0} <ffffffff8010ebdb>{child_rip+0} Code: 48 8b 45 20 0f 18 08 48 83 c3 38 48 39 da 74 4a 48 8b 85 10 RIP <ffffffffa00010e7>{:scsi_mod:scsi_device_cancel+55} RSP <000001003fd6fc58> CR2: 0000000000000d68 ##### Ends Here ##### Of course it closely resembles that of FC3's kernel, I think. Should I escalate that to LKML? Would it be unfair of me to expect FC guys to look at FC3 kernel's issue when kernel.org's kernel exhibits the same oops, albeit under different circumstances (which I do not completely understand yet, as it is not as easy to trigger as it is in FC3) ? Thank you. Hari. PS: It seems another gentleman has already reported this (or very similar) problem to LKML today: http://marc.theaimsgroup.com/?l=linux-kernel&m=110081002103288&w=2 I think my oops message looks very similar. just for you guys' information, the bug persists in 2.6.9-1.678_FC3 when i remove my own CD-RW, following is my output from dmesg usb 2-1.3: USB disconnect, address 4 scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0 sr 0:0:0:0: Illegal state transition cancel->offline Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1688 [<161fb645>] scsi_device_set_state+0xc8/0xd3 [scsi_mod] [<161f8b8b>] scsi_eh_offline_sdevs+0x49/0x5e [scsi_mod] [<161f9146>] scsi_unjam_host+0x22d/0x23e [scsi_mod] [<161f9291>] scsi_error_handler+0x13a/0x191 [scsi_mod] [<0211b3d5>] schedule_tail+0xc/0x37 [<161f9157>] scsi_error_handler+0x0/0x191 [scsi_mod] [<021041d9>] kernel_thread_helper+0x5/0xb Unable to handle kernel NULL pointer dereference at virtual address 00000008 printing eip: 02250207 *pde = 00000000 Oops: 0000 [#1] Modules linked in: nls_utf8 vfat fat i915 md5 ipv6 parport_pc lp parport i8k ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod sd_mod sr_mod usb_storage scsi_mod button battery ac joydev yenta_socket uhci_hcd hw_random snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore orinoco_cs ds pcmcia_core orinoco hermes 3c59x floppy ext3 jbd CPU: 0 EIP: 0060:[<02250207>] Not tainted VLI EFLAGS: 00010046 (2.6.9-1.678_FC3) EIP is at cfq_insert_request+0x45/0xdf eax: 15569290 ebx: 1304f6b0 ecx: 00000001 edx: 1304f6b0 esi: 00000001 edi: 00000000 ebp: 00000000 esp: 12735efc ds: 007b es: 007b ss: 0068 Process scsi_eh_0 (pid: 1890, threadinfo=12735000 task=120c81f0) Stack: 15569290 15569290 00000001 1304f6b0 00000202 022469e3 15569290 00000001 1304f6b0 022469a5 00000000 02248b52 12465c40 13247000 12770000 00001057 161f9576 12465c40 00000001 12465c40 12735f74 12735f74 12735f7c 161f8ec8 Call Trace: [<022469e3>] __elv_add_request+0x3c/0x71 [<022469a5>] elv_requeue_request+0x29/0x2b [<02248b52>] blk_insert_request+0xba/0x18b [<161f9576>] scsi_queue_insert+0x84/0x8d [scsi_mod] [<161f8ec8>] scsi_eh_flush_done_q+0x7d/0xce [scsi_mod] [<161f914f>] scsi_unjam_host+0x236/0x23e [scsi_mod] [<161f9291>] scsi_error_handler+0x13a/0x191 [scsi_mod] [<0211b3d5>] schedule_tail+0xc/0x37 [<161f9157>] scsi_error_handler+0x0/0x191 [scsi_mod] [<021041d9>] kernel_thread_helper+0x5/0xb Code: 74 29 eb 51 83 f9 03 74 33 eb 4a 8b 04 24 89 fa e8 f8 fa ff ff 85 c0 75 f2 8b 47 08 8b 50 04 89 03 89 58 04 89 1a 89 53 04 eb 3f <8b> 47 08 8b 10 89 5a 04 89 13 89 43 04 89 18 eb 2e f6 42 08 10 i should add that https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=138755 seems to be a duplicate of this one... ? one more thing, i can't seem to reproduce this with 2.6.10-rc2-mm1 but i haven't tried any of linus' kernels yet and i don't know if the bug was in vanilla 2.6.9 and if so if the patch to fix it is in linus's or morton's tree just tried kernel-2.6.9-1.681_FC3 and it does the same thing this bug has been fixed in kernel-2.6.9-1.715_FC3 you can grab it from http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/3/i386/ Unfortunately, kernel-2.6.9-1.715_FC3 is too buggy. I have oopses and system freezes when I tried to exit from Xserver. And the kernel-2.6.9-1.1047_FC4 (which have this bug fixed according to its changelog) is even worse - it crashed on boot. Kernel-2.6.9-1.715_FC3 does fix this problem, but unfortunately it has introduced this problem: [root@desktop ~]# ps -eo state,pid,cmd,wchan|egrep '^[D]' D 29 [khubd] scsi_wait_req D 2582 hald usb_device_read D 6774 [scsi_eh_16] - Thank you. Hari PS: While I was turning on/off the external USB DVD-RW to simulate the kernel bug, I came across D state processes involving USB/SCSI. Seems, that in kernel-2.6.9-1.1049_FC4 these and agpgart bugs were fixed for good! You can get it from http://cvs.fedora.redhat.com/ or wait for it to appear in rawhide. An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you. This bug has been automatically closed as part of a mass update. It had been in NEEDINFO state since July 2005. If this bug still exists in current errata kernels, please reopen this bug. There are a large number of inactive bugs in the database, and this is the only way to purge them. Thank you. |