Description of problem: I have a Zonet ZUC2700 Cardbus USB2.0/IEEE1394 adapter in my laptop. Connected to the two IEEE1394 ports are two external disk drives (arranged as a software RAID1). One of the disk drives went offline: scsi 2:0:0:0: rejecting I/O to dead device raid1: sda1: rescheduling sector 794648 scsi 2:0:0:0: rejecting I/O to dead device scsi 2:0:0:0: rejecting I/O to dead device To recover, I: umount /media/raid1 modprobe -r sbp2 modprobe sbp2 The device did not come back, so I tried: modprobe -r sbp2 modprobe -r ohci1394 At which point the following occurred: list_del corruption. prev->next should be f717d840, but was f89abea0 ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:67! invalid opcode: 0000 [#1] SMP last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed Modules linked in: iptable_filter ip_tables cbc blkcipher irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm fuse aes cryptoloop loop joydev autofs4 hidp l2cap bluetooth rt2500(U) nf_conntrack_netbios_ns nf_conntrack nfnetlink ipt_REJECT xt_tcpudp x_tables dm_mirror dm_multipath dm_mod raid1 video toshiba_acpi sbs i2c_ec dock button battery asus_acpi backlight ac lp parport sg snd_intel8x0m snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq nvidia(P)(U) snd_seq_device snd_pcm_oss i2c_core snd_mixer_oss snd_pcm snd_timer snd e100 mii serio_raw iTCO_wdt iTCO_vendor_support soundcore ohci1394 synaptics_usb(U) ieee1394 snd_page_alloc smsc_ircc2 ide_cd irda crc_ccitt cdrom ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<c04ee109>] Tainted: P VLI EFLAGS: 00210096 (2.6.20-1.2944.fc6 #1) EIP is at list_del+0x21/0x5d eax: 00000048 ebx: f717d840 ecx: 00000000 edx: 00200086 esi: f6d76000 edi: f899c0a0 ebp: 00200286 esp: d1aeae98 ds: 007b es: 007b ss: 0068 Process modprobe (pid: 18843, ti=d1aea000 task=f6cf63f0 task.ti=d1aea000) Stack: c06bcd5f f717d840 f89abea0 f717d840 f8966419 f6ca7ec8 f8966504 00200246 00000000 c0743860 00000000 f899c0a0 f6d76000 f8920e54 f6d77a4c f896677f f6d76000 f8920e54 f896617e c1b4f000 f891b369 c1b4f048 f8920ec4 f6d760b4 Call Trace: [<f8966419>] __delete_addr+0x8/0x18 [ieee1394] [<f8966504>] __unregister_host+0x3a/0x98 [ieee1394] [<f896677f>] highlevel_remove_host+0x21/0x42 [ieee1394] [<f896617e>] hpsb_remove_host+0x38/0x56 [ieee1394] [<f891b369>] ohci1394_pci_remove+0x47/0x1ec [ohci1394] [<c04f58df>] pci_device_remove+0x16/0x35 [<c05589f0>] __device_release_driver+0x71/0x87 [<c0558eb3>] driver_detach+0xa7/0xe8 [<c05585e3>] bus_remove_driver+0x5a/0x78 [<c0558f18>] driver_unregister+0x8/0x13 [<c04f5a30>] pci_unregister_driver+0xe/0x4c [<c0440e3d>] sys_delete_module+0x18a/0x1b1 [<c044db01>] audit_syscall_entry+0x111/0x143 [<c0403f64>] syscall_call+0x7/0xb [<c0620033>] rt_mutex_slowlock+0x439/0x44f ======================= Code: 63 00 00 00 89 c3 eb e8 90 90 53 83 ec 0c 8b 48 04 8b 11 39 c2 74 18 89 54 24 08 89 44 24 04 c7 04 24 5f cd 6b c0 e8 78 99 f3 ff <0f> 0b eb fe 8b 10 8b 5a 04 39 c3 74 18 89 5c 24 08 89 44 24 04 EIP: [<c04ee109>] list_del+0x21/0x5d SS:ESP 0068:d1aeae98 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 [<c043a142>] down_read+0x12/0x28 [<c044639e>] acct_collect+0x38/0x13e [<c04299fc>] do_exit+0x1b1/0x6f6 [<c0427aa0>] printk+0x1f/0x95 [<c0405530>] die+0x21b/0x240 [<c040593b>] do_invalid_op+0x0/0xab [<c04059dd>] do_invalid_op+0xa2/0xab [<c04ee109>] list_del+0x21/0x5d [<c0427447>] release_console_sem+0x1ba/0x1c2 [<c042b109>] on_each_cpu+0x1f/0x27 [<c0417c59>] flush_tlb_all+0x1b/0x1d [<c04670c1>] __vunmap+0xcf/0xe3 [<c0417d00>] do_flush_tlb_all+0x0/0x57 [<c042b109>] on_each_cpu+0x1f/0x27 [<c0620744>] error_code+0x7c/0x84 [<c04100d8>] powernowk8_target+0x45f/0x946 [<c04ee109>] list_del+0x21/0x5d [<f8966419>] __delete_addr+0x8/0x18 [ieee1394] [<f8966504>] __unregister_host+0x3a/0x98 [ieee1394] [<f896677f>] highlevel_remove_host+0x21/0x42 [ieee1394] [<f896617e>] hpsb_remove_host+0x38/0x56 [ieee1394] [<f891b369>] ohci1394_pci_remove+0x47/0x1ec [ohci1394] [<c04f58df>] pci_device_remove+0x16/0x35 [<c05589f0>] __device_release_driver+0x71/0x87 [<c0558eb3>] driver_detach+0xa7/0xe8 [<c05585e3>] bus_remove_driver+0x5a/0x78 [<c0558f18>] driver_unregister+0x8/0x13 [<c04f5a30>] pci_unregister_driver+0xe/0x4c [<c0440e3d>] sys_delete_module+0x18a/0x1b1 [<c044db01>] audit_syscall_entry+0x111/0x143 [<c0403f64>] syscall_call+0x7/0xb [<c0620033>] rt_mutex_slowlock+0x439/0x44f ======================= Version-Release number of selected component (if applicable): kernel-2.6.20-1.2944.fc6 How reproducible: Further attempts to modprobe -r ohci1394 failed, so I rebooted. Attempting the same sequence again caused the same failure, so it looks repeatable. Steps to Reproduce: 1. Insert ZUC2700 in PCMCIA slot: 05:00.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 61) 05:00.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 61) 05:00.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 63) 05:00.3 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller ( rev 46) 2. Connect two hard drives to the IEEE1394 ports 3. modprobe -r sbp2, modprobe -r ohci1394 Actual results: list_del corruption. Expected results: Error free removal of the ohci1394 module. Additional info:
Just a correction; the initial recovery attempt was actually: umount /media/raid1 mdadm -S /dev/md0 modprobe -r sbp2 modprobe sbp2 I neglected to note that the array was stopped after umounting.
Is it an SMP machine? Does it also happen if you wait a few seconds between modprobe -r sbp2 and modprobe -r ohci1394?
It is a UP machine (Intel(R) Pentium(R) 4 Mobile CPU 1.70GHz) - a 6 year old Toshiba laptop. I've waited 15 seconds between the modprobe -r (sleep 15) with no change. If there's an outstanding operation that needs to complete, shouldn't the modprobe -r have delayed returning?
Does the P4M include hyperthreading and does kernel-2.6.20-1.2944.fc6 enable it? My questions are because modprobe -r sbp2 and modprobe -r ohci1394 immediately afterwards always worked for me on an AMD single processor machine, but occasionally resulted in a kernel panic on a dual core machine. That panic seemed different from yours (occurred in the block layer's softIRQ AFAICT); and I haven't investigated it further yet. The panic could be completely avoided by a few seconds pause between modprobe -r sbp2 and modprobe -r ohci1394. Anyway, your bug is different, and now I know that I won't be able to reproduce with what I have here. I'll see if I find something from looking at the source. (I'm the upstream maintainer of the IEEE 1394 drivers.)
The P4M apparently supports hyperthreading: flags : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm up However /proc/cpuinfo is only reporting a single CPU, so it does not appear to be enabled in the BIOS. This machine has a "legacy free" BIOS, so there is no user interface for configuration. It does appear to be enabled in the kernel: $ grep -i x86_HT /boot/config-2.6.20-1.2944.fc6 CONFIG_X86_HT=y
I retested with kernel-2.6.22.7-57.fc6, and the problem no longer occurs. Closing.