Bug 238175 - modprobe -r ohci1394 causes list_del corruption
modprobe -r ohci1394 causes list_del corruption
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
6
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-04-27 13:54 EDT by Mace Moneta
Modified: 2007-11-30 17:12 EST (History)
1 user (show)

See Also:
Fixed In Version: kernel-2.6.22.7-57.fc6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-09-26 16:29:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mace Moneta 2007-04-27 13:54:44 EDT
Description of problem:

I have a Zonet ZUC2700 Cardbus USB2.0/IEEE1394 adapter in my laptop.  Connected
to the two IEEE1394 ports are two external disk drives (arranged as a software
RAID1).

One of the disk drives went offline:

scsi 2:0:0:0: rejecting I/O to dead device
raid1: sda1: rescheduling sector 794648
scsi 2:0:0:0: rejecting I/O to dead device
scsi 2:0:0:0: rejecting I/O to dead device

To recover, I:

umount /media/raid1
modprobe -r sbp2
modprobe sbp2

The device did not come back, so I tried:

modprobe -r sbp2
modprobe -r ohci1394

At which point the following occurred:

list_del corruption. prev->next should be f717d840, but was f89abea0
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:67!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
Modules linked in: iptable_filter ip_tables cbc blkcipher irnet ppp_generic slhc
irtty_sir sir_dev ircomm_tty ircomm fuse aes cryptoloop loop joydev autofs4 hidp
l2cap bluetooth rt2500(U) nf_conntrack_netbios_ns nf_conntrack nfnetlink
ipt_REJECT xt_tcpudp x_tables dm_mirror dm_multipath dm_mod raid1 video
toshiba_acpi sbs i2c_ec dock button battery asus_acpi backlight ac lp parport sg
snd_intel8x0m snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq nvidia(P)(U) snd_seq_device snd_pcm_oss i2c_core
snd_mixer_oss snd_pcm snd_timer snd e100 mii serio_raw iTCO_wdt
iTCO_vendor_support soundcore ohci1394 synaptics_usb(U) ieee1394 snd_page_alloc
smsc_ircc2 ide_cd irda crc_ccitt cdrom ata_piix libata sd_mod scsi_mod ext3 jbd
ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<c04ee109>]    Tainted: P      VLI
EFLAGS: 00210096   (2.6.20-1.2944.fc6 #1)
EIP is at list_del+0x21/0x5d
eax: 00000048   ebx: f717d840   ecx: 00000000   edx: 00200086
esi: f6d76000   edi: f899c0a0   ebp: 00200286   esp: d1aeae98
ds: 007b   es: 007b   ss: 0068
Process modprobe (pid: 18843, ti=d1aea000 task=f6cf63f0 task.ti=d1aea000)
Stack: c06bcd5f f717d840 f89abea0 f717d840 f8966419 f6ca7ec8 f8966504 00200246 
00000000 c0743860 00000000 f899c0a0 f6d76000 f8920e54 f6d77a4c f896677f 
f6d76000 f8920e54 f896617e c1b4f000 f891b369 c1b4f048 f8920ec4 f6d760b4 
Call Trace:
[<f8966419>] __delete_addr+0x8/0x18 [ieee1394]
[<f8966504>] __unregister_host+0x3a/0x98 [ieee1394]
[<f896677f>] highlevel_remove_host+0x21/0x42 [ieee1394]
[<f896617e>] hpsb_remove_host+0x38/0x56 [ieee1394]
[<f891b369>] ohci1394_pci_remove+0x47/0x1ec [ohci1394]
[<c04f58df>] pci_device_remove+0x16/0x35
[<c05589f0>] __device_release_driver+0x71/0x87
[<c0558eb3>] driver_detach+0xa7/0xe8
[<c05585e3>] bus_remove_driver+0x5a/0x78
[<c0558f18>] driver_unregister+0x8/0x13
[<c04f5a30>] pci_unregister_driver+0xe/0x4c
[<c0440e3d>] sys_delete_module+0x18a/0x1b1
[<c044db01>] audit_syscall_entry+0x111/0x143
[<c0403f64>] syscall_call+0x7/0xb
[<c0620033>] rt_mutex_slowlock+0x439/0x44f
=======================
Code: 63 00 00 00 89 c3 eb e8 90 90 53 83 ec 0c 8b 48 04 8b 11 39 c2 74 18 89 54
24 08 89 44 24 04 c7 04 24 5f cd 6b c0 e8 78 99 f3 ff <0f> 0b eb fe 8b 10 8b 5a
04 39 c3 74 18 89 5c 24 08 89 44 24 04 
EIP: [<c04ee109>] list_del+0x21/0x5d SS:ESP 0068:d1aeae98
<3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
[<c043a142>] down_read+0x12/0x28
[<c044639e>] acct_collect+0x38/0x13e
[<c04299fc>] do_exit+0x1b1/0x6f6
[<c0427aa0>] printk+0x1f/0x95
[<c0405530>] die+0x21b/0x240
[<c040593b>] do_invalid_op+0x0/0xab
[<c04059dd>] do_invalid_op+0xa2/0xab
[<c04ee109>] list_del+0x21/0x5d
[<c0427447>] release_console_sem+0x1ba/0x1c2
[<c042b109>] on_each_cpu+0x1f/0x27
[<c0417c59>] flush_tlb_all+0x1b/0x1d
[<c04670c1>] __vunmap+0xcf/0xe3
[<c0417d00>] do_flush_tlb_all+0x0/0x57
[<c042b109>] on_each_cpu+0x1f/0x27
[<c0620744>] error_code+0x7c/0x84
[<c04100d8>] powernowk8_target+0x45f/0x946
[<c04ee109>] list_del+0x21/0x5d
[<f8966419>] __delete_addr+0x8/0x18 [ieee1394]
[<f8966504>] __unregister_host+0x3a/0x98 [ieee1394]
[<f896677f>] highlevel_remove_host+0x21/0x42 [ieee1394]
[<f896617e>] hpsb_remove_host+0x38/0x56 [ieee1394]
[<f891b369>] ohci1394_pci_remove+0x47/0x1ec [ohci1394]
[<c04f58df>] pci_device_remove+0x16/0x35
[<c05589f0>] __device_release_driver+0x71/0x87
[<c0558eb3>] driver_detach+0xa7/0xe8
[<c05585e3>] bus_remove_driver+0x5a/0x78
[<c0558f18>] driver_unregister+0x8/0x13
[<c04f5a30>] pci_unregister_driver+0xe/0x4c
[<c0440e3d>] sys_delete_module+0x18a/0x1b1
[<c044db01>] audit_syscall_entry+0x111/0x143
[<c0403f64>] syscall_call+0x7/0xb
[<c0620033>] rt_mutex_slowlock+0x439/0x44f
=======================


Version-Release number of selected component (if applicable):

kernel-2.6.20-1.2944.fc6

How reproducible:

Further attempts to modprobe -r ohci1394 failed, so I rebooted.  Attempting the
same sequence again caused the same failure, so it looks repeatable.

Steps to Reproduce:
1. Insert ZUC2700 in PCMCIA slot:

05:00.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
 (rev 61)
05:00.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
 (rev 61)
05:00.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 63)
05:00.3 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (
rev 46)

2. Connect two hard drives to the IEEE1394 ports
3. modprobe -r sbp2, modprobe -r ohci1394
  
Actual results:

list_del corruption.

Expected results:

Error free removal of the ohci1394 module.

Additional info:
Comment 1 Mace Moneta 2007-04-27 16:01:52 EDT
Just a correction; the initial recovery attempt was actually:

umount /media/raid1
mdadm -S /dev/md0
modprobe -r sbp2
modprobe sbp2

I neglected to note that the array was stopped after umounting.
Comment 2 Stefan Richter 2007-04-28 09:49:24 EDT
Is it an SMP machine?

Does it also happen if you wait a few seconds between modprobe -r sbp2 and
modprobe -r ohci1394?
Comment 3 Mace Moneta 2007-04-28 10:54:06 EDT
It is a UP machine (Intel(R) Pentium(R) 4 Mobile CPU 1.70GHz) - a 6 year old
Toshiba laptop.  I've waited 15 seconds between the modprobe -r (sleep 15) with
no change.  If there's an outstanding operation that needs to complete,
shouldn't the modprobe -r have delayed returning?
Comment 4 Stefan Richter 2007-04-28 12:08:43 EDT
Does the P4M include hyperthreading and does kernel-2.6.20-1.2944.fc6 enable it?

My questions are because modprobe -r sbp2 and modprobe -r ohci1394 immediately
afterwards always worked for me on an AMD single processor machine, but
occasionally resulted in a kernel panic on a dual core machine.  That panic
seemed different from yours (occurred in the block layer's softIRQ AFAICT); and
I haven't investigated it further yet.  The panic could be completely avoided by
a few seconds pause between modprobe -r sbp2 and modprobe -r ohci1394.

Anyway, your bug is different, and now I know that I won't be able to reproduce
with what I have here.  I'll see if I find something from looking at the source.
 (I'm the upstream maintainer of the IEEE 1394 drivers.)
Comment 5 Mace Moneta 2007-04-28 12:33:48 EDT
The P4M apparently supports hyperthreading:

flags           : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm up

However /proc/cpuinfo is only reporting a single CPU, so it does not appear to
be enabled in the BIOS.  This machine has a "legacy free" BIOS, so there is no
user interface for configuration.  It does appear to be enabled in the kernel:

$ grep -i x86_HT /boot/config-2.6.20-1.2944.fc6 
CONFIG_X86_HT=y
Comment 6 Mace Moneta 2007-09-26 16:29:26 EDT
I retested with kernel-2.6.22.7-57.fc6, and the problem no longer occurs.  Closing.

Note You need to log in before you can comment on or make changes to this bug.