Description of problem: [ 8306.301671] ------------[ cut here ]------------ [ 8306.306567] WARNING: at lib/debugobjects.c:262 debug_print_object+0x7c/0x8d() [ 8306.313968] Hardware name: X8DTH-i/6/iF/6F [ 8306.318337] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x2b [ 8306.328741] Modules linked in: binfmt_misc ses enclosure mlx4_ib mlx4_en microcode joydev serio_raw iTCO_wdt i2c_i801 iTCO_vendor_support mpt2sas(-) scsi_transport_sas ioatdma raid_class mlx4_core igb i7core_edac dca edac_core w83795 w83627ehf hwmon_vid coretemp adm1021 i2c_core ib_ipoib ib_cm ib_addr ib_sa ib_uverbs ib_umad ib_mad ib_core ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler [last unloaded: scsi_wait_scan] [ 8306.371293] Pid: 20082, comm: rmmod Tainted: G W 3.1.0-0.rc3.git0.0.fc16.x86_64 #1 [ 8306.380135] Call Trace: [ 8306.382861] [<ffffffff8105c528>] warn_slowpath_common+0x83/0x9b [ 8306.389133] [<ffffffff8105c5e3>] warn_slowpath_fmt+0x46/0x48 [ 8306.395150] [<ffffffff812593bf>] debug_print_object+0x7c/0x8d [ 8306.401252] [<ffffffff81075729>] ? __queue_work+0x2c2/0x2c2 [ 8306.407182] [<ffffffff81259bab>] debug_check_no_obj_freed+0x96/0x177 [ 8306.413894] [<ffffffff814fc4be>] ? __slab_free+0x16f/0x24c [ 8306.419733] [<ffffffff81328e47>] ? scsi_host_dev_release+0xbd/0xc2 [ 8306.426268] [<ffffffff8112efb0>] slab_free_hook+0x6b/0x74 [ 8306.432024] [<ffffffff81130bfc>] kfree+0xb4/0x131 [ 8306.437088] [<ffffffff81328e47>] scsi_host_dev_release+0xbd/0xc2 [ 8306.443451] [<ffffffff8131328d>] device_release+0x4b/0x7f [ 8306.449207] [<ffffffff8124981b>] kobject_release+0x11d/0x154 [ 8306.455222] [<ffffffff812496fe>] ? kobject_del+0x36/0x36 [ 8306.460885] [<ffffffff8124ac2f>] kref_put+0x43/0x4d [ 8306.466122] [<ffffffff81249661>] kobject_put+0x45/0x49 [ 8306.471610] [<ffffffff81313097>] put_device+0x17/0x19 [ 8306.477011] [<ffffffff81328e78>] scsi_host_put+0x15/0x17 [ 8306.482678] [<ffffffffa01a34f7>] _scsih_remove+0x199/0x1a8 [mpt2sas] [ 8306.489382] [<ffffffff8126ccae>] pci_device_remove+0x3d/0x8f [ 8306.495398] [<ffffffff813167cf>] __device_release_driver+0x86/0xd2 [ 8306.501932] [<ffffffff81316ed3>] driver_detach+0x99/0xc2 [ 8306.507601] [<ffffffff81316693>] bus_remove_driver+0xba/0xdf [ 8306.513617] [<ffffffff81317579>] driver_unregister+0x6a/0x75 [ 8306.519624] [<ffffffff8126ce89>] pci_unregister_driver+0x44/0x8d [ 8306.525983] [<ffffffffa01a352b>] _scsih_exit+0x25/0xafa [mpt2sas] [ 8306.532437] [<ffffffff81098a58>] sys_delete_module+0x1dd/0x251 [ 8306.538625] [<ffffffff810b4e1b>] ? audit_syscall_entry+0x11c/0x148 [ 8306.545159] [<ffffffff812536fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 8306.551866] [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b [ 8306.558140] ---[ end trace fab739e6cd1dc530 ]--- Version-Release number of selected component (if applicable): kernel-3.1.0-0.rc3.git0.0.fc16.x86_64 How reproducible: Sometimes Steps to Reproduce: 1. rmmod mpt2sas Additional info:
Another strange warning while loading/unloading mpt2sas: [ 8778.456040] ------------[ cut here ]------------ [ 8778.461103] WARNING: at drivers/pci/msi.c:794 pci_enable_msix+0xae/0x349() [ 8778.468261] Hardware name: X8DTH-i/6/iF/6F [ 8778.472656] Modules linked in: mpt2sas(+) binfmt_misc ses enclosure mlx4_ib mlx4_en microcode joydev serio_raw iTCO_wdt i2c_i801 iTCO_vendor_support scsi_transport_sas ioatdma raid_class mlx4_core igb i7core_edac dca edac_core w83795 w83627ehf hwmon_vid coretemp adm1021 i2c_core ib_ipoib ib_cm ib_addr ib_sa ib_uverbs ib_umad ib_mad ib_core ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler [last unloaded: mpt2sas] [ 8778.514871] Pid: 27632, comm: work_for_cpu Tainted: G W 3.1.0-0.rc3.git0.0.fc16.x86_64 #1 [ 8778.524265] Call Trace: [ 8778.526989] [<ffffffff8105c528>] warn_slowpath_common+0x83/0x9b [ 8778.533271] [<ffffffff8105c55a>] warn_slowpath_null+0x1a/0x1c [ 8778.539373] [<ffffffff8127cfdd>] pci_enable_msix+0xae/0x349 [ 8778.545305] [<ffffffff8113132e>] ? __kmalloc+0xfa/0x10c [ 8778.550891] [<ffffffffa0188725>] ? kcalloc.constprop.7+0x32/0x34 [mpt2sas] [ 8778.558128] [<ffffffffa018930d>] mpt2sas_base_map_resources+0x2e8/0x4dd [mpt2sas] [ 8778.566185] [<ffffffffa018b633>] mpt2sas_base_attach+0x46/0x140d [mpt2sas] [ 8778.573442] [<ffffffff8108eed1>] ? lock_release+0x1a4/0x1d1 [ 8778.579399] [<ffffffff815042d3>] ? _raw_spin_unlock+0x28/0x3b [ 8778.585528] [<ffffffff81076ae6>] ? __alloc_workqueue_key+0x29c/0x2ce [ 8778.592270] [<ffffffffa0192c82>] _scsih_probe+0x429/0x60c [mpt2sas] [ 8778.598924] [<ffffffff81504297>] ? _raw_spin_unlock_irqrestore+0x4d/0x61 [ 8778.606012] [<ffffffff81073b06>] ? move_linked_works+0x6e/0x6e [ 8778.612236] [<ffffffff8126c78f>] local_pci_probe+0x44/0x75 [ 8778.618192] [<ffffffff81073b1c>] do_work_for_cpu+0x16/0x28 [ 8778.624053] [<ffffffff8107a18d>] kthread+0xa8/0xb0 [ 8778.629238] [<ffffffff8150d284>] kernel_thread_helper+0x4/0x10 [ 8778.635451] [<ffffffff815046f4>] ? retint_restore_args+0x13/0x13 [ 8778.641840] [<ffffffff8107a0e5>] ? __init_kthread_worker+0x5a/0x5a [ 8778.648407] [<ffffffff8150d280>] ? gs_change+0x13/0x13 [ 8778.653934] ---[ end trace fab739e6cd1dc531 ]---
Also, sometimes modprobe mpt2sas fails with the following error, but the next attempt works again: [ 8947.624754] mpt2sas version 09.100.00.00 loaded [ 8947.630768] scsi7 : Fusion MPT SAS Host [ 8947.636974] mpt2sas 0000:04:00.0: BAR 1: can't reserve [mem 0xfaf3c000-0xfaf3ffff 64bit] [ 8947.645549] mpt2sas0: pci_request_selected_regions: failed [ 8947.651332] mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:7625/_scsih_probe()! [ 8953.905824] mpt2sas version 09.100.00.00 unloading [ 8955.075560] mpt2sas version 09.100.00.00 loaded [ 8955.081465] scsi8 : Fusion MPT SAS Host [ 8955.087667] mpt2sas 0000:04:00.0: setting latency timer to 64 [ 8955.093726] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (198086684 kB)
Can you recreate this with the latest f16 kernel?
rmmod mpt2sas is completely broken and causes the kernel to panic every time.
Tomas, can you CC LSI developer here? Albert is this mpt2sas rmmod panic still reproducible on the lates kernels 3.2 (and/or 3.3-rc) ? If so can you install kernel-debug and see if it print warning and continue execution, instead of just panic?
I just retested with 3.2.3-2.fc16.x86_64, and I now have the opposite problem: even with all my file systems unmounted (machine is booted with an initramfs), I can't rmmod mpt2sas: ERROR: Module mpt2sas is in use. /sys/module/mpt2sas/refcnt says 26, if that makes a difference.
Hi Nandigama, this is a upstream/fedora issue, I think it could be interesting for you.
(In reply to comment #6) > I just retested with 3.2.3-2.fc16.x86_64, and I now have the opposite problem: > even with all my file systems unmounted (machine is booted with an initramfs), > I can't rmmod mpt2sas: > ERROR: Module mpt2sas is in use. > /sys/module/mpt2sas/refcnt says 26, if that makes a difference. Request you to provide the debug logs for the kernel you are testing.
Will try to do it.
Tested with 3.2.3-2.fc16.x86_64.debug. It turned up this warning on first rmmod mpt2sas: WARNING: at fs/sysfs/inode.c:323 sysfs_hash_and_remove+0xa9/0xb0() sysfs: can not remove 'bsg', no directory If I then modprobe and rmmod mpt2sas a few more times, I get: BUG: unable to handle kernel paging request at ffffffffa01874b8 IP: [<ffffffff8141d1e1>] do_scsi_scan_host+0x61/0xa0 Will attach complete traces now.
Created attachment 566262 [details] paging request BUG trace
Created attachment 566263 [details] sysfs warning trace
> WARNING: at fs/sysfs/inode.c:323 sysfs_hash_and_remove+0xa9/0xb0() > sysfs: can not remove 'bsg', no directory This is already addressed (bug 787862). Please use updated kernel 3.2.7-1 or newer.
*** Bug 737085 has been marked as a duplicate of this bug. ***
Attaching a trace with 3.2.7-1 debug. The trace starts where I mdadm --stop /dev/md127 which is probably the RAID-1 over the first two disks attached to the mpt2sas controller. I then did: rmmod mpt2sas modprobe mpt2sas rmmod mpt2sas BOOM
Created attachment 566836 [details] 3.2.7-1 trace of paging request BUG
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update.
after modprobe and rmmod about 5 or 6 times: [ 1544.748626] BUG: unable to handle kernel paging request at ffffffffa02094b8 [ 1544.755851] IP: [<ffffffff81439631>] do_scsi_scan_host+0x61/0xa0 [ 1544.762034] PGD 1c07067 PUD 1c0b063 PMD 2da2b15067 PTE 0 [ 1544.767757] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 1544.772640] CPU 4 [ 1544.774545] Modules linked in: mpt2sas(+) binfmt_misc netconsole ses enclosure mlx4_ib mlx4_en ib_ipoib ib_cm ib_addr ib_sa ib_uverbs ib_umad microcode ib_mad ib_core serio_raw joydev ipmi_poweroff ipmi_watchdog ipmi_devintf i2c_i801 i7core_edac i2c_core mlx4_core iTCO_wdt scsi_transport_sas iTCO_vendor_support raid_class edac_core ioatdma igb dca ipmi_si ipmi_msghandler [last unloaded: mpt2sas] [ 1544.813049] [ 1544.814656] Pid: 8019, comm: scsi_scan_7 Tainted: G W 3.3.0-4.fc16.x86_64.debug #1 Supermicro X8DTH-i/6/iF/6F/X8DTH [ 1544.826352] RIP: 0010:[<ffffffff81439631>] [<ffffffff81439631>] do_scsi_scan_host+0x61/0xa0 [ 1544.835033] RSP: 0018:ffff882da1cede50 EFLAGS: 00010246 [ 1544.840467] RAX: ffffffffa0209420 RBX: ffff88151347c290 RCX: 0000000000000001 [ 1544.847698] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000286 [ 1544.854932] RBP: ffff882da1cede60 R08: 0000000000000002 R09: 0000000000000001 [ 1544.862170] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000100130853 [ 1544.869414] R13: ffff881588112ac0 R14: 0000000000000000 R15: 0000000000000000 [ 1544.876647] FS: 0000000000000000(0000) GS:ffff8817dee00000(0000) knlGS:0000000000000000 [ 1544.884887] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1544.890732] CR2: ffffffffa02094b8 CR3: 0000000001c05000 CR4: 00000000000006e0 [ 1544.897970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1544.905222] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1544.912459] Process scsi_scan_7 (pid: 8019, threadinfo ffff882da1cec000, task ffff882d3cbb0000) [ 1544.921314] Stack: [ 1544.923438] ffff882c96f0bcf8 ffff881588112ac0 ffff882da1cede90 ffffffff8143968c [ 1544.931280] ffff882c96f0bcf8 ffff881588112ac0 ffffffff81439670 0000000000000000 [ 1544.939156] ffff882da1cedf40 ffffffff8108b7c7 ffff882d3cbb0000 ffff882d00000000 [ 1544.947036] Call Trace: [ 1544.949598] [<ffffffff8143968c>] do_scan_async+0x1c/0x160 [ 1544.955194] [<ffffffff81439670>] ? do_scsi_scan_host+0xa0/0xa0 [ 1544.961223] [<ffffffff8108b7c7>] kthread+0xb7/0xc0 [ 1544.966209] [<ffffffff816acf74>] kernel_thread_helper+0x4/0x10 [ 1544.972240] [<ffffffff816a28f0>] ? _raw_spin_unlock_irq+0x30/0x50 [ 1544.978527] [<ffffffff816a3274>] ? retint_restore_args+0x13/0x13 [ 1544.984724] [<ffffffff8108b710>] ? __init_kthread_worker+0x70/0x70 [ 1544.991096] [<ffffffff816acf70>] ? gs_change+0x13/0x13 [ 1544.996423] Code: d2 48 8b 83 00 02 00 00 48 8b 80 98 00 00 00 eb 21 66 0f 1f 84 00 00 00 00 00 bf 0a 00 00 00 e8 46 c3 c3 ff 48 8b 83 00 02 00 00 <48> 8b 80 98 00 00 00 48 8b 35 c1 c9 8b 00 48 89 df 4c 29 e6 ff [ 1545.020485] RIP [<ffffffff81439631>] do_scsi_scan_host+0x61/0xa0 [ 1545.026749] RSP <ffff882da1cede50> [ 1545.030340] CR2: ffffffffa02094b8 [ 1545.033769] ---[ end trace 5220511d4851e1c9 ]--- followed by [ 1545.038490] BUG: sleeping function called from invalid context at kernel/rwsem.c:21 [ 1545.046291] in_atomic(): 0, irqs_disabled(): 1, pid: 8019, name: scsi_scan_7 [ 1545.053439] INFO: lockdep is turned off. [ 1545.057470] irq event stamp: 0 [ 1545.060631] hardirqs last enabled at (0): [< (null)>] (null) [ 1545.068257] hardirqs last disabled at (0): [<ffffffff8105fac0>] copy_process+0x690/0x1860 [ 1545.076667] softirqs last enabled at (0): [<ffffffff8105fac0>] copy_process+0x690/0x1860 [ 1545.085078] softirqs last disabled at (0): [< (null)>] (null) [ 1545.092713] Pid: 8019, comm: scsi_scan_7 Tainted: G D W 3.3.0-4.fc16.x86_64.debug #1 [ 1545.101296] Call Trace: [ 1545.103858] [<ffffffff810cb650>] ? print_irqtrace_events+0xd0/0xe0 [ 1545.110228] [<ffffffff81097d35>] __might_sleep+0x135/0x1f0 [ 1545.115902] [<ffffffff816a0326>] down_read+0x26/0x98 [ 1545.121066] [<ffffffff8107bdb4>] exit_signals+0x24/0x130 [ 1545.126579] [<ffffffff81066d2f>] do_exit+0xdf/0xae0 [ 1545.131671] [<ffffffff81063f02>] ? kmsg_dump+0x182/0x270 [ 1545.137175] [<ffffffff81063e1c>] ? kmsg_dump+0x9c/0x270 [ 1545.142615] [<ffffffff816a415c>] oops_end+0xac/0xf0 [ 1545.147687] [<ffffffff81696442>] no_context+0x27b/0x28a [ 1545.153111] [<ffffffff8169662a>] __bad_area_nosemaphore+0x1d9/0x1f8 [ 1545.159579] [<ffffffff81695c91>] ? pmd_offset+0x1a/0x20 [ 1545.164997] [<ffffffff8169665c>] bad_area_nosemaphore+0x13/0x15 [ 1545.171111] [<ffffffff816a704b>] do_page_fault+0x42b/0x590 [ 1545.176791] [<ffffffff810ce89d>] ? trace_hardirqs_on+0xd/0x10 [ 1545.182731] [<ffffffff8133303e>] ? free_object+0x8e/0xc0 [ 1545.188232] [<ffffffff813339c8>] ? debug_object_free+0xe8/0x140 [ 1545.194348] [<ffffffff81073f62>] ? del_timer_sync+0xa2/0xe0 [ 1545.200110] [<ffffffff81072cf5>] ? destroy_timer_on_stack+0x15/0x20 [ 1545.206569] [<ffffffff8132c43d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 1545.213197] [<ffffffff816a34f5>] page_fault+0x25/0x30 [ 1545.218451] [<ffffffff81439631>] ? do_scsi_scan_host+0x61/0xa0 [ 1545.224476] [<ffffffff8143968c>] do_scan_async+0x1c/0x160 [ 1545.230065] [<ffffffff81439670>] ? do_scsi_scan_host+0xa0/0xa0 [ 1545.236084] [<ffffffff8108b7c7>] kthread+0xb7/0xc0 [ 1545.241070] [<ffffffff816acf74>] kernel_thread_helper+0x4/0x10 [ 1545.247096] [<ffffffff816a28f0>] ? _raw_spin_unlock_irq+0x30/0x50 [ 1545.253379] [<ffffffff816a3274>] ? retint_restore_args+0x13/0x13 [ 1545.259576] [<ffffffff8108b710>] ? __init_kthread_worker+0x70/0x70 [ 1545.265948] [<ffffffff816acf70>] ? gs_change+0x13/0x13 [ 1545.275665] mpt2sas version 12.100.00.00 loaded [ 1545.281373] scsi8 : Fusion MPT SAS Host [ 1545.287155] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (197413856 kB) [ 1545.295530] mpt2sas 0000:04:00.0: irq 111 for MSI/MSI-X [ 1545.301126] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 111 [ 1545.306416] mpt2sas0: iomem(0x00000000faf3c000), mapped(0xffffc9002aa80000), size(16384) [ 1545.314786] mpt2sas0: ioport(0x000000000000e800), size(256) [ 1545.441201] mpt2sas0: Allocated physical memory: size(3993 kB) [ 1545.447212] mpt2sas0: Current Controller Queue Depth(1754), Max Controller Queue Depth(2015) [ 1545.455937] mpt2sas0: Scatter Gather Elements per IO(128) [ 1545.520147] mpt2sas0: LSISAS2008: FWVersion(07.00.00.00), ChipRevision(0x03), BiosVersion(07.11.00.00) [ 1545.529658] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ) [ 1545.543575] mpt2sas0: sending port enable !! [ 1545.553525] mpt2sas version 12.100.00.00 unloading [ 1545.558773] mpt2sas0: sending diag reset !! [ 1546.587607] mpt2sas0: diag reset: SUCCESS
(In reply to comment #20) > after modprobe and rmmod about 5 or 6 times: > I think what you are seeing is a race between rmmod and a asyn scan thread. After a modprobe a scan thread is started and if you are fast enough, you remove the driver together with scanning functions the scan thread is trying to use. The mpt2sas driver is only a victim, the problem is in that said scan thread. You can follow the scan's thread work via 'tail - /var/log/messages'. A workaround is to switch the async scan off with a module option scsi_scan_type http://lxr.linux.no/#linux+v3.3.1/drivers/scsi/scsi_scan.c#L102 ,or add a significant delay between modprobe and rmmod. Some discussion is also here - http://www.spinics.net/lists/linux-scsi/msg58578.html
Tomas, did this ever get fully fixed? I see that: commit f07d3f59e35eb0fc8847587f601f84b8cfa8dd38 Author: Dan Williams <dan.j.williams> Date: Thu Jun 21 23:47:28 2012 -0700 SCSI: fix hot unplug vs async scan race commit 3b661a92e869ebe2358de8f4b3230ad84f7fce51 upstream. Was added in 3.4.8 and that is one of the patches you said were needed, but I'm not sure if another ever was needed or included.
(In reply to comment #22) > SCSI: fix hot unplug vs async scan race > > commit 3b661a92e869ebe2358de8f4b3230ad84f7fce51 upstream. > > > Was added in 3.4.8 and that is one of the patches you said were needed, but > I'm not sure if another ever was needed or included. I'm not sure this fixes the race we see here, right now it's not easy to test for me, because my 'prototype' board doesn't work with latest upstream. I've had the feeling there was another patch which fixed this though. So a tester is needed :)
# Mass update to all open bugs. Kernel 3.6.2-1.fc16 has just been pushed to updates. This update is a significant rebase from the previous version. Please retest with this kernel, and let us know if your problem has been fixed. In the event that you have upgraded to a newer release and the bug you reported is still present, please change the version field to the newest release you have encountered the issue with. Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered. If you are not the original bug reporter and you still experience this bug, please file a new report, as it is possible that you may be seeing a different problem. (Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).
With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report.