737083 – mpt2sas rmmod panic

Bug 737083 - mpt2sas rmmod panic

Summary: mpt2sas rmmod panic

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	16
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Tomas Henzl
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	737085 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-09-09 14:39 UTC by Albert Strasheim
Modified:	2012-11-14 17:04 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-11-14 17:04:12 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
paging request BUG trace (2.96 KB, text/plain) 2012-02-28 09:16 UTC, Albert Strasheim	no flags	Details
sysfs warning trace (6.10 KB, text/plain) 2012-02-28 09:16 UTC, Albert Strasheim	no flags	Details
3.2.7-1 trace of paging request BUG (13.71 KB, text/plain) 2012-03-01 12:43 UTC, Albert Strasheim	no flags	Details
View All

Description Albert Strasheim 2011-09-09 14:39:26 UTC

Description of problem:

[ 8306.301671] ------------[ cut here ]------------
[ 8306.306567] WARNING: at lib/debugobjects.c:262 debug_print_object+0x7c/0x8d()
[ 8306.313968] Hardware name: X8DTH-i/6/iF/6F
[ 8306.318337] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x2b
[ 8306.328741] Modules linked in: binfmt_misc ses enclosure mlx4_ib mlx4_en microcode joydev serio_raw iTCO_wdt i2c_i801 iTCO_vendor_support mpt2sas(-) scsi_transport_sas ioatdma raid_class mlx4_core igb i7core_edac dca edac_core w83795 w83627ehf hwmon_vid coretemp adm1021 i2c_core ib_ipoib ib_cm ib_addr ib_sa ib_uverbs ib_umad ib_mad ib_core ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler [last unloaded: scsi_wait_scan]
[ 8306.371293] Pid: 20082, comm: rmmod Tainted: G        W   3.1.0-0.rc3.git0.0.fc16.x86_64 #1
[ 8306.380135] Call Trace:
[ 8306.382861]  [<ffffffff8105c528>] warn_slowpath_common+0x83/0x9b
[ 8306.389133]  [<ffffffff8105c5e3>] warn_slowpath_fmt+0x46/0x48
[ 8306.395150]  [<ffffffff812593bf>] debug_print_object+0x7c/0x8d
[ 8306.401252]  [<ffffffff81075729>] ? __queue_work+0x2c2/0x2c2
[ 8306.407182]  [<ffffffff81259bab>] debug_check_no_obj_freed+0x96/0x177
[ 8306.413894]  [<ffffffff814fc4be>] ? __slab_free+0x16f/0x24c
[ 8306.419733]  [<ffffffff81328e47>] ? scsi_host_dev_release+0xbd/0xc2
[ 8306.426268]  [<ffffffff8112efb0>] slab_free_hook+0x6b/0x74
[ 8306.432024]  [<ffffffff81130bfc>] kfree+0xb4/0x131
[ 8306.437088]  [<ffffffff81328e47>] scsi_host_dev_release+0xbd/0xc2
[ 8306.443451]  [<ffffffff8131328d>] device_release+0x4b/0x7f
[ 8306.449207]  [<ffffffff8124981b>] kobject_release+0x11d/0x154
[ 8306.455222]  [<ffffffff812496fe>] ? kobject_del+0x36/0x36
[ 8306.460885]  [<ffffffff8124ac2f>] kref_put+0x43/0x4d
[ 8306.466122]  [<ffffffff81249661>] kobject_put+0x45/0x49
[ 8306.471610]  [<ffffffff81313097>] put_device+0x17/0x19
[ 8306.477011]  [<ffffffff81328e78>] scsi_host_put+0x15/0x17
[ 8306.482678]  [<ffffffffa01a34f7>] _scsih_remove+0x199/0x1a8 [mpt2sas]
[ 8306.489382]  [<ffffffff8126ccae>] pci_device_remove+0x3d/0x8f
[ 8306.495398]  [<ffffffff813167cf>] __device_release_driver+0x86/0xd2
[ 8306.501932]  [<ffffffff81316ed3>] driver_detach+0x99/0xc2
[ 8306.507601]  [<ffffffff81316693>] bus_remove_driver+0xba/0xdf
[ 8306.513617]  [<ffffffff81317579>] driver_unregister+0x6a/0x75
[ 8306.519624]  [<ffffffff8126ce89>] pci_unregister_driver+0x44/0x8d
[ 8306.525983]  [<ffffffffa01a352b>] _scsih_exit+0x25/0xafa [mpt2sas]
[ 8306.532437]  [<ffffffff81098a58>] sys_delete_module+0x1dd/0x251
[ 8306.538625]  [<ffffffff810b4e1b>] ? audit_syscall_entry+0x11c/0x148
[ 8306.545159]  [<ffffffff812536fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 8306.551866]  [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b
[ 8306.558140] ---[ end trace fab739e6cd1dc530 ]---

Version-Release number of selected component (if applicable):

kernel-3.1.0-0.rc3.git0.0.fc16.x86_64

How reproducible:

Sometimes

Steps to Reproduce:
1. rmmod mpt2sas
  
Additional info:

Comment 1 Albert Strasheim 2011-09-09 14:45:38 UTC

Another strange warning while loading/unloading mpt2sas:


[ 8778.456040] ------------[ cut here ]------------
[ 8778.461103] WARNING: at drivers/pci/msi.c:794 pci_enable_msix+0xae/0x349()
[ 8778.468261] Hardware name: X8DTH-i/6/iF/6F
[ 8778.472656] Modules linked in: mpt2sas(+) binfmt_misc ses enclosure mlx4_ib mlx4_en microcode joydev serio_raw iTCO_wdt i2c_i801 iTCO_vendor_support scsi_transport_sas ioatdma raid_class mlx4_core igb i7core_edac dca edac_core w83795 w83627ehf hwmon_vid coretemp adm1021 i2c_core ib_ipoib ib_cm ib_addr ib_sa ib_uverbs ib_umad ib_mad ib_core ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler [last unloaded: mpt2sas]
[ 8778.514871] Pid: 27632, comm: work_for_cpu Tainted: G        W   3.1.0-0.rc3.git0.0.fc16.x86_64 #1
[ 8778.524265] Call Trace:
[ 8778.526989]  [<ffffffff8105c528>] warn_slowpath_common+0x83/0x9b
[ 8778.533271]  [<ffffffff8105c55a>] warn_slowpath_null+0x1a/0x1c
[ 8778.539373]  [<ffffffff8127cfdd>] pci_enable_msix+0xae/0x349
[ 8778.545305]  [<ffffffff8113132e>] ? __kmalloc+0xfa/0x10c
[ 8778.550891]  [<ffffffffa0188725>] ? kcalloc.constprop.7+0x32/0x34 [mpt2sas]
[ 8778.558128]  [<ffffffffa018930d>] mpt2sas_base_map_resources+0x2e8/0x4dd [mpt2sas]
[ 8778.566185]  [<ffffffffa018b633>] mpt2sas_base_attach+0x46/0x140d [mpt2sas]
[ 8778.573442]  [<ffffffff8108eed1>] ? lock_release+0x1a4/0x1d1
[ 8778.579399]  [<ffffffff815042d3>] ? _raw_spin_unlock+0x28/0x3b
[ 8778.585528]  [<ffffffff81076ae6>] ? __alloc_workqueue_key+0x29c/0x2ce
[ 8778.592270]  [<ffffffffa0192c82>] _scsih_probe+0x429/0x60c [mpt2sas]
[ 8778.598924]  [<ffffffff81504297>] ? _raw_spin_unlock_irqrestore+0x4d/0x61
[ 8778.606012]  [<ffffffff81073b06>] ? move_linked_works+0x6e/0x6e
[ 8778.612236]  [<ffffffff8126c78f>] local_pci_probe+0x44/0x75
[ 8778.618192]  [<ffffffff81073b1c>] do_work_for_cpu+0x16/0x28
[ 8778.624053]  [<ffffffff8107a18d>] kthread+0xa8/0xb0
[ 8778.629238]  [<ffffffff8150d284>] kernel_thread_helper+0x4/0x10
[ 8778.635451]  [<ffffffff815046f4>] ? retint_restore_args+0x13/0x13
[ 8778.641840]  [<ffffffff8107a0e5>] ? __init_kthread_worker+0x5a/0x5a
[ 8778.648407]  [<ffffffff8150d280>] ? gs_change+0x13/0x13
[ 8778.653934] ---[ end trace fab739e6cd1dc531 ]---

Comment 2 Albert Strasheim 2011-09-09 14:47:35 UTC

Also, sometimes modprobe mpt2sas fails with the following error, but the next attempt works again:

[ 8947.624754] mpt2sas version 09.100.00.00 loaded
[ 8947.630768] scsi7 : Fusion MPT SAS Host
[ 8947.636974] mpt2sas 0000:04:00.0: BAR 1: can't reserve [mem 0xfaf3c000-0xfaf3ffff 64bit]
[ 8947.645549] mpt2sas0: pci_request_selected_regions: failed
[ 8947.651332] mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:7625/_scsih_probe()!
[ 8953.905824] mpt2sas version 09.100.00.00 unloading
[ 8955.075560] mpt2sas version 09.100.00.00 loaded
[ 8955.081465] scsi8 : Fusion MPT SAS Host
[ 8955.087667] mpt2sas 0000:04:00.0: setting latency timer to 64
[ 8955.093726] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (198086684 kB)

Comment 3 Josh Boyer 2011-10-24 19:51:06 UTC

Can you recreate this with the latest f16 kernel?

Comment 4 Albert Strasheim 2011-10-25 07:38:33 UTC

rmmod mpt2sas is completely broken and causes the kernel to panic every time.

Comment 5 Stanislaw Gruszka 2012-02-26 11:44:08 UTC

Tomas, can you CC LSI developer here?

Albert is this mpt2sas rmmod panic still reproducible on the lates kernels 3.2 (and/or 3.3-rc) ? If so can you install kernel-debug and see if it print warning and continue execution, instead of just panic?

Comment 6 Albert Strasheim 2012-02-26 14:34:51 UTC

I just retested with 3.2.3-2.fc16.x86_64, and I now have the opposite problem: even with all my file systems unmounted (machine is booted with an initramfs), I can't rmmod mpt2sas:

ERROR: Module mpt2sas is in use.

/sys/module/mpt2sas/refcnt says 26, if that makes a difference.

Comment 7 Tomas Henzl 2012-02-27 12:56:00 UTC

Hi Nandigama,
this is a upstream/fedora issue, I think it could be interesting for you.

Comment 8 Nagalakshmi 2012-02-28 07:25:20 UTC

(In reply to comment #6)
> I just retested with 3.2.3-2.fc16.x86_64, and I now have the opposite problem:
> even with all my file systems unmounted (machine is booted with an initramfs),
> I can't rmmod mpt2sas:
> ERROR: Module mpt2sas is in use.
> /sys/module/mpt2sas/refcnt says 26, if that makes a difference.

Request you to provide the debug logs for the kernel you are testing.

Comment 9 Albert Strasheim 2012-02-28 07:33:25 UTC

Will try to do it.

Comment 10 Albert Strasheim 2012-02-28 09:15:56 UTC

Tested with 3.2.3-2.fc16.x86_64.debug.

It turned up this warning on first rmmod mpt2sas:

WARNING: at fs/sysfs/inode.c:323 sysfs_hash_and_remove+0xa9/0xb0()
sysfs: can not remove 'bsg', no directory

If I then modprobe and rmmod mpt2sas a few more times, I get:

BUG: unable to handle kernel paging request at ffffffffa01874b8
IP: [<ffffffff8141d1e1>] do_scsi_scan_host+0x61/0xa0

Will attach complete traces now.

Comment 11 Albert Strasheim 2012-02-28 09:16:27 UTC

Created attachment 566262 [details]
paging request BUG trace

Comment 12 Albert Strasheim 2012-02-28 09:16:46 UTC

Created attachment 566263 [details]
sysfs warning trace

Comment 13 Stanislaw Gruszka 2012-02-28 10:51:55 UTC

> WARNING: at fs/sysfs/inode.c:323 sysfs_hash_and_remove+0xa9/0xb0()
> sysfs: can not remove 'bsg', no directory
This is already addressed (bug 787862). Please use updated kernel 3.2.7-1 or newer.

Comment 14 Josh Boyer 2012-02-28 21:52:06 UTC

*** Bug 737085 has been marked as a duplicate of this bug. ***

Comment 15 Albert Strasheim 2012-03-01 12:42:23 UTC

Attaching a trace with 3.2.7-1 debug.

The trace starts where I mdadm --stop /dev/md127 which is probably the RAID-1 over the first two disks attached to the mpt2sas controller.

I then did:

rmmod mpt2sas
modprobe mpt2sas
rmmod mpt2sas
BOOM

Comment 16 Albert Strasheim 2012-03-01 12:43:16 UTC

Created attachment 566836 [details]
3.2.7-1 trace of paging request BUG

Comment 17 Dave Jones 2012-03-22 16:55:16 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 18 Dave Jones 2012-03-22 16:58:54 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 19 Dave Jones 2012-03-22 17:10:16 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 20 Albert Strasheim 2012-03-23 16:08:04 UTC

after modprobe and rmmod about 5 or 6 times:

[ 1544.748626] BUG: unable to handle kernel paging request at ffffffffa02094b8
[ 1544.755851] IP: [<ffffffff81439631>] do_scsi_scan_host+0x61/0xa0
[ 1544.762034] PGD 1c07067 PUD 1c0b063 PMD 2da2b15067 PTE 0
[ 1544.767757] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1544.772640] CPU 4
[ 1544.774545] Modules linked in: mpt2sas(+) binfmt_misc netconsole ses enclosure mlx4_ib mlx4_en ib_ipoib ib_cm ib_addr ib_sa ib_uverbs ib_umad microcode ib_mad ib_core serio_raw joydev ipmi_poweroff ipmi_watchdog ipmi_devintf i2c_i801 i7core_edac i2c_core mlx4_core iTCO_wdt scsi_transport_sas iTCO_vendor_support raid_class edac_core ioatdma igb dca ipmi_si ipmi_msghandler [last unloaded: mpt2sas]
[ 1544.813049]
[ 1544.814656] Pid: 8019, comm: scsi_scan_7 Tainted: G        W    3.3.0-4.fc16.x86_64.debug #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 1544.826352] RIP: 0010:[<ffffffff81439631>]  [<ffffffff81439631>] do_scsi_scan_host+0x61/0xa0
[ 1544.835033] RSP: 0018:ffff882da1cede50  EFLAGS: 00010246
[ 1544.840467] RAX: ffffffffa0209420 RBX: ffff88151347c290 RCX: 0000000000000001
[ 1544.847698] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000286
[ 1544.854932] RBP: ffff882da1cede60 R08: 0000000000000002 R09: 0000000000000001
[ 1544.862170] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000100130853
[ 1544.869414] R13: ffff881588112ac0 R14: 0000000000000000 R15: 0000000000000000
[ 1544.876647] FS:  0000000000000000(0000) GS:ffff8817dee00000(0000) knlGS:0000000000000000
[ 1544.884887] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1544.890732] CR2: ffffffffa02094b8 CR3: 0000000001c05000 CR4: 00000000000006e0
[ 1544.897970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1544.905222] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1544.912459] Process scsi_scan_7 (pid: 8019, threadinfo ffff882da1cec000, task ffff882d3cbb0000)
[ 1544.921314] Stack:
[ 1544.923438]  ffff882c96f0bcf8 ffff881588112ac0 ffff882da1cede90 ffffffff8143968c
[ 1544.931280]  ffff882c96f0bcf8 ffff881588112ac0 ffffffff81439670 0000000000000000
[ 1544.939156]  ffff882da1cedf40 ffffffff8108b7c7 ffff882d3cbb0000 ffff882d00000000
[ 1544.947036] Call Trace:
[ 1544.949598]  [<ffffffff8143968c>] do_scan_async+0x1c/0x160
[ 1544.955194]  [<ffffffff81439670>] ? do_scsi_scan_host+0xa0/0xa0
[ 1544.961223]  [<ffffffff8108b7c7>] kthread+0xb7/0xc0
[ 1544.966209]  [<ffffffff816acf74>] kernel_thread_helper+0x4/0x10
[ 1544.972240]  [<ffffffff816a28f0>] ? _raw_spin_unlock_irq+0x30/0x50
[ 1544.978527]  [<ffffffff816a3274>] ? retint_restore_args+0x13/0x13
[ 1544.984724]  [<ffffffff8108b710>] ? __init_kthread_worker+0x70/0x70
[ 1544.991096]  [<ffffffff816acf70>] ? gs_change+0x13/0x13
[ 1544.996423] Code: d2 48 8b 83 00 02 00 00 48 8b 80 98 00 00 00 eb 21 66 0f 1f 84 00 00 00 00 00 bf 0a 00 00 00 e8 46 c3 c3 ff 48 8b 83 00 02 00 00 <48> 8b 80 98 00 00 00 48 8b 35 c1 c9 8b 00 48 89 df 4c 29 e6 ff
[ 1545.020485] RIP  [<ffffffff81439631>] do_scsi_scan_host+0x61/0xa0
[ 1545.026749]  RSP <ffff882da1cede50>
[ 1545.030340] CR2: ffffffffa02094b8
[ 1545.033769] ---[ end trace 5220511d4851e1c9 ]---

followed by


[ 1545.038490] BUG: sleeping function called from invalid context at kernel/rwsem.c:21
[ 1545.046291] in_atomic(): 0, irqs_disabled(): 1, pid: 8019, name: scsi_scan_7
[ 1545.053439] INFO: lockdep is turned off.
[ 1545.057470] irq event stamp: 0
[ 1545.060631] hardirqs last  enabled at (0): [<          (null)>]           (null)
[ 1545.068257] hardirqs last disabled at (0): [<ffffffff8105fac0>] copy_process+0x690/0x1860
[ 1545.076667] softirqs last  enabled at (0): [<ffffffff8105fac0>] copy_process+0x690/0x1860
[ 1545.085078] softirqs last disabled at (0): [<          (null)>]           (null)
[ 1545.092713] Pid: 8019, comm: scsi_scan_7 Tainted: G      D W    3.3.0-4.fc16.x86_64.debug #1
[ 1545.101296] Call Trace:
[ 1545.103858]  [<ffffffff810cb650>] ? print_irqtrace_events+0xd0/0xe0
[ 1545.110228]  [<ffffffff81097d35>] __might_sleep+0x135/0x1f0
[ 1545.115902]  [<ffffffff816a0326>] down_read+0x26/0x98
[ 1545.121066]  [<ffffffff8107bdb4>] exit_signals+0x24/0x130
[ 1545.126579]  [<ffffffff81066d2f>] do_exit+0xdf/0xae0
[ 1545.131671]  [<ffffffff81063f02>] ? kmsg_dump+0x182/0x270
[ 1545.137175]  [<ffffffff81063e1c>] ? kmsg_dump+0x9c/0x270
[ 1545.142615]  [<ffffffff816a415c>] oops_end+0xac/0xf0
[ 1545.147687]  [<ffffffff81696442>] no_context+0x27b/0x28a
[ 1545.153111]  [<ffffffff8169662a>] __bad_area_nosemaphore+0x1d9/0x1f8
[ 1545.159579]  [<ffffffff81695c91>] ? pmd_offset+0x1a/0x20
[ 1545.164997]  [<ffffffff8169665c>] bad_area_nosemaphore+0x13/0x15
[ 1545.171111]  [<ffffffff816a704b>] do_page_fault+0x42b/0x590
[ 1545.176791]  [<ffffffff810ce89d>] ? trace_hardirqs_on+0xd/0x10
[ 1545.182731]  [<ffffffff8133303e>] ? free_object+0x8e/0xc0
[ 1545.188232]  [<ffffffff813339c8>] ? debug_object_free+0xe8/0x140
[ 1545.194348]  [<ffffffff81073f62>] ? del_timer_sync+0xa2/0xe0
[ 1545.200110]  [<ffffffff81072cf5>] ? destroy_timer_on_stack+0x15/0x20
[ 1545.206569]  [<ffffffff8132c43d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1545.213197]  [<ffffffff816a34f5>] page_fault+0x25/0x30
[ 1545.218451]  [<ffffffff81439631>] ? do_scsi_scan_host+0x61/0xa0
[ 1545.224476]  [<ffffffff8143968c>] do_scan_async+0x1c/0x160
[ 1545.230065]  [<ffffffff81439670>] ? do_scsi_scan_host+0xa0/0xa0
[ 1545.236084]  [<ffffffff8108b7c7>] kthread+0xb7/0xc0
[ 1545.241070]  [<ffffffff816acf74>] kernel_thread_helper+0x4/0x10
[ 1545.247096]  [<ffffffff816a28f0>] ? _raw_spin_unlock_irq+0x30/0x50
[ 1545.253379]  [<ffffffff816a3274>] ? retint_restore_args+0x13/0x13
[ 1545.259576]  [<ffffffff8108b710>] ? __init_kthread_worker+0x70/0x70
[ 1545.265948]  [<ffffffff816acf70>] ? gs_change+0x13/0x13
[ 1545.275665] mpt2sas version 12.100.00.00 loaded
[ 1545.281373] scsi8 : Fusion MPT SAS Host
[ 1545.287155] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (197413856 kB)
[ 1545.295530] mpt2sas 0000:04:00.0: irq 111 for MSI/MSI-X
[ 1545.301126] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 111
[ 1545.306416] mpt2sas0: iomem(0x00000000faf3c000), mapped(0xffffc9002aa80000), size(16384)
[ 1545.314786] mpt2sas0: ioport(0x000000000000e800), size(256)
[ 1545.441201] mpt2sas0: Allocated physical memory: size(3993 kB)
[ 1545.447212] mpt2sas0: Current Controller Queue Depth(1754), Max Controller Queue Depth(2015)
[ 1545.455937] mpt2sas0: Scatter Gather Elements per IO(128)
[ 1545.520147] mpt2sas0: LSISAS2008: FWVersion(07.00.00.00), ChipRevision(0x03), BiosVersion(07.11.00.00)
[ 1545.529658] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[ 1545.543575] mpt2sas0: sending port enable !!
[ 1545.553525] mpt2sas version 12.100.00.00 unloading
[ 1545.558773] mpt2sas0: sending diag reset !!
[ 1546.587607] mpt2sas0: diag reset: SUCCESS

Comment 21 Tomas Henzl 2012-04-06 14:37:58 UTC

(In reply to comment #20)
> after modprobe and rmmod about 5 or 6 times:
> 
I think what you are seeing is a race between rmmod and a asyn scan thread. After a  modprobe a scan thread is started and if you are fast enough, you remove the driver together with scanning functions the scan thread is trying to use. The mpt2sas driver is only a victim, the problem is in that said scan thread. You can follow the scan's thread work via 'tail - /var/log/messages'.

A workaround is to switch the async scan off with a module option scsi_scan_type http://lxr.linux.no/#linux+v3.3.1/drivers/scsi/scsi_scan.c#L102 ,or add a significant delay between modprobe and rmmod.

Some discussion is also here - http://www.spinics.net/lists/linux-scsi/msg58578.html

Comment 22 Josh Boyer 2012-09-06 14:06:58 UTC

Tomas, did this ever get fully fixed?  I see that:

commit f07d3f59e35eb0fc8847587f601f84b8cfa8dd38
Author: Dan Williams <dan.j.williams>
Date:   Thu Jun 21 23:47:28 2012 -0700

    SCSI: fix hot unplug vs async scan race
    
    commit 3b661a92e869ebe2358de8f4b3230ad84f7fce51 upstream.
    

Was added in 3.4.8 and that is one of the patches you said were needed, but I'm not sure if another ever was needed or included.

Comment 23 Tomas Henzl 2012-09-06 14:34:20 UTC

(In reply to comment #22)
>     SCSI: fix hot unplug vs async scan race
>     
>     commit 3b661a92e869ebe2358de8f4b3230ad84f7fce51 upstream.
>     
> 
> Was added in 3.4.8 and that is one of the patches you said were needed, but
> I'm not sure if another ever was needed or included.

I'm not sure this fixes the race we see here, right now it's not easy to test for me, because my 'prototype' board doesn't work with latest upstream.
I've had the feeling there was another patch which fixed this though. 
So a tester is needed :)

Comment 24 Dave Jones 2012-10-23 15:40:40 UTC

# Mass update to all open bugs.

Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.

Please retest with this kernel, and let us know if your problem has been fixed.

In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with.  Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.

If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
different problem. 
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).

Comment 25 Justin M. Forbes 2012-11-14 17:04:12 UTC

With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report.

Note You need to log in before you can comment on or make changes to this bug.