Bug 737085

Summary: BUG: unable to handle kernel paging request at ffffc9002b75e15c in pci_enable_msix()
Product: [Fedora] Fedora Reporter: Albert Strasheim <fullung>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16CC: fullung, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-28 21:52:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Albert Strasheim 2011-09-09 14:51:42 UTC
Description of problem:

[ 9079.973667] BUG: unable to handle kernel paging request at ffffc9002b75e15c
[ 9079.980973] IP: [<ffffffff8127d19f>] pci_enable_msix+0x270/0x349
[ 9079.987276] PGD 17de088067 PUD 2fdc810067 PMD 15740b4067 PTE 0
[ 9079.993577] Oops: 0000 [#1] SMP
[ 9079.997163] CPU 0
[ 9079.999040] Modules linked in: mpt2sas(+) binfmt_misc ses enclosure mlx4_ib mlx4_en microcode joydev serio_raw iTCO_wdt i2c_i801 iTCO_vendor_support scsi_transport_sas ioatdma raid_class mlx4_core igb i7core_edac dca edac_core w83795 w83627ehf hwmon_vid coretemp adm1021 i2c_core ib_ipoib ib_cm ib_addr ib_sa ib_uverbs ib_umad ib_mad ib_core ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler [last unloaded: mpt2sas]
[ 9080.040917]
[ 9080.042654] Pid: 32717, comm: work_for_cpu Tainted: G        W   3.1.0-0.rc3.git0.0.fc16.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 9080.055075] RIP: 0010:[<ffffffff8127d19f>]  [<ffffffff8127d19f>] pci_enable_msix+0x270/0x349
[ 9080.063995] RSP: 0018:ffff882ccbef9c20  EFLAGS: 00010286
[ 9080.069545] RAX: ffffc9002b75e15c RBX: ffff8817a0ec1148 RCX: 0000000000000000
[ 9080.076908] RDX: 0000000000060005 RSI: 0000000000000001 RDI: 0000000000000282
[ 9080.084272] RBP: ffff882ccbef9c90 R08: 0000000000000002 R09: 0000000000000001
[ 9080.091628] R10: 0000000000000000 R11: ffffffff82986f10 R12: ffff8815711e9260
[ 9080.098993] R13: ffff882ccbef9cc0 R14: 0000000000000003 R15: ffff8817a0ec1c40
[ 9080.106358] FS:  0000000000000000(0000) GS:ffff8817de600000(0000) knlGS:0000000000000000
[ 9080.114871] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 9080.120843] CR2: ffffc9002b75e15c CR3: 0000000001a05000 CR4: 00000000000006f0
[ 9080.128208] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9080.135572] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 9080.142929] Process work_for_cpu (pid: 32717, threadinfo ffff882ccbef8000, task ffff882f967224d0)
[ 9080.152218] Stack:
[ 9080.154474]  00000000000080d0 0000000000000000 0000000000000000 ffff8817a0ec1c40
[ 9080.162531]  000000c2cbef9c90 ffffc9000008815c 00000000000000c4 c00e881500002000
[ 9080.170594]  ffff8817a0ec1148 ffff8815706eace0 ffff8817a0ec1148 0000000000004000
[ 9080.178652] Call Trace:
[ 9080.181335]  [<ffffffffa018930d>] mpt2sas_base_map_resources+0x2e8/0x4dd [mpt2sas]
[ 9080.189331]  [<ffffffffa018b633>] mpt2sas_base_attach+0x46/0x140d [mpt2sas]
[ 9080.196517]  [<ffffffff8108eed1>] ? lock_release+0x1a4/0x1d1
[ 9080.202413]  [<ffffffff815042d3>] ? _raw_spin_unlock+0x28/0x3b
[ 9080.208471]  [<ffffffff81076ae6>] ? __alloc_workqueue_key+0x29c/0x2ce
[ 9080.215140]  [<ffffffffa0192c82>] _scsih_probe+0x429/0x60c [mpt2sas]
[ 9080.226666]  [<ffffffff81504297>] ? _raw_spin_unlock_irqrestore+0x4d/0x61
[ 9080.233676]  [<ffffffff81073b06>] ? move_linked_works+0x6e/0x6e
[ 9080.239831]  [<ffffffff8126c78f>] local_pci_probe+0x44/0x75
[ 9080.245638]  [<ffffffff81073b1c>] do_work_for_cpu+0x16/0x28
[ 9080.251448]  [<ffffffff8107a18d>] kthread+0xa8/0xb0
[ 9080.256555]  [<ffffffff8150d284>] kernel_thread_helper+0x4/0x10
[ 9080.262700]  [<ffffffff815046f4>] ? retint_restore_args+0x13/0x13
[ 9080.269020]  [<ffffffff8107a0e5>] ? __init_kthread_worker+0x5a/0x5a
[ 9080.275519]  [<ffffffff8150d280>] ? gs_change+0x13/0x13
[ 9080.280973] Code: 00 0f b7 42 04 c1 e0 04 83 c0 0c 89 45 b8 41 8b 44 24 0c 89 02 41 8b 7c 24 0c 89 4d a0 e8 02 76 e4 ff 48 63 45 b8 49 03 44 24 20 <8b> 00 be 01 00 00 00 41 89 44 24 08 4c 89 e7 e8 d3 f2 ff ff 4d
[ 9080.304317] RIP  [<ffffffff8127d19f>] pci_enable_msix+0x270/0x349
[ 9080.310698]  RSP <ffff882ccbef9c20>
[ 9080.314424] CR2: ffffc9002b75e15c
[ 9080.317976] ---[ end trace fab739e6cd1dc536 ]---

Version-Release number of selected component (if applicable):

kernel-3.1.0-0.rc3.git0.0.fc16.x86_64

How reproducible:

Always

Steps to Reproduce:
1. rmmod mpt2sas
2. modprobe mpt2sas
3. Goto 1 until crash.

Comment 1 Albert Strasheim 2011-09-26 10:57:32 UTC
It seems the crash can happen in many places. Here's another partial trace with 3.1.0-0.rc6.git0.3.fc16.x86_64:

BUG: unable to handle kernel paging request at ffffffffa00f11be
IP: [<ffffffffa00f11be>] 0xffffffffa00f11bd
PGD 1a07067 PUD 1a0b063 PMD 2d8d08d067 PTE 0
Oops: 0010 [#1] SMP
CPU 0
Modules linked in: raid0 ses enclosure mlx4_ib mlx4_en ioatdma microcode igb joydev serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support i7core_edac edac_core mlx4_core scsi_transport_sas raid_class dca w83795 w83627ehf hwmon_vid coretemp adm1021 i2c_core ib_ipoib ib_cm ib_addr ib_sa ib_uverbs ib_umad ib_mad ib_core ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler [last unloaded: mpt2sas]
Pid: 141, comm: kworker/u:2 Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
RIP: 0010:[<ffffffffa00f11be>]  [<ffffffffa00f11be>] 0xffffffffa00f11bd
RSP: 0018:ffff88179a789e48  EFLAGS: 00010246
RAX: ffffffff81cc7dc8 RBX: ffff882fa235d000 RCX: ffff88157dd4a738
RDX: ffff88157dd4a738 RSI: ffffffff81cc7dc8 RDI: ffff88157dd4a738
RBP: ffff88179a789ea0 R08: ffff88157dd4a740 R09: 0000000000d395e0
R10: 0000000000d395e0 R11: ffffffff81a01fd8 R12: ffffffff81cc7dc0
R13: ffff88178efffa00 R14: ffffffffa00f11be R15: ffff88178efffa05
FS:  0000000000000000(0000) GS:ffff8817dfc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa00f11be CR3: 0000000001a05000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:2 (pid: 141, threadinfo ffff88179a788000, task ffff88179f5d4590)
Stack:
 ffffffff8106ee4c 0000000000000004 ffff88157dd4a740 0000000081cc6f90
 ffff88157dd4a738 ffffffff81cc7dc0 ffff882fa235d000 ffffffff81cc7dc0
 ffff882fa235d020 ffff88179f5d4590 ffff882fa235d020 ffff88179a789ee0
Call Trace:
 [<ffffffff8106ee4c>] ? process_one_work+0x176/0x2a9
 [<ffffffff8106f95a>] worker_thread+0xda/0x15d
 [<ffffffff8106f880>] ? manage_workers+0x176/0x176
 [<ffffffff81072da7>] kthread+0x84/0x8c
 [<ffffffff814be134>] kernel_thread_helper+0x4/0x10
 [<ffffffff81072d23>] ? kthread_worker_fn+0x148/0x148

Comment 2 Chuck Ebbert 2011-09-28 14:54:58 UTC
(In reply to comment #1)
> It seems the crash can happen in many places. Here's another partial trace with
> 3.1.0-0.rc6.git0.3.fc16.x86_64:
> 
That appears to be a completely different problem.
Does the original one happen with 3.1-rc8? rc3 is very old now.

Comment 3 Albert Strasheim 2011-09-29 11:08:22 UTC
[  248.639216] BUG: unable to handle kernel paging request at ffffffffa01561be
[  248.646536] IP: [<ffffffffa01561be>] 0xffffffffa01561bd
[  248.652050] PGD 1a07067 PUD 1a0b063 PMD 2f8d7fb067 PTE 0
[  248.657832] Oops: 0010 [#1] SMP
[  248.661422] CPU 12
[  248.663390] Modules linked in: ses enclosure mlx4_ib mlx4_en microcode serio_raw joydev i2c_i801 iTCO_wdt iTCO_vendor_support i7core_edac ioatdma edac_core scsi_transport_sas raid_class igb mlx4_core dca w83795 w83627ehf hwmon_vid coretemp adm1021 i2c_core ib_ipoib ib_cm ib_addr ib_sa ib_uverbs ib_umad ib_mad ib_core ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler [last unloaded: mpt2sas]
[  248.703143]
[  248.704881] Pid: 1690, comm: kworker/u:7 Not tainted 3.1.0-0.rc8.git0.0.fc16.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
[  248.716255] RIP: 0010:[<ffffffffa01561be>]  [<ffffffffa01561be>] 0xffffffffa01561bd
[  248.724396] RSP: 0018:ffff8815868dfe48  EFLAGS: 00010246
[  248.729946] RAX: ffffffff81cc7dc8 RBX: ffff882f8d55f980 RCX: ffff88179face738
[  248.737302] RDX: ffff88179face738 RSI: ffffffff81cc7dc8 RDI: ffff88179face738
[  248.744657] RBP: ffff8815868dfea0 R08: ffff88179face740 R09: ffff881580bf6a20
[  248.752021] R10: ffff881580bf6a20 R11: ffff8817a3741fd8 R12: ffffffff81cc7dc0
[  248.759377] R13: ffff881580a96400 R14: ffffffffa01561be R15: ffff881580a96405
[  248.766733] FS:  0000000000000000(0000) GS:ffff8817dfcc0000(0000) knlGS:0000000000000000
[  248.775255] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  248.781227] CR2: ffffffffa01561be CR3: 0000000001a05000 CR4: 00000000000006e0
[  248.788583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  248.795947] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  248.803303] Process kworker/u:7 (pid: 1690, threadinfo ffff8815868de000, task ffff88158bb95cc0)
[  248.812421] Stack:
[  248.814669]  ffffffff8106ee94 ffffffffffffffa1 ffff88179face740 00000000a373d410
[  248.822723]  ffff88179face738 ffffffff81cc7dc0 ffff882f8d55f980 ffffffff81cc7dc0
[  248.830779]  ffff882f8d55f9a0 ffff88158bb95cc0 ffff882f8d55f9a0 ffff8815868dfee0
[  248.838842] Call Trace:
[  248.841537]  [<ffffffff8106ee94>] ? process_one_work+0x176/0x2a9
[  248.847775]  [<ffffffff8106f9a2>] worker_thread+0xda/0x15d
[  248.853489]  [<ffffffff8106f8c8>] ? manage_workers+0x176/0x176
[  248.859557]  [<ffffffff81072def>] kthread+0x84/0x8c
[  248.864666]  [<ffffffff814be1f4>] kernel_thread_helper+0x4/0x10
[  248.870819]  [<ffffffff81072d6b>] ? kthread_worker_fn+0x148/0x148
[  248.877166]  [<ffffffff814be1f0>] ? gs_change+0x13/0x13
[  248.882649] Code:  Bad RIP value.
[  248.886329] RIP  [<ffffffffa01561be>] 0xffffffffa01561bd
[  248.891930]  RSP <ffff8815868dfe48>
[  248.895657] CR2: ffffffffa01561be
[  248.899209] ---[ end trace 5a062bfd9b2bdd0c ]---

Comment 4 Albert Strasheim 2011-09-29 11:15:02 UTC
It seems rc3 crashes in one place and rc6 and rc8 crash in another place, but the crash is always triggered by rmmoding mpt2sas.

Comment 5 Josh Boyer 2012-02-28 21:52:06 UTC
This seems to basically be additional fallout from 737083.  I'm going to dupe this bug to that for now.

*** This bug has been marked as a duplicate of bug 737083 ***