Bug 761508

Summary: kernel BUG at drivers/iommu/intel-iommu.c:1767 while registering memory for InfiniBand
Product: [Fedora] Fedora Reporter: Albert Strasheim <fullung>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: dwmw2, fullung, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-20 19:03:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Albert Strasheim 2011-12-08 13:48:12 UTC
Description of problem:

kernel panic when registering 144* 32 MB buffers as memory regions

Looks a lot like

https://bugzilla.redhat.com/show_bug.cgi?id=712806

again

Version-Release number of selected component (if applicable):

kernel-3.1.1-2.fc16.x86_64
libibverbs-1.1.5-5.fc16.x86_64

How reproducible:

Always

Steps to Reproduce:
1. register 144* 32 MB buffers as memory regions
2.
3.
  
Actual results:

kernel panic

Expected results:


Additional info:

[  597.407974] ------------[ cut here ]------------
[  597.412843] kernel BUG at drivers/iommu/intel-iommu.c:1767!
[  597.418652] invalid opcode: 0000 [#1] SMP
[  597.423114] CPU 0
[  597.424993] Modules linked in: binfmt_misc ses enclosure mlx4_ib
mlx4_en microcode serio_raw joydev i2c_i801 iTCO_wdt
iTCO_vendor_support ioatdma igb mpt2sas mlx4_core scsi_transport_sas
raid_class i7core_edac edac_core dca w83795 w83627ehf hwmon_vid
coretemp adm1021 i2c_core ib_ipoib ib_cm ib_addr ib_sa ib_uverbs
ib_umad ib_mad ib_core ipmi_poweroff ipmi_watchdog ipmi_devintf
ipmi_si ipmi_msghandler [last unloaded: scsi_wait_scan]
[  597.467309]
[  597.469040] Pid: 3789, comm: foo Not tainted
3.1.1-2.fc16.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
[  597.479379] RIP: 0010:[<ffffffff813c0542>]  [<ffffffff813c0542>]
__domain_mapping+0x41/0x251
[  597.488304] RSP: 0018:ffff8814ac599bf8  EFLAGS: 00010206
[  597.493849] RAX: 000000000fffffff RBX: ffff881674b93018 RCX: 0000000000000024
[  597.501210] RDX: ffff881674b93018 RSI: ffffffffffffff80 RDI: ffff88178dc04e00
[  597.508575] RBP: ffff8814ac599c68 R08: 000000000000007f R09: 0000000000000003
[  597.515936] R10: 00000000000162b7 R11: 0000000000016268 R12: ffff881674b93018
[  597.523297] R13: 000000000000007f R14: ffffffffffffff80 R15: 000000000000007f
[  597.530663] FS:  00007f6e3cb67700(0000) GS:ffff8817dfc00000(0000)
knlGS:0000000000000000
[  597.539170] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  597.545154] CR2: 00007f6e34003038 CR3: 0000002e4b14f000 CR4: 00000000000006f0
[  597.552510] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  597.559870] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  597.567230] Process flowrouter (pid: 3789, threadinfo
ffff8814ac598000, task ffff88168c82c590)
[  597.576263] Stack:
[  597.578513]  ffff88168c82c590 000000000000007f ffff88178dc04e00
ffff88178dc04e00
[  597.586585]  0000fffffffff000 000000000000007f 0000000000000000
ffffffffffffff80
[  597.594647]  000000000000007f ffff881674b93018 ffff88178dc04e00
000000000000007f
[  597.602700] Call Trace:
[  597.605387]  [<ffffffff813c1bef>] intel_map_sg+0x15c/0x1d6
[  597.611108]  [<ffffffffa003bbe6>] ib_umem_get+0x317/0x42d [ib_core]
[  597.617612]  [<ffffffffa0162305>] mlx4_ib_reg_user_mr+0x79/0x15b [mlx4_ib]
[  597.624710]  [<ffffffff81043ff3>] ? should_resched+0xe/0x2d
[  597.630512]  [<ffffffff814b5ad5>] ? _cond_resched+0xe/0x22
[  597.636232]  [<ffffffffa0062fe6>] ib_uverbs_reg_mr+0x144/0x29a [ib_uverbs]
[  597.643334]  [<ffffffffa00613c1>] ib_uverbs_write+0xb6/0xc1 [ib_uverbs]
[  597.650177]  [<ffffffff81129186>] vfs_write+0xac/0xf3
[  597.655463]  [<ffffffff81129375>] sys_write+0x4a/0x6e
[  597.660744]  [<ffffffff814bd902>] system_call_fastpath+0x16/0x1b
[  597.666973] Code: 48 89 4d c0 48 89 7d a8 49 89 d4 6b 4f 4c 09 48
89 75 c8 4d 89 c7 83 c1 12 83 f9 3f 7f 0f 4a 8d 44 06 ff 48 d3 e8 48
85 c0 74 02 <0f> 0b 41 f6 c1 03 0f 84 e9 01 00 00 41 81 e1 03 08 00 00
45 31
[  597.690327] RIP  [<ffffffff813c0542>] __domain_mapping+0x41/0x251
[  597.696704]  RSP <ffff8814ac599bf8>
[  597.700601] ---[ end trace bd543b01b0d3c89e ]---
[  597.705549] ------------[ cut here ]------------

Comment 1 Chuck Ebbert 2011-12-09 01:52:15 UTC
> [  597.412843] kernel BUG at drivers/iommu/intel-iommu.c:1767!

That's:

    BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1) >> addr_width);

which appears to be saying you've simply tried to allocate too much memory?

Comment 2 Josh Boyer 2011-12-09 02:02:46 UTC
This is being discussed upstream

http://thread.gmane.org/gmane.linux.drivers.rdma/10450

Comment 3 Albert Strasheim 2012-01-20 19:03:47 UTC
Seems to have been fixed in 3.2.1.