Bug 688646 - intel_iommu domain id exhaustion
Summary: intel_iommu domain id exhaustion
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.7
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Virtualization Bugs
Depends On:
Blocks: Rhel5KvmTier2
TreeView+ depends on / blocked
Reported: 2011-03-17 16:02 UTC by Alex Williamson
Modified: 2013-01-09 23:40 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 619455
Last Closed: 2011-07-21 10:26:22 UTC

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1065 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 09:21:37 UTC

Comment 1 RHEL Product and Program Management 2011-03-19 22:49:07 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update

Comment 2 Chris Ward 2011-03-21 09:18:23 UTC

Please confirm your intentions to validate this fix if included in 5.7.0.


Comment 6 Jarod Wilson 2011-03-23 21:45:17 UTC
Patch(es) available in kernel-2.6.18-250.el5
Detailed testing feedback is always welcomed.

Comment 8 Chao Yang 2011-05-31 10:26:04 UTC
Reproduced on kernel 2.6.18-238.el5, repeatedly detach/attach nic card which uses tg3 as kernel driver over 244 times via a script results in host kernel panic:
IOMMU: no free domain ids
Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: 
 [<ffffffff80157b56>] list_del+0x1/0x71
PGD 3183d7067 PUD 31869c067 PMD 0 
Oops: 0000 [1] SMP 
last sysfs file: /bus/pci/drivers/tg3/bind
CPU 0 
Modules linked in: tun autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bridge ipt_REJECT xt_tcpudp ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i cxgb3 libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy ksm(U) kvm_intel(U) kvm(U) joydev snd_hda_intel snd_seq_dummy sr_mod cdrom snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep sg igb snd serio_raw pcspkr shpchp 8021q soundcore i7core_edac edac_mc dca tg3 tpm_tis tpm tpm_bios dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod mptsas mptscsih scsi_transport_sas mptbase ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 5814, comm: repeated-unbind Tainted: G      2.6.18-238.el5 #1
RIP: 0010:[<ffffffff80157b56>]  [<ffffffff80157b56>] list_del+0x1/0x71
RSP: 0018:ffff8102974e3c18  EFLAGS: 00010007
RAX: ffff8102977996d0 RBX: 0000000000000000 RCX: ffffffff80319f28
RDX: ffffffff80319f28 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000286 R08: ffffffff80319f28 R09: 000000000000003d
R10: ffff8102974e38e8 R11: 0000000000000080 R12: ffff8102977996c0
R13: 0000000000002000 R14: ffff81032f0a2800 R15: 0000000000000000
FS:  00002ae33b24cf50(0000) GS:ffffffff80425000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000318361000 CR4: 00000000000026e0
Process repeated-unbind (pid: 5814, threadinfo ffff8102974e2000, task ffff810326aec040)
Stack:  0000000000000000 ffffffff80169d6c ffff81032f0a2800 ffff8102977996c0
 0000000000000030 ffffffff8016a20b ffff81032f0a2800 ffff8102977996c0
 0000000000000030 0000000000002000 ffff81032f0a2800 ffffffff8016b089
Call Trace:
 [<ffffffff80169d6c>] domain_remove_dev_info+0x16/0xab
 [<ffffffff8016a20b>] domain_exit+0x19/0x14b
 [<ffffffff8016b089>] get_domain_for_dev+0x30d/0x536
 [<ffffffff8016b2c5>] __get_valid_domain_for_dev+0x13/0x6d
 [<ffffffff8016b42a>] __intel_map_single+0x5d/0x172
 [<ffffffff8016b9e1>] intel_alloc_coherent+0xb3/0xd8
 [<ffffffff88215b52>] :tg3:tg3_init_one+0xa21/0x14a4
 [<ffffffff8016168a>] pci_device_probe+0x104/0x184
 [<ffffffff801cad74>] driver_helper+0x0/0x1b
 [<ffffffff80287e04>] klist_del+0x1d/0x2a
 [<ffffffff801cbab4>] driver_probe_device+0x52/0xaa
 [<ffffffff801cb84e>] driver_bind+0x9f/0x11b
 [<ffffffff8010fee2>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016a81>] vfs_write+0xce/0x174
 [<ffffffff80017339>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Code: 48 8b 47 08 48 89 fb 48 8b 10 48 39 fa 74 1b 48 89 fe 31 c0 
RIP  [<ffffffff80157b56>] list_del+0x1/0x71
 RSP <ffff8102974e3c18>
CR2: 0000000000000008
 <0>Kernel panic - not syncing: Fatal exception

-------Verified on kernel 2.6.18-264.el5 with same nic card using same script, detach/attach over one thousand times, host works fine. 

-------nic card info:
lspci -vvv -s 02:00.0
	Kernel driver in use: tg3
	Kernel modules: tg3

-------script used to detach/attach nic card:

i=1; while echo 0000:01:00.0 > /sys/bus/pci/drivers/tg3/unbind; do echo $i; i=$[i+1]; sleep 0.5; echo 0000:01:00.0 > /sys/bus/pci/drivers/tg3/bind; sleep 0.5; done

Based on above, I think this issue has been fixed.

Comment 9 Chao Yang 2011-05-31 10:30:16 UTC
Additional info:
# lspci|grep Eth
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5764M Gigabit Ethernet PCIe (rev 10)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5764M Gigabit Ethernet PCIe (rev 10)

Comment 10 juzhang 2011-06-01 04:18:41 UTC
According to comment9,set this issue as verified

Comment 11 errata-xmlrpc 2011-07-21 10:26:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.