This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
@Intel, Please confirm your intentions to validate this fix if included in 5.7.0. Thanks.
Patch(es) available in kernel-2.6.18-250.el5 Detailed testing feedback is always welcomed.
Reproduced on kernel 2.6.18-238.el5, repeatedly detach/attach nic card which uses tg3 as kernel driver over 244 times via a script results in host kernel panic: IOMMU: no free domain ids Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: [<ffffffff80157b56>] list_del+0x1/0x71 PGD 3183d7067 PUD 31869c067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /bus/pci/drivers/tg3/bind CPU 0 Modules linked in: tun autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bridge ipt_REJECT xt_tcpudp ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i cxgb3 libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy ksm(U) kvm_intel(U) kvm(U) joydev snd_hda_intel snd_seq_dummy sr_mod cdrom snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep sg igb snd serio_raw pcspkr shpchp 8021q soundcore i7core_edac edac_mc dca tg3 tpm_tis tpm tpm_bios dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod mptsas mptscsih scsi_transport_sas mptbase ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 5814, comm: repeated-unbind Tainted: G 2.6.18-238.el5 #1 RIP: 0010:[<ffffffff80157b56>] [<ffffffff80157b56>] list_del+0x1/0x71 RSP: 0018:ffff8102974e3c18 EFLAGS: 00010007 RAX: ffff8102977996d0 RBX: 0000000000000000 RCX: ffffffff80319f28 RDX: ffffffff80319f28 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000286 R08: ffffffff80319f28 R09: 000000000000003d R10: ffff8102974e38e8 R11: 0000000000000080 R12: ffff8102977996c0 R13: 0000000000002000 R14: ffff81032f0a2800 R15: 0000000000000000 FS: 00002ae33b24cf50(0000) GS:ffffffff80425000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 0000000318361000 CR4: 00000000000026e0 Process repeated-unbind (pid: 5814, threadinfo ffff8102974e2000, task ffff810326aec040) Stack: 0000000000000000 ffffffff80169d6c ffff81032f0a2800 ffff8102977996c0 0000000000000030 ffffffff8016a20b ffff81032f0a2800 ffff8102977996c0 0000000000000030 0000000000002000 ffff81032f0a2800 ffffffff8016b089 Call Trace: [<ffffffff80169d6c>] domain_remove_dev_info+0x16/0xab [<ffffffff8016a20b>] domain_exit+0x19/0x14b [<ffffffff8016b089>] get_domain_for_dev+0x30d/0x536 [<ffffffff8016b2c5>] __get_valid_domain_for_dev+0x13/0x6d [<ffffffff8016b42a>] __intel_map_single+0x5d/0x172 [<ffffffff8016b9e1>] intel_alloc_coherent+0xb3/0xd8 [<ffffffff88215b52>] :tg3:tg3_init_one+0xa21/0x14a4 [<ffffffff8016168a>] pci_device_probe+0x104/0x184 [<ffffffff801cad74>] driver_helper+0x0/0x1b [<ffffffff80287e04>] klist_del+0x1d/0x2a [<ffffffff801cbab4>] driver_probe_device+0x52/0xaa [<ffffffff801cb84e>] driver_bind+0x9f/0x11b [<ffffffff8010fee2>] sysfs_write_file+0xb9/0xe8 [<ffffffff80016a81>] vfs_write+0xce/0x174 [<ffffffff80017339>] sys_write+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 48 8b 47 08 48 89 fb 48 8b 10 48 39 fa 74 1b 48 89 fe 31 c0 RIP [<ffffffff80157b56>] list_del+0x1/0x71 RSP <ffff8102974e3c18> CR2: 0000000000000008 <0>Kernel panic - not syncing: Fatal exception -------Verified on kernel 2.6.18-264.el5 with same nic card using same script, detach/attach over one thousand times, host works fine. -------nic card info: lspci -vvv -s 02:00.0 Kernel driver in use: tg3 Kernel modules: tg3 -------script used to detach/attach nic card: #!/bin/bash i=1; while echo 0000:01:00.0 > /sys/bus/pci/drivers/tg3/unbind; do echo $i; i=$[i+1]; sleep 0.5; echo 0000:01:00.0 > /sys/bus/pci/drivers/tg3/bind; sleep 0.5; done -----conclusion: Based on above, I think this issue has been fixed.
Additional info: # lspci|grep Eth 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5764M Gigabit Ethernet PCIe (rev 10) 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5764M Gigabit Ethernet PCIe (rev 10)
According to comment9,set this issue as verified
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html