Hide Forgot
================================================= ---Problem Description--- When attempting to enable PCI passthru for an Emulex BE3 10Gb ethernet card, the system will hang. This was on RH6.1 Beta Snap 3. Contact Information = John Whetzel/whetzel.ibm.com, Ram Pai/linuxram.com ---Additional Hardware Info--- Emulex Virtual Fabric Adapter Advanced (CFFh) 82:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 02) 82:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 02) ---uname output--- Linux elm3a186.beaverton.ibm.com 2.6.32-130.el6.x86_64 #1 SMP Tue Apr 5 19:58:31 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux Machine Type = BladeCenter HX5 7872AC1 ---System Hang--- The system and all interfaces (network, console, etc.) lock up. The system must be rebooted for access. ---Steps to Reproduce--- With an Emulex VNIC expansion card, enable the driver with a non-zero number of vfs (e.g., modprobe be2net num_vfs=1' then do a 'virsh nodedev-dettach' on one of the VFs. (e.g., virsh nodedev-dettach pci_0000_82_04_0) ---Kernel Component Data--- Stack trace output: Apr 15 10:43:20 elm3a186 kernel: ------------[ cut here ]------------ Apr 15 10:43:20 elm3a186 kernel: WARNING: at drivers/pci/intel-iommu.c:1722 __domain_mapping+0x200/0x230() (Not tainted) Apr 15 10:43:20 elm3a186 kernel: Hardware name: -[7872AC1]- Apr 15 10:43:20 elm3a186 kernel: Modules linked in: be2net ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm_intel kvm microcode serio_raw ghes hed sg cdc_ether usbnet mii i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support bnx2 ioatdma dca i7core_edac edac_core shpchp sr_mod cdrom ext4 mbcache jbd2 sd_mod crc_t10dif ums_cypress usb_storage mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: be2net] Apr 15 10:43:20 elm3a186 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-130.el6.x86_64 #1 Apr 15 10:43:20 elm3a186 kernel: Call Trace: Apr 15 10:43:20 elm3a186 kernel: <IRQ> [<ffffffff81067157>] ? warn_slowpath_common+0x87/0xc0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810671aa>] ? warn_slowpath_null+0x1a/0x20 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff81299d20>] ? __domain_mapping+0x200/0x230 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129b971>] ? __intel_map_single+0x111/0x210 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129bab1>] ? intel_map_page+0x41/0x50 Apr 15 10:43:20 elm3a186 kernel: [<ffffffffa011e29a>] ? bnx2_poll_work+0x8fa/0x1270 [bnx2] Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129ba70>] ? intel_map_page+0x0/0x50 Apr 15 10:43:20 elm3a186 kernel: [<ffffffffa011ec4d>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2] Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810921a2>] ? enqueue_hrtimer+0x82/0xd0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff813f0c36>] ? dma_issue_pending_all+0x76/0xa0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814222d3>] ? net_rx_action+0x103/0x2f0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8106f737>] ? __do_softirq+0xb7/0x1e0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810d6960>] ? handle_IRQ_event+0x60/0x170 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100df05>] ? do_softirq+0x65/0xa0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8106f525>] ? irq_exit+0x85/0x90 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814e3255>] ? do_IRQ+0x75/0xf0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100bad3>] ? ret_from_intr+0x0/0x11 Apr 15 10:43:20 elm3a186 kernel: <EOI> [<ffffffff812bb77e>] ? intel_idle+0xde/0x170 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff812bb761>] ? intel_idle+0xc1/0x170 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff813ecaa7>] ? cpuidle_idle_call+0xa7/0x140 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff81009e96>] ? cpu_idle+0xb6/0x110 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814d4604>] ? start_secondary+0x202/0x245 Apr 15 10:43:20 elm3a186 kernel: ---[ end trace dde8db0efc2878c5 ]--- Apr 15 10:43:20 elm3a186 kernel: ERROR: DMA PTE for vPFN 0xffe1a already set (to 46bb76002 not 106cce7002) Oops output: Apr 15 10:43:20 elm3a186 kernel: ------------[ cut here ]------------ Apr 15 10:43:20 elm3a186 kernel: WARNING: at drivers/pci/intel-iommu.c:1722 __domain_mapping+0x200/0x230() (Not tainted) Apr 15 10:43:20 elm3a186 kernel: Hardware name: -[7872AC1]- Apr 15 10:43:20 elm3a186 kernel: Modules linked in: be2net ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm_intel kvm microcode serio_raw ghes hed sg cdc_ether usbnet mii i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support bnx2 ioatdma dca i7core_edac edac_core shpchp sr_mod cdrom ext4 mbcache jbd2 sd_mod crc_t10dif ums_cypress usb_storage mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: be2net] Apr 15 10:43:20 elm3a186 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-130.el6.x86_64 #1 Apr 15 10:43:20 elm3a186 kernel: Call Trace: Apr 15 10:43:20 elm3a186 kernel: <IRQ> [<ffffffff81067157>] ? warn_slowpath_common+0x87/0xc0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810671aa>] ? warn_slowpath_null+0x1a/0x20 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff81299d20>] ? __domain_mapping+0x200/0x230 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129b971>] ? __intel_map_single+0x111/0x210 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129bab1>] ? intel_map_page+0x41/0x50 Apr 15 10:43:20 elm3a186 kernel: [<ffffffffa011e29a>] ? bnx2_poll_work+0x8fa/0x1270 [bnx2] Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129ba70>] ? intel_map_page+0x0/0x50 Apr 15 10:43:20 elm3a186 kernel: [<ffffffffa011ec4d>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2] Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810921a2>] ? enqueue_hrtimer+0x82/0xd0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff813f0c36>] ? dma_issue_pending_all+0x76/0xa0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814222d3>] ? net_rx_action+0x103/0x2f0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8106f737>] ? __do_softirq+0xb7/0x1e0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810d6960>] ? handle_IRQ_event+0x60/0x170 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100df05>] ? do_softirq+0x65/0xa0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8106f525>] ? irq_exit+0x85/0x90 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814e3255>] ? do_IRQ+0x75/0xf0 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100bad3>] ? ret_from_intr+0x0/0x11 Apr 15 10:43:20 elm3a186 kernel: <EOI> [<ffffffff812bb77e>] ? intel_idle+0xde/0x170 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff812bb761>] ? intel_idle+0xc1/0x170 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff813ecaa7>] ? cpuidle_idle_call+0xa7/0x140 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff81009e96>] ? cpu_idle+0xb6/0x110 Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814d4604>] ? start_secondary+0x202/0x245 Apr 15 10:43:20 elm3a186 kernel: ---[ end trace dde8db0efc2878c5 ]--- Apr 15 10:43:20 elm3a186 kernel: ERROR: DMA PTE for vPFN 0xffe1a already set (to 46bb76002 not 106cce7002) /etc/selinux/config output: [root@elm3a186 ~]# cat /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=enforcing # SELINUXTYPE= can take one of these two values: # targeted - Targeted processes are protected, # mls - Multi Level Security protection. SELINUXTYPE=targeted rpm -qa | grep -i selinux output: [root@elm3a186 ~]# rpm -qa | grep -i selinux libselinux-utils-2.0.94-5.el6.x86_64 selinux-policy-targeted-3.7.19-82.el6.noarch libselinux-devel-2.0.94-5.el6.x86_64 selinux-policy-3.7.19-82.el6.noarch libselinux-2.0.94-5.el6.x86_64 System Dump Info: The system is not configured to capture a system dump. getsebool output: [root@elm3a186 ~]# getsebool -a abrt_anon_write --> off allow_console_login --> on allow_cvs_read_shadow --> off allow_daemons_dump_core --> on allow_daemons_use_tcp_wrapper --> off allow_daemons_use_tty --> on allow_domain_fd_use --> on allow_execheap --> off allow_execmem --> on allow_execmod --> on allow_execstack --> on allow_ftpd_anon_write --> off allow_ftpd_full_access --> off allow_ftpd_use_cifs --> off allow_ftpd_use_nfs --> off allow_gssd_read_tmp --> on allow_guest_exec_content --> off allow_httpd_anon_write --> off allow_httpd_mod_auth_ntlm_winbind --> off allow_httpd_mod_auth_pam --> off allow_httpd_sys_script_anon_write --> off allow_java_execstack --> off allow_kerberos --> on allow_mount_anyfile --> on allow_mplayer_execstack --> off allow_nfsd_anon_write --> off allow_nsplugin_execmem --> on allow_polyinstantiation --> off allow_postfix_local_write_mail_spool --> on allow_ptrace --> off allow_rsync_anon_write --> off allow_saslauthd_read_shadow --> off allow_smbd_anon_write --> off allow_ssh_keysign --> off allow_staff_exec_content --> on allow_sysadm_exec_content --> on allow_unconfined_nsplugin_transition --> off allow_unconfined_qemu_transition --> off allow_user_exec_content --> on allow_user_mysql_connect --> off allow_user_postgresql_connect --> off allow_write_xshm --> off allow_xguest_exec_content --> off allow_xserver_execmem --> off allow_ypbind --> off allow_zebra_write_config --> on authlogin_radius --> off cdrecord_read_content --> off clamd_use_jit --> off cobbler_anon_write --> off cobbler_can_network_connect --> off cobbler_use_cifs --> off cobbler_use_nfs --> off cron_can_relabel --> off dhcpc_exec_iptables --> off domain_kernel_load_modules --> off exim_can_connect_db --> off exim_manage_user_files --> off exim_read_user_files --> off fcron_crond --> off fenced_can_network_connect --> off ftp_home_dir --> off ftpd_connect_db --> off git_session_bind_all_unreserved_ports --> off git_system_enable_homedirs --> off git_system_use_cifs --> off git_system_use_nfs --> off global_ssp --> off gpg_agent_env_file --> off gpg_web_anon_write --> off httpd_builtin_scripting --> on httpd_can_check_spam --> off httpd_can_network_connect --> off httpd_can_network_connect_cobbler --> off httpd_can_network_connect_db --> off httpd_can_network_memcache --> off httpd_can_network_relay --> off httpd_can_sendmail --> off httpd_dbus_avahi --> on httpd_enable_cgi --> on httpd_enable_ftp_server --> off httpd_enable_homedirs --> off httpd_execmem --> off httpd_read_user_content --> off httpd_setrlimit --> off httpd_ssi_exec --> off httpd_tmp_exec --> off httpd_tty_comm --> on httpd_unified --> on httpd_use_cifs --> off httpd_use_gpg --> off httpd_use_nfs --> off icecast_connect_any --> off init_upstart --> on irssi_use_full_network --> off mmap_low_allowed --> off mozilla_read_content --> off mysql_connect_any --> off named_write_master_zones --> off ncftool_read_user_content --> off nfs_export_all_ro --> on nfs_export_all_rw --> on nscd_use_shm --> on nsplugin_can_network --> on openvpn_enable_homedirs --> on piranha_lvs_can_network_connect --> off pppd_can_insmod --> off pppd_for_user --> off privoxy_connect_any --> on puppet_manage_all_files --> off puppetmaster_use_db --> off qemu_full_network --> on qemu_use_cifs --> on qemu_use_comm --> off qemu_use_nfs --> on qemu_use_usb --> on racoon_read_shadow --> off rgmanager_can_network_connect --> off rsync_client --> off rsync_export_all_ro --> off samba_create_home_dirs --> off samba_domain_controller --> off samba_enable_home_dirs --> off samba_export_all_ro --> off samba_export_all_rw --> off samba_run_unconfined --> off samba_share_fusefs --> off samba_share_nfs --> off secure_mode --> off secure_mode_insmod --> off secure_mode_policyload --> off sepgsql_enable_users_ddl --> on sepgsql_unconfined_dbadm --> on sftpd_anon_write --> off sftpd_enable_homedirs --> off sftpd_full_access --> off sftpd_write_ssh_home --> off smartmon_3ware --> off spamassassin_can_network --> off spamd_enable_home_dirs --> on squid_connect_any --> on squid_use_tproxy --> off ssh_sysadm_login --> off telepathy_tcp_connect_generic_network_ports --> off tftp_anon_write --> off tor_bind_all_unreserved_ports --> off unconfined_login --> on unconfined_mmap_zero_ignore --> off use_fusefs_home_dirs --> off use_lpd_server --> off use_nfs_home_dirs --> on use_samba_home_dirs --> off user_direct_dri --> on user_direct_mouse --> off user_ping --> on user_rw_noexattrfile --> on user_setrlimit --> on user_tcp_server --> off user_ttyfile_stat --> off varnishd_connect_any --> off vbetool_mmap_zero_ignore --> off virt_use_comm --> off virt_use_fusefs --> off virt_use_nfs --> off virt_use_samba --> off virt_use_sysfs --> on virt_use_usb --> on virt_use_xserver --> off webadm_manage_user_files --> off webadm_read_user_files --> off wine_mmap_zero_ignore --> off xdm_exec_bootloader --> off xdm_sysadm_login --> off xen_use_nfs --> off xguest_connect_network --> on xguest_mount_media --> on xguest_use_bluetooth --> on xserver_object_manager --> off ================================================= 1. Server architecture(s) (please list all effected) (x86/POWER6/Z/etc.): x86-64 2. Server type (9117-MMA/HS20/s390/etc.): IBM BladeCenter HX5 7872AC1 3. General component (desktop/kernel/base OS/dev tools/etc.): Kernel 4. Other components involved (ixgbe/java/emulex/etc.): NA 5. Does the server have the latest GA firmware? Yes 6. What is the latest official distro build on which this bug has been seen? RHEL 6.1 Snap 3
Created attachment 492835 [details] Messages file from across both lockups
Created attachment 492836 [details] log of commands before lockup
Created attachment 492837 [details] sosreport output
with this guiltyfunc: bug 527824 bug 528295 bug 528296 bug 528521 bug 528768 bug 530380 bug 536855 bug 536985 bug 542694 bug 562008
Too late for 6.1, moving to the list for review for 6.2 inclusion and possible 6.1.z stream.
------- Comment From tpnoonan.com 2011-04-19 16:44 EDT------- Hi Red Hat. Once fixed in rhel6.2, please consider for rhel6.1.z. Thanks
------- Comment From linuxram.com 2011-04-20 16:04 EDT------- This is a blocker bug. It ***cannot be deferred**** to 6.2 since it is a key feature targeted for 6.1. BTW: the problem does not exist with upstream kernel. We are in the process of identifying the patch that fixes the problem. RP
------- Comment From linuxram.com 2011-04-25 21:55 EDT------- We, John and myself, started diving deeper to narrow down the patch that fixed the problem upstream. After some 25-30 different iterations with incremental changes, git-bisects, mix-and-match of upstream driver with rc4 kernels etc, suddenly the problem disappeared. We can neither reproduce the problem on rc2 nor on rc4....... We are now scratching our heads to discover a theory that can explain this mystery... Stay tuned. BTW: I have no reason to say this bug as a blocker any more.
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Any news?
------- Comment From whetzel.com 2011-10-12 11:39 EDT------- This bug was closed on the IBM side due to our inability to reproduce the issue after multiple attempts.