Bug 697438

Summary: System locks up when pci-passthru performed on be2net VF
Product: Red Hat Enterprise Linux 6 Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: balkov, ivecera, jjarvis, jkachuck, kstansel, laurie.barry, nobody+PNT0273897, peterm, whetzel
Target Milestone: beta   
Target Release: 6.2   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-12 12:32:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 684953    
Attachments:
Description Flags
Messages file from across both lockups
none
log of commands before lockup
none
sosreport output none

Description IBM Bug Proxy 2011-04-18 09:31:26 UTC
=================================================


---Problem Description---
When attempting to enable PCI passthru for an Emulex BE3 10Gb ethernet card, the system will hang. 
This was on RH6.1 Beta Snap 3.
 
Contact Information = John Whetzel/whetzel.ibm.com, Ram Pai/linuxram.com 
 
---Additional Hardware Info---
Emulex Virtual Fabric Adapter Advanced (CFFh)  
82:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 02)
82:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 02) 

 
---uname output---
Linux elm3a186.beaverton.ibm.com 2.6.32-130.el6.x86_64 #1 SMP Tue Apr 5 19:58:31  EDT 2011 x86_64
x86_64 x86_64 GNU/Linux
 
Machine Type = BladeCenter HX5 7872AC1 

 
---System Hang---
 The system and all interfaces (network, console, etc.) lock up.  The system must be rebooted for
access.

 
---Steps to Reproduce---
 With an Emulex VNIC expansion card, enable the driver with a non-zero number of vfs (e.g., modprobe
be2net num_vfs=1' then do a 'virsh nodedev-dettach' on one of the VFs.  (e.g., virsh nodedev-dettach
pci_0000_82_04_0)

 
---Kernel Component Data--- 
Stack trace output:
 Apr 15 10:43:20 elm3a186 kernel: ------------[ cut here ]------------
Apr 15 10:43:20 elm3a186 kernel: WARNING: at drivers/pci/intel-iommu.c:1722
__domain_mapping+0x200/0x230() (Not tainted)
Apr 15 10:43:20 elm3a186 kernel: Hardware name:  -[7872AC1]-
Apr 15 10:43:20 elm3a186 kernel: Modules linked in: be2net ebtable_nat ebtables ipt_MASQUERADE
iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand
acpi_cpufreq freq_table xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter
ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm_intel kvm
microcode serio_raw ghes hed sg cdc_ether usbnet mii i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support
bnx2 ioatdma dca i7core_edac edac_core shpchp sr_mod cdrom ext4 mbcache jbd2 sd_mod crc_t10dif
ums_cypress usb_storage mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: be2net]
Apr 15 10:43:20 elm3a186 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-130.el6.x86_64 #1
Apr 15 10:43:20 elm3a186 kernel: Call Trace:
Apr 15 10:43:20 elm3a186 kernel: <IRQ>  [<ffffffff81067157>] ? warn_slowpath_common+0x87/0xc0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810671aa>] ? warn_slowpath_null+0x1a/0x20
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff81299d20>] ? __domain_mapping+0x200/0x230
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129b971>] ? __intel_map_single+0x111/0x210
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129bab1>] ? intel_map_page+0x41/0x50
Apr 15 10:43:20 elm3a186 kernel: [<ffffffffa011e29a>] ? bnx2_poll_work+0x8fa/0x1270 [bnx2]
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129ba70>] ? intel_map_page+0x0/0x50
Apr 15 10:43:20 elm3a186 kernel: [<ffffffffa011ec4d>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810921a2>] ? enqueue_hrtimer+0x82/0xd0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff813f0c36>] ? dma_issue_pending_all+0x76/0xa0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814222d3>] ? net_rx_action+0x103/0x2f0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8106f737>] ? __do_softirq+0xb7/0x1e0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810d6960>] ? handle_IRQ_event+0x60/0x170
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100df05>] ? do_softirq+0x65/0xa0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8106f525>] ? irq_exit+0x85/0x90
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814e3255>] ? do_IRQ+0x75/0xf0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100bad3>] ? ret_from_intr+0x0/0x11
Apr 15 10:43:20 elm3a186 kernel: <EOI>  [<ffffffff812bb77e>] ? intel_idle+0xde/0x170
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff812bb761>] ? intel_idle+0xc1/0x170
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff813ecaa7>] ? cpuidle_idle_call+0xa7/0x140
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff81009e96>] ? cpu_idle+0xb6/0x110
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814d4604>] ? start_secondary+0x202/0x245
Apr 15 10:43:20 elm3a186 kernel: ---[ end trace dde8db0efc2878c5 ]---
Apr 15 10:43:20 elm3a186 kernel: ERROR: DMA PTE for vPFN 0xffe1a already set (to 46bb76002 not
106cce7002)

 
Oops output:
 Apr 15 10:43:20 elm3a186 kernel: ------------[ cut here ]------------
Apr 15 10:43:20 elm3a186 kernel: WARNING: at drivers/pci/intel-iommu.c:1722
__domain_mapping+0x200/0x230() (Not tainted)
Apr 15 10:43:20 elm3a186 kernel: Hardware name:  -[7872AC1]-
Apr 15 10:43:20 elm3a186 kernel: Modules linked in: be2net ebtable_nat ebtables ipt_MASQUERADE
iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand
acpi_cpufreq freq_table xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter
ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm_intel kvm
microcode serio_raw ghes hed sg cdc_ether usbnet mii i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support
bnx2 ioatdma dca i7core_edac edac_core shpchp sr_mod cdrom ext4 mbcache jbd2 sd_mod crc_t10dif
ums_cypress usb_storage mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: be2net]
Apr 15 10:43:20 elm3a186 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-130.el6.x86_64 #1
Apr 15 10:43:20 elm3a186 kernel: Call Trace:
Apr 15 10:43:20 elm3a186 kernel: <IRQ>  [<ffffffff81067157>] ? warn_slowpath_common+0x87/0xc0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810671aa>] ? warn_slowpath_null+0x1a/0x20
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff81299d20>] ? __domain_mapping+0x200/0x230
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129b971>] ? __intel_map_single+0x111/0x210
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129bab1>] ? intel_map_page+0x41/0x50
Apr 15 10:43:20 elm3a186 kernel: [<ffffffffa011e29a>] ? bnx2_poll_work+0x8fa/0x1270 [bnx2]
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8129ba70>] ? intel_map_page+0x0/0x50
Apr 15 10:43:20 elm3a186 kernel: [<ffffffffa011ec4d>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810921a2>] ? enqueue_hrtimer+0x82/0xd0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff813f0c36>] ? dma_issue_pending_all+0x76/0xa0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814222d3>] ? net_rx_action+0x103/0x2f0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8106f737>] ? __do_softirq+0xb7/0x1e0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff810d6960>] ? handle_IRQ_event+0x60/0x170
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100df05>] ? do_softirq+0x65/0xa0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8106f525>] ? irq_exit+0x85/0x90
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814e3255>] ? do_IRQ+0x75/0xf0
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff8100bad3>] ? ret_from_intr+0x0/0x11
Apr 15 10:43:20 elm3a186 kernel: <EOI>  [<ffffffff812bb77e>] ? intel_idle+0xde/0x170
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff812bb761>] ? intel_idle+0xc1/0x170
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff813ecaa7>] ? cpuidle_idle_call+0xa7/0x140
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff81009e96>] ? cpu_idle+0xb6/0x110
Apr 15 10:43:20 elm3a186 kernel: [<ffffffff814d4604>] ? start_secondary+0x202/0x245
Apr 15 10:43:20 elm3a186 kernel: ---[ end trace dde8db0efc2878c5 ]---
Apr 15 10:43:20 elm3a186 kernel: ERROR: DMA PTE for vPFN 0xffe1a already set (to 46bb76002 not
106cce7002)
  
/etc/selinux/config output: [root@elm3a186 ~]# cat /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted
 
 rpm -qa | grep -i selinux output: [root@elm3a186 ~]# rpm -qa | grep -i selinux
libselinux-utils-2.0.94-5.el6.x86_64
selinux-policy-targeted-3.7.19-82.el6.noarch
libselinux-devel-2.0.94-5.el6.x86_64
selinux-policy-3.7.19-82.el6.noarch
libselinux-2.0.94-5.el6.x86_64
 
System Dump Info:
  The system is not configured to capture a system dump.
  
getsebool output:  [root@elm3a186 ~]# getsebool -a
abrt_anon_write --> off
allow_console_login --> on
allow_cvs_read_shadow --> off
allow_daemons_dump_core --> on
allow_daemons_use_tcp_wrapper --> off
allow_daemons_use_tty --> on
allow_domain_fd_use --> on
allow_execheap --> off
allow_execmem --> on
allow_execmod --> on
allow_execstack --> on
allow_ftpd_anon_write --> off
allow_ftpd_full_access --> off
allow_ftpd_use_cifs --> off
allow_ftpd_use_nfs --> off
allow_gssd_read_tmp --> on
allow_guest_exec_content --> off
allow_httpd_anon_write --> off
allow_httpd_mod_auth_ntlm_winbind --> off
allow_httpd_mod_auth_pam --> off
allow_httpd_sys_script_anon_write --> off
allow_java_execstack --> off
allow_kerberos --> on
allow_mount_anyfile --> on
allow_mplayer_execstack --> off
allow_nfsd_anon_write --> off
allow_nsplugin_execmem --> on
allow_polyinstantiation --> off
allow_postfix_local_write_mail_spool --> on
allow_ptrace --> off
allow_rsync_anon_write --> off
allow_saslauthd_read_shadow --> off
allow_smbd_anon_write --> off
allow_ssh_keysign --> off
allow_staff_exec_content --> on
allow_sysadm_exec_content --> on
allow_unconfined_nsplugin_transition --> off
allow_unconfined_qemu_transition --> off
allow_user_exec_content --> on
allow_user_mysql_connect --> off
allow_user_postgresql_connect --> off
allow_write_xshm --> off
allow_xguest_exec_content --> off
allow_xserver_execmem --> off
allow_ypbind --> off
allow_zebra_write_config --> on
authlogin_radius --> off
cdrecord_read_content --> off
clamd_use_jit --> off
cobbler_anon_write --> off
cobbler_can_network_connect --> off
cobbler_use_cifs --> off
cobbler_use_nfs --> off
cron_can_relabel --> off
dhcpc_exec_iptables --> off
domain_kernel_load_modules --> off
exim_can_connect_db --> off
exim_manage_user_files --> off
exim_read_user_files --> off
fcron_crond --> off
fenced_can_network_connect --> off
ftp_home_dir --> off
ftpd_connect_db --> off
git_session_bind_all_unreserved_ports --> off
git_system_enable_homedirs --> off
git_system_use_cifs --> off
git_system_use_nfs --> off
global_ssp --> off
gpg_agent_env_file --> off
gpg_web_anon_write --> off
httpd_builtin_scripting --> on
httpd_can_check_spam --> off
httpd_can_network_connect --> off
httpd_can_network_connect_cobbler --> off
httpd_can_network_connect_db --> off
httpd_can_network_memcache --> off
httpd_can_network_relay --> off
httpd_can_sendmail --> off
httpd_dbus_avahi --> on
httpd_enable_cgi --> on
httpd_enable_ftp_server --> off
httpd_enable_homedirs --> off
httpd_execmem --> off
httpd_read_user_content --> off
httpd_setrlimit --> off
httpd_ssi_exec --> off
httpd_tmp_exec --> off
httpd_tty_comm --> on
httpd_unified --> on
httpd_use_cifs --> off
httpd_use_gpg --> off
httpd_use_nfs --> off
icecast_connect_any --> off
init_upstart --> on
irssi_use_full_network --> off
mmap_low_allowed --> off
mozilla_read_content --> off
mysql_connect_any --> off
named_write_master_zones --> off
ncftool_read_user_content --> off
nfs_export_all_ro --> on
nfs_export_all_rw --> on
nscd_use_shm --> on
nsplugin_can_network --> on
openvpn_enable_homedirs --> on
piranha_lvs_can_network_connect --> off
pppd_can_insmod --> off
pppd_for_user --> off
privoxy_connect_any --> on
puppet_manage_all_files --> off
puppetmaster_use_db --> off
qemu_full_network --> on
qemu_use_cifs --> on
qemu_use_comm --> off
qemu_use_nfs --> on
qemu_use_usb --> on
racoon_read_shadow --> off
rgmanager_can_network_connect --> off
rsync_client --> off
rsync_export_all_ro --> off
samba_create_home_dirs --> off
samba_domain_controller --> off
samba_enable_home_dirs --> off
samba_export_all_ro --> off
samba_export_all_rw --> off
samba_run_unconfined --> off
samba_share_fusefs --> off
samba_share_nfs --> off
secure_mode --> off
secure_mode_insmod --> off
secure_mode_policyload --> off
sepgsql_enable_users_ddl --> on
sepgsql_unconfined_dbadm --> on
sftpd_anon_write --> off
sftpd_enable_homedirs --> off
sftpd_full_access --> off
sftpd_write_ssh_home --> off
smartmon_3ware --> off
spamassassin_can_network --> off
spamd_enable_home_dirs --> on
squid_connect_any --> on
squid_use_tproxy --> off
ssh_sysadm_login --> off
telepathy_tcp_connect_generic_network_ports --> off
tftp_anon_write --> off
tor_bind_all_unreserved_ports --> off
unconfined_login --> on
unconfined_mmap_zero_ignore --> off
use_fusefs_home_dirs --> off
use_lpd_server --> off
use_nfs_home_dirs --> on
use_samba_home_dirs --> off
user_direct_dri --> on
user_direct_mouse --> off
user_ping --> on
user_rw_noexattrfile --> on
user_setrlimit --> on
user_tcp_server --> off
user_ttyfile_stat --> off
varnishd_connect_any --> off
vbetool_mmap_zero_ignore --> off
virt_use_comm --> off
virt_use_fusefs --> off
virt_use_nfs --> off
virt_use_samba --> off
virt_use_sysfs --> on
virt_use_usb --> on
virt_use_xserver --> off
webadm_manage_user_files --> off
webadm_read_user_files --> off
wine_mmap_zero_ignore --> off
xdm_exec_bootloader --> off
xdm_sysadm_login --> off
xen_use_nfs --> off
xguest_connect_network --> on
xguest_mount_media --> on
xguest_use_bluetooth --> on
xserver_object_manager --> off



=================================================


1. Server architecture(s) (please list all effected) (x86/POWER6/Z/etc.): x86-64
2. Server type (9117-MMA/HS20/s390/etc.): IBM BladeCenter HX5 7872AC1
3. General component (desktop/kernel/base OS/dev tools/etc.): Kernel
4. Other components involved (ixgbe/java/emulex/etc.): NA
5. Does the server have the latest GA firmware? Yes
6. What is the latest official distro build on which this bug has been seen? RHEL 6.1 Snap 3

Comment 1 IBM Bug Proxy 2011-04-18 09:31:35 UTC
Created attachment 492835 [details]
Messages file from across both lockups

Comment 2 IBM Bug Proxy 2011-04-18 09:31:38 UTC
Created attachment 492836 [details]
log of commands before lockup

Comment 3 IBM Bug Proxy 2011-04-18 09:31:46 UTC
Created attachment 492837 [details]
sosreport output

Comment 4 KernelOops Bot 2011-04-18 09:36:08 UTC
 with this guiltyfunc:  bug 527824 bug 528295 bug 528296 bug 528521 bug 528768 bug 530380 bug 536855 bug 536985 bug 542694 bug 562008

Comment 6 KernelOops Bot 2011-04-18 09:36:14 UTC
 with this guiltyfunc:  bug 527824 bug 528295 bug 528296 bug 528521 bug 528768 bug 530380 bug 536855 bug 536985 bug 542694 bug 562008

Comment 10 John Jarvis 2011-04-19 15:44:45 UTC
Too late for 6.1, moving to the list for review for 6.2 inclusion and possible
6.1.z stream.

Comment 11 IBM Bug Proxy 2011-04-19 20:52:51 UTC
------- Comment From tpnoonan.com 2011-04-19 16:44 EDT-------
Hi Red Hat. Once fixed in rhel6.2, please consider for rhel6.1.z. Thanks

Comment 12 IBM Bug Proxy 2011-04-20 20:12:14 UTC
------- Comment From linuxram.com 2011-04-20 16:04 EDT-------
This is a blocker bug. It ***cannot be deferred**** to 6.2 since it is a key feature targeted for 6.1.

BTW: the problem does not exist with upstream kernel. We are in the process of identifying the patch that fixes the problem.

RP

Comment 13 IBM Bug Proxy 2011-04-26 02:01:37 UTC
------- Comment From linuxram.com 2011-04-25 21:55 EDT-------
We, John and myself, started diving deeper to narrow down the patch that fixed the problem upstream. After some 25-30 different iterations with incremental changes, git-bisects, mix-and-match of upstream driver with rc4 kernels etc, suddenly the problem disappeared. We can neither reproduce the problem on rc2 nor on rc4....... We are now scratching our heads to discover a theory that can explain this mystery...

Stay tuned.  BTW: I have no reason to say this bug as a blocker any more.

Comment 14 RHEL Program Management 2011-10-07 15:31:05 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 15 Ivan Vecera 2011-10-12 12:06:09 UTC
Any news?

Comment 16 IBM Bug Proxy 2011-10-12 15:41:36 UTC
------- Comment From whetzel.com 2011-10-12 11:39 EDT-------
This bug was closed on the IBM side due to our inability to reproduce the issue after multiple attempts.