Bug 734815
Summary: | kernel: NETDEV WATCHDOG: eth2 (bnx2): transmit queue 5 timed out | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Robert Stroetgen <stroetgen> | ||||
Component: | kernel | Assignee: | Neil Horman <nhorman> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Network QE <network-qe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 6.1 | CC: | chorn, jeder, jwest, kzhang, nhorman, rdassen | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-10-14 13:06:47 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Robert Stroetgen
2011-08-31 14:32:54 UTC
Can you try the 6.2 kernel please. I fixed a few bugs there that affected how and when we got tx timeouts. (In reply to comment #2) > Can you try the 6.2 kernel please. I fixed a few bugs there that affected how > and when we got tx timeouts. Sorry, stupid question, where do I find the 6.2 kernel? RHN, it should be in the latest RHEL6 beta channel. If its not there I can get you a build. (In reply to comment #4) > RHN, it should be in the latest RHEL6 beta channel. If its not there I can get > you a build. I did not find it in the RHEL6 beta channel, maybe my fault. Could you please give me a build? Could be related jeremy, can you give this customer a copy of the latest build with the bnx2 updates in it? Just out of curiosity, are you using the iscsi cna features of the bnx2 card? It looks like you might be. Does this happen if you just use the device as a NIC? Not intentionally. We use Brocade FibreChannel adapters and we use iscsi - but without enabling any extra features. The error happens for with eth interfaces, not only with the interface used for iscsi. (In reply to comment #11) > Just out of curiosity, are you using the iscsi cna features of the bnx2 card? > It looks like you might be. Does this happen if you just use the device as a > NIC? Created attachment 521884 [details] patch to disable carrier early in bnx2_netif_stop http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3612684 This is a build including the attached patch that should disable carrier early enough to prevent timeouts during devices changes. If this is the root cause of the problem, this patch should fix it. Please test and let me know the results. thanks Sorry, I cannot reach the host brewweb.devel.redhat.com (DNS: not found). Yes, you won't be able to, as its an internal build system. I was expecting Chrstian would provide you with a copy of the appropriate resultant rpms when the build completed. Robert: you should be able to access https://access.redhat.com/support/cases/00527421 and get the -195 kernel from there. You can contact me directly via email if that does not work for you. I downloaded and installed the test kernel: Linux vmhost4.gei.de 2.6.32-195.el6.test.x86_64 #1 SMP Wed Sep 7 10:32:23 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux Sep 8 11:25:15 vmhost4 kernel: Broadcom NetXtreme II iSCSI Driver bnx2i v2.7.0.3 (Jun 15, 2010) Sep 8 11:25:15 vmhost4 kernel: iscsi: registered transport (bnx2i) Sep 8 11:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: using MSIX Sep 8 11:25:15 vmhost4 kernel: bnx2 0000:0b:00.1: eth1: using MSIX Sep 8 11:25:15 vmhost4 kernel: bnx2 0000:0b:00.1: eth1: NIC Copper Link is Up, 1000 Mbps full duplex Sep 8 11:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex Sep 8 11:25:15 vmhost4 kernel: bnx2 0000:10:00.0: eth2: using MSIX Sep 8 11:25:15 vmhost4 kernel: bnx2 0000:10:00.0: eth2: NIC Copper Link is Up, 1000 Mbps full duplex The error is not reproducible, but happens usually once or twice a week. I will inform you, what will happen. Thanks and best regards Robert Ok, thank you. The error happened again: Sep 8 20:25:15 vmhost4 kernel: NETDEV WATCHDOG: eth0 (bnx2): transmit queue 4 timed out Sep 8 20:25:15 vmhost4 kernel: Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle autofs4 coretemp ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfat fat dm_mirror dm_region_hash dm_log dm_round_robin vhost_net macvtap macvlan tun kvm_intel kvm microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii ch osst st sg bnx2 ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix bfa(U) scsi_transport_fc scsi_tgt megaraid_sas dm Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: intr_sem[0] PCI_CMD[00100446] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[01ef0010] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: PBA[00000000] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: NIC Copper Link is Down Sep 8 20:25:18 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex Ok, I'm out of ideas. Christian, can you give me access to the local reproducer please? I'll start poking around to see what else I can find. Robert, can you you please post the complete error, you seem to have cut out the backtrace for some reason. Would it help to see if the problem appears with -195 and the driver from the broadcom website? Could atleast help us distinguish "bnx2 only" and "all other areas could be affected"? (In reply to comment #25) > Robert, can you you please post the complete error, you seem to have cut out > the backtrace for some reason. Sorry, I grepped "bnx". The complete log: Sep 8 20:25:15 vmhost4 kernel: ------------[ cut here ]------------ Sep 8 20:25:15 vmhost4 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Sep 8 20:25:15 vmhost4 kernel: Hardware name: System x3550 M3 -[7944K1G]- Sep 8 20:25:15 vmhost4 kernel: NETDEV WATCHDOG: eth0 (bnx2): transmit queue 4 timed out Sep 8 20:25:15 vmhost4 kernel: Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle autofs4 coretemp ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfat fat dm_mirror dm_region_hash dm_log dm_round_robin vhost_net macvtap macvlan tun kvm_intel kvm microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii ch osst st sg bnx2 ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix bfa(U) scsi_transport_fc scsi_tgt megaraid_sas dm Sep 8 20:25:15 vmhost4 kernel: _multipath dm_mod scsi_dh_emc [last unloaded: scsi_wait_scan] Sep 8 20:25:15 vmhost4 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-195.el6.test.x86_64 #1 Sep 8 20:25:15 vmhost4 kernel: Call Trace: Sep 8 20:25:15 vmhost4 kernel: <IRQ> [<ffffffff81069b17>] ? warn_slowpath_common+0x87/0xc0 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff81069c06>] ? warn_slowpath_fmt+0x46/0x50 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff81449acd>] ? dev_watchdog+0x26d/0x280 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff8107d0f4>] ? mod_timer+0x144/0x220 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff81449860>] ? dev_watchdog+0x0/0x280 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff8107c8f7>] ? run_timer_softirq+0x197/0x340 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff81072101>] ? __do_softirq+0xc1/0x1d0 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff810d9410>] ? handle_IRQ_event+0x60/0x170 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff8100c20c>] ? call_softirq+0x1c/0x30 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff8100de45>] ? do_softirq+0x65/0xa0 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff81071ee5>] ? irq_exit+0x85/0x90 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff814f40f5>] ? do_IRQ+0x75/0xf0 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff8100ba13>] ? ret_from_intr+0x0/0x11 Sep 8 20:25:15 vmhost4 kernel: <EOI> [<ffffffff812c3f2e>] ? intel_idle+0xde/0x170 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff812c3f11>] ? intel_idle+0xc1/0x170 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff813f9767>] ? cpuidle_idle_call+0xa7/0x140 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff81009de6>] ? cpu_idle+0xb6/0x110 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff814d367a>] ? rest_init+0x7a/0x80 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129 Sep 8 20:25:15 vmhost4 kernel: [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109 Sep 8 20:25:15 vmhost4 kernel: ---[ end trace 45f28c736a30ea38 ]--- Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: intr_sem[0] PCI_CMD[00100446] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[01ef0010] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: DEBUG: PBA[00000000] Sep 8 20:25:15 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: NIC Copper Link is Down Sep 8 20:25:15 vmhost4 kernel: br0: port 1(eth0) entering disabled state Sep 8 20:25:18 vmhost4 kernel: bnx2 0000:0b:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex Sep 8 20:25:18 vmhost4 kernel: br0: port 1(eth0) entering forwarding state Sep 8 20:25:31 vmhost4 auditd[9026]: Audit daemon rotating log files Sep 8 20:25:36 vmhost4 abrt: Kerneloops: Reported 1 kernel oopses to Abrt Sep 8 20:25:36 vmhost4 abrtd: Directory 'kerneloops-1315506336-8583-1' creation detected Sep 8 20:25:36 vmhost4 abrtd: Crash is in database already (dup of /var/spool/abrt/kerneloops-1308049562-7262-1) Sep 8 20:25:36 vmhost4 abrtd: Deleting crash kerneloops-1315506336-8583-1 (dup of kerneloops-1308049562-7262-1), sending dbus signal (In reply to comment #27) > Would it help to see if the problem appears with -195 and the driver from the > broadcom website? Could atleast help us distinguish "bnx2 only" and "all other > areas could be affected"? I just downloaded broadcom Broadcom NetXtreme II Driver iSCSI version 2.6.2.4c (Feb 01, 2011) and Broadcom bnx2 Linux Driver bnx2 v2.0.23b (Feb 01, 2011) cnic v2.2.13b (Feb 01, 2011) To test the original broadcom driver I would need the -195 kernel sources. Since the problems already occured with -131 using this for testing should work I think (and in case we see that the broadcom driver works with -131 then we also have fewer diffs to the working -71 than from our -193). Both affected environments are running quite similiar hardware: environment a) Vendor: IBM Corp. Version: -[D6E149AUS-1.09]- Release Date: 09/21/2010 System Information Manufacturer: IBM Product Name: System x3550 M3 -[7944K1G]- environment b) Vendor: IBM Corp. Version: -[D6E153AUS-1.12]- Release Date: 06/30/2011 System Information Manufacturer: IBM Product Name: System x3650 M2 -[7947PCV]- environment a) 00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 13) 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13) environment b) 00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22) 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22) The affected bnx2 devices are identical in both cases: 0b:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 0b:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 10:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 10:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) I think it is quite interesting both environments showing this run so similiar hardware. Maybe we should also think of chipset/bios versions (ensure we run latest released versions, check outstanding bugs)? (In reply to comment #30) > Since the problems already occured with -131 using this for testing should work > I think (and in case we see that the broadcom driver works with -131 then we > also have fewer diffs to the working -71 than from our -193). I just have built and installed the original Broadcom driver for the 131 kernel - replacing v2.7.0.3 (Jun 15, 2010): Sep 12 14:44:07 vmhost4 kernel: Broadcom NetXtreme II iSCSI Driver bnx2i v2.6.2.4c (Feb 01, 2011) Sep 12 14:44:07 vmhost4 kernel: iscsi: registered transport (bnx2i) Sep 12 14:44:07 vmhost4 kernel: bnx2: eth0: using MSIX Sep 12 14:44:07 vmhost4 kernel: bnx2i: dev eth0 does not support iSCSI Sep 12 14:44:07 vmhost4 kernel: bnx2i: eth0 free_hba done after 0 retries Sep 12 14:44:07 vmhost4 kernel: bnx2: eth1: using MSIX Sep 12 14:44:07 vmhost4 kernel: bnx2i: dev eth1 does not support iSCSI Sep 12 14:44:07 vmhost4 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex Sep 12 14:44:07 vmhost4 kernel: bnx2i: eth1 free_hba done after 0 retries Sep 12 14:44:07 vmhost4 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex Sep 12 14:44:07 vmhost4 kernel: bnx2: eth2: using MSIX Sep 12 14:44:07 vmhost4 kernel: bnx2i: dev eth2 does not support iSCSI Sep 12 14:44:07 vmhost4 kernel: bnx2i: eth2 free_hba done after 0 retries Sep 12 14:44:07 vmhost4 kernel: bnx2: eth2 NIC Copper Link is Up, 1000 Mbps full duplex Let's wait if the error happens again within the next few days. Ok, please let us know. any update here? I've run the latest 197 kernel here all weekend with heavy traffic, and did not encounter a hang Still watching, no incident yet with the 131 kernel and the original broadcom driver: [root@vmhost4 ~]# uname -a Linux vmhost4.gei.de 2.6.32-131.12.1.el6.x86_64 #1 SMP Sun Jul 31 16:44:56 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux Sep 12 14:44:07 vmhost4 kernel: Broadcom NetXtreme II iSCSI Driver bnx2i v2.6.2.4c (Feb 01, 2011) Another system with 195 kernel has one error in 12 days: [root@vmhost-pbx ~]# uname -a Linux vmhost-pbx.gei.de 2.6.32-195.el6.test.x86_64 #1 SMP Wed Sep 7 10:32:23 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux Sep 8 14:01:30 vmhost-pbx kernel: Broadcom NetXtreme II iSCSI Driver bnx2i v2.7.0.3 (Jun 15, 2010) Sep 18 13:02:48 vmhost-pbx kernel: bnx2 0000:0b:00.0: eth0: NIC Copper Link is Down Sep 18 13:02:51 vmhost-pbx kernel: bnx2 0000:0b:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex just out of curiosity what bios level is the system in question running, and what version of the kernel-firmware package to you have installed? Both machines have the same kernel-firmware and BIOS (UEFI): [root@vmhost4 ~]# rpmquery kernel-firmware kernel-firmware-2.6.32-195.el6.test.noarch [root@vmhost4 ~]# dmidecode | grep UEFI String 1: $MV Min UEFI Version -[D6E123AUS-1.00]- [root@vmhost-pbx ~]# rpmquery kernel-firmware kernel-firmware-2.6.32-195.el6.test.noarch [root@vmhost-pbx ~]# dmidecode | grep UEFI String 1: $MV Min UEFI Version -[D6E123AUS-1.00]- He have prepared a BIOS/UEFI upgrade, but first we wanted to watch the kernel/driver tests. Ok, thanks. Let us know what the driver tests produce. Error with the original Broadcom driver and the 131 kernel: Sep 23 21:50:08 vmhost4 kernel: ------------[ cut here ]------------ Sep 23 21:50:08 vmhost4 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Tainted: G ---------------- T) Sep 23 21:50:08 vmhost4 kernel: Hardware name: System x3550 M3 -[7944K1G]- Sep 23 21:50:08 vmhost4 kernel: NETDEV WATCHDOG: eth0 (bnx2): transmit queue 1 timed out Sep 23 21:50:08 vmhost4 kernel: Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle autofs4 coretemp hwmon ipmi_si ipmi_msghandler nfs lockd fscache(T) nfs_acl auth_rpcgss sunrpc cpufreq_ondemand acpi_cpufreq freq_table bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i(U) cnic(U) uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfat fat dm_mirror dm_region_hash dm_log dm_round_robin vhost_net macvtap macvlan tun kvm_intel kvm microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support osst st ch cdc_ether usbnet mii sg bnx2(U) ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif megaraid_sas pata_acpi ata_generic ata_piix bfa(U) scsi_transport_fc Sep 23 21:50:08 vmhost4 kernel: scsi_tgt dm_multipath dm_mod scsi_dh_emc [last unloaded: scsi_wait_scan] Sep 23 21:50:08 vmhost4 kernel: Pid: 0, comm: swapper Tainted: G ---------------- T 2.6.32-131.12.1.el6.x86_64 #1 Sep 23 21:50:08 vmhost4 kernel: Call Trace: Sep 23 21:50:08 vmhost4 kernel: <IRQ> [<ffffffff810670f7>] ? warn_slowpath_common+0x87/0xc0 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff810671e6>] ? warn_slowpath_fmt+0x46/0x50 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff8143a39d>] ? dev_watchdog+0x26d/0x280 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff81088aed>] ? insert_work+0x6d/0xb0 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff8143a130>] ? dev_watchdog+0x0/0x280 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff81079ef7>] ? run_timer_softirq+0x197/0x340 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff81268c5d>] ? rb_insert_color+0x9d/0x160 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff8102a00d>] ? lapic_next_event+0x1d/0x30 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff8106f6e1>] ? __do_softirq+0xc1/0x1d0 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff81092cc0>] ? hrtimer_interrupt+0x140/0x250 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff8100df05>] ? do_softirq+0x65/0xa0 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff8106f4c5>] ? irq_exit+0x85/0x90 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff814e3030>] ? smp_apic_timer_interrupt+0x70/0x9b Sep 23 21:50:08 vmhost4 kernel: [<ffffffff8100bc93>] ? apic_timer_interrupt+0x13/0x20 Sep 23 21:50:08 vmhost4 kernel: <EOI> [<ffffffff812bb7ce>] ? intel_idle+0xde/0x170 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff812bb7b1>] ? intel_idle+0xc1/0x170 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff813ec987>] ? cpuidle_idle_call+0xa7/0x140 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff81009e86>] ? cpu_idle+0xb6/0x110 Sep 23 21:50:08 vmhost4 kernel: [<ffffffff814d438a>] ? start_secondary+0x202/0x245 Sep 23 21:50:08 vmhost4 kernel: ---[ end trace 2ce7b3b8d8f26d8b ]--- Sep 23 21:50:08 vmhost4 kernel: bnx2: <--- start FTQ dump on eth0 ---> Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_RV2P_PFTQ_CTL 10002 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_RV2P_TFTQ_CTL 20000 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_RV2P_MFTQ_CTL 4000 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_TBDR_FTQ_CTL 4002 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_TDMA_FTQ_CTL 10000 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_TXP_FTQ_CTL 10000 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_TPAT_FTQ_CTL 10000 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_RXP_CFTQ_CTL 8000 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_RXP_FTQ_CTL 100000 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_COM_COMXQ_FTQ_CTL 10000 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_COM_COMTQ_FTQ_CTL 20000 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_COM_COMQ_FTQ_CTL 10000 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: BNX2_CP_CPQ_FTQ_CTL 4002 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: TXP mode b84c state 80001000 evt_mask 500 pc 8001284 pc 800128c instr 38640001 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: TPAT mode b84c state 80001000 evt_mask 500 pc 8000a4c pc 8000a5c instr 10400016 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: RXP mode b84c state 80001000 evt_mask 500 pc 8004c1c pc 8004c1c instr 10a0fffd Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: COM mode b8cc state 80000000 evt_mask 500 pc 8000a98 pc 8000aa4 instr 3c020800 Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0: CP mode b8cc state 80000000 evt_mask 500 pc 8000c50 pc 8000c48 instr 3e00008 Sep 23 21:50:08 vmhost4 kernel: bnx2: <--- end FTQ dump on eth0 ---> Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0 DEBUG: intr_sem[0] Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0 DEBUG: intr_sem[0] PCI_CMD[00100446] Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0 DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088] Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0 RPM_MGMT_PKT_CTRL[40000088] Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0 DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[01fc0003] Sep 23 21:50:08 vmhost4 kernel: bnx2: eth0 DEBUG: PBA[00000000] Sep 23 21:50:09 vmhost4 kernel: cnic: cnic_stop_bnx2_ooo_hw: hw rx_cons=0 != sw rx_cons=0 rx_prod=511 Sep 23 21:50:09 vmhost4 kernel: bnx2: eth0 NIC Copper Link is Down Sep 23 21:50:09 vmhost4 kernel: br0: port 1(eth0) entering disabled state Sep 23 21:50:12 vmhost4 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex Sep 23 21:50:12 vmhost4 kernel: br0: port 1(eth0) entering forwarding state ok, looks like we're back to the firmware/UEFI upgrade question... I just updated the server firmware: Old: Type Version Release Date ---- ------- ------------ IMM YUOO84C 09/28/2010 UEFI D6E149A 09/21/2010 DSA DSYT75X 09/17/2010 New: Type Version Release Date ---- ------- ------------ IMM YUOOB7C 06/11/2011 UEFI D6E153A 06/30/2011 DSA DSYT89G 06/21/2011 I updated the broadcom firmware, too: Old: ADAPTER MAC BOOT IPMI ASF PXE UMP NCSI iSCSI EFI ------------------- ---- ---- --- --- --- --- --- --- E41F136D0B64 (5709) 5.2.2 NA NA NA NA 2.0.10 NA NA E41F136D0B66 (5709) 5.2.2 NA NA NA NA 2.0.10 NA NA E41F13D60EDC (5709) 4.6.4 NA NA NA NA 1.0.3 NA NA E41F13D60EDE (5709) NA NA NA NA NA NA NA NA New: ADAPTER MAC BOOT IPMI ASF PXE UMP NCSI iSCSI EFI ------------------- ---- ---- --- --- --- --- --- --- E41F136D0B64 (5709) 6.2.0 NA NA NA NA 2.0.11 NA NA E41F136D0B66 (5709) 6.2.0 NA NA NA NA 2.0.11 NA NA E41F13D60EDC (5709) 6.2.0 NA NA NA NA 2.0.11 NA NA E41F13D60EDE (5709) NA NA NA NA NA NA NA NA I keep watching ... Ok, thank you Just to keep you up to date: No incident after firmware update yet. Sep 26 14:40:57 vmhost4 kernel: Broadcom NetXtreme II iSCSI Driver bnx2i v2.6.2.4c (Feb 01, 2011) Sep 26 14:40:57 vmhost4 kernel: iscsi: registered transport (bnx2i) Sep 26 14:40:57 vmhost4 kernel: bnx2: eth0: using MSIX Sep 26 14:40:57 vmhost4 kernel: bnx2i: dev eth0 does not support iSCSI Sep 26 14:40:57 vmhost4 kernel: bnx2i: eth0 free_hba done after 0 retries Sep 26 14:40:57 vmhost4 kernel: bnx2: eth1: using MSIX Sep 26 14:40:57 vmhost4 kernel: bnx2i: dev eth1 does not support iSCSI Sep 26 14:40:57 vmhost4 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex Sep 26 14:40:57 vmhost4 kernel: bnx2i: eth1 free_hba done after 0 retries Sep 26 14:40:57 vmhost4 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex Sep 26 14:40:57 vmhost4 kernel: bnx2: eth2: using MSIX Sep 26 14:40:57 vmhost4 kernel: bnx2i: dev eth2 does not support iSCSI Sep 26 14:40:57 vmhost4 kernel: bnx2i: eth2 free_hba done after 0 retries Sep 26 14:40:57 vmhost4 kernel: bnx2: eth2 NIC Copper Link is Up, 1000 Mbps full duplex Kernel 2.6.32-131.12.1.el6.x86_64 copy that, thank you for the update. Are the tests continuing to run, or have you concluded that this is the cause of the problem? We keep the tests running, but more than two weeks without error let us hope. ok, at 2 weeks, I'll say this is fixed. Please re-open if the problem resurfaces. Thanks! |