RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 696337 - Bond interface flapping and increasing rx_missed_errors
Summary: Bond interface flapping and increasing rx_missed_errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: 6.0
Assignee: Andy Gospodarek
QA Contact: Liang Zheng
URL:
Whiteboard:
Depends On:
Blocks: 698109
TreeView+ depends on / blocked
 
Reported: 2011-04-13 20:44 UTC by Neal Kim
Modified: 2018-11-27 19:32 UTC (History)
16 users (show)

Fixed In Version: kernel-2.6.32-131.0.9.el6
Doc Type: Bug Fix
Doc Text:
During light or no network traffic, the active-backup interface bond using ARP monitoring with validation could go down and return due to an overflow or underflow of system timer interrupt ticks (jiffies). With this update, the jiffies calculation issues problems have been fixed and a bond interface works as expected.
Clone Of:
Environment:
Last Closed: 2011-05-19 12:02:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
IRQ data before and after the test (18.02 KB, application/x-gzip)
2011-04-14 21:28 UTC, Ashwani Wason
no flags Details
missing sysreport data (/etc/modprobe.d/, /sys/module/ixgbe/, and /sys/module/bnx2/) (4.50 KB, application/x-gzip)
2011-04-18 20:50 UTC, Ashwani Wason
no flags Details
bonding-fix-jiffie-issues.patch (4.57 KB, patch)
2011-04-19 15:38 UTC, Andy Gospodarek
no flags Details | Diff
See MGMT tab for management blade info and DATA tab for data blade info (23.42 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2011-04-19 16:31 UTC, Ashwani Wason
no flags Details
IRQ data from a run with irqbalance running (19.07 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2011-04-19 21:12 UTC, Ashwani Wason
no flags Details
Attachment referred to in comment #62 (12.96 KB, application/x-gzip)
2011-04-21 21:18 UTC, Ashwani Wason
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 11:58:07 UTC

Description Neal Kim 2011-04-13 20:44:56 UTC
Description of problem:

Customer is experiencing an issue with bond interface flapping and increasing number of rx_missed_errors with a dual-port 10GB Intel NIC (ixgbe) installed in a IBM HS22V blade.

The issues arise even while a relatively small amount of network traffic passes through the system (~300Mbps).

They have tried a later upstream version of the Intel ixgbe driver (3.2.10) and have seen their issues go away completely.

The current RHEL 6.0 ixgbe driver version is 2.0.62-k2, something must have changed between the current release and upstream that causes the change in behaviour. Although I would imagine the change delta is rather large between these releases.


[ Network Configuration ]

Both ports on the NIC are bonded together. Traffic comes in on one VLAN and goes out on another VLAN and vice versa.

The bonded interface utilizes ARP monitoring as well.

So, bond1 = eth0 + eth1
Client-side: bond1.1010
Server-side: bond1.2020

Clients do TCP connections to servers and then they exchange data back and forth. All data is being IP forwarded at the blade for this test.

Only IP forwarding, TProxy is not involved in this case.


[ Statistics ]

[root@s01b02 webgrp]# ethtool -S eth0 | egrep 'missed|restart'
     rx_missed_errors: 109687
     tx_restart_queue: 1823

[root@s01b02 webgrp]# ifconfig bond1
bond1     Link encap:Ethernet  HWaddr 00:1B:21:63:37:F4  
          inet addr:169.254.144.2  Bcast:169.254.144.255  Mask:255.255.255.0
          inet6 addr: fe80::21b:21ff:fe63:37f4/64 Scope:Link
          inet6 addr: fdfd:6b2a:f17e::a9fe:9002/120 Scope:Global
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:126777918 errors:0 dropped:109687 overruns:0 frame:0
          TX packets:126284364 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:101602942581 (94.6 GiB)  TX bytes:101439721673 (94.4 GiB)

Version-Release number of selected component (if applicable):

kernel-2.6.32-71.15.1.el6


How reproducible:

Everytime.


Actual results:

Bond interface flapping and increasing number of rx_missed_errors.


Expected results:

No interface flapping and no packet issues.


Additional info:

[root@s01b08 ~]# ethtool -S eth0
NIC statistics:
     rx_packets: 18960451663
     tx_packets: 18736735790
     rx_bytes: 15268718655676
     tx_bytes: 15219341217704
     rx_pkts_nic: 18958826853
     tx_pkts_nic: 18846190484
     rx_bytes_nic: 15413937608036
     tx_bytes_nic: 15376470397295
     lsc_int: 34
     tx_busy: 0
     non_eop_descs: 0
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     multicast: 124
     broadcast: 660493
     rx_no_buffer_count: 0
     collisions: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     hw_rsc_aggregated: 0
     hw_rsc_flushed: 0
     fdir_match: 297218
     fdir_miss: 15208547228
     rx_fifo_errors: 0
     rx_missed_errors: 28662099
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_timeout_count: 7
     tx_restart_queue: 666751
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 0
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_flow_control_xoff: 0
     rx_csum_offload_errors: 0
     alloc_rx_page_failed: 0
     alloc_rx_buff_failed: 0
     rx_no_dma_resources: 0
     fcoe_bad_fccrc: 0
     rx_fcoe_dropped: 0
     rx_fcoe_packets: 0
     rx_fcoe_dwords: 0
     tx_fcoe_packets: 0
     tx_fcoe_dwords: 0
     tx_queue_0_packets: 18736875503
     tx_queue_0_bytes: 15113556475076
     tx_queue_1_packets: 6967649
     tx_queue_1_bytes: 7036138900
     tx_queue_2_packets: 5426822
     tx_queue_2_bytes: 5449835183
     tx_queue_3_packets: 3893681
     tx_queue_3_bytes: 3819772495
     tx_queue_4_packets: 827328
     tx_queue_4_bytes: 713898104
     tx_queue_5_packets: 558634
     tx_queue_5_bytes: 447302250
     tx_queue_6_packets: 6564501
     tx_queue_6_bytes: 6725737428
     tx_queue_7_packets: 6250693
     tx_queue_7_bytes: 6369278054
     tx_queue_8_packets: 4653423
     tx_queue_8_bytes: 4712033102
     tx_queue_9_packets: 2317523
     tx_queue_9_bytes: 2345069705
     tx_queue_10_packets: 637354
     tx_queue_10_bytes: 565168608
     tx_queue_11_packets: 530739
     tx_queue_11_bytes: 449370913
     tx_queue_12_packets: 8159317
     tx_queue_12_bytes: 8340261146
     tx_queue_13_packets: 11553688
     tx_queue_13_bytes: 11794139518
     tx_queue_14_packets: 9338246
     tx_queue_14_bytes: 9647538192
     tx_queue_15_packets: 6626543
     tx_queue_15_bytes: 6759077470
     tx_queue_16_packets: 1211369
     tx_queue_16_bytes: 1230185462
     tx_queue_17_packets: 876535
     tx_queue_17_bytes: 897862116
     tx_queue_18_packets: 10433776
     tx_queue_18_bytes: 10873753220
     tx_queue_19_packets: 9774745
     tx_queue_19_bytes: 10074590129
     tx_queue_20_packets: 8196281
     tx_queue_20_bytes: 8510884529
     tx_queue_21_packets: 3974129
     tx_queue_21_bytes: 4181293432
     tx_queue_22_packets: 1026887
     tx_queue_22_bytes: 1064929581
     tx_queue_23_packets: 1073829
     tx_queue_23_bytes: 1102979407
     rx_queue_0_packets: 1186693572
     rx_queue_0_bytes: 952614636420
     rx_queue_1_packets: 1185716493
     rx_queue_1_bytes: 955619570194
     rx_queue_2_packets: 1184065747
     rx_queue_2_bytes: 952224453002
     rx_queue_3_packets: 1185187762
     rx_queue_3_bytes: 953443947589
     rx_queue_4_packets: 1185489871
     rx_queue_4_bytes: 954773794470
     rx_queue_5_packets: 1182358421
     rx_queue_5_bytes: 952878203985
     rx_queue_6_packets: 1183625850
     rx_queue_6_bytes: 953566093961
     rx_queue_7_packets: 1186857231
     rx_queue_7_bytes: 956218485913
     rx_queue_8_packets: 1185492094
     rx_queue_8_bytes: 954046487349
     rx_queue_9_packets: 1183735969
     rx_queue_9_bytes: 952826497398
     rx_queue_10_packets: 1184993319
     rx_queue_10_bytes: 955604947893
     rx_queue_11_packets: 1185822376
     rx_queue_11_bytes: 955475867103
     rx_queue_12_packets: 1186094377
     rx_queue_12_bytes: 956116190198
     rx_queue_13_packets: 1184231997
     rx_queue_13_bytes: 953408899120
     rx_queue_14_packets: 1185826419
     rx_queue_14_bytes: 955513858960
     rx_queue_15_packets: 1184259880
     rx_queue_15_bytes: 954386508161
     rx_queue_16_packets: 41
     rx_queue_16_bytes: 43704
     rx_queue_17_packets: 7
     rx_queue_17_bytes: 424
     rx_queue_18_packets: 28
     rx_queue_18_bytes: 16116
     rx_queue_19_packets: 134
     rx_queue_19_bytes: 143235
     rx_queue_20_packets: 68
     rx_queue_20_bytes: 11489
     rx_queue_21_packets: 3
     rx_queue_21_bytes: 200
     rx_queue_22_packets: 2
     rx_queue_22_bytes: 126
     rx_queue_23_packets: 3
     rx_queue_23_bytes: 180

Comment 3 Andy Gospodarek 2011-04-13 21:24:14 UTC
Looking at the sysreport it looks like some irq affinity must have been configured?  Is that correct?

Comment 4 Andy Gospodarek 2011-04-13 21:55:52 UTC
Any additional configuration information would also be helpful.  Even TCP parameters that are set.  I can see what is in sysctl.conf, and will apply those during my next testing run.

Comment 5 Ashwani Wason 2011-04-14 02:27:55 UTC
(In reply to comment #3)
> Looking at the sysreport it looks like some irq affinity must have been
> configured?  Is that correct?

No IRQ affinity is being configured, at least not that I know of.

Comment 6 Ashwani Wason 2011-04-14 02:28:21 UTC
(In reply to comment #4)
> Any additional configuration information would also be helpful.  Even TCP
> parameters that are set.  I can see what is in sysctl.conf, and will apply
> those during my next testing run.

net.ipv4.tcp_keepalive_intvl = 5
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 180
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_max_tw_buckets = 512000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies = 0
net.core.netdev_max_backlog = 5000

Westwood+ as TCP congestion control algorithm.

Comment 7 Ashwani Wason 2011-04-14 02:47:30 UTC
Results from a test with tx ring changed from 512 (default) to 4096.

Summarily, it did not seem to have an impact on bonding, which still flapped. It did seem to have a positive impact on the number of tx_restart_queue count. With default ring parameters it increased by 800 ~during a test, but with increased ring parameter it only increased by 50.

[root@s01b01 64]# ethtool -i eth0
driver: ixgbe
version: 2.0.62-k2
firmware-version: 1.0-3
bus-info: 0000:15:00.0


[root@s01b01 64]# uname -r
2.6.32-71.25.1.el6.bytemobile.1.x86_64


INITIAL STATE OF THE SYSTEM:

[root@s01b01 64]# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096
Current hardware settings:
RX:		512
RX Mini:	0
RX Jumbo:	0
TX:		512


[root@s01b01 64]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85


[root@s01b01 64]# ethtool -S eth0
NIC statistics:
     rx_packets: 66673
     tx_packets: 9821
     rx_bytes: 4656432
     tx_bytes: 538580
     rx_pkts_nic: 66667
     tx_pkts_nic: 9821
     rx_bytes_nic: 5114832
     tx_bytes_nic: 669066
     lsc_int: 1
     tx_busy: 0
     non_eop_descs: 0
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     multicast: 391
     broadcast: 57649
     rx_no_buffer_count: 0
     collisions: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     hw_rsc_aggregated: 0
     hw_rsc_flushed: 0
     fdir_match: 1030
     fdir_miss: 8853
     rx_fifo_errors: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_timeout_count: 0
     tx_restart_queue: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 0
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_flow_control_xoff: 0
     rx_csum_offload_errors: 0
     alloc_rx_page_failed: 0
     alloc_rx_buff_failed: 0
     rx_no_dma_resources: 0
     fcoe_bad_fccrc: 0
     rx_fcoe_dropped: 0
     rx_fcoe_packets: 0
     rx_fcoe_dwords: 0
     tx_fcoe_packets: 0
     tx_fcoe_dwords: 0
     tx_queue_0_packets: 6741
     tx_queue_0_bytes: 389788
     tx_queue_1_packets: 6
     tx_queue_1_bytes: 468
     tx_queue_2_packets: 0
     tx_queue_2_bytes: 0
     tx_queue_3_packets: 0
     tx_queue_3_bytes: 0
     tx_queue_4_packets: 0
     tx_queue_4_bytes: 0
     tx_queue_5_packets: 0
     tx_queue_5_bytes: 0
     tx_queue_6_packets: 7
     tx_queue_6_bytes: 294
     tx_queue_7_packets: 2376
     tx_queue_7_bytes: 100754
     tx_queue_8_packets: 82
     tx_queue_8_bytes: 3444
     tx_queue_9_packets: 0
     tx_queue_9_bytes: 0
     tx_queue_10_packets: 5
     tx_queue_10_bytes: 210
     tx_queue_11_packets: 0
     tx_queue_11_bytes: 0
     tx_queue_12_packets: 552
     tx_queue_12_bytes: 40326
     tx_queue_13_packets: 2
     tx_queue_13_bytes: 132
     tx_queue_14_packets: 0
     tx_queue_14_bytes: 0
     tx_queue_15_packets: 0
     tx_queue_15_bytes: 0
     tx_queue_16_packets: 0
     tx_queue_16_bytes: 0
     tx_queue_17_packets: 0
     tx_queue_17_bytes: 0
     tx_queue_18_packets: 6
     tx_queue_18_bytes: 468
     tx_queue_19_packets: 26
     tx_queue_19_bytes: 1264
     tx_queue_20_packets: 9
     tx_queue_20_bytes: 766
     tx_queue_21_packets: 0
     tx_queue_21_bytes: 0
     tx_queue_22_packets: 5
     tx_queue_22_bytes: 378
     tx_queue_23_packets: 4
     tx_queue_23_bytes: 288
     rx_queue_0_packets: 58195
     rx_queue_0_bytes: 3508411
     rx_queue_1_packets: 425
     rx_queue_1_bytes: 46158
     rx_queue_2_packets: 391
     rx_queue_2_bytes: 33464
     rx_queue_3_packets: 522
     rx_queue_3_bytes: 50152
     rx_queue_4_packets: 843
     rx_queue_4_bytes: 233933
     rx_queue_5_packets: 711
     rx_queue_5_bytes: 68119
     rx_queue_6_packets: 498
     rx_queue_6_bytes: 48164
     rx_queue_7_packets: 593
     rx_queue_7_bytes: 71306
     rx_queue_8_packets: 474
     rx_queue_8_bytes: 57960
     rx_queue_9_packets: 626
     rx_queue_9_bytes: 48984
     rx_queue_10_packets: 410
     rx_queue_10_bytes: 39893
     rx_queue_11_packets: 488
     rx_queue_11_bytes: 48700
     rx_queue_12_packets: 578
     rx_queue_12_bytes: 62312
     rx_queue_13_packets: 821
     rx_queue_13_bytes: 229420
     rx_queue_14_packets: 475
     rx_queue_14_bytes: 39462
     rx_queue_15_packets: 623
     rx_queue_15_bytes: 69994
     rx_queue_16_packets: 0
     rx_queue_16_bytes: 0
     rx_queue_17_packets: 0
     rx_queue_17_bytes: 0
     rx_queue_18_packets: 0
     rx_queue_18_bytes: 0
     rx_queue_19_packets: 0
     rx_queue_19_bytes: 0
     rx_queue_20_packets: 0
     rx_queue_20_bytes: 0
     rx_queue_21_packets: 0
     rx_queue_21_bytes: 0
     rx_queue_22_packets: 0
     rx_queue_22_bytes: 0
     rx_queue_23_packets: 0
     rx_queue_23_bytes: 0



[root@s01b01 64]# netstat -stw
Transfering required files
Starting netstat
Ip:
    219671 total packets received
    80 forwarded
    0 incoming packets discarded
    216076 incoming packets delivered
    216459 requests sent out
    57 dropped because of missing route
Icmp:
    14 ICMP messages received
    5 input ICMP message failed.
    ICMP input histogram:
        echo replies: 9
    9 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        echo request: 9
IcmpMsg:
        InType0: 9
        InType9: 5
        OutType8: 9
Tcp:
    8898 active connections openings
    8061 passive connection openings
    5 failed connection attempts
    2161 connection resets received
    42 connections established
    213797 segments received
    214397 segments send out
    5 segments retransmited
    0 bad segments received.
    1782 resets sent
UdpLite:
TcpExt:
    1190 TCP sockets finished time wait in fast timer
    1 time wait sockets recycled by time stamp
    5349 TCP sockets finished time wait in slow timer
    5020 delayed acks sent
    1 delayed acks further delayed because of locked socket
    11576 packets directly queued to recvmsg prequeue.
    2049 packets directly received from backlog
    10730892 packets directly received from prequeue
    72744 packets header predicted
    1150 packets header predicted and directly queued to user
    45201 acknowledgments not containing data received
    63776 predicted acknowledgments
    1 congestion windows recovered after partial ack
    0 TCP data loss events
    1 retransmits in slow start
    2 other TCP timeouts
    1 connections reset due to unexpected data
    1474 connections reset due to early user close
IpExt:
    InNoRoutes: 4
    InMcastPkts: 25
    InBcastPkts: 219
    InOctets: 168176227
    OutOctets: 157252203
    InMcastOctets: 740
    InBcastOctets: 31942


TEST #1: DON'T CHANGE RING PARAMS. JUST REPORT THE DEFAULT BEHAVIOR.

STATE AFTER TEST #1: Note the bonding flaps. Note the tx_restart_queue jump from 0 to ~800. (Good part about this kernel is that the rx_missed_error count is still 0. With the older RHEL 6 kernel even that would go up. So that erratum patch is doing some good.)

[root@s01b01 64]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 5
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85


bonding: bond1: link status definitely down for interface eth0, disabling it
bonding: bond1: now running without any active interface !
bonding: bond1: link status definitely up for interface eth0.
bonding: bond1: making interface eth0 the new active one.
bonding: bond1: first active interface up!
bonding: bond1: link status definitely down for interface eth0, disabling it
bonding: bond1: now running without any active interface !
bonding: bond1: link status definitely up for interface eth0.
bonding: bond1: making interface eth0 the new active one.
bonding: bond1: first active interface up!
bonding: bond1: link status definitely down for interface eth0, disabling it
bonding: bond1: now running without any active interface !
bonding: bond1: link status definitely up for interface eth0.
bonding: bond1: making interface eth0 the new active one.
bonding: bond1: first active interface up!
bonding: bond1: link status definitely down for interface eth0, disabling it
bonding: bond1: now running without any active interface !
bonding: bond1: link status definitely up for interface eth0.
bonding: bond1: making interface eth0 the new active one.
bonding: bond1: first active interface up!
bonding: bond1: link status definitely down for interface eth0, disabling it
bonding: bond1: now running without any active interface !
bonding: bond1: link status definitely up for interface eth0.
bonding: bond1: making interface eth0 the new active one.
bonding: bond1: first active interface up!


[root@s01b01 64]# netstat -stw
Ip:
    30364827 total packets received
    30022305 forwarded
    0 incoming packets discarded
    336887 incoming packets delivered
    30360861 requests sent out
    57 dropped because of missing route
Icmp:
    17 ICMP messages received
    8 input ICMP message failed.
    ICMP input histogram:
        echo replies: 9
    9 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        echo request: 9
IcmpMsg:
        InType0: 9
        InType9: 8
        OutType8: 9
Tcp:
    14064 active connections openings
    12755 passive connection openings
    12 failed connection attempts
    3453 connection resets received
    46 connections established
    333306 segments received
    335400 segments send out
    5 segments retransmited
    0 bad segments received.
    2765 resets sent
UdpLite:
TcpExt:
    6 resets received for embryonic SYN_RECV sockets
    1894 TCP sockets finished time wait in fast timer
    1 time wait sockets recycled by time stamp
    8476 TCP sockets finished time wait in slow timer
    7958 delayed acks sent
    1 delayed acks further delayed because of locked socket
    18166 packets directly queued to recvmsg prequeue.
    3074 packets directly received from backlog
    16964130 packets directly received from prequeue
    109774 packets header predicted
    1758 packets header predicted and directly queued to user
    72152 acknowledgments not containing data received
    96496 predicted acknowledgments
    1 congestion windows recovered after partial ack
    0 TCP data loss events
    1 retransmits in slow start
    2 other TCP timeouts
    1 connections reset due to unexpected data
    2336 connections reset due to early user close
IpExt:
    InNoRoutes: 4
    InMcastPkts: 40
    InBcastPkts: 333
    InOctets: 24561359330
    OutOctets: 24546546883
    InMcastOctets: 1184
    InBcastOctets: 47558



[root@s01b01 64]# ethtool -S eth0
NIC statistics:
     rx_packets: 30149350
     tx_packets: 29983651
     rx_bytes: 24727261216
     tx_bytes: 24713790494
     rx_pkts_nic: 30149344
     tx_pkts_nic: 29983651
     rx_bytes_nic: 24968336964
     tx_bytes_nic: 24953744210
     lsc_int: 1
     tx_busy: 0
     non_eop_descs: 0
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     multicast: 624
     broadcast: 93577
     rx_no_buffer_count: 0
     collisions: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     hw_rsc_aggregated: 0
     hw_rsc_flushed: 0
     fdir_match: 1716
     fdir_miss: 27874297
     rx_fifo_errors: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_timeout_count: 0
     tx_restart_queue: 808
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 0
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_flow_control_xoff: 0
     rx_csum_offload_errors: 0
     alloc_rx_page_failed: 0
     alloc_rx_buff_failed: 0
     rx_no_dma_resources: 0
     fcoe_bad_fccrc: 0
     rx_fcoe_dropped: 0
     rx_fcoe_packets: 0
     rx_fcoe_dwords: 0
     tx_fcoe_packets: 0
     tx_fcoe_dwords: 0
     tx_queue_0_packets: 29968460
     tx_queue_0_bytes: 24712756332
     tx_queue_1_packets: 776
     tx_queue_1_bytes: 60472
     tx_queue_2_packets: 884
     tx_queue_2_bytes: 68732
     tx_queue_3_packets: 375
     tx_queue_3_bytes: 29112
     tx_queue_4_packets: 0
     tx_queue_4_bytes: 0
     tx_queue_5_packets: 3
     tx_queue_5_bytes: 126
     tx_queue_6_packets: 1456
     tx_queue_6_bytes: 113028
     tx_queue_7_packets: 4198
     tx_queue_7_bytes: 189158
     tx_queue_8_packets: 1292
     tx_queue_8_bytes: 97788
     tx_queue_9_packets: 374
     tx_queue_9_bytes: 28956
     tx_queue_10_packets: 574
     tx_queue_10_bytes: 44448
     tx_queue_11_packets: 1032
     tx_queue_11_bytes: 80388
     tx_queue_12_packets: 1963
     tx_queue_12_bytes: 146842
     tx_queue_13_packets: 96
     tx_queue_13_bytes: 7356
     tx_queue_14_packets: 1
     tx_queue_14_bytes: 42
     tx_queue_15_packets: 0
     tx_queue_15_bytes: 0
     tx_queue_16_packets: 0
     tx_queue_16_bytes: 0
     tx_queue_17_packets: 0
     tx_queue_17_bytes: 0
     tx_queue_18_packets: 324
     tx_queue_18_bytes: 25272
     tx_queue_19_packets: 253
     tx_queue_19_bytes: 18394
     tx_queue_20_packets: 1581
     tx_queue_20_bytes: 123382
     tx_queue_21_packets: 0
     tx_queue_21_bytes: 0
     tx_queue_22_packets: 5
     tx_queue_22_bytes: 378
     tx_queue_23_packets: 4
     tx_queue_23_bytes: 288
     rx_queue_0_packets: 1981926
     rx_queue_0_bytes: 1562859674
     rx_queue_1_packets: 1930232
     rx_queue_1_bytes: 1555175234
     rx_queue_2_packets: 1856354
     rx_queue_2_bytes: 1516051655
     rx_queue_3_packets: 1894827
     rx_queue_3_bytes: 1577806109
     rx_queue_4_packets: 1879466
     rx_queue_4_bytes: 1542961037
     rx_queue_5_packets: 1874220
     rx_queue_5_bytes: 1550872091
     rx_queue_6_packets: 1932878
     rx_queue_6_bytes: 1561825168
     rx_queue_7_packets: 1842200
     rx_queue_7_bytes: 1491242498
     rx_queue_8_packets: 1861876
     rx_queue_8_bytes: 1586886435
     rx_queue_9_packets: 1898826
     rx_queue_9_bytes: 1573190999
     rx_queue_10_packets: 1919328
     rx_queue_10_bytes: 1590789345
     rx_queue_11_packets: 1862172
     rx_queue_11_bytes: 1534656155
     rx_queue_12_packets: 1864577
     rx_queue_12_bytes: 1553796060
     rx_queue_13_packets: 1764945
     rx_queue_13_bytes: 1411355045
     rx_queue_14_packets: 1916482
     rx_queue_14_bytes: 1605562773
     rx_queue_15_packets: 1869041
     rx_queue_15_bytes: 1512230938
     rx_queue_16_packets: 0
     rx_queue_16_bytes: 0
     rx_queue_17_packets: 0
     rx_queue_17_bytes: 0
     rx_queue_18_packets: 0
     rx_queue_18_bytes: 0
     rx_queue_19_packets: 0
     rx_queue_19_bytes: 0
     rx_queue_20_packets: 0
     rx_queue_20_bytes: 0
     rx_queue_21_packets: 0
     rx_queue_21_bytes: 0
     rx_queue_22_packets: 0
     rx_queue_22_bytes: 0
     rx_queue_23_packets: 0
     rx_queue_23_bytes: 0
[root@s01b01 64]# 





TEST #2: SAME TEST. SAME DURATION. SAME TRAFFIC LOAD/PROFILE. BUT WITH INCREASED TX RING.

NOTE THAT IT IS THE SAME BOOT SESSION. SO CONSIDER THE STATISTICS IN RELATION AND NOT ABSOLUTE.

[root@s01b01 64]# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096
Current hardware settings:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096

AFTER TEST: note the bonding flaps jumped from 5 to 17. However the tx_restart_queue did not have such a drastic increase. They just jumped from ~800 to ~850. So some improvement here.


[root@s01b01 proc]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 17
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85




[root@s01b01 proc]# ethtool -S eth0
NIC statistics:
     rx_packets: 59987069
     tx_packets: 59718329
     rx_bytes: 49150597813
     tx_bytes: 49126940415
     rx_pkts_nic: 59987044
     tx_pkts_nic: 59718322
     rx_bytes_nic: 49630344193
     tx_bytes_nic: 49604794231
     lsc_int: 2
     tx_busy: 0
     non_eop_descs: 0
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     multicast: 779
     broadcast: 116995
     rx_no_buffer_count: 0
     collisions: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     hw_rsc_aggregated: 0
     hw_rsc_flushed: 0
     fdir_match: 2168
     fdir_miss: 55604863
     rx_fifo_errors: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_timeout_count: 0
     tx_restart_queue: 852
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 0
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_flow_control_xoff: 0
     rx_csum_offload_errors: 0
     alloc_rx_page_failed: 0
     alloc_rx_buff_failed: 0
     rx_no_dma_resources: 0
     fcoe_bad_fccrc: 0
     rx_fcoe_dropped: 0
     rx_fcoe_packets: 0
     rx_fcoe_dwords: 0
     tx_fcoe_packets: 0
     tx_fcoe_dwords: 0
     tx_queue_0_packets: 59701537
     tx_queue_0_bytes: 49125823277
     tx_queue_1_packets: 1441
     tx_queue_1_bytes: 88402
     tx_queue_2_packets: 888
     tx_queue_2_bytes: 68900
     tx_queue_3_packets: 379
     tx_queue_3_bytes: 29312
     tx_queue_4_packets: 3
     tx_queue_4_bytes: 126
     tx_queue_5_packets: 5
     tx_queue_5_bytes: 210
     tx_queue_6_packets: 1457
     tx_queue_6_bytes: 113070
     tx_queue_7_packets: 4589
     tx_queue_7_bytes: 205580
     tx_queue_8_packets: 1297
     tx_queue_8_bytes: 97998
     tx_queue_9_packets: 378
     tx_queue_9_bytes: 29124
     tx_queue_10_packets: 577
     tx_queue_10_bytes: 44574
     tx_queue_11_packets: 1033
     tx_queue_11_bytes: 80430
     tx_queue_12_packets: 2469
     tx_queue_12_bytes: 183780
     tx_queue_13_packets: 99
     tx_queue_13_bytes: 7498
     tx_queue_14_packets: 1
     tx_queue_14_bytes: 42
     tx_queue_15_packets: 0
     tx_queue_15_bytes: 0
     tx_queue_16_packets: 0
     tx_queue_16_bytes: 0
     tx_queue_17_packets: 1
     tx_queue_17_bytes: 42
     tx_queue_18_packets: 324
     tx_queue_18_bytes: 25272
     tx_queue_19_packets: 260
     tx_queue_19_bytes: 18688
     tx_queue_20_packets: 1581
     tx_queue_20_bytes: 123382
     tx_queue_21_packets: 0
     tx_queue_21_bytes: 0
     tx_queue_22_packets: 5
     tx_queue_22_bytes: 378
     tx_queue_23_packets: 5
     tx_queue_23_bytes: 330
     rx_queue_0_packets: 3863715
     rx_queue_0_bytes: 3119701745
     rx_queue_1_packets: 3802825
     rx_queue_1_bytes: 3130994162
     rx_queue_2_packets: 3736701
     rx_queue_2_bytes: 3053940114
     rx_queue_3_packets: 3733773
     rx_queue_3_bytes: 3100673697
     rx_queue_4_packets: 3787255
     rx_queue_4_bytes: 3130782369
     rx_queue_5_packets: 3728692
     rx_queue_5_bytes: 3036426313
     rx_queue_6_packets: 3801584
     rx_queue_6_bytes: 3084845481
     rx_queue_7_packets: 3705539
     rx_queue_7_bytes: 2974999570
     rx_queue_8_packets: 3703772
     rx_queue_8_bytes: 3100380228
     rx_queue_9_packets: 3759349
     rx_queue_9_bytes: 3083157525
     rx_queue_10_packets: 3708914
     rx_queue_10_bytes: 3026970175
     rx_queue_11_packets: 3696920
     rx_queue_11_bytes: 3008120973
     rx_queue_12_packets: 3754849
     rx_queue_12_bytes: 3111984189
     rx_queue_13_packets: 3673464
     rx_queue_13_bytes: 3009417912
     rx_queue_14_packets: 3824254
     rx_queue_14_bytes: 3189822779
     rx_queue_15_packets: 3705463
     rx_queue_15_bytes: 2988380581
     rx_queue_16_packets: 0
     rx_queue_16_bytes: 0
     rx_queue_17_packets: 0
     rx_queue_17_bytes: 0
     rx_queue_18_packets: 0
     rx_queue_18_bytes: 0
     rx_queue_19_packets: 0
     rx_queue_19_bytes: 0
     rx_queue_20_packets: 0
     rx_queue_20_bytes: 0
     rx_queue_21_packets: 0
     rx_queue_21_bytes: 0
     rx_queue_22_packets: 0
     rx_queue_22_bytes: 0
     rx_queue_23_packets: 0
     rx_queue_23_bytes: 0
[root@s01b01 proc]# 



[root@s01b01 proc]# netstat -stw
Ip:
    60231993 total packets received
    59789772 forwarded
    0 incoming packets discarded
    434951 incoming packets delivered
    60227237 requests sent out
    57 dropped because of missing route
Icmp:
    19 ICMP messages received
    10 input ICMP message failed.
    ICMP input histogram:
        echo replies: 9
    9 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        echo request: 9
IcmpMsg:
        InType0: 9
        InType9: 10
        OutType8: 9
Tcp:
    18114 active connections openings
    16428 passive connection openings
    26 failed connection attempts
    4459 connection resets received
    44 connections established
    430346 segments received
    433395 segments send out
    5 segments retransmited
    0 bad segments received.
    3540 resets sent
UdpLite:
TcpExt:
    18 resets received for embryonic SYN_RECV sockets
    2437 TCP sockets finished time wait in fast timer
    10 time wait sockets recycled by time stamp
    10929 TCP sockets finished time wait in slow timer
    10146 delayed acks sent
    1 delayed acks further delayed because of locked socket
    23289 packets directly queued to recvmsg prequeue.
    6149 packets directly received from backlog
    21862504 packets directly received from prequeue
    141715 packets header predicted
    2226 packets header predicted and directly queued to user
    93443 acknowledgments not containing data received
    125015 predicted acknowledgments
    1 congestion windows recovered after partial ack
    0 TCP data loss events
    1 retransmits in slow start
    2 other TCP timeouts
    1 connections reset due to unexpected data
    3014 connections reset due to early user close
IpExt:
    InNoRoutes: 4
    InMcastPkts: 51
    InBcastPkts: 432
    InOctets: 48636231243
    OutOctets: 48618311121
    InMcastOctets: 1508
    InBcastOctets: 60938

Comment 8 RHEL Program Management 2011-04-14 06:01:02 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 9 Andy Gospodarek 2011-04-14 20:30:31 UTC
A few questions:

Can you send me the contents of all the files and
directories in /proc/irq after one of these tests has been run?

I was also tracing through things and my concerns about bonding only
using a single queue are probably not valid.  The bonding code in 6.0
should just act as a pass-through as far as output queues are concerned
rather than dumping everything in queue 0 as I initially thought.  I
still find it interesting that queue 0 has the most transmit traffic,
though.

I also wanted to check on one more thing.  Am I correct to see that you
are actually using only physical 1 interface (eth0 in this case) in bond1 to route between two VLANs that are actually associated with bond0?  The proverbial 'router on a stick' [0], or in this case 'proxy on a stick?'

I was actually forwarding between the two physical interfaces for my testing before, but it looks like that might not have been what you were testing.

0. http://en.wikipedia.org/wiki/One-armed_router

Comment 11 Ashwani Wason 2011-04-14 20:53:27 UTC
(In reply to comment #9)
> A few questions:
> 
> Can you send me the contents of all the files and
> directories in /proc/irq after one of these tests has been run?
> 

Running a test right now. I will attach the information in the next 1/2 hour or so.

> I was also tracing through things and my concerns about bonding only
> using a single queue are probably not valid.  The bonding code in 6.0
> should just act as a pass-through as far as output queues are concerned
> rather than dumping everything in queue 0 as I initially thought.  I
> still find it interesting that queue 0 has the most transmit traffic,
> though.
> 
> I also wanted to check on one more thing.  Am I correct to see that you
> are actually using only physical 1 interface (eth0 in this case) in bond1 to
> route between two VLANs that are actually associated with bond0?  The
> proverbial 'router on a stick' [0], or in this case 'proxy on a stick?'
> 

Yes, that is right. It is actually a router on a stick right now because I am simply forwarding the packets from one VLAN to another, but ultimately it would be a proxy on a stick.

Comment 12 Ashwani Wason 2011-04-14 21:28:01 UTC
Before and after data from a new test. Will also attach before and after IRQ information.

BEFORE:


[root@s01b01 ~]# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096
Current hardware settings:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096




[root@s01b01 ~]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85



[root@s01b01 ~]# ethtool -S eth0
NIC statistics:
     rx_packets: 4788
     tx_packets: 467
     rx_bytes: 327480
     tx_bytes: 27194
     rx_pkts_nic: 4754
     tx_pkts_nic: 460
     rx_bytes_nic: 359552
     tx_bytes_nic: 32798
     lsc_int: 2
     tx_busy: 0
     non_eop_descs: 0
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     multicast: 17
     broadcast: 4378
     rx_no_buffer_count: 0
     collisions: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     hw_rsc_aggregated: 0
     hw_rsc_flushed: 0
     fdir_match: 47
     fdir_miss: 490
     rx_fifo_errors: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_timeout_count: 0
     tx_restart_queue: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 0
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_flow_control_xoff: 0
     rx_csum_offload_errors: 0
     alloc_rx_page_failed: 0
     alloc_rx_buff_failed: 0
     rx_no_dma_resources: 0
     fcoe_bad_fccrc: 0
     rx_fcoe_dropped: 0
     rx_fcoe_packets: 0
     rx_fcoe_dwords: 0
     tx_fcoe_packets: 0
     tx_fcoe_dwords: 0
     tx_queue_0_packets: 308
     tx_queue_0_bytes: 17959
     tx_queue_1_packets: 8
     tx_queue_1_bytes: 617
     tx_queue_2_packets: 0
     tx_queue_2_bytes: 0
     tx_queue_3_packets: 0
     tx_queue_3_bytes: 0
     tx_queue_4_packets: 0
     tx_queue_4_bytes: 0
     tx_queue_5_packets: 0
     tx_queue_5_bytes: 0
     tx_queue_6_packets: 0
     tx_queue_6_bytes: 0
     tx_queue_7_packets: 115
     tx_queue_7_bytes: 5730
     tx_queue_8_packets: 0
     tx_queue_8_bytes: 0
     tx_queue_9_packets: 0
     tx_queue_9_bytes: 0
     tx_queue_10_packets: 0
     tx_queue_10_bytes: 0
     tx_queue_11_packets: 0
     tx_queue_11_bytes: 0
     tx_queue_12_packets: 6
     tx_queue_12_bytes: 468
     tx_queue_13_packets: 1
     tx_queue_13_bytes: 90
     tx_queue_14_packets: 14
     tx_queue_14_bytes: 1144
     tx_queue_15_packets: 6
     tx_queue_15_bytes: 468
     tx_queue_16_packets: 0
     tx_queue_16_bytes: 0
     tx_queue_17_packets: 0
     tx_queue_17_bytes: 0
     tx_queue_18_packets: 0
     tx_queue_18_bytes: 0
     tx_queue_19_packets: 4
     tx_queue_19_bytes: 340
     tx_queue_20_packets: 0
     tx_queue_20_bytes: 0
     tx_queue_21_packets: 0
     tx_queue_21_bytes: 0
     tx_queue_22_packets: 5
     tx_queue_22_bytes: 378
     tx_queue_23_packets: 0
     tx_queue_23_bytes: 0
     rx_queue_0_packets: 4301
     rx_queue_0_bytes: 258536
     rx_queue_1_packets: 17
     rx_queue_1_bytes: 2024
     rx_queue_2_packets: 28
     rx_queue_2_bytes: 4141
     rx_queue_3_packets: 28
     rx_queue_3_bytes: 2529
     rx_queue_4_packets: 37
     rx_queue_4_bytes: 11062
     rx_queue_5_packets: 96
     rx_queue_5_bytes: 9102
     rx_queue_6_packets: 28
     rx_queue_6_bytes: 2914
     rx_queue_7_packets: 30
     rx_queue_7_bytes: 6332
     rx_queue_8_packets: 20
     rx_queue_8_bytes: 2589
     rx_queue_9_packets: 25
     rx_queue_9_bytes: 1584
     rx_queue_10_packets: 22
     rx_queue_10_bytes: 1428
     rx_queue_11_packets: 26
     rx_queue_11_bytes: 4263
     rx_queue_12_packets: 38
     rx_queue_12_bytes: 5414
     rx_queue_13_packets: 35
     rx_queue_13_bytes: 9008
     rx_queue_14_packets: 23
     rx_queue_14_bytes: 2430
     rx_queue_15_packets: 34
     rx_queue_15_bytes: 4124
     rx_queue_16_packets: 0
     rx_queue_16_bytes: 0
     rx_queue_17_packets: 0
     rx_queue_17_bytes: 0
     rx_queue_18_packets: 0
     rx_queue_18_bytes: 0
     rx_queue_19_packets: 0
     rx_queue_19_bytes: 0
     rx_queue_20_packets: 0
     rx_queue_20_bytes: 0
     rx_queue_21_packets: 0
     rx_queue_21_bytes: 0
     rx_queue_22_packets: 0
     rx_queue_22_bytes: 0
     rx_queue_23_packets: 0
     rx_queue_23_bytes: 0
[root@s01b01 ~]# 



AFTER:


[root@s01b01 logs]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 32
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85
[root@s01b01 logs]# ethtool -S eth0
NIC statistics:
     rx_packets: 100543278
     tx_packets: 100189125
     rx_bytes: 82262630023
     tx_bytes: 82231094517
     rx_pkts_nic: 100543244
     tx_pkts_nic: 100189118
     rx_bytes_nic: 83066897111
     tx_bytes_nic: 83032667121
     lsc_int: 2
     tx_busy: 0
     non_eop_descs: 0
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     multicast: 366
     broadcast: 83959
     rx_no_buffer_count: 0
     collisions: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     hw_rsc_aggregated: 0
     hw_rsc_flushed: 0
     fdir_match: 1246
     fdir_miss: 92810609
     rx_fifo_errors: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_timeout_count: 0
     tx_restart_queue: 167
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 0
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_flow_control_xoff: 0
     rx_csum_offload_errors: 0
     alloc_rx_page_failed: 0
     alloc_rx_buff_failed: 0
     rx_no_dma_resources: 0
     fcoe_bad_fccrc: 0
     rx_fcoe_dropped: 0
     rx_fcoe_packets: 0
     rx_fcoe_dwords: 0
     tx_fcoe_packets: 0
     tx_fcoe_dwords: 0
     tx_queue_0_packets: 100174989
     tx_queue_0_bytes: 82230096217
     tx_queue_1_packets: 355
     tx_queue_1_bytes: 27458
     tx_queue_2_packets: 1304
     tx_queue_2_bytes: 101456
     tx_queue_3_packets: 1362
     tx_queue_3_bytes: 106046
     tx_queue_4_packets: 982
     tx_queue_4_bytes: 76496
     tx_queue_5_packets: 4
     tx_queue_5_bytes: 184
     tx_queue_6_packets: 1213
     tx_queue_6_bytes: 94034
     tx_queue_7_packets: 1309
     tx_queue_7_bytes: 71514
     tx_queue_8_packets: 11
     tx_queue_8_bytes: 462
     tx_queue_9_packets: 1699
     tx_queue_9_bytes: 126470
     tx_queue_10_packets: 1438
     tx_queue_10_bytes: 86168
     tx_queue_11_packets: 965
     tx_queue_11_bytes: 40530
     tx_queue_12_packets: 1120
     tx_queue_12_bytes: 83614
     tx_queue_13_packets: 461
     tx_queue_13_bytes: 35658
     tx_queue_14_packets: 43
     tx_queue_14_bytes: 3370
     tx_queue_15_packets: 143
     tx_queue_15_bytes: 11118
     tx_queue_16_packets: 194
     tx_queue_16_bytes: 15088
     tx_queue_17_packets: 1
     tx_queue_17_bytes: 42
     tx_queue_18_packets: 0
     tx_queue_18_bytes: 0
     tx_queue_19_packets: 1442
     tx_queue_19_bytes: 111712
     tx_queue_20_packets: 55
     tx_queue_20_bytes: 4254
     tx_queue_21_packets: 29
     tx_queue_21_bytes: 2206
     tx_queue_22_packets: 5
     tx_queue_22_bytes: 378
     tx_queue_23_packets: 1
     tx_queue_23_bytes: 42
     rx_queue_0_packets: 6350695
     rx_queue_0_bytes: 5128989255
     rx_queue_1_packets: 6284315
     rx_queue_1_bytes: 5188983212
     rx_queue_2_packets: 6195507
     rx_queue_2_bytes: 5020757521
     rx_queue_3_packets: 6343558
     rx_queue_3_bytes: 5229906625
     rx_queue_4_packets: 6291467
     rx_queue_4_bytes: 5174275132
     rx_queue_5_packets: 6323478
     rx_queue_5_bytes: 5200131534
     rx_queue_6_packets: 6203879
     rx_queue_6_bytes: 4974353475
     rx_queue_7_packets: 6236079
     rx_queue_7_bytes: 4987702452
     rx_queue_8_packets: 6280823
     rx_queue_8_bytes: 5160680828
     rx_queue_9_packets: 6333555
     rx_queue_9_bytes: 5238911576
     rx_queue_10_packets: 6256809
     rx_queue_10_bytes: 5152094483
     rx_queue_11_packets: 6368079
     rx_queue_11_bytes: 5241391039
     rx_queue_12_packets: 6382258
     rx_queue_12_bytes: 5238224014
     rx_queue_13_packets: 6352196
     rx_queue_13_bytes: 5261181684
     rx_queue_14_packets: 6188024
     rx_queue_14_bytes: 5019302130
     rx_queue_15_packets: 6152556
     rx_queue_15_bytes: 5045745063
     rx_queue_16_packets: 0
     rx_queue_16_bytes: 0
     rx_queue_17_packets: 0
     rx_queue_17_bytes: 0
     rx_queue_18_packets: 0
     rx_queue_18_bytes: 0
     rx_queue_19_packets: 0
     rx_queue_19_bytes: 0
     rx_queue_20_packets: 0
     rx_queue_20_bytes: 0
     rx_queue_21_packets: 0
     rx_queue_21_bytes: 0
     rx_queue_22_packets: 0
     rx_queue_22_bytes: 0
     rx_queue_23_packets: 0
     rx_queue_23_bytes: 0

Comment 13 Ashwani Wason 2011-04-14 21:28:32 UTC
Created attachment 492236 [details]
IRQ data before and after the test

Comment 14 Andy Gospodarek 2011-04-15 05:16:03 UTC
Thanks, Ashwan!

Comment 15 Andy Gospodarek 2011-04-18 16:55:39 UTC
I've been testing this a bit more and though I only have the equipment to push 1Gbps and only a single port in the bond, I've noticed a few interesting things.

Every 2.0s: ethtool -S eth0 | grep -v :\ 0$ ; cat /proc/net/bonding/bond0
                                                                                                                                                            
NIC statistics:                                                                                                                                             
     rx_packets: 177133586                                                                                                                                  
     tx_packets: 177179413                                                                                                                                  
     rx_bytes: 240473694525                                                                                                                                 
     tx_bytes: 240536435099                                                                                                                                 
     rx_pkts_nic: 177180001                                                                                                                                 
     tx_pkts_nic: 177180540                                                                                                                                 
     rx_bytes_nic: 241952509937                                                                                                                             
     tx_bytes_nic: 241953968727                                                                                                                             
     lsc_int: 1                                                                                                                                             
     multicast: 132                                                                                                                                         
     broadcast: 84                                                                                                                                          
     fdir_match: 364                                                                                                                                        
     fdir_miss: 177269158                                                                                                                                   
     tx_queue_0_packets: 809076                                                                                                                             
     tx_queue_0_bytes: 1113433485                                                                                                                           
     tx_queue_1_packets: 14                                                                                                                                 
     tx_queue_1_bytes: 3268                                                                                                                                 
     tx_queue_2_packets: 90971798                                                                                                                           
     tx_queue_2_bytes: 114887921986                                                                                                                         
     tx_queue_3_packets: 85399649                                                                                                                           
     tx_queue_3_bytes: 124535137512                                                                                                                         
     rx_queue_0_packets: 69209367                                                                                                                           
     rx_queue_0_bytes: 95141467338                                                                                                                          
     rx_queue_1_packets: 79082688                                                                                                                           
     rx_queue_1_bytes: 111646968545                                                                                                                         
     rx_queue_2_packets: 20428586                                                                                                                           
     rx_queue_2_bytes: 30781611215                                                                                                                          
     rx_queue_3_packets: 8459355                                                                                                                            
     rx_queue_3_bytes: 2966371273                                                                                                                           
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)                                                                                                  
                                                                                                                                                            
Bonding Mode: fault-tolerance (active-backup)                                                                                                               
Primary Slave: None                                                                                                                                         
Currently Active Slave: eth0                                                                                                                                
MII Status: up                                                                                                                                              
MII Polling Interval (ms): 0                                                                                                                                
Up Delay (ms): 0                                                                                                                                            
Down Delay (ms): 0                                                                                                                                          
ARP Polling Interval (ms): 1000                                                                                                                             
ARP IP target/s (n.n.n.n form): 10.0.2.200                                                                                                                  
                                                                                                                                                            
Slave Interface: eth0                                                                                                                                       
MII Status: up                                                                                                                                              
Link Failure Count: 3                                                                                                                                       
Permanent HW addr: 00:1b:21:37:b7:20

I have seen bonding link failures.  I was able to run tcpdump on the host that is the ARP IP target and when I mapped the timestamps, the failure occurred at 12:16:03...

12:16:01.668179 ARP, Request who-has zotac tell 10.0.2.201, length 46
12:16:01.668204 ARP, Reply zotac is-at 00:01:2e:bc:11:5a (oui Unknown), length 28
12:16:02.668403 ARP, Request who-has zotac tell 10.0.2.201, length 46
12:16:02.668427 ARP, Reply zotac is-at 00:01:2e:bc:11:5a (oui Unknown), length 28
12:16:03.681330 ARP, Request who-has zotac tell 10.0.2.201, length 46
12:16:03.681352 ARP, Reply zotac is-at 00:01:2e:bc:11:5a (oui Unknown), length 28
12:16:04.705850 ARP, Reply 10.0.2.201 is-at 00:1b:21:37:b7:20 (oui Unknown), length 46
12:16:04.709915 ARP, Request who-has zotac tell 10.0.2.201, length 46
12:16:04.709930 ARP, Reply zotac is-at 00:01:2e:bc:11:5a (oui Unknown), length 28
12:16:05.709770 ARP, Request who-has zotac tell 10.0.2.201, length 46
12:16:05.709795 ARP, Reply zotac is-at 00:01:2e:bc:11:5a (oui Unknown), length 28

I noticed there was essentially no delay sending the ARP response on the wire, so if there was a delay processing the response that would have been why the link dropped.

I also want to note that when sending on 1Gbps interfaces (which in this case nets about ~880Mbps of throughput) with 256 ring buffer entries I am not seeing the tx_queue_stopped messages incrementing.  This may be an indicator that the stress from my 1Gbps of traffic is not significant enough.

Comment 16 Andy Gospodarek 2011-04-18 16:57:38 UTC
(In reply to comment #15)
> 
> I also want to note that when sending on 1Gbps interfaces (which in this case
> nets about ~880Mbps of throughput) with 256 ring buffer entries I am not seeing
> the tx_queue_stopped messages incrementing.  This may be an indicator that the
> stress from my 1Gbps of traffic is not significant enough.

Sorry, the default is 512 rx and tx ring buffer entries.

Comment 17 Andy Gospodarek 2011-04-18 17:05:35 UTC
(In reply to comment #15)
> 
> Every 2.0s: ethtool -S eth0 | grep -v :\ 0$ ; cat /proc/net/bonding/bond0
> 
> NIC statistics:                                                                 
>      rx_packets: 177133586                                                      
>      tx_packets: 177179413                                                      
>      rx_bytes: 240473694525                                                     
>      tx_bytes: 240536435099                                                     
>      rx_pkts_nic: 177180001                                                     
>      tx_pkts_nic: 177180540                                                     
>      rx_bytes_nic: 241952509937                                                 
>      tx_bytes_nic: 241953968727                                                 
>      lsc_int: 1                                                                 
>      multicast: 132                                                             
>      broadcast: 84                                                              
>      fdir_match: 364                                                            
>      fdir_miss: 177269158                                                       
>      tx_queue_0_packets: 809076                                                 
>      tx_queue_0_bytes: 1113433485                                               
>      tx_queue_1_packets: 14                                                     
>      tx_queue_1_bytes: 3268                                                     
>      tx_queue_2_packets: 90971798                                               
>      tx_queue_2_bytes: 114887921986                                             
>      tx_queue_3_packets: 85399649                                               
>      tx_queue_3_bytes: 124535137512                                             
>      rx_queue_0_packets: 69209367                                               
>      rx_queue_0_bytes: 95141467338                                              
>      rx_queue_1_packets: 79082688                                               
>      rx_queue_1_bytes: 111646968545                                             
>      rx_queue_2_packets: 20428586                                               
>      rx_queue_2_bytes: 30781611215                                              
>      rx_queue_3_packets: 8459355                                                
>      rx_queue_3_bytes: 2966371273                                               
> Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)                      
> 

I also wanted to not that I'm seeing a much more even distribution of traffic across transmit queues, than Bytemobile is seeing.  Their tests indicate that over 90% of the traffic is being sent by tx queue 0, whereas mine is much more distributed.  I do not know if that is a function of the traffic in test being run, but I find that interesting.

Comment 18 Andy Gospodarek 2011-04-18 18:09:28 UTC
When I move around my smp_affinity to only use CPU0 for eth0's interrupts, I'm seeing the same type of tx_queue usage seen by Bytemobile.  This is good confirmation that the driver is doing what is should be doing, but why is CPU0 the only one receiving any interrupts when smp_affinity for those irqs is ffffff?

Comment 19 Andy Gospodarek 2011-04-18 18:35:46 UTC
When I move around my smp_affinity to only use CPU0 for eth0's interrupts, I'm seeing the same type of tx_queue usage seen by Bytemobile.  This is good confirmation that the driver is doing what is should be doing, but why is CPU0 the only one receiving any interrupts when smp_affinity for those irqs is ffffff?

Comment 20 Neil Horman 2011-04-18 19:29:28 UTC
I'm looking at the sosreport, and it appears to be missing several items (or
they were edited out).  Most notably the ixgbe and bnx2 subdirectories in
/sys/modules are missing, as is /etc/modprobe.d.  Were these left out
intentionally?

Comment 21 Andy Gospodarek 2011-04-18 20:19:17 UTC
To test 6.1 and workaround the panic from the bonding module, do the following:

$ echo options bonding tx_queues=32 > /etc/modprobe.d/local.conf

This will create local.conf and the tx_queues should be passed to bonding any time it is loaded with modprobe.

You can also set this manually each time the driver is loaded with

$ modprobe bonding tx_queues=32

Comment 22 Andy Gospodarek 2011-04-18 20:34:39 UTC
Just had a failure with the Intel ixgbe driver from sourceforge:

[root@localhost src]# ethtool -i eth0                                                                                                  
driver: ixgbe
version: 3.2.10-NAPI
firmware-version: 0.9-2
bus-info: 0000:01:00.0
[root@localhost src]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 10.0.2.200

Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:37:b7:20

I will reboot and try the latest RHEL6.1 kernel.

Comment 23 Ashwani Wason 2011-04-18 20:49:01 UTC
(In reply to comment #20)
> I'm looking at the sosreport, and it appears to be missing several items (or
> they were edited out).  Most notably the ixgbe and bnx2 subdirectories in
> /sys/modules are missing, as is /etc/modprobe.d.  Were these left out
> intentionally?

The sosreport was sent as it was generated. Not sure why it did not have those directories. I am attaching a file with the contents of those directories. This was collected using 2.6.32-71.25.1.el6.bytemobile.1.x86_64.

Comment 24 Ashwani Wason 2011-04-18 20:50:00 UTC
Created attachment 493005 [details]
missing sysreport data (/etc/modprobe.d/, /sys/module/ixgbe/, and /sys/module/bnx2/)

Comment 25 Ashwani Wason 2011-04-18 23:14:16 UTC
I ran a test with 6.1 kernel plus Andy's recommendations of 32 tx_queues to work around the kernel panic. In a test with ~900 mbps, I saw only two flaps, which is better than the previous kernel. In a test with ~300 mbps, I saw no flaps, which is unlike the previous kernel. Here is the "before and after" data from the ~900 mbps test.

The restart queue count keeps going up regardless. 

[root@s01b01 ~]# uname -r
2.6.32-130.el6.x86_64

[root@s01b01 ~]# ethtool -G eth0 rx 4096 tx 4096

[root@s01b01 ~]# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096
Current hardware settings:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096

BEFORE:

[root@s01b01 ~]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:63:be:84
Slave queue ID: 0

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85
Slave queue ID: 0

[root@s01b01 ~]# netstat -stw
Transfering required files
Starting netstat
Ip:
    11476 total packets received
    3 forwarded
    0 incoming packets discarded
    11273 incoming packets delivered
    11330 requests sent out
    58 dropped because of missing route
Icmp:
    10 ICMP messages received
    1 input ICMP message failed.
    ICMP input histogram:
        echo replies: 9
    9 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        echo request: 9
IcmpMsg:
        InType0: 9
        InType9: 1
        OutType8: 9
Tcp:
    575 active connections openings
    474 passive connection openings
    5 failed connection attempts
    104 connection resets received
    44 connections established
    11124 segments received
    11254 segments send out
    5 segments retransmited
    0 bad segments received.
    130 resets sent
UdpLite:
TcpExt:
    34 TCP sockets finished time wait in fast timer
    322 TCP sockets finished time wait in slow timer
    329 delayed acks sent
    855 packets directly queued to recvmsg prequeue.
    32 packets directly received from backlog
    663827 packets directly received from prequeue
    2738 packets header predicted
    71 packets header predicted and directly queued to user
    2519 acknowledgments not containing data received
    2417 predicted acknowledgments
    0 TCP data loss events
    3 retransmits in slow start
    1 other TCP timeouts
    1 DSACKs received
    11 connections reset due to unexpected data
    72 connections reset due to early user close
    TCPDSACKIgnoredNoUndo: 1
    TCPSackShiftFallback: 2
IpExt:
    InNoRoutes: 4
    InMcastPkts: 2
    InBcastPkts: 21
    InOctets: 11993597
    OutOctets: 7822733
    InMcastOctets: 64
    InBcastOctets: 3138


[root@s01b01 ~]# ethtool -S eth0 | grep -v :\ 0$
NIC statistics:
     rx_packets: 4682
     tx_packets: 618
     rx_bytes: 319365
     tx_bytes: 34988
     rx_pkts_nic: 4603
     tx_pkts_nic: 616
     rx_bytes_nic: 345786
     tx_bytes_nic: 43058
     multicast: 26
     broadcast: 4080
     fdir_match: 45
     fdir_miss: 517
     tx_queue_0_packets: 431
     tx_queue_0_bytes: 24942
     tx_queue_1_packets: 37
     tx_queue_1_bytes: 1554
     tx_queue_7_packets: 2
     tx_queue_7_bytes: 684
     tx_queue_8_packets: 104
     tx_queue_8_bytes: 4368
     tx_queue_13_packets: 17
     tx_queue_13_bytes: 1318
     tx_queue_20_packets: 16
     tx_queue_20_bytes: 1276
     tx_queue_21_packets: 11
     tx_queue_21_bytes: 846
     rx_queue_0_packets: 4196
     rx_queue_0_bytes: 253762
     rx_queue_1_packets: 35
     rx_queue_1_bytes: 3138
     rx_queue_2_packets: 31
     rx_queue_2_bytes: 2992
     rx_queue_3_packets: 25
     rx_queue_3_bytes: 2427
     rx_queue_4_packets: 58
     rx_queue_4_bytes: 15979
     rx_queue_5_packets: 21
     rx_queue_5_bytes: 1491
     rx_queue_6_packets: 13
     rx_queue_6_bytes: 906
     rx_queue_7_packets: 26
     rx_queue_7_bytes: 1704
     rx_queue_8_packets: 20
     rx_queue_8_bytes: 1748
     rx_queue_9_packets: 64
     rx_queue_9_bytes: 7642
     rx_queue_10_packets: 21
     rx_queue_10_bytes: 2129
     rx_queue_11_packets: 31
     rx_queue_11_bytes: 2255
     rx_queue_12_packets: 22
     rx_queue_12_bytes: 1877
     rx_queue_13_packets: 46
     rx_queue_13_bytes: 13358
     rx_queue_14_packets: 36
     rx_queue_14_bytes: 2921
     rx_queue_15_packets: 37
     rx_queue_15_bytes: 5036
[root@s01b01 ~]# 



AFTER:


[root@s01b01 ~]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 2
Permanent HW addr: 00:1b:21:63:be:84
Slave queue ID: 0

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85
Slave queue ID: 0


[root@s01b01 ~]# netstat -stw
Ip:
    291854857 total packets received
    2 with invalid headers
    291577951 forwarded
    0 incoming packets discarded
    272014 incoming packets delivered
    291852926 requests sent out
    58 dropped because of missing route
Icmp:
    26 ICMP messages received
    17 input ICMP message failed.
    ICMP input histogram:
        echo replies: 9
    9 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        echo request: 9
IcmpMsg:
        InType0: 9
        InType9: 17
        OutType8: 9
Tcp:
    11405 active connections openings
    10286 passive connection openings
    64 failed connection attempts
    2775 connection resets received
    44 connections established
    268902 segments received
    272433 segments send out
    5 segments retransmited
    0 bad segments received.
    2395 resets sent
UdpLite:
TcpExt:
    58 resets received for embryonic SYN_RECV sockets
    1472 TCP sockets finished time wait in fast timer
    1 time wait sockets recycled by time stamp
    6867 TCP sockets finished time wait in slow timer
    6074 delayed acks sent
    5 delayed acks further delayed because of locked socket
    14428 packets directly queued to recvmsg prequeue.
    9360 packets directly received from backlog
    13937167 packets directly received from prequeue
    90961 packets header predicted
    1554 packets header predicted and directly queued to user
    55187 acknowledgments not containing data received
    85703 predicted acknowledgments
    0 TCP data loss events
    3 retransmits in slow start
    1 other TCP timeouts
    1 DSACKs received
    106 connections reset due to unexpected data
    1895 connections reset due to early user close
    TCPDSACKIgnoredNoUndo: 1
    TCPSackShiftFallback: 2
IpExt:
    InNoRoutes: 4
    InMcastPkts: 43
    InBcastPkts: 474
    InOctets: 232487400868
    OutOctets: 232474667581
    InMcastOctets: 1340
    InBcastOctets: 72730


[root@s01b01 ~]# ethtool -S eth0 | grep -v :\ 0$
NIC statistics:
     rx_packets: 291696416
     tx_packets: 290986626
     rx_bytes: 236363205272
     tx_bytes: 236292342470
     rx_pkts_nic: 291696337
     tx_pkts_nic: 290986624
     rx_bytes_nic: 238698154754
     tx_bytes_nic: 238620317286
     multicast: 467
     broadcast: 94209
     fdir_match: 1416
     fdir_miss: 211473538
     rx_missed_errors: 19094
     tx_restart_queue: 555
     tx_queue_0_packets: 290922206
     tx_queue_0_bytes: 236287428312
     tx_queue_1_packets: 5231
     tx_queue_1_bytes: 398422
     tx_queue_2_packets: 3530
     tx_queue_2_bytes: 273456
     tx_queue_3_packets: 421
     tx_queue_3_bytes: 31548
     tx_queue_4_packets: 2341
     tx_queue_4_bytes: 182598
     tx_queue_5_packets: 3180
     tx_queue_5_bytes: 248040
     tx_queue_6_packets: 4981
     tx_queue_6_bytes: 383886
     tx_queue_7_packets: 6243
     tx_queue_7_bytes: 443820
     tx_queue_8_packets: 4078
     tx_queue_8_bytes: 314310
     tx_queue_9_packets: 730
     tx_queue_9_bytes: 56940
     tx_queue_10_packets: 2333
     tx_queue_10_bytes: 181974
     tx_queue_11_packets: 2402
     tx_queue_11_bytes: 187336
     tx_queue_12_packets: 15299
     tx_queue_12_bytes: 1161924
     tx_queue_13_packets: 2633
     tx_queue_13_bytes: 198512
     tx_queue_14_packets: 1160
     tx_queue_14_bytes: 90434
     tx_queue_15_packets: 1
     tx_queue_15_bytes: 42
     tx_queue_17_packets: 828
     tx_queue_17_bytes: 64584
     tx_queue_18_packets: 5369
     tx_queue_18_bytes: 414284
     tx_queue_19_packets: 1695
     tx_queue_19_bytes: 128904
     tx_queue_20_packets: 689
     tx_queue_20_bytes: 53740
     tx_queue_21_packets: 710
     tx_queue_21_bytes: 55266
     tx_queue_22_packets: 196
     tx_queue_22_bytes: 15288
     tx_queue_23_packets: 370
     tx_queue_23_bytes: 28850
     rx_queue_0_packets: 18257466
     rx_queue_0_bytes: 14682650788
     rx_queue_1_packets: 18192598
     rx_queue_1_bytes: 14565117131
     rx_queue_2_packets: 18124128
     rx_queue_2_bytes: 14613656382
     rx_queue_3_packets: 18111687
     rx_queue_3_bytes: 14786470637
     rx_queue_4_packets: 18208241
     rx_queue_4_bytes: 14736447332
     rx_queue_5_packets: 18482515
     rx_queue_5_bytes: 15226056475
     rx_queue_6_packets: 18189796
     rx_queue_6_bytes: 14690088848
     rx_queue_7_packets: 18057284
     rx_queue_7_bytes: 14434175469
     rx_queue_8_packets: 18271238
     rx_queue_8_bytes: 14895254489
     rx_queue_9_packets: 18240166
     rx_queue_9_bytes: 14809410889
     rx_queue_10_packets: 18210778
     rx_queue_10_bytes: 14778378340
     rx_queue_11_packets: 18197291
     rx_queue_11_bytes: 14721739448
     rx_queue_12_packets: 18245856
     rx_queue_12_bytes: 14827948246
     rx_queue_13_packets: 18247311
     rx_queue_13_bytes: 14714933243
     rx_queue_14_packets: 18344321
     rx_queue_14_bytes: 14995975923
     rx_queue_15_packets: 18315740
     rx_queue_15_bytes: 14884901632
[root@s01b01 ~]# 
[root@s01b01 ~]# 
[root@s01b01 ~]#

Comment 26 Ashwani Wason 2011-04-18 23:15:24 UTC
Interrupt distribution *after* the prior test had finished.

[root@s01b01 ~]# cat /proc/interrupts 
            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14      CPU15      CPU16      CPU17      CPU18      CPU19      CPU20      CPU21      CPU22      CPU23      
   0:        198          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      timer
   1:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      i8042
   3:        669          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      serial
   8:          1          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      rtc0
   9:        433          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
  16:         32          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
  20:         22          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb4, uhci_hcd:usb5
  21:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb6
  22:         47          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
  28:         42          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   ioc0
  48:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
  49:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
  50:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
  51:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
  52:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
  53:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
  54:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
  55:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
  56:   17701519          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-0
  57:    7390099          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-1
  58:    7353816          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-2
  59:    6763228          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-3
  60:    7401546          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-4
  61:    7417458          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-5
  62:    7259788          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-6
  63:    7435445          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-7
  64:    7362636          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-8
  65:    7466439          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-9
  66:    7433681          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-10
  67:    7427769          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-11
  68:    7380698          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-12
  69:    7415619          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-13
  70:    7327236          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-14
  71:    7537552          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-15
  72:       1776          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-16
  73:       2402          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-17
  74:       6268          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-18
  75:       3298          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-19
  76:       2405          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-20
  77:       2304          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-21
  78:       1924          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-22
  79:       2103          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-23
  80:        573          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0:lsc
  81:      80916          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-0
  82:       2266          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-1
  83:       2024          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-2
  84:       1956          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-3
  85:       2244          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-4
  86:       1796          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-5
  87:       2376          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-6
  88:       1853          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-7
  89:       1856          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-8
  90:       2422          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-9
  91:       2075          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-10
  92:       1949          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-11
  93:       2172          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-12
  94:       2068          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-13
  95:       2157          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-14
  96:       2157          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-15
  97:       1770          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-16
  98:       1770          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-17
  99:       1770          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-18
 100:       1770          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-19
 101:       1770          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-20
 102:       1770          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-21
 103:       1770          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-22
 104:       1770          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-23
 105:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1:lsc
 106:      13459          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-0
 107:      16190          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-1
 108:       5100          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-2
 109:       4804          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-3
 110:      53269          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-4
 111:     177743          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-5
 112:      65319          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-6
 113:       6534          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-7
 115:      12293          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-0
 116:         59          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-1
 117:         29          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-2
 118:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-3
 119:          2          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-4
 120:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-5
 121:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-6
 122:          5          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-7
 NMI:       1762         32         13          8          5          4         56         22         11          5          4          4         39         70         47         31         12         17         38         73         47         28         15         14   Non-maskable interrupts
 LOC:     999234     736787     285921     187931     116761     111476    1375217     504464     187395     136101      65584     107327    1064736     556335     569528     215451      75003      71634    1193783     605338     572661     213914      93099      91605   Local timer interrupts
 SPU:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Spurious interrupts
 PMI:       1762         32         13          8          5          4         56         22         11          5          4          4         39         70         47         31         12         17         38         73         47         28         15         14   Performance monitoring interrupts
 PND:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Performance pending work
 RES:      51846        407         70         11          9          3       2261        260        124          9          4          3       4398        408        423        539         37         14       4000        379        144        107         46         12   Rescheduling interrupts
 CAL:        593       2864       5686       6630       4572       6449        953       2306       3389       4619       8585       6050       4174       2375       1039       2233       1676       2554       2781       1626        413        956       1828       2763   Function call interrupts
 TLB:      19227       9332       6455       1319       1573       1403      17346       8396       3406       1400       2621       2520       7571      43937      19170       5839       3320       2402       8880      48275      18637       6503       3334       3600   TLB shootdowns
 TRM:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Machine check exceptions
 MCP:         13         13         13         13         13         13         13         13         13         13         13         13         13         13         13         13         13         13         13         13         13         13         13         13   Machine check polls
 ERR:         23
 MIS:          0
[root@s01b01 ~]#

Comment 27 Ashwani Wason 2011-04-18 23:16:20 UTC
Even though I have run it many times now, I am going to run the same test as the last one using the 3.2.10 driver and RHEL6 kernel (which per my claim works perfectly) and post the data here soon.

Comment 28 Andy Gospodarek 2011-04-19 02:07:11 UTC
First all of these tests were run with irqbalance enabled so each queue was tied to a different CPU.  It seemed that no matter what kernel I was using, if I bound all 4 queues to the same CPU I was link-flap pretty quickly.

I ran tests with RHEL6.1 and with RHEL6.0 + Intel's Sourceforge driver version 3.2.10 and saw link flap on both with around ~875Mbps of traffic (I'm just running netperf on 1Gbps links).

I had the capability to run an upstream kernel so I tried 2.6.39-rc2 and saw so link flap.  I compared the two bonding drivers and will experiment with at least one patch with an overnight run.  If it is successful on RHEL6.1, I will then try the same patch on RHEL6.0 and will post the patch to this bug.  I'm not sure if this patch will make a difference, so I refuse to get excited.

As for Ashwani's tests, it still seems like all of your PCI devices are only getting interrupts on CPU0 (though the table is pretty hard to read when pasted to the bug).  Any luck figuring out why that is happening?

Comment 29 Ashwani Wason 2011-04-19 02:09:25 UTC
This was a test with RHEL6.0 kernel 2.6.32-71.15.1.el6.x86_64 plus 3.2.10 ixgbe. No other modifications were done, such as increased ring buffers.

As you will see interrupt distribution is still unfair, tx_queue_restart still high, but no bonding errors.


[root@s01b01 ~]# uname -r
2.6.32-71.15.1.el6.x86_64

[root@s01b01 ~]# ethtool -i eth0
driver: ixgbe
version: 3.2.10-NAPI
firmware-version: 1.0-3
bus-info: 0000:15:00.0

BEFORE:

[root@s01b01 ~]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85
[root@s01b01 ~]# 
[root@s01b01 ~]# 
[root@s01b01 ~]# 
[root@s01b01 ~]# 
[root@s01b01 ~]# ethtool -S eth0 | grep -v :\ 0$
NIC statistics:
     rx_packets: 103893034
     tx_packets: 103624659
     rx_bytes: 84230698466
     tx_bytes: 84213318477
     multicast: 986
     rx_pkts_nic: 103893029
     tx_pkts_nic: 103624659
     rx_bytes_nic: 85061635810
     tx_bytes_nic: 85042464759
     broadcast: 226389
     tx_restart_queue: 1385
     fdir_match: 14584
     fdir_miss: 104319330
     fdir_overflow: 237
     tx_queue_0_packets: 103596679
     tx_queue_0_bytes: 84211401209
     tx_queue_1_packets: 1994
     tx_queue_1_bytes: 155314
     tx_queue_2_packets: 1396
     tx_queue_2_bytes: 108420
     tx_queue_3_packets: 424
     tx_queue_3_bytes: 32568
     tx_queue_4_packets: 566
     tx_queue_4_bytes: 43814
     tx_queue_5_packets: 1564
     tx_queue_5_bytes: 121644
     tx_queue_6_packets: 5204
     tx_queue_6_bytes: 285060
     tx_queue_7_packets: 1326
     tx_queue_7_bytes: 102960
     tx_queue_8_packets: 1691
     tx_queue_8_bytes: 94422
     tx_queue_9_packets: 346
     tx_queue_9_bytes: 26520
     tx_queue_10_packets: 2889
     tx_queue_10_bytes: 138474
     tx_queue_11_packets: 593
     tx_queue_11_bytes: 46002
     tx_queue_12_packets: 3671
     tx_queue_12_bytes: 272670
     tx_queue_13_packets: 2362
     tx_queue_13_bytes: 183412
     tx_queue_14_packets: 398
     tx_queue_14_bytes: 30996
     tx_queue_15_packets: 768
     tx_queue_15_bytes: 59904
     tx_queue_16_packets: 2
     tx_queue_16_bytes: 96
     tx_queue_17_packets: 6
     tx_queue_17_bytes: 468
     tx_queue_18_packets: 75
     tx_queue_18_bytes: 5814
     tx_queue_19_packets: 62
     tx_queue_19_bytes: 2772
     tx_queue_20_packets: 1731
     tx_queue_20_bytes: 134874
     tx_queue_21_packets: 911
     tx_queue_21_bytes: 71022
     tx_queue_23_packets: 1
     tx_queue_23_bytes: 42
     rx_queue_0_packets: 6648741
     rx_queue_0_bytes: 5267668514
     rx_queue_1_packets: 6632506
     rx_queue_1_bytes: 5424566021
     rx_queue_2_packets: 6496930
     rx_queue_2_bytes: 5228715732
     rx_queue_3_packets: 6384608
     rx_queue_3_bytes: 5224710914
     rx_queue_4_packets: 6674731
     rx_queue_4_bytes: 5455887167
     rx_queue_5_packets: 6440825
     rx_queue_5_bytes: 5179107948
     rx_queue_6_packets: 6376508
     rx_queue_6_bytes: 5172928131
     rx_queue_7_packets: 6501621
     rx_queue_7_bytes: 5288332325
     rx_queue_8_packets: 6407504
     rx_queue_8_bytes: 5141716457
     rx_queue_9_packets: 6477232
     rx_queue_9_bytes: 5258138855
     rx_queue_10_packets: 6377835
     rx_queue_10_bytes: 5159208932
     rx_queue_11_packets: 6243267
     rx_queue_11_bytes: 4968908031
     rx_queue_12_packets: 6520803
     rx_queue_12_bytes: 5307702481
     rx_queue_13_packets: 6507862
     rx_queue_13_bytes: 5344620501
     rx_queue_14_packets: 6687641
     rx_queue_14_bytes: 5516246199
     rx_queue_15_packets: 6514420
     rx_queue_15_bytes: 5292240258
[root@s01b01 ~]# 
[root@s01b01 ~]# 
[root@s01b01 ~]# 
[root@s01b01 ~]# netstat -stw
Transfering required files
Starting netstat
Ip:
    104243520 total packets received
    4 with invalid headers
    103637861 forwarded
    0 incoming packets discarded
    594339 incoming packets delivered
    104236519 requests sent out
    57 dropped because of missing route
Icmp:
    25 ICMP messages received
    16 input ICMP message failed.
    ICMP input histogram:
        echo replies: 9
    9 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        echo request: 9
IcmpMsg:
        InType0: 9
        InType9: 16
        OutType8: 9
Tcp:
    24787 active connections openings
    22411 passive connection openings
    136 failed connection attempts
    5819 connection resets received
    42 connections established
    587858 segments received
    593150 segments send out
    0 segments retransmited
    0 bad segments received.
    5174 resets sent
UdpLite:
TcpExt:
    3365 TCP sockets finished time wait in fast timer
    3 time wait sockets recycled by time stamp
    14918 TCP sockets finished time wait in slow timer
    13613 delayed acks sent
    3 delayed acks further delayed because of locked socket
    31827 packets directly queued to recvmsg prequeue.
    9229 packets directly received from backlog
    29778368 packets directly received from prequeue
    191492 packets header predicted
    3018 packets header predicted and directly queued to user
    127489 acknowledgments not containing data received
    171427 predicted acknowledgments
    0 TCP data loss events
    3 connections reset due to unexpected data
    4092 connections reset due to early user close
IpExt:
    InNoRoutes: 4
    InMcastPkts: 64
    InBcastPkts: 852
    InOctets: 83223541378
    OutOctets: 83205071153
    InMcastOctets: 1920
    InBcastOctets: 134937

AFTER:


[root@s01b01 ~]# netstat -tsw
Ip:
    389777232 total packets received
    4 with invalid headers
    388935902 forwarded
    0 incoming packets discarded
    827854 incoming packets delivered
    389770686 requests sent out
    57 dropped because of missing route
Icmp:
    30 ICMP messages received
    21 input ICMP message failed.
    ICMP input histogram:
        echo replies: 9
    9 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        echo request: 9
IcmpMsg:
        InType0: 9
        InType9: 21
        OutType8: 9
Tcp:
    34952 active connections openings
    31105 passive connection openings
    702 failed connection attempts
    8276 connection resets received
    40 connections established
    818816 segments received
    826987 segments send out
    0 segments retransmited
    0 bad segments received.
    7917 resets sent
UdpLite:
TcpExt:
    4773 TCP sockets finished time wait in fast timer
    4 time wait sockets recycled by time stamp
    20497 TCP sockets finished time wait in slow timer
    18533 delayed acks sent
    4 delayed acks further delayed because of locked socket
    44700 packets directly queued to recvmsg prequeue.
    14354 packets directly received from backlog
    42108826 packets directly received from prequeue
    264145 packets header predicted
    4254 packets header predicted and directly queued to user
    177870 acknowledgments not containing data received
    236881 predicted acknowledgments
    0 TCP data loss events
    119 connections reset due to unexpected data
    5799 connections reset due to early user close
IpExt:
    InNoRoutes: 4
    InMcastPkts: 92
    InBcastPkts: 1092
    InOctets: 311696291237
    OutOctets: 311670620218
    InMcastOctets: 2744
    InBcastOctets: 168691
[root@s01b01 ~]# 
[root@s01b01 ~]# 
[root@s01b01 ~]# 
[root@s01b01 ~]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85
[root@s01b01 ~]# ethtool -S eth0 | grep -v :\ 0$
NIC statistics:
     rx_packets: 389293555
     tx_packets: 388919606
     rx_bytes: 316515917607
     tx_bytes: 316491739357
     multicast: 1394
     rx_pkts_nic: 389293550
     tx_pkts_nic: 388919606
     rx_bytes_nic: 319629973151
     tx_bytes_nic: 319603309411
     broadcast: 318330
     tx_restart_queue: 2342
     fdir_match: 40188
     fdir_miss: 394023402
     fdir_overflow: 960
     tx_queue_0_packets: 388886330
     tx_queue_0_bytes: 316489528097
     tx_queue_1_packets: 2402
     tx_queue_1_bytes: 172450
     tx_queue_2_packets: 3855
     tx_queue_2_bytes: 211698
     tx_queue_3_packets: 435
     tx_queue_3_bytes: 33030
     tx_queue_4_packets: 576
     tx_queue_4_bytes: 44234
     tx_queue_5_packets: 1569
     tx_queue_5_bytes: 121854
     tx_queue_6_packets: 5229
     tx_queue_6_bytes: 286110
     tx_queue_7_packets: 1337
     tx_queue_7_bytes: 103422
     tx_queue_8_packets: 1695
     tx_queue_8_bytes: 94590
     tx_queue_9_packets: 350
     tx_queue_9_bytes: 26688
     tx_queue_10_packets: 2893
     tx_queue_10_bytes: 138642
     tx_queue_11_packets: 597
     tx_queue_11_bytes: 46182
     tx_queue_12_packets: 5982
     tx_queue_12_bytes: 441280
     tx_queue_13_packets: 2372
     tx_queue_13_bytes: 183832
     tx_queue_14_packets: 398
     tx_queue_14_bytes: 30996
     tx_queue_15_packets: 769
     tx_queue_15_bytes: 59946
     tx_queue_16_packets: 3
     tx_queue_16_bytes: 138
     tx_queue_17_packets: 6
     tx_queue_17_bytes: 468
     tx_queue_18_packets: 75
     tx_queue_18_bytes: 5814
     tx_queue_19_packets: 89
     tx_queue_19_bytes: 3906
     tx_queue_20_packets: 1732
     tx_queue_20_bytes: 134916
     tx_queue_21_packets: 911
     tx_queue_21_bytes: 71022
     tx_queue_23_packets: 1
     tx_queue_23_bytes: 42
     rx_queue_0_packets: 24359759
     rx_queue_0_bytes: 19532543443
     rx_queue_1_packets: 24442613
     rx_queue_1_bytes: 19853241784
     rx_queue_2_packets: 24279377
     rx_queue_2_bytes: 19684090398
     rx_queue_3_packets: 24006037
     rx_queue_3_bytes: 19467318579
     rx_queue_4_packets: 24660712
     rx_queue_4_bytes: 20276792056
     rx_queue_5_packets: 24368719
     rx_queue_5_bytes: 19722498372
     rx_queue_6_packets: 24200478
     rx_queue_6_bytes: 19750374451
     rx_queue_7_packets: 24240430
     rx_queue_7_bytes: 19664375181
     rx_queue_8_packets: 24334625
     rx_queue_8_bytes: 19688925671
     rx_queue_9_packets: 24262385
     rx_queue_9_bytes: 19668652634
     rx_queue_10_packets: 24254656
     rx_queue_10_bytes: 19804347023
     rx_queue_11_packets: 24129255
     rx_queue_11_bytes: 19598368096
     rx_queue_12_packets: 24127265
     rx_queue_12_bytes: 19547612595
     rx_queue_13_packets: 24571579
     rx_queue_13_bytes: 20105837415
     rx_queue_14_packets: 24506068
     rx_queue_14_bytes: 20023205900
     rx_queue_15_packets: 24549597
     rx_queue_15_bytes: 20127734009





[root@s01b01 ~]# cat /proc/interrupts 
            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14      CPU15      CPU16      CPU17      CPU18      CPU19      CPU20      CPU21      CPU22      CPU23      
   0:        231          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      timer
   1:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      i8042
   3:        315          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      serial
   8:          1          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      rtc0
   9:       1189          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
  16:         33          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
  20:         22          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb4, uhci_hcd:usb5
  21:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb6
  22:         50          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
  55:   30020984          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-0
  56:    9450976          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-1
  57:    9454807          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-2
  58:    9405209          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-3
  59:    9485093          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-4
  60:    9462193          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-5
  61:    9422017          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-6
  62:    9429916          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-7
  63:    9437250          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-8
  64:    9433228          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-9
  65:    9436007          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-10
  66:    9422641          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-11
  67:    9429053          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-12
  68:    9473697          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-13
  69:    9466163          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-14
  70:    9462627          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-15
  71:       5869          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-16
  72:       5872          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-17
  73:       5941          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-18
  74:       5955          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-19
  75:       7583          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-20
  76:       6762          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-21
  77:       5866          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-22
  78:       5867          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-TxRx-23
  79:        960          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0:lsc
  80:     242179          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-0
  81:       5132          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-1
  82:       5744          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-2
  83:       5541          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-3
  84:       5704          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-4
  85:       5015          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-5
  86:       5725          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-6
  87:       5098          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-7
  88:       5142          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-8
  89:       6427          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-9
  90:       5911          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-10
  91:       5322          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-11
  92:       5933          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-12
  93:       5015          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-13
  94:       5914          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-14
  95:       5867          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-15
  96:       4900          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-16
  97:       4900          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-17
  98:       4900          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-18
  99:       4900          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-19
 100:       4900          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-20
 101:       4900          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-21
 102:       4900          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-22
 103:       4900          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-23
 104:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1:lsc
 105:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
 106:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
 107:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
 108:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
 109:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
 110:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
 111:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
 112:          3          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ioat-msix
 113:      39854          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-0
 114:     122866          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-1
 115:      19610          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-2
 116:      12959          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-3
 117:      44179          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-4
 118:     255876          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-5
 119:      12055          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-6
 120:      19552          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-7
 122:      34203          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-0
 123:        163          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-1
 124:         82          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-2
 125:          2          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-3
 126:          1          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-4
 127:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-5
 128:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-6
 129:          6          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth3-7
 NMI:       5251        217        201        174        165        156        175        218        196        178        156        172        223        445        332        246        182        167        145        432        337        246        185        158   Non-maskable interrupts
 LOC:    2564746    2976878    2850883    1768993    1405429    1209384    1564087    2619751    2617615    1758193    1191458    1448045     431593    1052363    1384903     474503     177619     156992     141899    1262714    1515288     476665     189885     159746   Local timer interrupts
 SPU:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Performance monitoring interrupts
 PND:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Performance pending work
 RES:       3283       4412       1635       3288        807        492       2076       5859       1724       2770        978       2375       1279       7430       9230      19332        922        621        732       8061       8832      18343       1261        501   Rescheduling interrupts
 CAL:        164        202        203        203        183        203        203        166        203        191        203        156        203        192        194        200        202        203        203        181        203        203        203        202   Function call interrupts
 TLB:      33679      61918      66261      37821      41542      30941      74365      71102      65982      46545      50869      23449      14841     134455      53047      31254      13082       9127       8876     135278      50148      32826      14067       8121   TLB shootdowns
 TRM:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Machine check exceptions
 MCP:         34         34         34         34         34         34         34         34         34         34         34         34         34         34         34         34         34         34         34         34         34         34         34         34   Machine check polls
 ERR:         23
 MIS:          0
[root@s01b01 ~]#

Comment 30 Ashwani Wason 2011-04-19 02:11:16 UTC
(In reply to comment #28)
> As for Ashwani's tests, it still seems like all of your PCI devices are only
> getting interrupts on CPU0 (though the table is pretty hard to read when pasted
> to the bug).  Any luck figuring out why that is happening?

I checked the BIOS and did not see anything evident that would have such an effect. I have asked IBM. Not sure when I will hear back though.

Comment 31 Andy Gospodarek 2011-04-19 02:21:49 UTC
I booted the system that IBM sent us (an HS22V) to check if interrupts for PCI devices were being spread across all CPUs.  They are.  I can attach the file if needed.

I also checked the system that IBM sent us and see the following BIOS information in the dmidecode:

Handle 0x0009, DMI type 0, 24 bytes
BIOS Information
        Vendor: IBM
        Version: -[P9E151BUS-1.12]-
        Release Date: 02/07/2011
        ROM Size: 4096 kB
        Characteristics:

which appears to be exactly what you are running.

Interestingly some of the DMI information does not match what I see in your sosreport.  Is this expected?

Handle 0x000A, DMI type 1, 27 bytes
System Information
        Manufacturer: IBM
        Product Name: BladeCenter HS22V -[7871AC1]-
        Version: 03
[...]
Handle 0x000B, DMI type 2, 16 bytes
Base Board Information
        Manufacturer: IBM
        Product Name: 69Y3769

Comment 32 Andy Gospodarek 2011-04-19 02:25:35 UTC
(In reply to comment #30)
> (In reply to comment #28)
> > As for Ashwani's tests, it still seems like all of your PCI devices are only
> > getting interrupts on CPU0 (though the table is pretty hard to read when pasted
> > to the bug).  Any luck figuring out why that is happening?
> 
> I checked the BIOS and did not see anything evident that would have such an
> effect. I have asked IBM. Not sure when I will hear back though.

Steve, can you check on this for us and try to get an answer by early Tuesday morning?

See comment #25, comment #29, and if you like the attachment #492236 [details] that contains /proc/interrupts and /proc/<irq>/* from before and after a test run.  I am concerned that the smp_affinity doesn't match what we are seeing in /proc/interrupts.

Comment 33 Steve Best 2011-04-19 14:01:25 UTC
(In reply to comment #32)
> (In reply to comment #30)
> > (In reply to comment #28)
> > > As for Ashwani's tests, it still seems like all of your PCI devices are only
> > > getting interrupts on CPU0 (though the table is pretty hard to read when pasted
> > > to the bug).  Any luck figuring out why that is happening?
> > 
> > I checked the BIOS and did not see anything evident that would have such an
> > effect. I have asked IBM. Not sure when I will hear back though.
> 
> Steve, can you check on this for us and try to get an answer by early Tuesday
> morning?
> 
> See comment #25, comment #29, and if you like the attachment #492236 [details] that
> contains /proc/interrupts and /proc/<irq>/* from before and after a test run. 
> I am concerned that the smp_affinity doesn't match what we are seeing in
> /proc/interrupts.


Ashwani,

is irqbalance rpm installed on your system? not sure how you are installing RHEL 6.0, but I don't believe the package is installed if you do a text install. This might explain what we are seeing here.

-Steve

Comment 34 Andy Gospodarek 2011-04-19 15:30:41 UTC
(In reply to comment #33)
> 
> Ashwani,
> 
> is irqbalance rpm installed on your system? not sure how you are installing
> RHEL 6.0, but I don't believe the package is installed if you do a text
> install. This might explain what we are seeing here.
> 
> -Steve

Steve, please look at the attachment I mentioned in comment #32 and you will see that this is not an issue with whether irqbalance is being used or not.

Comment 35 Andy Gospodarek 2011-04-19 15:38:19 UTC
Created attachment 493218 [details]
bonding-fix-jiffie-issues.patch

It is jiffie or jiffy? :-)

Who knows, but this patch on RHEL6.0.z should resolve your issue.  I've run for close to 3 hours on 6.0.z with no link flap and ran 9 hours overnight on RHEL6.1 with no link-flap.

You still need to resolve your issues with interrupt allocation and scheduling, but this should resolve this link-flap issues.  I'm going to work to get this in in RHEL6.1 and RHEL6.0.z, but any feedback you can give me would be helpful.

On my system after enabling this patch and running ~850Mbps through it for ~3 hours on 2.6.32-71.15.1.el6.x86_64 + the attached patch.

# ethtool -S eth0 | grep -v :\ 0$ ; cat /proc/net/bonding/bond0 ; grep eth0 /proc/inter
rupts 
NIC statistics:
     rx_packets: 1306817160
     tx_packets: 1306819529
     rx_bytes: 1798872902422
     tx_bytes: 1798877030765
     rx_pkts_nic: 1306211317
     tx_pkts_nic: 1306208789
     rx_bytes_nic: 1808488252393
     tx_bytes_nic: 1808486101795
     lsc_int: 7
     multicast: 1176
     broadcast: 494
     fdir_miss: 1306846440
     tx_queue_0_packets: 1296880190
     tx_queue_0_bytes: 1785202385097
     tx_queue_1_packets: 55460
     tx_queue_1_bytes: 3661395
     tx_queue_2_packets: 8476047
     tx_queue_2_bytes: 12827270297
     tx_queue_3_packets: 1407821
     tx_queue_3_bytes: 843700218
     rx_queue_0_packets: 696060790
     rx_queue_0_bytes: 1053809735334
     rx_queue_1_packets: 559723805
     rx_queue_1_bytes: 741664949556
     rx_queue_2_packets: 63
     rx_queue_2_bytes: 7271
     rx_queue_3_packets: 51037559
     rx_queue_3_bytes: 3404980790
Ethernet Channel Bonding Driver: v3.5.0.1 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 10.0.2.200

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:37:b7:20
 36:  127727887         62     899889      54552   PCI-MSI-edge      eth0-TxRx-0
 37:  123174795     106184         45     880620   PCI-MSI-edge      eth0-TxRx-1
 38:       7267          0       2457     918179   PCI-MSI-edge      eth0-TxRx-2
 39:   50437997          0     949300       2850   PCI-MSI-edge      eth0-TxRx-3
 40:          7          0          0          0   PCI-MSI-edge      eth0:lsc

Comment 36 Ashwani Wason 2011-04-19 16:30:58 UTC
(In reply to comment #31)
> I booted the system that IBM sent us (an HS22V) to check if interrupts for PCI
> devices were being spread across all CPUs.  They are.

Thanks for this info. I will check. In our platform/chassis two blades are management blades, which have an on-disk installation of RHEL6 with irqbalance running. (These data blades run using a ramdisk.) The management blades only use t the bnx2 and their distribution indeed looks better. I am attaching a XLS document with that info.

Comment 37 Ashwani Wason 2011-04-19 16:31:53 UTC
Created attachment 493225 [details]
See MGMT tab for management blade info and DATA tab for data blade info

Comment 38 Ashwani Wason 2011-04-19 16:35:07 UTC
(In reply to comment #35)
> Created attachment 493218 [details]
> bonding-fix-jiffie-issues.patch
> 
> It is jiffie or jiffy? :-)
> 
> Who knows, but this patch on RHEL6.0.z should resolve your issue.  I've run for
> close to 3 hours on 6.0.z with no link flap and ran 9 hours overnight on
> RHEL6.1 with no link-flap.
> 

Thanks for the tremendous effort!

Of course, would love to test it out but is it possible for you to recompile the bonding driver only so that I can put it on the test system?

Also, for testing please confirm the following:

1. Should I revert back to RHEL 6 ixgbe driver?

build-x64-rh6-ga ~> modinfo ixgbe
filename:       /lib/modules/2.6.32-71.15.1.el6.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko
version:        2.0.62-k2

2. Any other special tuning that should be used, such as ring buffers.

3. ??

Comment 39 Andy Gospodarek 2011-04-19 21:06:24 UTC
(In reply to comment #38)
> (In reply to comment #35)
> > Created attachment 493218 [details]
> > bonding-fix-jiffie-issues.patch
> > 
> > It is jiffie or jiffy? :-)
> > 
> > Who knows, but this patch on RHEL6.0.z should resolve your issue.  I've run for
> > close to 3 hours on 6.0.z with no link flap and ran 9 hours overnight on
> > RHEL6.1 with no link-flap.
> > 
> 
> Thanks for the tremendous effort!
> 
> Of course, would love to test it out but is it possible for you to recompile
> the bonding driver only so that I can put it on the test system?
> 
> Also, for testing please confirm the following:
> 
> 1. Should I revert back to RHEL 6 ixgbe driver?
> 
> build-x64-rh6-ga ~> modinfo ixgbe
> filename:      
> /lib/modules/2.6.32-71.15.1.el6.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko
> version:        2.0.62-k2
> 
> 2. Any other special tuning that should be used, such as ring buffers.
> 
> 3. ??

Neal should have a kernel for you as well as a description of what is included sometime soon.

Comment 40 Ashwani Wason 2011-04-19 21:11:58 UTC
Ran another test with ~900mbps + irqbalance on the data blade. Looks much better, not only from the IRQ distribution perspective (see attachment) but also from the perspective that there were no tx_restart_queue errors.

[root@s01b01 ~]# ethtool -S eth0 | grep -v :\ 0$
NIC statistics:
     rx_packets: 289429728
     tx_packets: 289313805
     rx_bytes: 235234174022
     tx_bytes: 235226651854
     multicast: 621
     rx_pkts_nic: 289429721
     tx_pkts_nic: 289313805
     rx_bytes_nic: 237549480070
     tx_bytes_nic: 237541258238
     broadcast: 117485
     fdir_match: 18802
     fdir_miss: 293902663
     fdir_overflow: 784
     tx_queue_0_packets: 3298
     tx_queue_0_bytes: 659118
     tx_queue_1_packets: 28469
     tx_queue_1_bytes: 28764974
     tx_queue_2_packets: 26442
     tx_queue_2_bytes: 21369817
     tx_queue_3_packets: 33187
     tx_queue_3_bytes: 32444908
     tx_queue_4_packets: 29781
     tx_queue_4_bytes: 17393532
     tx_queue_5_packets: 24745
     tx_queue_5_bytes: 18744257
     tx_queue_6_packets: 36089023
     tx_queue_6_bytes: 29296372732
     tx_queue_7_packets: 36191836
     tx_queue_7_bytes: 29496251241
     tx_queue_8_packets: 36306794
     tx_queue_8_bytes: 29468301530
     tx_queue_9_packets: 36222324
     tx_queue_9_bytes: 29589167260
     tx_queue_10_packets: 17811256
     tx_queue_10_bytes: 14404856372
     tx_queue_11_packets: 18091865
     tx_queue_11_bytes: 14786698218
     tx_queue_12_packets: 5214
     tx_queue_12_bytes: 4124321
     tx_queue_13_packets: 37606
     tx_queue_13_bytes: 40563500
     tx_queue_14_packets: 18267
     tx_queue_14_bytes: 13175901
     tx_queue_15_packets: 43502
     tx_queue_15_bytes: 42107414
     tx_queue_16_packets: 19873
     tx_queue_16_bytes: 15974905
     tx_queue_17_packets: 36971
     tx_queue_17_bytes: 36681682
     tx_queue_18_packets: 17970236
     tx_queue_18_bytes: 14460737165
     tx_queue_19_packets: 18151420
     tx_queue_19_bytes: 14807266835
     tx_queue_20_packets: 18155526
     tx_queue_20_bytes: 14715999352
     tx_queue_21_packets: 17919530
     tx_queue_21_bytes: 14545479860
     tx_queue_22_packets: 18127165
     tx_queue_22_bytes: 14792866661
     tx_queue_23_packets: 17969475
     tx_queue_23_bytes: 14590650299
     rx_queue_0_packets: 18218140
     rx_queue_0_bytes: 14680801193
     rx_queue_1_packets: 18232063
     rx_queue_1_bytes: 14844571031
     rx_queue_2_packets: 18112940
     rx_queue_2_bytes: 14744153029
     rx_queue_3_packets: 18100393
     rx_queue_3_bytes: 14672690673
     rx_queue_4_packets: 17988865
     rx_queue_4_bytes: 14610347288
     rx_queue_5_packets: 18148819
     rx_queue_5_bytes: 14812180053
     rx_queue_6_packets: 17932941
     rx_queue_6_bytes: 14551633539
     rx_queue_7_packets: 18177333
     rx_queue_7_bytes: 14740695807
     rx_queue_8_packets: 18161804
     rx_queue_8_bytes: 14813482601
     rx_queue_9_packets: 17991199
     rx_queue_9_bytes: 14475881989
     rx_queue_10_packets: 18111649
     rx_queue_10_bytes: 14802783237
     rx_queue_11_packets: 17828470
     rx_queue_11_bytes: 14422820816
     rx_queue_12_packets: 18156880
     rx_queue_12_bytes: 14948334681
     rx_queue_13_packets: 18127271
     rx_queue_13_bytes: 14683189436
     rx_queue_14_packets: 18113169
     rx_queue_14_bytes: 14775185790
     rx_queue_15_packets: 18022795
     rx_queue_15_bytes: 14652415301
     rx_queue_16_packets: 164
     rx_queue_16_bytes: 10332
     rx_queue_17_packets: 218
     rx_queue_17_bytes: 13776
     rx_queue_18_packets: 653
     rx_queue_18_bytes: 291157
     rx_queue_19_packets: 653
     rx_queue_19_bytes: 328281
     rx_queue_20_packets: 1216
     rx_queue_20_bytes: 1050974
     rx_queue_21_packets: 700
     rx_queue_21_bytes: 384714
     rx_queue_22_packets: 484
     rx_queue_22_bytes: 227984
     rx_queue_23_packets: 909
     rx_queue_23_bytes: 700340

Comment 41 Ashwani Wason 2011-04-19 21:12:40 UTC
Created attachment 493292 [details]
IRQ data from a run with irqbalance running

Comment 43 Andy Gospodarek 2011-04-19 21:25:01 UTC
(In reply to comment #40)
> Ran another test with ~900mbps + irqbalance on the data blade. Looks much
> better, not only from the IRQ distribution perspective (see attachment) but
> also from the perspective that there were no tx_restart_queue errors.
> 
> [root@s01b01 ~]# ethtool -S eth0 | grep -v :\ 0$
> NIC statistics:
>      rx_packets: 289429728
>      tx_packets: 289313805
>      rx_bytes: 235234174022
>      tx_bytes: 235226651854
>      multicast: 621
>      rx_pkts_nic: 289429721
>      tx_pkts_nic: 289313805
>      rx_bytes_nic: 237549480070
>      tx_bytes_nic: 237541258238
>      broadcast: 117485
>      fdir_match: 18802
>      fdir_miss: 293902663
>      fdir_overflow: 784
>      tx_queue_0_packets: 3298
>      tx_queue_0_bytes: 659118
>      tx_queue_1_packets: 28469
>      tx_queue_1_bytes: 28764974
>      tx_queue_2_packets: 26442
>      tx_queue_2_bytes: 21369817
>      tx_queue_3_packets: 33187
>      tx_queue_3_bytes: 32444908
>      tx_queue_4_packets: 29781
>      tx_queue_4_bytes: 17393532
>      tx_queue_5_packets: 24745
>      tx_queue_5_bytes: 18744257
>      tx_queue_6_packets: 36089023
>      tx_queue_6_bytes: 29296372732
>      tx_queue_7_packets: 36191836
>      tx_queue_7_bytes: 29496251241
>      tx_queue_8_packets: 36306794
>      tx_queue_8_bytes: 29468301530
>      tx_queue_9_packets: 36222324
>      tx_queue_9_bytes: 29589167260
>      tx_queue_10_packets: 17811256
>      tx_queue_10_bytes: 14404856372
>      tx_queue_11_packets: 18091865
>      tx_queue_11_bytes: 14786698218
>      tx_queue_12_packets: 5214
>      tx_queue_12_bytes: 4124321
>      tx_queue_13_packets: 37606
>      tx_queue_13_bytes: 40563500
>      tx_queue_14_packets: 18267
>      tx_queue_14_bytes: 13175901
>      tx_queue_15_packets: 43502
>      tx_queue_15_bytes: 42107414
>      tx_queue_16_packets: 19873
>      tx_queue_16_bytes: 15974905
>      tx_queue_17_packets: 36971
>      tx_queue_17_bytes: 36681682
>      tx_queue_18_packets: 17970236
>      tx_queue_18_bytes: 14460737165
>      tx_queue_19_packets: 18151420
>      tx_queue_19_bytes: 14807266835
>      tx_queue_20_packets: 18155526
>      tx_queue_20_bytes: 14715999352
>      tx_queue_21_packets: 17919530
>      tx_queue_21_bytes: 14545479860
>      tx_queue_22_packets: 18127165
>      tx_queue_22_bytes: 14792866661
>      tx_queue_23_packets: 17969475
>      tx_queue_23_bytes: 14590650299
>      rx_queue_0_packets: 18218140
>      rx_queue_0_bytes: 14680801193
>      rx_queue_1_packets: 18232063
>      rx_queue_1_bytes: 14844571031
>      rx_queue_2_packets: 18112940
>      rx_queue_2_bytes: 14744153029
>      rx_queue_3_packets: 18100393
>      rx_queue_3_bytes: 14672690673
>      rx_queue_4_packets: 17988865
>      rx_queue_4_bytes: 14610347288
>      rx_queue_5_packets: 18148819
>      rx_queue_5_bytes: 14812180053
>      rx_queue_6_packets: 17932941
>      rx_queue_6_bytes: 14551633539
>      rx_queue_7_packets: 18177333
>      rx_queue_7_bytes: 14740695807
>      rx_queue_8_packets: 18161804
>      rx_queue_8_bytes: 14813482601
>      rx_queue_9_packets: 17991199
>      rx_queue_9_bytes: 14475881989
>      rx_queue_10_packets: 18111649
>      rx_queue_10_bytes: 14802783237
>      rx_queue_11_packets: 17828470
>      rx_queue_11_bytes: 14422820816
>      rx_queue_12_packets: 18156880
>      rx_queue_12_bytes: 14948334681
>      rx_queue_13_packets: 18127271
>      rx_queue_13_bytes: 14683189436
>      rx_queue_14_packets: 18113169
>      rx_queue_14_bytes: 14775185790
>      rx_queue_15_packets: 18022795
>      rx_queue_15_bytes: 14652415301
>      rx_queue_16_packets: 164
>      rx_queue_16_bytes: 10332
>      rx_queue_17_packets: 218
>      rx_queue_17_bytes: 13776
>      rx_queue_18_packets: 653
>      rx_queue_18_bytes: 291157
>      rx_queue_19_packets: 653
>      rx_queue_19_bytes: 328281
>      rx_queue_20_packets: 1216
>      rx_queue_20_bytes: 1050974
>      rx_queue_21_packets: 700
>      rx_queue_21_bytes: 384714
>      rx_queue_22_packets: 484
>      rx_queue_22_bytes: 227984
>      rx_queue_23_packets: 909
>      rx_queue_23_bytes: 700340

Nice!  I'm a bit surprised, but I'll take this kind of surprise.

Comment 45 Neal Kim 2011-04-19 21:30:13 UTC
Very encouraging results!

I am working with Flavio (the engineer who provided us with the original kernel build 2.6.32-71.25.1.el6.bytemobile.1.x86_64) on building the same kernel, but with the added patch that Andy worked on.

We should have the build started today, but will more than likely have the kernel ready for you tomorrow.

Comment 46 Andy Gospodarek 2011-04-19 21:34:38 UTC
(In reply to comment #43)
> (In reply to comment #40)
> > Ran another test with ~900mbps + irqbalance on the data blade. Looks much
> > better, not only from the IRQ distribution perspective (see attachment) but
> > also from the perspective that there were no tx_restart_queue errors.
> > 

Ashwani, were you seeing link-flap in any measurable quantity?

Comment 47 RHEL Program Management 2011-04-19 21:39:44 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 48 Ashwani Wason 2011-04-19 22:02:08 UTC
(In reply to comment #46)
> (In reply to comment #43)
> > (In reply to comment #40)
> > > Ran another test with ~900mbps + irqbalance on the data blade. Looks much
> > > better, not only from the IRQ distribution perspective (see attachment) but
> > > also from the perspective that there were no tx_restart_queue errors.
> > > 
> 
> Ashwani, were you seeing link-flap in any measurable quantity?

No link flaps - this was with 2.6.32-71.15.1.el6.x86_64 + ixgbe 3.2.10. It would be interesting to test it in other setups as well where I see flaps to identify if it somehow masks the bonding issue. Will see what I can do.

Ultimately I am aiming towards all these pieces coming together - irqbalance + your patches into a single image that I will test with. Looks like all pieces are lining up:

1. rx_missed_errors handled by Intel errata patch to ixgbe.
2. bonding flaps handled by your bonding driver patch.
3. tx_restart_queue and interrupt distribution handled by irqbalance.

Comment 49 Roger Mach 2011-04-19 23:22:14 UTC
I have reproduced the interrupt distribution issue on an HS22v blade running RHEL6.1 snap3.  Enabling the irqbalance daemon did cause the interrupts to be distributed.  From reading the latest comments it appears that a solution is coming together but I thought I'd report my findings and add myself to the CC list so I can monitor the progress of this issue.

Comment 50 Andy Gospodarek 2011-04-20 00:32:55 UTC
(In reply to comment #48)
> 
> No link flaps - this was with 2.6.32-71.15.1.el6.x86_64 + ixgbe 3.2.10. It
> would be interesting to test it in other setups as well where I see flaps to
> identify if it somehow masks the bonding issue. Will see what I can do.
> 
> Ultimately I am aiming towards all these pieces coming together - irqbalance +
> your patches into a single image that I will test with. Looks like all pieces
> are lining up:
> 
> 1. rx_missed_errors handled by Intel errata patch to ixgbe.
> 2. bonding flaps handled by your bonding driver patch.
> 3. tx_restart_queue and interrupt distribution handled by irqbalance.

I agree that it seems things are aligning well.  With all 3 included, I think you will be in good shape for your product.

Comment 54 Neal Kim 2011-04-20 22:02:46 UTC
I have some test results from Ashwani:

Looks like they are still seeing issues with the rx_missed_errors and the bond flapping...

I am running without irqbalance.

1. rx_missed_errors count is up, which did not happen with the previous kernel with errata fix.
2. I still see the bond flapping.
3. tx_restart_queue is also up, although it is expected to be so because of lack of irqbalance.

How can we verify that I am indeed running the right bits? I have put the MD5 sums below. Can you please verify that these are expected?

[root@s01b01 ~]# uname -r
2.6.32-71.25.1.el6.bytemobile.2.x86_64

[root@s01b01 ~]# modinfo ixgbe
filename:       /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko
version:        2.0.62-k2
license:        GPL
description:    Intel(R) 10 Gigabit PCI Express Network Driver
author:         Intel Corporation, <linux.nics>
srcversion:     5231125124F0BB51E13BCD2
alias:          pci:v00008086d000010F8sv*sd*bc*sc*i*
alias:          pci:v00008086d000010F9sv*sd*bc*sc*i*
alias:          pci:v00008086d00001514sv*sd*bc*sc*i*
alias:          pci:v00008086d00001507sv*sd*bc*sc*i*
alias:          pci:v00008086d000010FBsv*sd*bc*sc*i*
alias:          pci:v00008086d00001517sv*sd*bc*sc*i*
alias:          pci:v00008086d000010FCsv*sd*bc*sc*i*
alias:          pci:v00008086d000010F7sv*sd*bc*sc*i*
alias:          pci:v00008086d00001508sv*sd*bc*sc*i*
alias:          pci:v00008086d000010DBsv*sd*bc*sc*i*
alias:          pci:v00008086d000010F4sv*sd*bc*sc*i*
alias:          pci:v00008086d000010E1sv*sd*bc*sc*i*
alias:          pci:v00008086d000010F1sv*sd*bc*sc*i*
alias:          pci:v00008086d000010ECsv*sd*bc*sc*i*
alias:          pci:v00008086d000010DDsv*sd*bc*sc*i*
alias:          pci:v00008086d0000150Bsv*sd*bc*sc*i*
alias:          pci:v00008086d000010C8sv*sd*bc*sc*i*
alias:          pci:v00008086d000010C7sv*sd*bc*sc*i*
alias:          pci:v00008086d000010C6sv*sd*bc*sc*i*
alias:          pci:v00008086d000010B6sv*sd*bc*sc*i*
depends:        mdio,dca
vermagic:       2.6.32-71.25.1.el6.bytemobile.2.x86_64 SMP mod_unload modversions 
parm:           max_vfs:Maximum number of virtual functions to allocate per physical function (uint)

[root@s01b01 ~]# modinfo bonding
filename:       /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/bonding/bonding.ko
author:         Thomas Davis, tadavis and many others
description:    Ethernet Channel Bonding Driver, v3.5.0.bytemobile.2
version:        3.5.0.bytemobile.2
license:        GPL
srcversion:     A74C26A50B47A78367E0CC2
depends:        ipv6
vermagic:       2.6.32-71.25.1.el6.bytemobile.2.x86_64 SMP mod_unload modversions 
parm:           max_bonds:Max number of bonded devices (int)
parm:           num_grat_arp:Number of gratuitous ARP packets to send on failover event (int)
parm:           num_unsol_na:Number of unsolicited IPv6 Neighbor Advertisements packets to send on failover event (int)
parm:           miimon:Link check interval in milliseconds (int)
parm:           updelay:Delay before considering link up, in milliseconds (int)
parm:           downdelay:Delay before considering link down, in milliseconds (int)
parm:           use_carrier:Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default) (int)
parm:           mode:Mode of operation : 0 for balance-rr, 1 for active-backup, 2 for balance-xor, 3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, 6 for balance-alb (charp)
parm:           primary:Primary network device to use (charp)
parm:           lacp_rate:LACPDU tx rate to request from 802.3ad partner (slow/fast) (charp)
parm:           ad_select:803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2) (charp)
parm:           xmit_hash_policy:XOR hashing method: 0 for layer 2 (default), 1 for layer 3+4 (charp)
parm:           arp_interval:arp interval in milliseconds (int)
parm:           arp_ip_target:arp targets in n.n.n.n form (array of charp)
parm:           arp_validate:validate src/dst of ARP probes: none (default), active, backup or all (charp)
parm:           fail_over_mac:For active-backup, do not set all slaves to the same MAC.  none (default), active or follow (charp)


[root@s01b01 ~]# md5sum /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko 
d423e66d274294f23f6bca1c0b34de8f  /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko

[root@s01b01 ~]# md5sum /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/bonding/bonding.ko
fa71b105a92e583ba408ec75a5294e21  /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/bonding/bonding.ko




Here are the test results from ~300 mbps load:


BEFORE:

[root@s01b01 ~]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.5.0.bytemobile.2 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85




[root@s01b01 ~]# ethtool -S eth0 | grep -v :\ 0$
NIC statistics:
     rx_packets: 1921
     tx_packets: 946
     rx_bytes: 116492
     tx_bytes: 53150
     rx_pkts_nic: 1913
     tx_pkts_nic: 946
     rx_bytes_nic: 123768
     tx_bytes_nic: 65690
     lsc_int: 1
     broadcast: 1120
     fdir_match: 66
     fdir_miss: 489
     tx_queue_0_packets: 670
     tx_queue_0_bytes: 39282
     tx_queue_1_packets: 240
     tx_queue_1_bytes: 10980
     tx_queue_7_packets: 6
     tx_queue_7_bytes: 468
     tx_queue_13_packets: 2
     tx_queue_13_bytes: 132
     tx_queue_14_packets: 5
     tx_queue_14_bytes: 378
     tx_queue_15_packets: 3
     tx_queue_15_bytes: 298
     tx_queue_19_packets: 3
     tx_queue_19_bytes: 298
     tx_queue_20_packets: 6
     tx_queue_20_bytes: 468
     tx_queue_21_packets: 11
     tx_queue_21_bytes: 846
     rx_queue_0_packets: 1469
     rx_queue_0_bytes: 88172
     rx_queue_1_packets: 25
     rx_queue_1_bytes: 1572
     rx_queue_2_packets: 32
     rx_queue_2_bytes: 2004
     rx_queue_3_packets: 33
     rx_queue_3_bytes: 2064
     rx_queue_4_packets: 28
     rx_queue_4_bytes: 1752
     rx_queue_5_packets: 34
     rx_queue_5_bytes: 2136
     rx_queue_6_packets: 35
     rx_queue_6_bytes: 2172
     rx_queue_7_packets: 26
     rx_queue_7_bytes: 1644
     rx_queue_8_packets: 32
     rx_queue_8_bytes: 2004
     rx_queue_9_packets: 29
     rx_queue_9_bytes: 1812
     rx_queue_10_packets: 31
     rx_queue_10_bytes: 1956
     rx_queue_11_packets: 27
     rx_queue_11_bytes: 1692
     rx_queue_12_packets: 29
     rx_queue_12_bytes: 1812
     rx_queue_13_packets: 21
     rx_queue_13_bytes: 1332
     rx_queue_14_packets: 30
     rx_queue_14_bytes: 1872
     rx_queue_15_packets: 40
     rx_queue_15_bytes: 2496

[root@s01b01 ~]# netstat -tsw
Transfering required files
Starting netstat
Ip:
    30756 total packets received
    6 forwarded
    0 incoming packets discarded
    30738 incoming packets delivered
    28989 requests sent out
    57 dropped because of missing route
Icmp:
    9 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        echo replies: 9
    9 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        echo request: 9
IcmpMsg:
        InType0: 9
        OutType8: 9
Tcp:
    1260 active connections openings
    1060 passive connection openings
    33 failed connection attempts
    226 connection resets received
    40 connections established
    30473 segments received
    28777 segments send out
    0 segments retransmited
    0 bad segments received.
    274 resets sent
UdpLite:
TcpExt:
    135 TCP sockets finished time wait in fast timer
    1 time wait sockets recycled by time stamp
    727 TCP sockets finished time wait in slow timer
    724 delayed acks sent
    1637 packets directly queued to recvmsg prequeue.
    71 packets directly received from backlog
    1261082 packets directly received from prequeue
    11402 packets header predicted
    152 packets header predicted and directly queued to user
    5804 acknowledgments not containing data received
    8375 predicted acknowledgments
    0 TCP data loss events
    15 connections reset due to unexpected data
    160 connections reset due to early user close
IpExt:
    InNoRoutes: 4
    InBcastPkts: 2
    InOctets: 22075180
    OutOctets: 17459385
    InBcastOctets: 656


AFTER:


bonding: bond1: link status definitely down for interface eth0, disabling it
bonding: bond1: now running without any active interface !
bonding: bond1: link status definitely up for interface eth0.
bonding: bond1: making interface eth0 the new active one.
bonding: bond1: first active interface up!

[root@s01b01 ~]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.5.0.bytemobile.2 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85



[root@s01b01 ~]# netstat -stw
Ip:
    57007211 total packets received
    56849808 forwarded
    0 incoming packets discarded
    157391 incoming packets delivered
    57007267 requests sent out
    57 dropped because of missing route
Icmp:
    9 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        echo replies: 9
    9 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        echo request: 9
IcmpMsg:
        InType0: 9
        OutType8: 9
Tcp:
    6678 active connections openings
    5966 passive connection openings
    62 failed connection attempts
    1504 connection resets received
    42 connections established
    155894 segments received
    156021 segments send out
    0 segments retransmited
    0 bad segments received.
    1578 resets sent
UdpLite:
TcpExt:
    12 resets received for embryonic SYN_RECV sockets
    870 TCP sockets finished time wait in fast timer
    1 time wait sockets recycled by time stamp
    3988 TCP sockets finished time wait in slow timer
    3577 delayed acks sent
    2 delayed acks further delayed because of locked socket
    8513 packets directly queued to recvmsg prequeue.
    1101 packets directly received from backlog
    7797022 packets directly received from prequeue
    50693 packets header predicted
    806 packets header predicted and directly queued to user
    33454 acknowledgments not containing data received
    43559 predicted acknowledgments
    0 TCP data loss events
    189 connections reset due to unexpected data
    1067 connections reset due to early user close
IpExt:
    InNoRoutes: 4
    InBcastPkts: 2
    InOctets: 45647988102
    OutOctets: 45639761134
    InBcastOctets: 656



[root@s01b01 ~]# ethtool -S eth0 | grep -v :\ 0$
NIC statistics:
     rx_packets: 56864409
     tx_packets: 56805584
     rx_bytes: 46322806771
     tx_bytes: 46315693894
     rx_pkts_nic: 56864401
     tx_pkts_nic: 56805584
     rx_bytes_nic: 46779122118
     tx_bytes_nic: 46770181818
     lsc_int: 1
     broadcast: 7857
     fdir_match: 770
     fdir_miss: 52008451
     rx_missed_errors: 18969
     tx_restart_queue: 1229
     tx_queue_0_packets: 56802427
     tx_queue_0_bytes: 46315522612
     tx_queue_1_packets: 255
     tx_queue_1_bytes: 11610
     tx_queue_2_packets: 1643
     tx_queue_2_bytes: 69100
     tx_queue_3_packets: 5
     tx_queue_3_bytes: 210
     tx_queue_4_packets: 8
     tx_queue_4_bytes: 522
     tx_queue_5_packets: 1
     tx_queue_5_bytes: 58
     tx_queue_6_packets: 2
     tx_queue_6_bytes: 84
     tx_queue_7_packets: 12
     tx_queue_7_bytes: 736
     tx_queue_8_packets: 7
     tx_queue_8_bytes: 294
     tx_queue_9_packets: 3
     tx_queue_9_bytes: 126
     tx_queue_10_packets: 6
     tx_queue_10_bytes: 252
     tx_queue_11_packets: 3
     tx_queue_11_bytes: 126
     tx_queue_12_packets: 1164
     tx_queue_12_bytes: 84972
     tx_queue_13_packets: 15
     tx_queue_13_bytes: 678
     tx_queue_14_packets: 5
     tx_queue_14_bytes: 378
     tx_queue_15_packets: 3
     tx_queue_15_bytes: 298
     tx_queue_16_packets: 1
     tx_queue_16_bytes: 42
     tx_queue_19_packets: 6
     tx_queue_19_bytes: 440
     tx_queue_20_packets: 7
     tx_queue_20_bytes: 510
     tx_queue_21_packets: 11
     tx_queue_21_bytes: 846
     rx_queue_0_packets: 3518502
     rx_queue_0_bytes: 2833197537
     rx_queue_1_packets: 3578689
     rx_queue_1_bytes: 2940638996
     rx_queue_2_packets: 3557084
     rx_queue_2_bytes: 2922821571
     rx_queue_3_packets: 3520230
     rx_queue_3_bytes: 2865288507
     rx_queue_4_packets: 3594444
     rx_queue_4_bytes: 2939537504
     rx_queue_5_packets: 3507775
     rx_queue_5_bytes: 2824993141
     rx_queue_6_packets: 3543211
     rx_queue_6_bytes: 2866590601
     rx_queue_7_packets: 3482985
     rx_queue_7_bytes: 2802755870
     rx_queue_8_packets: 3533260
     rx_queue_8_bytes: 2893816323
     rx_queue_9_packets: 3632360
     rx_queue_9_bytes: 2990479384
     rx_queue_10_packets: 3562045
     rx_queue_10_bytes: 2928397502
     rx_queue_11_packets: 3464077
     rx_queue_11_bytes: 2775250226
     rx_queue_12_packets: 3635877
     rx_queue_12_bytes: 2950594538
     rx_queue_13_packets: 3603826
     rx_queue_13_bytes: 2912348131
     rx_queue_14_packets: 3541037
     rx_queue_14_bytes: 2923877489
     rx_queue_15_packets: 3589007
     rx_queue_15_bytes: 2952219451"

Comment 56 Neal Kim 2011-04-20 23:33:57 UTC
It appears there is something amiss with the new build...

The test with 2.6.32-71.25.1.el6.bytemobile.1.x86_64 just finished showing results identical to before. So indeed it looks like something is amiss in this new build.

1. Flapping.
2. No rx_missed_errors
3. tx_queue_restart errors still there


[root@s01b01 ~]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 21
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85

[root@s01b01 ~]# ethtool -S eth0 | egrep 'missed_err|restart_q'
     rx_missed_errors: 0
     tx_restart_queue: 831

[root@s01b01 ~]# netstat -stw
Ip:
    58079358 total packets received
    12 with invalid headers
    57843214 forwarded
    0 incoming packets discarded
    236101 incoming packets delivered
    58080170 requests sent out
    57 dropped because of missing route
Icmp:
    259 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 250
        echo replies: 9
    259 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 250
        echo request: 9
IcmpMsg:
        InType0: 9
        InType3: 250
        OutType3: 250
        OutType8: 9
Tcp:
    10022 active connections openings
    8990 passive connection openings
    73 failed connection attempts
    2457 connection resets received
    44 connections established
    233584 segments received
    234233 segments send out
    266 segments retransmited
    0 bad segments received.
    2016 resets sent
UdpLite:
TcpExt:
    23 resets received for embryonic SYN_RECV sockets
    1315 TCP sockets finished time wait in fast timer
    1 time wait sockets recycled by time stamp
    6003 TCP sockets finished time wait in slow timer
    5301 delayed acks sent
    12851 packets directly queued to recvmsg prequeue.
    5125 packets directly received from backlog
    11909125 packets directly received from prequeue
    75888 packets header predicted
    1244 packets header predicted and directly queued to user
    50340 acknowledgments not containing data received
    66058 predicted acknowledgments
    1 congestion windows recovered after partial ack
    0 TCP data loss events
    4 other TCP timeouts
    95 connections reset due to unexpected data
    1632 connections reset due to early user close
    2 connections aborted due to timeout
IpExt:
    InNoRoutes: 4
    InBcastPkts: 2
    InOctets: 46700481289
    OutOctets: 46689751360
    InBcastOctets: 656"

Comment 57 Neil Horman 2011-04-20 23:43:14 UTC
Neal, why are you running without irqbalance?  COmment 40 indicated the problem was resolved after the application of the patch and the enabling of irqbalance.

Comment 58 Neal Kim 2011-04-20 23:47:44 UTC
Hi Neil,

Having irqbalance running fixes the tx_queue_restart issue.

Running the same test without irqbalance enabled with the previous kernel results in no rx_missed_errors.

Comment 59 Andy Gospodarek 2011-04-21 10:55:50 UTC
I saw zero rx_missed_errors or tx_restars when running irqbalance on my 4-core/4-queue system.

Testing needs to be done with irqbalance enabled as it would be unwise to go into production without irqs spread across multiple cores.

Comment 60 Steve Best 2011-04-21 12:41:38 UTC
(In reply to comment #59)
> I saw zero rx_missed_errors or tx_restars when running irqbalance on my
> 4-core/4-queue system.
> 
> Testing needs to be done with irqbalance enabled as it would be unwise to go
> into production without irqs spread across multiple cores.

I agree with Andy on this. it is very unwise to run without irqbalance on these systems.

-Steve

Comment 61 Ashwani Wason 2011-04-21 16:23:02 UTC
Guys, I understand and in production we will indeed use irqbalance, however I want to make sure that we are not masking other issues just by running irqbalance.

For instance, rx_missed_error count is 0 despite irqbalance not running on the previous kernel with errata patch. 

Andy, can you please give me your bonding module binary that has your patch?

Comment 62 Neal Kim 2011-04-21 21:02:48 UTC
Bytemobile came back with some test results with the new kernel, and irqbalance enabled. Results are not looking to good...

Results from testing with irqbalance.

Summary:

1. Test with ~300 mbps showed both rx_missed_errors and tx_restart_queue errors, but did not show any bonding flaps.

     rx_missed_errors: 1566
     tx_restart_queue: 328

2. Test with ~900 mbps showed all problems - huge jump in error counts plus more than 100 bonding flaps plus error reports (timeouts) from clients/servers.

     rx_missed_errors: 2780843
     tx_restart_queue: 3819


Details:

[root@s01b01 ~]# uname -r
2.6.32-71.25.1.el6.bytemobile.2.x86_64

INITIAL:

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85

NIC statistics:
     rx_packets: 8908
     tx_packets: 4946
     rx_bytes: 553348
     tx_bytes: 283370
     rx_pkts_nic: 8899
     tx_pkts_nic: 4946
     rx_bytes_nic: 591988
     tx_bytes_nic: 345004
     lsc_int: 1
     broadcast: 4470
     fdir_match: 473
     fdir_miss: 2909
     tx_queue_0_packets: 3525
     tx_queue_0_bytes: 211541
     tx_queue_1_packets: 5
     tx_queue_1_bytes: 875
     tx_queue_6_packets: 2
     tx_queue_6_bytes: 84
     tx_queue_7_packets: 1051
     tx_queue_7_bytes: 44358
     tx_queue_9_packets: 1
     tx_queue_9_bytes: 42
     tx_queue_12_packets: 326
     tx_queue_12_bytes: 23798
     tx_queue_13_packets: 10
     tx_queue_13_bytes: 640
     tx_queue_17_packets: 5
     tx_queue_17_bytes: 378
     tx_queue_19_packets: 1
     tx_queue_19_bytes: 42
     tx_queue_20_packets: 6
     tx_queue_20_bytes: 468
     tx_queue_21_packets: 6
     tx_queue_21_bytes: 468
     tx_queue_22_packets: 3
     tx_queue_22_bytes: 298
     tx_queue_23_packets: 5
     tx_queue_23_bytes: 378
     rx_queue_0_packets: 6072
     rx_queue_0_bytes: 364988
     rx_queue_1_packets: 128
     rx_queue_1_bytes: 8028
     rx_queue_2_packets: 124
     rx_queue_2_bytes: 7752
     rx_queue_3_packets: 130
     rx_queue_3_bytes: 8148
     rx_queue_4_packets: 144
     rx_queue_4_bytes: 14276
     rx_queue_5_packets: 125
     rx_queue_5_bytes: 7836
     rx_queue_6_packets: 162
     rx_queue_6_bytes: 10092
     rx_queue_7_packets: 125
     rx_queue_7_bytes: 7836
     rx_queue_8_packets: 120
     rx_queue_8_bytes: 7500
     rx_queue_9_packets: 159
     rx_queue_9_bytes: 9924
     rx_queue_10_packets: 908
     rx_queue_10_bytes: 62520
     rx_queue_11_packets: 130
     rx_queue_11_bytes: 8148
     rx_queue_12_packets: 190
     rx_queue_12_bytes: 11808
     rx_queue_13_packets: 127
     rx_queue_13_bytes: 7956
     rx_queue_14_packets: 140
     rx_queue_14_bytes: 8736
     rx_queue_15_packets: 124
     rx_queue_15_bytes: 7800

root     28626     1  0 11:16 ?        00:00:00 irqbalance


AFTER 300 MBPS TEST (ALSO SEE ATTACHMENT AFTER10K.LOG):

Ethernet Channel Bonding Driver: v3.5.0.bytemobile.2 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85



NIC statistics:
     rx_packets: 62324319
     tx_packets: 62712480
     rx_bytes: 51055046132
     tx_bytes: 51399831234
     rx_pkts_nic: 62726594
     tx_pkts_nic: 62712480
     rx_bytes_nic: 51903097995
     tx_bytes_nic: 51901606570
     lsc_int: 1
     broadcast: 13565
     fdir_match: 1490
     fdir_miss: 57822331
     rx_missed_errors: 1566
     tx_restart_queue: 328
     tx_queue_0_packets: 6575
     tx_queue_0_bytes: 2641391
     tx_queue_1_packets: 63628
     tx_queue_1_bytes: 61771013
     tx_queue_2_packets: 79771
     tx_queue_2_bytes: 78388746
     tx_queue_3_packets: 54688
     tx_queue_3_bytes: 45788135
     tx_queue_4_packets: 46114
     tx_queue_4_bytes: 34135640
     tx_queue_5_packets: 38103
     tx_queue_5_bytes: 27086875
     tx_queue_6_packets: 7776835
     tx_queue_6_bytes: 6334685408
     tx_queue_7_packets: 7736717
     tx_queue_7_bytes: 6340339291
     tx_queue_8_packets: 7736082
     tx_queue_8_bytes: 6400198315
     tx_queue_9_packets: 7763552
     tx_queue_9_bytes: 6281147229
     tx_queue_10_packets: 3929379
     tx_queue_10_bytes: 3255370817
     tx_queue_11_packets: 3903852
     tx_queue_11_bytes: 3230642314
     tx_queue_12_packets: 7395
     tx_queue_12_bytes: 3447302
     tx_queue_13_packets: 60124
     tx_queue_13_bytes: 66017736
     tx_queue_14_packets: 69340
     tx_queue_14_bytes: 63268384
     tx_queue_15_packets: 53605
     tx_queue_15_bytes: 55030184
     tx_queue_16_packets: 79399
     tx_queue_16_bytes: 74684849
     tx_queue_17_packets: 78207
     tx_queue_17_bytes: 71520028
     tx_queue_18_packets: 3848639
     tx_queue_18_bytes: 3115954944
     tx_queue_19_packets: 3774229
     tx_queue_19_bytes: 3077246786
     tx_queue_20_packets: 3805603
     tx_queue_20_bytes: 3021839218
     tx_queue_21_packets: 3953322
     tx_queue_21_bytes: 3283866444
     tx_queue_22_packets: 3965541
     tx_queue_22_bytes: 3297759626
     tx_queue_23_packets: 3881780
     tx_queue_23_bytes: 3177000559
     rx_queue_0_packets: 3917610
     rx_queue_0_bytes: 3144194796
     rx_queue_1_packets: 3869442
     rx_queue_1_bytes: 3199274917
     rx_queue_2_packets: 3889262
     rx_queue_2_bytes: 3175886218
     rx_queue_3_packets: 3934441
     rx_queue_3_bytes: 3237799626
     rx_queue_4_packets: 3914908
     rx_queue_4_bytes: 3205887477
     rx_queue_5_packets: 3999706
     rx_queue_5_bytes: 3329196466
     rx_queue_6_packets: 3985070
     rx_queue_6_bytes: 3305566651
     rx_queue_7_packets: 3837681
     rx_queue_7_bytes: 3050773153
     rx_queue_8_packets: 3818759
     rx_queue_8_bytes: 3114684210
     rx_queue_9_packets: 3884149
     rx_queue_9_bytes: 3146424689
     rx_queue_10_packets: 3935287
     rx_queue_10_bytes: 3258626529
     rx_queue_11_packets: 3967227
     rx_queue_11_bytes: 3290780486
     rx_queue_12_packets: 3942912
     rx_queue_12_bytes: 3224159656
     rx_queue_13_packets: 3957696
     rx_queue_13_bytes: 3291882552
     rx_queue_14_packets: 3938668
     rx_queue_14_bytes: 3243292247
     rx_queue_15_packets: 3933543
     rx_queue_15_bytes: 3182375494
     rx_queue_16_packets: 6
     rx_queue_16_bytes: 372
     rx_queue_17_packets: 15
     rx_queue_17_bytes: 924
     rx_queue_18_packets: 25
     rx_queue_18_bytes: 1864
     rx_queue_19_packets: 77
     rx_queue_19_bytes: 7284
     rx_queue_20_packets: 36
     rx_queue_20_bytes: 13175
     rx_queue_21_packets: 21
     rx_queue_21_bytes: 1272
     rx_queue_22_packets: 32
     rx_queue_22_bytes: 1968
     rx_queue_23_packets: 30
     rx_queue_23_bytes: 2530



AFTER 900 MBPS TEST (ALSO SEE ATTACHMENT AFTER30K.LOG):


Ethernet Channel Bonding Driver: v3.5.0.bytemobile.2 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 103
Permanent HW addr: 00:1b:21:63:be:84

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:63:be:85


NIC statistics:
     rx_packets: 336849775
     tx_packets: 338917722
     rx_bytes: 274445104795
     tx_bytes: 277391846730
     rx_pkts_nic: 340538829
     tx_pkts_nic: 338917722
     rx_bytes_nic: 281238963659
     tx_bytes_nic: 280103482438
     lsc_int: 1
     broadcast: 36139
     fdir_match: 4551
     fdir_miss: 235008967
     rx_missed_errors: 2780843
     tx_restart_queue: 3819
     tx_queue_0_packets: 8832
     tx_queue_0_bytes: 3632539
     tx_queue_1_packets: 108573
     tx_queue_1_bytes: 99519963
     tx_queue_2_packets: 120821
     tx_queue_2_bytes: 114128230
     tx_queue_3_packets: 95608
     tx_queue_3_bytes: 80035210
     tx_queue_4_packets: 82101
     tx_queue_4_bytes: 59397236
     tx_queue_5_packets: 70329
     tx_queue_5_bytes: 51100025
     tx_queue_6_packets: 25141985
     tx_queue_6_bytes: 20479644198
     tx_queue_7_packets: 25184055
     tx_queue_7_bytes: 20695827576
     tx_queue_8_packets: 42169647
     tx_queue_8_bytes: 34673413944
     tx_queue_9_packets: 42377175
     tx_queue_9_bytes: 34675323260
     tx_queue_10_packets: 38497382
     tx_queue_10_bytes: 31512260551
     tx_queue_11_packets: 38025838
     tx_queue_11_bytes: 30949604823
     tx_queue_12_packets: 11390
     tx_queue_12_bytes: 5157808
     tx_queue_13_packets: 82304
     tx_queue_13_bytes: 91656681
     tx_queue_14_packets: 98408
     tx_queue_14_bytes: 95796901
     tx_queue_15_packets: 78372
     tx_queue_15_bytes: 79036160
     tx_queue_16_packets: 92477
     tx_queue_16_bytes: 84583488
     tx_queue_17_packets: 109149
     tx_queue_17_bytes: 91023342
     tx_queue_18_packets: 20882499
     tx_queue_18_bytes: 17055332799
     tx_queue_19_packets: 20926719
     tx_queue_19_bytes: 17087471244
     tx_queue_20_packets: 21063712
     tx_queue_20_bytes: 17084251428
     tx_queue_21_packets: 21191948
     tx_queue_21_bytes: 17469488036
     tx_queue_22_packets: 21513857
     tx_queue_22_bytes: 17776305129
     tx_queue_23_packets: 20984541
     tx_queue_23_bytes: 17077856159
     rx_queue_0_packets: 21019087
     rx_queue_0_bytes: 16889582096
     rx_queue_1_packets: 21123307
     rx_queue_1_bytes: 17151698621
     rx_queue_2_packets: 21293746
     rx_queue_2_bytes: 17378592541
     rx_queue_3_packets: 21265298
     rx_queue_3_bytes: 17346367315
     rx_queue_4_packets: 21487526
     rx_queue_4_bytes: 17609268944
     rx_queue_5_packets: 21479679
     rx_queue_5_bytes: 17502449811
     rx_queue_6_packets: 21198023
     rx_queue_6_bytes: 17230586371
     rx_queue_7_packets: 21507379
     rx_queue_7_bytes: 17572862206
     rx_queue_8_packets: 21172268
     rx_queue_8_bytes: 17327191064
     rx_queue_9_packets: 21263451
     rx_queue_9_bytes: 17246993917
     rx_queue_10_packets: 21201705
     rx_queue_10_bytes: 17295837738
     rx_queue_11_packets: 21116910
     rx_queue_11_bytes: 17260249135
     rx_queue_12_packets: 21227946
     rx_queue_12_bytes: 17268783706
     rx_queue_13_packets: 21517690
     rx_queue_13_bytes: 17669696928
     rx_queue_14_packets: 21389910
     rx_queue_14_bytes: 17497790472
     rx_queue_15_packets: 21273147
     rx_queue_15_bytes: 17413189754
     rx_queue_16_packets: 42
     rx_queue_16_bytes: 2592
     rx_queue_17_packets: 83
     rx_queue_17_bytes: 5148
     rx_queue_18_packets: 206
     rx_queue_18_bytes: 141206
     rx_queue_19_packets: 222
     rx_queue_19_bytes: 60938
     rx_queue_20_packets: 204
     rx_queue_20_bytes: 62862
     rx_queue_21_packets: 731
     rx_queue_21_bytes: 51239
     rx_queue_22_packets: 107
     rx_queue_22_bytes: 9457
     rx_queue_23_packets: 171
     rx_queue_23_bytes: 16679"

Comment 64 Andy Gospodarek 2011-04-21 21:12:15 UTC
I'll take a closer look at flavio's test kernels.

Comment 65 Ashwani Wason 2011-04-21 21:18:50 UTC
Created attachment 494002 [details]
Attachment referred to in comment #62

Comment 66 Andy Gospodarek 2011-04-22 00:24:40 UTC
I just ran for a few hours with Flavio's test kernel and it appears to work fine for me.  This is odd.

[root@localhost ~]# ethtool -S eth0 | grep -v :\ 0$ ; cat /proc/net/bonding/bond0 ; grep eth0 /proc/interrupts ; uname -a 
NIC statistics:
     rx_packets: 902863401
     tx_packets: 902862962
     rx_bytes: 1241182796831
     tx_bytes: 1241182653994
     rx_pkts_nic: 902719558
     tx_pkts_nic: 902719045
     rx_bytes_nic: 1248206560399
     tx_bytes_nic: 1248206497636
     lsc_int: 2
     multicast: 472
     broadcast: 170
     fdir_match: 2
     fdir_miss: 902708086
     tx_queue_0_packets: 482287
     tx_queue_0_bytes: 661088900
     tx_queue_1_packets: 26
     tx_queue_1_bytes: 2460
     tx_queue_2_packets: 10087
     tx_queue_2_bytes: 427845
     tx_queue_3_packets: 902370562
     tx_queue_3_bytes: 1240521134789
     rx_queue_0_packets: 12575
     rx_queue_0_bytes: 761063
     rx_queue_1_packets: 96
     rx_queue_1_bytes: 20097
     rx_queue_2_packets: 17
     rx_queue_2_bytes: 2031
     rx_queue_3_packets: 902850823
     rx_queue_3_bytes: 1241182190400
Ethernet Channel Bonding Driver: v3.5.0.bytemobile.2 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 10.0.2.200

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:37:b7:20
 36:         32      49347      19386          0   PCI-MSI-edge      eth0-TxRx-0
 37:         22          0       5633         10   PCI-MSI-edge      eth0-TxRx-1
 38:         22          0      15615         17   PCI-MSI-edge      eth0-TxRx-2
 39:      49511          0          0   91856880   PCI-MSI-edge      eth0-TxRx-3
 40:          2          0          0          0   PCI-MSI-edge      eth0:lsc
Linux localhost.localdomain 2.6.32-71.25.1.el6.bytemobile.2.x86_64 #1 SMP Tue Apr 19 23:29:55 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

Comment 67 Ashwani Wason 2011-04-22 00:35:46 UTC
Andy, can you please cross-check the md5sums below?

[root@s01b01 ~]# md5sum /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko
d423e66d274294f23f6bca1c0b34de8f /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko

[root@s01b01 ~]# md5sum /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/bonding/bonding.ko
fa71b105a92e583ba408ec75a5294e21 /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/bonding/bonding.ko

Comment 68 Andy Gospodarek 2011-04-25 18:09:03 UTC
# md5sum /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/bonding/bonding.ko /lib/modules/2.6.3
2-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko 
fa71b105a92e583ba408ec75a5294e21  /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/bonding/bonding.ko
d423e66d274294f23f6bca1c0b34de8f  /lib/modules/2.6.32-71.25.1.el6.bytemobile.2.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko

Looks the same.

Comment 69 Aristeu Rozanski 2011-04-27 14:17:31 UTC
Patch(es) available on kernel-2.6.32-131.0.9.el6

Comment 72 Neal Kim 2011-04-28 23:35:17 UTC
I have been doing some testing with the hardware that we received from IBM. In short I have been unable to recreate the bond flapping (with irqbalancing enabled) and increasing rx_missed_errors. Although I have been able to recreate the tx_restart_queue behaviour.

I have 3 IBM blades in my testing environment, one HS22V and two HS21 blades in the same chassis with identical Intel 10GB dual-port NICS (82599EB), all connected to the same 10GB switch (in-chassis). Unfortunately we do not have an additional switch, so only one active 10GB link per blade.

[ VLAN Configuration ]

Alias Port Tag Fast PVID      NAME                  VLAN(s)
----- ---- --- ---- ---- -------------- -------------------------------
HS22V   1   y   n    130 INT1           130 1000 2000 
HS21    4   n   n   1000 INT4           1000 
HS21    5   n   n   2000 INT5           2000 

HS22V acting as router-on-a-stick (ibm-hs22v-01)
eth0(backup no-link)/eth1(active) bonded bond0

With the following VLANs configured on bond0:
VLAN 1000 (192.168.1.0/24)
VLAN 2000 (192.168.2.0/24)

Server/Client machines connected to their respective VLAN:
ibm-hs21-03 (VLAN 1000) 192.168.1.2 (server)
ibm-hs21-04 (VLAN 2000) 192.168.2.2 (client)

For my testing I created 197 vips on both the server and client systems. I have a script on the server host that starts 197 netserver instances, listening on each respective vip. I have another script on the client host that starts 197 netperf processes, which maps to a corresponding netserver instance on the server host.

My netperf instances ran with a modified duration of 60sec.

netperf -L 192.168.2.${i} -H 192.168.1.${i} -l 60 &

I tested the following kernel versions:

kernel-2.6.32-71.el6.x86_64
kernel-2.6.32-71.25.1.el6.bytemobile.1.x86_64
kernel-2.6.32-71.25.1.el6.bytemobile.2.x86_64

With each kernel version I ran the following test iterations:

----------------------------------------
default rx/tx ring buffer size of 512
irqbalancing disabled
irqbalancing enabled
----------------------------------------
modified rx/tx ring buffer size of 4096
irqbalancing disabled
irqbalancing enabled

*With all kernel versions and irqbalancing *disabled*, I had numerous bond interface flapping events.
*With all kernel versions and irqbalancing enabled, I could not replicate the bond interface flapping.
*With all kernel versions and irqbalancing enabled/disabled, I could not replicate rx_missed_errors.
*With all kernel versions and irqbalancing enabled/disabled the tx_restart_queue value increased.

One thing I noticed that may or not be relevant, with every test I had some rx queues that didn't seem to be utilized:

     rx_queue_0_packets: 3089870
     rx_queue_0_bytes: 1804042009
     rx_queue_1_packets: 4005624
     rx_queue_1_bytes: 4061120895
     rx_queue_2_packets: 4235923
     rx_queue_2_bytes: 3443838722
     rx_queue_3_packets: 3822445
     rx_queue_3_bytes: 1869651553
     rx_queue_4_packets: 4344733
     rx_queue_4_bytes: 3427483087
     rx_queue_5_packets: 4576337
     rx_queue_5_bytes: 3675215658
     rx_queue_6_packets: 3455725
     rx_queue_6_bytes: 1670861973
     rx_queue_7_packets: 3086839
     rx_queue_7_bytes: 1730918582
     rx_queue_8_packets: 4405062
     rx_queue_8_bytes: 3208447459
     rx_queue_9_packets: 3679678
     rx_queue_9_bytes: 2242165199
     rx_queue_10_packets: 4691184
     rx_queue_10_bytes: 3730180979
     rx_queue_11_packets: 4096033
     rx_queue_11_bytes: 3477832483
     rx_queue_12_packets: 2813715
     rx_queue_12_bytes: 1795775431
     rx_queue_13_packets: 2474769
     rx_queue_13_bytes: 1623111649
     rx_queue_14_packets: 2392125
     rx_queue_14_bytes: 1670762842
     rx_queue_15_packets: 4507918
     rx_queue_15_bytes: 3347957742
     rx_queue_16_packets: 0
     rx_queue_16_bytes: 0
     rx_queue_17_packets: 0
     rx_queue_17_bytes: 0
     rx_queue_18_packets: 0
     rx_queue_18_bytes: 0
     rx_queue_19_packets: 0
     rx_queue_19_bytes: 0
     rx_queue_20_packets: 0
     rx_queue_20_bytes: 0
     rx_queue_21_packets: 0
     rx_queue_21_bytes: 0
     rx_queue_22_packets: 0
     rx_queue_22_bytes: 0
     rx_queue_23_packets: 0
     rx_queue_23_bytes: 0

Comment 74 Ashwani Wason 2011-04-29 17:10:28 UTC
(In reply to comment #72)
> For my testing I created 197 vips on both the server and client systems. I have
> a script on the server host that starts 197 netserver instances, listening on
> each respective vip. I have another script on the client host that starts 197
> netperf processes, which maps to a corresponding netserver instance on the
> server host.
> 
> My netperf instances ran with a modified duration of 60sec.
> 
> netperf -L 192.168.2.${i} -H 192.168.1.${i} -l 60 &
> 

Neal, can you please quantify what this means in terms of data rate, connection rate, packet size, etc?

Comment 75 Neal Kim 2011-04-29 17:34:03 UTC
Hi Ashwani,

In terms of throughput I was able to achieve roughly ~4800Mbps

Comment 76 Ashwani Wason 2011-04-29 17:45:16 UTC
Thanks Neal - that is way over what I was running with. Let us discuss further options offline.

Comment 77 Ashwani Wason 2011-05-04 04:40:45 UTC
Neal, Andy, I am able to reproduce the exact same problem with standard RHEL installation on a hard disk. It has RHEL 6.0 installed and on top of it I installed the new kernel (2.6.32-71.25.1.el6.bytemobile.2.x86_64).

I am seeing all problems:

[root@s01b11 ~]# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.5.0.bytemobile.2 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 169.254.144.20

Slave Interface: eth0
MII Status: up
Link Failure Count: 19
Permanent HW addr: 00:1b:21:72:9f:24

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1b:21:72:9f:25



[root@s01b11 ~]# ethtool -S eth0 | egrep 'rx_missed|tx_restart_q'
     rx_missed_errors: 2439591
     tx_restart_queue: 2521


Per our discussion I am in the process of providing this lab to you via VPN. Should be done sometime tomorrow. I will let Neal know when it is ready via our conversation on the support portal.

Comment 78 Martin Prpič 2011-05-05 08:53:10 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
During light or no network traffic, the active-backup interface bond using ARP monitoring with validation could go down and return due to an overflow or underflow of system timer interrupt ticks (jiffies). With this update, the jiffies calculation issues problems have been fixed and a bond interface works as expected.

Comment 80 errata-xmlrpc 2011-05-19 12:02:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html


Note You need to log in before you can comment on or make changes to this bug.