RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 653832 - bonding failover in every monitor interval with virtio-net driver
Summary: bonding failover in every monitor interval with virtio-net driver
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: rc
: ---
Assignee: Jiri Olsa
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On: 653828
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-16 09:29 UTC by Mark Wu
Modified: 2011-01-04 14:47 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 653828
Environment:
Last Closed: 2011-01-04 14:47:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mark Wu 2010-11-16 09:29:19 UTC
+++ This bug was initially created as a clone of Bug #653828 +++

Description of problem:


Version-Release number of selected component (if applicable):
guest OS:
kernel-2.6.18-194
host:
KVM host and version not related.

How reproducible:
100%

Steps to Reproduce:
1. create a virtual guest with two virtio network interfaces
2. configure a bonding interface with the option BONDING_OPTS="mode=active-backup arp_interval=2000 arp_ip_target=192.168.122.1" 

  
Actual results:
Bonding failovers on each arp inspect: 

Nov  4 15:56:09 localhost kernel: bonding: bond0: making interface eth0 the new active one.
Nov  4 15:56:11 localhost kernel: bonding: bond0: link status definitely up for interface eth1.
Nov  4 15:56:13 localhost kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
Nov  4 15:56:13 localhost kernel: bonding: bond0: making interface eth1 the new active one.
Nov  4 15:56:15 localhost kernel: bonding: bond0: link status definitely up for interface eth0.
Nov  4 15:56:17 localhost kernel: bonding: bond0: link status definitely down for interface eth1, disabling it
Nov  4 15:56:17 localhost kernel: bonding: bond0: making interface eth0 the new active one.
Nov  4 15:56:19 localhost kernel: bonding: bond0: link status definitely up for interface eth1.
Nov  4 15:56:21 localhost kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
Nov  4 15:56:21 localhost kernel: bonding: bond0: making interface eth1 the new active one.
Nov  4 15:56:23 localhost kernel: bonding: bond0: link status definitely up for interface eth0.
Nov  4 15:56:25 localhost kernel: bonding: bond0: link status definitely down for interface eth1, disabling it
Nov  4 15:56:25 localhost kernel: bonding: bond0: making interface eth0 the new active one.
Nov  4 15:56:27 localhost kernel: bonding: bond0: link status definitely up for interface eth1.
Nov  4 15:56:29 localhost kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
Nov  4 15:56:29 localhost kernel: bonding: bond0: making interface eth1 the new active one.
Nov  4 15:56:31 localhost kernel: bonding: bond0: link status definitely up for interface eth0.
Nov  4 15:56:33 localhost kernel: bonding: bond0: link status definitely down for interface eth1, disabling it
Nov  4 15:56:33 localhost kernel: bonding: bond0: making interface eth0 the new active one.
Nov  4 15:56:35 localhost kernel: bonding: bond0: link status definitely up for interface eth1.


Expected results:
bonding works fine with virt-net driver.

Additional info:
 Because either ethtool or mii-tool is not available to detect the link for virtio interface, so arp monitor is used to detect link failure. The link detect doesn't make more sense for a virio interface.

This issue was caused the virtio driver doesn't update timestamp of rx and tx. 
The current slave can't pass the following check, because trans_start is not updated.
                /*
                 * Active slave is down if:
                 * - more than 2*delta since transmitting OR
                 * - (more than 2*delta since receive AND
                 *    the bond has an IP address)
                 */
                if ((slave->state == BOND_STATE_ACTIVE) &&
                    (time_after_eq(jiffies, slave->dev->trans_start +
                                    2 * delta_in_ticks) ||
                      (time_after_eq(jiffies, slave_last_rx(bond, slave)
                                     + 2 * delta_in_ticks)))) {
                        slave->new_link = BOND_LINK_DOWN;
                        commit++;
                }

Even though last_rx is ignored by virtio-net either, but it will be updated in the following codes:

static inline int skb_bond_should_drop(struct sk_buff *skb)
{
        struct net_device *dev = skb->dev;
        struct net_device *master = dev->master;

        if (master) {
                if (master->priv_flags & IFF_MASTER_ARPMON)
                        dev->last_rx = jiffies;
        ...

So the failure interface can become up in next arp inspect and then be chose on next failover, because it just check last_rx:
                if (slave->link != BOND_LINK_UP) {
                        if (time_before_eq(jiffies, slave_last_rx(bond, slave) +
                                           delta_in_ticks)) {
                                slave->new_link = BOND_LINK_UP;
                                commit++;
                        }

Then I update the rx and tx timestamp in virtio driver, and the patched virtio driver works fine with bonding.

--- Additional comment from dwu on 2010-11-16 04:28:22 EST ---

Created attachment 460793 [details]
update timestamp of rx and tx in virtio-net driver

Comment 2 Neil Horman 2010-11-18 15:48:01 UTC
Triage assignment.  If you feel this bug doesn't belong to you, or that it cannot be handled in a timely fashion, please contact me for re-assignment

Comment 3 Jiri Olsa 2011-01-03 17:23:42 UTC
I cannot reproduce this one in RHEL6, while I can on same setup with RHEL5 (which is taken care via BZ 653828) .. so I believe this is not an issue for RHEL6

as for RHEL5:
seems like the last_rx stat should be taken care by skb_bond_should_drop
function.. which is already ready in RHEL5.. so my guess is that the proposed
patch is probably workaround, and we might need other fix for this issue
on RHEL5

if I dont hear from you otherwise, I'll close this one in a week..

thanks,
jirka

Comment 4 Jiri Olsa 2011-01-04 14:47:26 UTC
closing, as the reason/fix was found for RHEL5 BZ 653828


Note You need to log in before you can comment on or make changes to this bug.