Description of problem: Version-Release number of selected component (if applicable): guest OS: kernel-2.6.18-194 host: KVM host and version not related. How reproducible: 100% Steps to Reproduce: 1. create a virtual guest with two virtio network interfaces 2. configure a bonding interface with the option BONDING_OPTS="mode=active-backup arp_interval=2000 arp_ip_target=192.168.122.1" Actual results: Bonding failovers on each arp inspect: Nov 4 15:56:09 localhost kernel: bonding: bond0: making interface eth0 the new active one. Nov 4 15:56:11 localhost kernel: bonding: bond0: link status definitely up for interface eth1. Nov 4 15:56:13 localhost kernel: bonding: bond0: link status definitely down for interface eth0, disabling it Nov 4 15:56:13 localhost kernel: bonding: bond0: making interface eth1 the new active one. Nov 4 15:56:15 localhost kernel: bonding: bond0: link status definitely up for interface eth0. Nov 4 15:56:17 localhost kernel: bonding: bond0: link status definitely down for interface eth1, disabling it Nov 4 15:56:17 localhost kernel: bonding: bond0: making interface eth0 the new active one. Nov 4 15:56:19 localhost kernel: bonding: bond0: link status definitely up for interface eth1. Nov 4 15:56:21 localhost kernel: bonding: bond0: link status definitely down for interface eth0, disabling it Nov 4 15:56:21 localhost kernel: bonding: bond0: making interface eth1 the new active one. Nov 4 15:56:23 localhost kernel: bonding: bond0: link status definitely up for interface eth0. Nov 4 15:56:25 localhost kernel: bonding: bond0: link status definitely down for interface eth1, disabling it Nov 4 15:56:25 localhost kernel: bonding: bond0: making interface eth0 the new active one. Nov 4 15:56:27 localhost kernel: bonding: bond0: link status definitely up for interface eth1. Nov 4 15:56:29 localhost kernel: bonding: bond0: link status definitely down for interface eth0, disabling it Nov 4 15:56:29 localhost kernel: bonding: bond0: making interface eth1 the new active one. Nov 4 15:56:31 localhost kernel: bonding: bond0: link status definitely up for interface eth0. Nov 4 15:56:33 localhost kernel: bonding: bond0: link status definitely down for interface eth1, disabling it Nov 4 15:56:33 localhost kernel: bonding: bond0: making interface eth0 the new active one. Nov 4 15:56:35 localhost kernel: bonding: bond0: link status definitely up for interface eth1. Expected results: bonding works fine with virt-net driver. Additional info: Because either ethtool or mii-tool is not available to detect the link for virtio interface, so arp monitor is used to detect link failure. The link detect doesn't make more sense for a virio interface. This issue was caused the virtio driver doesn't update timestamp of rx and tx. The current slave can't pass the following check, because trans_start is not updated. /* * Active slave is down if: * - more than 2*delta since transmitting OR * - (more than 2*delta since receive AND * the bond has an IP address) */ if ((slave->state == BOND_STATE_ACTIVE) && (time_after_eq(jiffies, slave->dev->trans_start + 2 * delta_in_ticks) || (time_after_eq(jiffies, slave_last_rx(bond, slave) + 2 * delta_in_ticks)))) { slave->new_link = BOND_LINK_DOWN; commit++; } Even though last_rx is ignored by virtio-net either, but it will be updated in the following codes: static inline int skb_bond_should_drop(struct sk_buff *skb) { struct net_device *dev = skb->dev; struct net_device *master = dev->master; if (master) { if (master->priv_flags & IFF_MASTER_ARPMON) dev->last_rx = jiffies; ... So the failure interface can become up in next arp inspect and then be chose on next failover, because it just check last_rx: if (slave->link != BOND_LINK_UP) { if (time_before_eq(jiffies, slave_last_rx(bond, slave) + delta_in_ticks)) { slave->new_link = BOND_LINK_UP; commit++; } Then I update the rx and tx timestamp in virtio driver, and the patched virtio driver works fine with bonding.
Created attachment 460793 [details] update timestamp of rx and tx in virtio-net driver
Is it windows guest? You opened the report for virtio-win and not kvm. We rather fix this on rhel6 host so please retest over rhel6
Dor, It should be a bug of virtio-net driver. I am not sure how the component became virtio-win. I am sorry for that confusion. According to the code, it should also have impact on rhel6. Anyway, I will retest it on rhel6 guest.
Thanks for the patch. The part for last_rx is not needed as it is updated in skb_bond_should_drop, while trans_start is really omitted and need to be taken care of. Please test following kernel, it fixies the issue for me. http://people.redhat.com/jolsa/653828/ I'll post the change soon, thanks.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-243.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Reproduced with guest kernel-2.6.18-238.el5. 1. Boot guest with 2 virtio nic: /usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate now -name rhel5-verify -smp 2,cores=2 -k en-us -m 1G -boot c -net nic,vlan=1,macaddr=00:1a:2a:42:29:10,model=virtio -net tap,vlan=1,script=/etc/qemu-ifup,downscript=no -net nic,vlan=2,macaddr=00:1a:2a:42:25:12,model=virtio -net tap,vlan=2,script=/etc/qemu-ifup,downscript=no -drive file=/media/rhel5.6-64.qcow2,media=disk,if=virtio,cache=off,boot=on,format=qcow2,werror=stop -cpu qemu64,+sse2 -M rhel5.6.0 -notify all -balloon none -monitor stdio -vnc :10 2. configure a bond0 for eth0 and eth1. 3. restart guest network. Result: dmesg in guest: bonding: bond0: link status definitely down for interface eth1, disabling it bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bonding: bond0: link status definitely down for interface eth0, disabling it bonding: bond0: making interface eth1 the new active one. bonding: bond0: link status definitely up for interface eth0. bonding: bond0: link status definitely down for interface eth1, disabling it bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bonding: bond0: link status definitely down for interface eth0, disabling it bonding: bond0: making interface eth1 the new active one. bonding: bond0: link status definitely up for interface eth0. bonding: bond0: link status definitely down for interface eth1, disabling it bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bonding: bond0: link status definitely down for interface eth0, disabling it bonding: bond0: making interface eth1 the new active one. bonding: bond0: link status definitely up for interface eth0. bonding: bond0: link status definitely down for interface eth1, disabling it bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bonding: bond0: link status definitely down for interface eth0, disabling it bonding: bond0: making interface eth1 the new active one. bonding: bond0: link status definitely up for interface eth0. bonding: bond0: link status definitely down for interface eth1, disabling it bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bonding: bond0: link status definitely down for interface eth0, disabling it bonding: bond0: making interface eth1 the new active one. Verified pass with guest kernel-2.6.18-262.el5. Tested with the same steps. In guest dmesg: bonding: bond0: making interface eth0 the new active one. bonding: bond0: first active interface up! bonding: bond0: enslaving eth0 as an active interface with an up link. bonding: bond0: Adding slave eth1. bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full. bonding: bond0: enslaving eth1 as a backup interface with an up link. Then I set_link down for one network interface and ping arp_ip_target, ping available. Set_link down for the second network interface, can not ping arp_ip_target. set_link up one network interface, can ping arp_ip_target again. So, this bug is verified pass.
according comment10,set this issue as verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html