Bug 1317457

Summary: [RFE] Engine should warn admin about bad 802.3ad status
Product: [oVirt] ovirt-engine Reporter: Yaniv Lavi <ylavi>
Component: RFEsAssignee: Marcin Mirecki <mmirecki>
Status: CLOSED CURRENTRELEASE QA Contact: Mor <mkalfon>
Severity: medium Docs Contact:
Priority: high    
Version: 3.6.0CC: bgraveno, bugs, danken, gklein, gveitmic, inetkach, lsurette, mburman, mkalinin, mmirecki, myakove, rbalakri, srevivo, ykaul, ylavi
Target Milestone: ovirt-4.0.2Keywords: FutureFeature
Target Release: ---Flags: rule-engine: ovirt-4.0.z+
mburman: testing_plan_complete+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
For a bond in mode 4 (link aggregation mode), all slaves must be configured properly on the switch side. If none of them are configured on the switch, the host side kernel reports the ad_partner_mac as 00:00:00:00:00:00. This update retrieves the partner mac address and warns the Manager user if the bond is configured incorrectly. No warning is given if only one of the slaves are up and running.
Story Points: ---
Clone Of: 1281666 Environment:
Last Closed: 2016-08-12 14:25:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1240719    
Bug Blocks: 902971, 1281666, 1397265    

Description Yaniv Lavi 2016-03-14 09:55:40 UTC
LACP Bond Bad Status Warning

Description of problem:
Many different issues on missing configuration on the switch side of LACP bonds. It's very easy to setup nodes in Bond Mode 4 or change some configuration and forget to do the required thing on the switch side.

Various different outcomes of bad LACP bonds:
- Applications on the VMs running slow
- Storage domain connection problems
- Flipping states for Hypervisors (non-operational, non-responsive...)
- Missing pings
- TCP resets, reordering, retransmissions
- Various timeouts
- VMs unable to communicate, flipping communication for VMs.

cat /proc/net/bonding/bond0
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 1
        Actor Key: 17
        Partner Key: 1
        Partner Mac Address: 00:00:00:00:00:00 <---------If config is ok, this
                                              should be the switch MAC address

Slave Interface: em1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:b1:1c:2a:81:0f
Aggregator ID: 1    <---------- Aggregator ID should be the same for all ports
Slave queue ID: 0

Slave Interface: em3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:b1:1c:2a:81:11
Aggregator ID: 2     <--------- Aggregator ID should be the same for all ports
Slave queue ID: 0

Aggregator ID could be different in this case:
http://unix.stackexchange.com/questions/82569/bonds-vs-aggregators/172232#172232

Comment 1 Mike McCune 2016-03-28 23:08:19 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 2 Sandro Bonazzola 2016-05-02 09:51:49 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 3 Michael Burman 2016-06-06 08:58:23 UTC
[root@orchid-vds2 ~]# vdsClient -s 0 getVdsCaps |grep aggreg
                              'ad_aggregator_id': '3',
                              'ad_aggregator_id': '1',
        nics = {'dummy_0': {'ad_aggregator_id': '3',
                'dummy_1': {'ad_aggregator_id': '4',
                'dummy_3': {'ad_aggregator_id': '1',
                'dummy_4': {'ad_aggregator_id': '2',

[root@orchid-vds2 ~]# vdsClient -s 0 getVdsCaps |grep partner 
                              'ad_partner_mac': '00:00:00:00:00:00',
                              'ad_partner_mac': '00:00:00:00:00:00',

- partner mac with zeros should be considered as bad bond status.

Comment 5 Yaniv Kaul 2016-07-14 12:59:55 UTC
(In reply to Dan Kenigsberg from comment #4)
> no,
> https://gerrit.ovirt.org/#/q/status:open+project:ovirt-engine+branch:
> master+topic:%22Bad+bond+aggregator%22 is not merged yet.

Time to move it to 4.1?

Comment 6 Gil Klein 2016-08-03 13:47:39 UTC
Verified based on:
https://bugzilla.redhat.com/show_bug.cgi?id=1281666#c35

Comment 7 Dan Kenigsberg 2017-01-25 21:04:11 UTC
What a bugzilla mess. This bug is the clone of bug 1281666. bug 1281666 should have been targeted to 4.1, and this bug to 4.0.2. Instead, both ended up being closed in 4.0.2.

But never mind that now. bug 1413381 and bug 1413380 track this "bad bond" feature in 4.1.