Bug 1317457 - [RFE] Engine should warn admin about bad 802.3ad status
[RFE] Engine should warn admin about bad 802.3ad status
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: RFEs (Show other bugs)
3.6.0
All Linux
high Severity medium (vote)
: ovirt-4.0.2
: ---
Assigned To: Marcin Mirecki
Mor
: FutureFeature
Depends On: 1240719
Blocks: 902971 1281666 1397265
  Show dependency treegraph
 
Reported: 2016-03-14 05:55 EDT by Yaniv Lavi (Dary)
Modified: 2017-01-25 16:04 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
For a bond in mode 4 (link aggregation mode), all slaves must be configured properly on the switch side. If none of them are configured on the switch, the host side kernel reports the ad_partner_mac as 00:00:00:00:00:00. This update retrieves the partner mac address and warns the Manager user if the bond is configured incorrectly. No warning is given if only one of the slaves are up and running.
Story Points: ---
Clone Of: 1281666
Environment:
Last Closed: 2016-08-12 10:25:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Network
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.0.z+
mburman: testing_plan_complete+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 53100 None None None 2016-03-14 05:55 EDT
oVirt gerrit 53852 master MERGED engine: Adding information about aggregate link status to Bond 2016-05-08 04:14 EDT
oVirt gerrit 54588 ovirt-3.6 ABANDONED Advertise aggregator ID in bonding interfaces 2016-08-24 06:21 EDT

  None (edit)
Description Yaniv Lavi (Dary) 2016-03-14 05:55:40 EDT
LACP Bond Bad Status Warning

Description of problem:
Many different issues on missing configuration on the switch side of LACP bonds. It's very easy to setup nodes in Bond Mode 4 or change some configuration and forget to do the required thing on the switch side.

Various different outcomes of bad LACP bonds:
- Applications on the VMs running slow
- Storage domain connection problems
- Flipping states for Hypervisors (non-operational, non-responsive...)
- Missing pings
- TCP resets, reordering, retransmissions
- Various timeouts
- VMs unable to communicate, flipping communication for VMs.

cat /proc/net/bonding/bond0
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 1
        Actor Key: 17
        Partner Key: 1
        Partner Mac Address: 00:00:00:00:00:00 <---------If config is ok, this
                                              should be the switch MAC address

Slave Interface: em1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:b1:1c:2a:81:0f
Aggregator ID: 1    <---------- Aggregator ID should be the same for all ports
Slave queue ID: 0

Slave Interface: em3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:b1:1c:2a:81:11
Aggregator ID: 2     <--------- Aggregator ID should be the same for all ports
Slave queue ID: 0

Aggregator ID could be different in this case:
http://unix.stackexchange.com/questions/82569/bonds-vs-aggregators/172232#172232
Comment 1 Mike McCune 2016-03-28 19:08:19 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Comment 2 Sandro Bonazzola 2016-05-02 05:51:49 EDT
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
Comment 3 Michael Burman 2016-06-06 04:58:23 EDT
[root@orchid-vds2 ~]# vdsClient -s 0 getVdsCaps |grep aggreg
                              'ad_aggregator_id': '3',
                              'ad_aggregator_id': '1',
        nics = {'dummy_0': {'ad_aggregator_id': '3',
                'dummy_1': {'ad_aggregator_id': '4',
                'dummy_3': {'ad_aggregator_id': '1',
                'dummy_4': {'ad_aggregator_id': '2',

[root@orchid-vds2 ~]# vdsClient -s 0 getVdsCaps |grep partner 
                              'ad_partner_mac': '00:00:00:00:00:00',
                              'ad_partner_mac': '00:00:00:00:00:00',

- partner mac with zeros should be considered as bad bond status.
Comment 5 Yaniv Kaul 2016-07-14 08:59:55 EDT
(In reply to Dan Kenigsberg from comment #4)
> no,
> https://gerrit.ovirt.org/#/q/status:open+project:ovirt-engine+branch:
> master+topic:%22Bad+bond+aggregator%22 is not merged yet.

Time to move it to 4.1?
Comment 6 Gil Klein 2016-08-03 09:47:39 EDT
Verified based on:
https://bugzilla.redhat.com/show_bug.cgi?id=1281666#c35
Comment 7 Dan Kenigsberg 2017-01-25 16:04:11 EST
What a bugzilla mess. This bug is the clone of bug 1281666. bug 1281666 should have been targeted to 4.1, and this bug to 4.0.2. Instead, both ended up being closed in 4.0.2.

But never mind that now. bug 1413381 and bug 1413380 track this "bad bond" feature in 4.1.

Note You need to log in before you can comment on or make changes to this bug.