Bug 1317457 - [RFE] Engine should warn admin about bad 802.3ad status
Summary: [RFE] Engine should warn admin about bad 802.3ad status
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: RFEs
Version: 3.6.0
Hardware: All
OS: Linux
high
medium
Target Milestone: ovirt-4.0.2
: ---
Assignee: Marcin Mirecki
QA Contact: Mor
URL:
Whiteboard:
Depends On: 1240719
Blocks: 902971 1281666 1397265
TreeView+ depends on / blocked
 
Reported: 2016-03-14 09:55 UTC by Yaniv Lavi
Modified: 2017-01-25 21:04 UTC (History)
15 users (show)

Fixed In Version:
Clone Of: 1281666
Environment:
Last Closed: 2016-08-12 14:25:55 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-4.0.z+
mburman: testing_plan_complete+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 53100 0 None None None 2016-03-14 09:55:40 UTC
oVirt gerrit 53852 0 master MERGED engine: Adding information about aggregate link status to Bond 2016-05-08 08:14:59 UTC
oVirt gerrit 54588 0 ovirt-3.6 ABANDONED Advertise aggregator ID in bonding interfaces 2016-08-24 10:21:41 UTC

Description Yaniv Lavi 2016-03-14 09:55:40 UTC
LACP Bond Bad Status Warning

Description of problem:
Many different issues on missing configuration on the switch side of LACP bonds. It's very easy to setup nodes in Bond Mode 4 or change some configuration and forget to do the required thing on the switch side.

Various different outcomes of bad LACP bonds:
- Applications on the VMs running slow
- Storage domain connection problems
- Flipping states for Hypervisors (non-operational, non-responsive...)
- Missing pings
- TCP resets, reordering, retransmissions
- Various timeouts
- VMs unable to communicate, flipping communication for VMs.

cat /proc/net/bonding/bond0
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 1
        Actor Key: 17
        Partner Key: 1
        Partner Mac Address: 00:00:00:00:00:00 <---------If config is ok, this
                                              should be the switch MAC address

Slave Interface: em1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:b1:1c:2a:81:0f
Aggregator ID: 1    <---------- Aggregator ID should be the same for all ports
Slave queue ID: 0

Slave Interface: em3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:b1:1c:2a:81:11
Aggregator ID: 2     <--------- Aggregator ID should be the same for all ports
Slave queue ID: 0

Aggregator ID could be different in this case:
http://unix.stackexchange.com/questions/82569/bonds-vs-aggregators/172232#172232

Comment 1 Mike McCune 2016-03-28 23:08:19 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 2 Sandro Bonazzola 2016-05-02 09:51:49 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 3 Michael Burman 2016-06-06 08:58:23 UTC
[root@orchid-vds2 ~]# vdsClient -s 0 getVdsCaps |grep aggreg
                              'ad_aggregator_id': '3',
                              'ad_aggregator_id': '1',
        nics = {'dummy_0': {'ad_aggregator_id': '3',
                'dummy_1': {'ad_aggregator_id': '4',
                'dummy_3': {'ad_aggregator_id': '1',
                'dummy_4': {'ad_aggregator_id': '2',

[root@orchid-vds2 ~]# vdsClient -s 0 getVdsCaps |grep partner 
                              'ad_partner_mac': '00:00:00:00:00:00',
                              'ad_partner_mac': '00:00:00:00:00:00',

- partner mac with zeros should be considered as bad bond status.

Comment 5 Yaniv Kaul 2016-07-14 12:59:55 UTC
(In reply to Dan Kenigsberg from comment #4)
> no,
> https://gerrit.ovirt.org/#/q/status:open+project:ovirt-engine+branch:
> master+topic:%22Bad+bond+aggregator%22 is not merged yet.

Time to move it to 4.1?

Comment 6 Gil Klein 2016-08-03 13:47:39 UTC
Verified based on:
https://bugzilla.redhat.com/show_bug.cgi?id=1281666#c35

Comment 7 Dan Kenigsberg 2017-01-25 21:04:11 UTC
What a bugzilla mess. This bug is the clone of bug 1281666. bug 1281666 should have been targeted to 4.1, and this bug to 4.0.2. Instead, both ended up being closed in 4.0.2.

But never mind that now. bug 1413381 and bug 1413380 track this "bad bond" feature in 4.1.


Note You need to log in before you can comment on or make changes to this bug.