Bug 1887518

Summary: When adding host to RHV-M with LACP bond, '"ad_actor_system=00:00:00:00:00:00' is appended to bond configuration
Product: Red Hat Enterprise Linux 8 Reporter: amashah
Component: nmstateAssignee: Gris Ge <fge>
Status: CLOSED ERRATA QA Contact: Mingyu Shi <mshi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.4CC: adevolder, amusil, aperotti, daniel.henchoz, dholler, ferferna, fge, jiji, jishi, jmaxwell, jortialc, kshukla, lsurette, matteo.panella, mperina, network-qe, nsurati, patrizio.bassi, sbonazzo, srevivo, till, ycui
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: 8.4Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nmstate-0.4.1-2.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1890497 (view as bug list) Environment:
Last Closed: 2021-05-18 15:17:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1541529, 1890497    
Attachments:
Description Flags
verification none

Description amashah 2020-10-12 17:14:51 UTC
Description of problem:
When adding host to RHV-M with LACP bond, 'ad_actor_system=00:00:00:00:00:00' is appended to bond configuration

Version-Release number of selected component (if applicable):
4.4.1

How reproducible:
Unknown

Steps to Reproduce:
1. Install RHV-H 4.4.1 and configure bond0 as LACP (required in this environment)
2. Add the host to RHV-M
3. 'ad_actor_system=00:00:00:00:00:00' in part of the bonding configuration on the host

Actual results:
Host installation fails to complete.

Expected results:
If the above is expected, then the correct MAC should be used as a null and/or multicast value is invalid.

Additional info:

engine.log
~~~
2020-10-07 13:48:54,145+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (EE-ManagedThreadFactory-engine-Thread-251) [54a1bfdb] Command 'HostSetupNetworksVDSCommand(HostName = rhv.xxxxx.xxxxx, HostSetupNetworksVdsCommandParameters:{hostId='544155bc-bd1b-4de4-8bc1-56889614aca7', vds='Host[rhv.xxxxx.xxxxx,544155bc-bd1b-4de4-8bc1-56889614aca7]', rollbackOnFailure='true', commitOnSuccess='true', connectivityTimeout='360', networks='[HostNetwork:{defaultRoute='true', bonding='true', networkName='ovirtmgmt', vdsmName='ovirtmgmt', nicName='bond0', vlan='null', vmNetwork='true', stp='false', properties='null', ipv4BootProtocol='STATIC_IP', ipv4Address='10.244.4.20', ipv4Netmask='255.255.240.0', ipv4Gateway='10.244.0.1', ipv6BootProtocol='NONE', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', nameServers='null'}]', removedNetworks='[]', bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY', managementNetworkChanged='true'})' execution failed: VDSGenericException: VDSErrorException: Failed to HostSetupNetworksVDS, error = Internal JSON-RPC error: {'reason': '\ndesired\n=======\n---\nname: bond0\ntype: bond\nstate: up\nipv4:\n  enabled: false\nipv6:\n  enabled: false\nlink-aggregation:\n  mode: 802.3ad\n  options:\n    ad_actor_system: 00:00:00:00:00:00\n    lacp_rate: fast\n    miimon: 100\n    xmit_hash_policy: layer3+4\n  slaves:\n  - p2p1\n  - p2p2\nmac-address: 34:80:0D:85:28:34\nmtu: 9000\n\ncurrent\n=======\n---\nname: bond0\ntype: bond\nstate: up\nipv4:\n  enabled: false\nipv6:\n  enabled: false\nlink-aggregation:\n  mode: 802.3ad\n  options:\n    ad_actor_system: 00:00:00:00:00:00\n    lacp_rate: fast\n    miimon: 100\n    xmit_hash_policy: layer3+4\n  slaves: []\nmac-address: 34:80:0D:85:28:34\nmtu: 9000\n\ndifference\n==========\n--- desired\n+++ current\n@@ -13,8 +13,6 @@\n     lacp_rate: fast\n     miimon: 100\n     xmit_hash_policy: layer3+4\n-  slaves:\n-  - p2p1\n-  - p2p2\n+  slaves: []\n mac-address: 34:80:0D:85:28:34\n mtu: 9000\n\n'}, code = -32603

...

2020-10-07 13:48:54,151+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-251) [54a1bfdb] EVENT_ID: SETUP_NETWORK_FAILED_FOR_MANAGEMENT_NETWORK_
CONFIGURATION(1,120), Failed to configure management network on host rhv.xxxx.xxxxx due to setup networks failure.
2020-10-07 13:48:54,151+02 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-251) [54a1bfdb] Exception: org.ovirt.engine.core.bll.network.NetworkConfigurator$NetworkConfiguratorException: Failed to configure management network
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.network.NetworkConfigurator.configureManagementNetwork(NetworkConfigurator.java:245)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.network.NetworkConfigurator.createManagementNetworkIfRequired(NetworkConfigurator.java:91)
...
~~~

On the host side:

~~~
Oct  9 10:34:57 rhvxxxxx-cdm kernel: bond0: Invalid ad_actor_system MAC address.
Oct  9 10:34:57 rhvxxxxx kernel: bond0: option ad_actor_system: invalid value (00:00:00:00:00:00)
Oct  9 10:34:57 rhvxxxxx NetworkManager[34789]: <error> [1602232497.0945] platform-linux: sysctl: failed to set 'bonding/ad_actor_system' to '00:00:00:00:00:00': (22) Invalid argument
Oct  9 10:34:57 rhvxxxxx NetworkManager[34789]: <warn>  [1602232497.0945] device (bond0): failed to set bonding attribute 'ad_actor_system' to '00:00:00:00:00:00'
~~~


~~~
$ cat proc/net//bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 34:80:0d:85:2d:d4
Active Aggregator Info:
	Aggregator ID: 3
	Number of ports: 2
	Actor Key: 15
	Partner Key: 52
	Partner Mac Address: 00:01:41:42:00:33

Slave Interface: enp98s0f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 34:80:0d:85:2d:d4
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 34:80:0d:85:2d:d4
    port key: 15
    port priority: 255
    port number: 1
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 00:01:41:42:00:33
    oper key: 52
    port priority: 127
    port number: 2
    port state: 63

Slave Interface: enp98s0f1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 34:80:0d:85:2d:d5
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 34:80:0d:85:2d:d4
    port key: 15
    port priority: 255
    port number: 2
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 00:01:41:42:00:33
    oper key: 52
    port priority: 127
    port number: 2
    port state: 63
~~~

Logs will be added to the case in a comment.

Comment 2 Patrizio Bassi 2020-10-19 06:40:49 UTC
same applies to 4.4.2, the kernel issues a warning about invalid config but it is still able to make the bond up

Comment 3 Ales Musil 2020-10-19 14:19:34 UTC
Hi,

I cannot find the stated error in any of those reports can you please provide relevant part of the log?

Comment 9 Patrizio Bassi 2020-10-20 08:24:31 UTC
Hi,

we added logs in private case 02760845 as they contain infos such as ip/hostnames we would like to keep private

Comment 15 Gris Ge 2020-10-21 12:38:36 UTC
The nmstate 0.2 requires bond subordinates/ports to be included in desire state manually, if not, it fails on verification error like above.

Without supervdsm.log, I cannot test for sure.

Comment 17 Gris Ge 2020-10-22 05:47:54 UTC
Changed to nmstate as confirmed as nmstate bug.

Comment 20 Gris Ge 2020-10-22 06:28:31 UTC
Reproducer:

echo '
---
interfaces:
- name: eth1
  state: up
- name: eth2
  state: up
- name: bond99
  type: bond
  state: up
  link-aggregation:
    mode: 802.3ad
    options:
      ad_actor_system: 00:00:00:00:00:00
    slaves:
    - eth2
    - eth1' | sudo nmstatectl set - 

nmcli --fields bond.options c show bond99


The output should not have `ad_actor_system`.

Comment 22 Dominik Holler 2020-10-22 06:37:48 UTC
(In reply to Gris Ge from comment #20)
> Reproducer:
> 
> echo '
> ---
> interfaces:
> - name: eth1
>   state: up
> - name: eth2
>   state: up
> - name: bond99
>   type: bond
>   state: up
>   link-aggregation:
>     mode: 802.3ad
>     options:
>       ad_actor_system: 00:00:00:00:00:00

Please note that the ad_actor_system attribute is not included in the reproducer in attachment 1723250 [details] .

>     slaves:
>     - eth2
>     - eth1' | sudo nmstatectl set - 
> 
> nmcli --fields bond.options c show bond99
> 
> 
> The output should not have `ad_actor_system`.

Comment 31 Mingyu Shi 2020-11-09 08:01:42 UTC
Verified with versions:
NetworkManager-1.28.0-0.1.el8.x86_64
nispor-0.6.1-2.el8.x86_64
nmstate-0.4.1-2.el8.noarch
DISTRO=RHEL-8.4.0-20201103.d.0
Linux hp-dl380g10-02.rhts.eng.pek2.redhat.com 4.18.0-241.el8.dt1.x86_64 #1 SMP Mon Nov 2 08:24:31 EST 2020 x86_64 x86_64 x86_64 GNU/Linux

Notice:
1. 'nmstatectl show' will still print 'ad_actor_system: 00:00:00:00:00:00' as this is read from the kernel
2. When 'ad_actor_system: 00:00:00:00:00:00' is set with nmstate, it will remove ad_actor_system property from the corresponding NM profile(connection)
3. Set 'ad_actor_system: 00:00:00:00:00:00' won't change the current ad_actor_system value in kernel

Comment 32 Mingyu Shi 2020-11-09 08:04:23 UTC
Created attachment 1727672 [details]
verification

Comment 39 Gris Ge 2021-02-23 07:53:31 UTC
Hi Nirav Surati,

I have confirmed this is a issue of NetworkManager: https://bugzilla.redhat.com/show_bug.cgi?id=1923999#c12

The error message is harmless, the value(00:00:00:00:00:00) is already the default value in kernel.
The kernel is in the state we requested, NetworkManager just showing the wrong message.

Please use that bug to track the efforts or feedback.

Thank you!

Comment 41 errata-xmlrpc 2021-05-18 15:17:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (nmstate bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1748