Bug 1810506

Summary: NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED on bond slave when switch bond mode
Product: Red Hat Enterprise Linux 8 Reporter: Gris Ge <fge>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED NOTABUG QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.2CC: acardace, atragler, bgalvani, dholler, lrintel, pasik, rkhan, sukulkar, thaller, till
Target Milestone: rc   
Target Release: 8.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-09 09:55:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1738136, 1809330    
Attachments:
Description Flags
bond_mode_1.yml
none
bond_mode_5.yml
none
System logs with NM trace enabled (NetworkManager-1.22.9-24733.7a004ef0bb.el8.x86_64) none

Description Gris Ge 2020-03-05 12:01:53 UTC
Description of problem:

With a two slaves bond in active-backup(1) mode,
switching to balance-tlb(5) will cause activation failure:

NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED

Version-Release number of selected component (if applicable):
NetworkManager-1.22.9-24733.7a004ef0bb.el8.x86_64
NetworkManager-1.22.8-3.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. sudo nmstatectl set bond_mode_1.yml
2. sudo nmstatectl set bond_mode_5.yml
3.

Actual results:

Failure

Expected results:

No failure

Additional info:

I tried to do another check after 5 seconds(with main context iterating), the NM.ActiveConnectin is still in NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED.

Comment 1 Gris Ge 2020-03-05 12:03:03 UTC
Created attachment 1667715 [details]
bond_mode_1.yml

Comment 2 Gris Ge 2020-03-05 12:04:15 UTC
Created attachment 1667716 [details]
bond_mode_5.yml

Comment 3 Gris Ge 2020-03-05 12:05:11 UTC
Created attachment 1667717 [details]
System logs with NM trace enabled (NetworkManager-1.22.9-24733.7a004ef0bb.el8.x86_64)

Comment 4 Dominik Holler 2020-03-06 15:34:17 UTC
Please note that this issue is the reason for bug 1810550 on RHV.

Comment 5 Beniamino Galvani 2020-03-06 21:25:32 UTC
In the second invocation nmstate creates two slaves connections with
the same cloned-mac-address. This is not allowed for bonds using modes
ALB or TLB and so the enslavement of the second interface fails with:

 platform-linux: do-change-link[7]: failure changing link: failure 14 (Bad address)

A simple reproducer using iproute2:

 # addr=00:99:88:77:66:55

 # ip link add bond99 type bond mode 5
 # ip link set eth0 addr $addr
 # ip link set eth1 addr $addr

 # ip link set eth0 master bond99
 # ip link set eth1 master bond99
 RTNETLINK answers: Bad address

Kernel complains with:

  bond99: (slave eth1): the slave hw address is in use by the bond; couldn't find a slave with a free hw address to give it (this should not have happened)

Gris, do you know why nmstate is setting duplicate
cloned-mac-addresses on the slaves?

Comment 6 Gris Ge 2020-03-09 09:55:27 UTC
(In reply to Beniamino Galvani from comment #5)
> In the second invocation nmstate creates two slaves connections with
> the same cloned-mac-address. This is not allowed for bonds using modes
> ALB or TLB and so the enslavement of the second interface fails with:
> 
>  platform-linux: do-change-link[7]: failure changing link: failure 14 (Bad
> address)
> 
> A simple reproducer using iproute2:
> 
>  # addr=00:99:88:77:66:55
> 
>  # ip link add bond99 type bond mode 5
>  # ip link set eth0 addr $addr
>  # ip link set eth1 addr $addr
> 
>  # ip link set eth0 master bond99
>  # ip link set eth1 master bond99
>  RTNETLINK answers: Bad address
> 
> Kernel complains with:
> 
>   bond99: (slave eth1): the slave hw address is in use by the bond; couldn't
> find a slave with a free hw address to give it (this should not have
> happened)
> 
> Gris, do you know why nmstate is setting duplicate
> cloned-mac-addresses on the slaves?

Yeah. I found out the root cause also.

nmstate just try to hardcode the mac in profile which is incorrect for this case.


Closing as not a bug of NM.

Comment 7 Thomas Haller 2020-04-08 07:33:36 UTC
Dropping from RPL-8.3, as this is CLOSED.