Bug 1434555

Summary: Better handling of bonds with TYPE=Ethernet
Product: Red Hat Enterprise Linux 7 Reporter: Dan Williams <dcbw>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.4CC: aloughla, atragler, bgalvani, fgiudici, jeder, lrintel, rkhan, sukulkar, thaller, vbenes
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aos-scalability-35
Fixed In Version: NetworkManager-1.8.0-0.4.rc2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 09:24:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
[PATCH] ifcfg-rh: also check BONDING_OPTS to determine the connection type
none
[PATCH v2] ifcfg-rh: also check BONDING_OPTS to determine the connection type none

Description Dan Williams 2017-03-21 17:47:43 UTC
Came across a machine the other day that had a bonding file with TYPE=Ethernet and BONDING_OPTS="<valid stuff>" and BONDING_MASTER=yes.  NM failed to bring up the bond because (legitimately) TYPE != Bond and printed a message to the journal.

However, the Fedora/RHEL scripts don't care about TYPE for bond masters, it seems.  They:

1) call ifup-eth on the interface (because there is no 'ifup-bond')
2) calls source_config from network-functions, which sets DEVICETYPE=bond and REALDEVICE=bond0
3) calls is_available from network-functions, which sees BONDING_OPTS and then installs the bonding module, creating the bond0 device
4) then calls is_bonding_device from network-functions, which this device is (because it was created on modprobe in step #3)
5) proceeds to ifup each slave of the bond

so basically, the initscript don't care about TYPE for bonds, and in this specific case things work out because the interface is named bond0 and is created during the modprobe bonding process.

Maybe NM should just look for BONDING_OPTS or BONDING_MASTER and just assume TYPE=Bond if previously TYPE=Ethernet?  I think this NM codepath has existed since 2011 when tgraf added bonding support in a2a0d788.

Comment 2 Dan Williams 2017-03-21 17:49:02 UTC
For the record, /etc/sysconfig/network-scripts/ifcfg-bond0 was:

DEVICE=bond0
NM_CONTROLLED=yes
TYPE=Ethernet
BONDING_OPTS="miimon=100 mode=4 lacp_rate=1"
BONDING_MASTER=yes
ONBOOT=yes
BOOTPROTO=none
USERCTL=no

Comment 3 Beniamino Galvani 2017-03-23 16:46:14 UTC
Created attachment 1265836 [details]
[PATCH] ifcfg-rh: also check BONDING_OPTS to determine the connection type

Comment 4 Thomas Haller 2017-03-23 17:10:01 UTC
(In reply to Beniamino Galvani from comment #3)
> Created attachment 1265836 [details]
> [PATCH] ifcfg-rh: also check BONDING_OPTS to determine the connection type

+        && svGetValueStr_cp (parsed, "BONDING_OPTS")

the "_cp" stands for copy. The patch leaks a string.

rest lgtm

Comment 5 Thomas Haller 2017-03-23 17:25:45 UTC
It seems the steps from comment 1 only work because
  1) the device is called bond0
  2a) because the bonding module gets actually loaded and doesn't have 
      max_bonds=0
  2b) or, for some other reason a bond with name bond0 already exists.

So, initscripts look at the current situation of interfaces and determines that this is a bond. I really wish that the settings plugin would make decisions based on the ifcfg-rh file alone, not looking at the system.... but that is already violated at other places and contradicts initscripts.

Anyway, checking for BONDING_OPTS seems like a safe bet still. So +1.

Comment 6 Beniamino Galvani 2017-03-23 18:04:23 UTC
Created attachment 1265855 [details]
[PATCH v2] ifcfg-rh: also check BONDING_OPTS to determine the connection type

(In reply to Thomas Haller from comment #4)
> the "_cp" stands for copy. The patch leaks a string.

Whoops, fixed in v2.


(In reply to Thomas Haller from comment #5)
> It seems the steps from comment 1 only work because
>   1) the device is called bond0
>   2a) because the bonding module gets actually loaded and doesn't have 
>       max_bonds=0
>   2b) or, for some other reason a bond with name bond0 already exists.
> 
> So, initscripts look at the current situation of interfaces and determines
> that this is a bond. I really wish that the settings plugin would make
> decisions based on the ifcfg-rh file alone, not looking at the system....
> but that is already violated at other places and contradicts initscripts.

The configuration file in comment 0 works for every device name and
not only for bond0, because install_bonding_drivers(), which is called
when BONDING_OPTS is set, also creates the bond interface through
sysfs:

 [ -n "$BONDING_OPTS" ] && install_bonding_driver $1

--

 install_bonding_driver ()
 {
    [ ! -f /sys/class/net/bonding_masters ] && ( modprobe bonding || return 1 )
    if ! fgrep -sqx "$1" /sys/class/net/bonding_masters; then
        echo "+$1" > /sys/class/net/bonding_masters 2>/dev/null
 ...

So, it doesn't depend on system status.

Comment 9 errata-xmlrpc 2017-08-01 09:24:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2299