Bug 1355656

Summary: Deleting a bridge with a slave attached leaves the slave with a nonexistent master
Product: Red Hat Enterprise Linux 7 Reporter: Huijuan Zhao <huzhao>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 7.2CC: atragler, bgalvani, bmcclain, bugs, cshao, dfediuck, dguo, fdeutsch, fgiudici, huzhao, jiawu, leiwang, lrintel, mgoldboi, mvollmer, rbarry, rkhan, thaller, vbenes, weiwang, yaniwang, ycui, yzhao
Target Milestone: pre-dev-freezeKeywords: Extras
Target Release: 7.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 19:23:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1304509, 1329957, 1400961    
Attachments:
Description Flags
kickstart file
none
All logs
none
After remove bridge0, all files in /etc/sysconfig/network-scripts
none
[PATCH] ifcfg-rh: ensure master is cleared when updating a connection
none
[PATCH v2] ifcfg-rh: ensure master is cleared when updating a connection none

Description Huijuan Zhao 2016-07-12 07:31:16 UTC
Created attachment 1178782 [details]
kickstart file

Description of problem:
Add one bridge0 on NIC em1 via cockpit, then delete this bridge0, NIC em1 can not be up, there is error report when "ifup em1":
Error: Connection activation failed: Master connection not found or invalid

Version-Release number of selected component (if applicable):
redhat-virtualization-host-4.0-20160708.0.x86_64
imgbased-0.7.2-0.1.el7ev.noarch
cockpit-0.108-1.el7.x86_64
libvirt-daemon-driver-network-1.2.17-13.el7_2.5.x86_64
glib-networking-2.42.0-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Install redhat-virtualization-host-4.0-20160708.0.x86_64 with kickstart file in attachment    
2. Login cockpit website hostIP:9090 with root account
3. Select Networking page, create bridge0 on em1.
4. Delete bridge0 on em1 via cockpit.


Actual results:
1. After step4, bridge0 can be deleted, but em1 can not be up via "ifup em1", there is error report:
Error: Connection activation failed: Master connection not found or invalid

Expected results:
1. After step4, em1 should can be up via "ifup em1"

Additional info:

Comment 1 Huijuan Zhao 2016-07-12 07:32:19 UTC
Created attachment 1178783 [details]
All logs

Comment 3 Marius Vollmer 2016-08-03 11:55:49 UTC
I can not reproduce this with cockpit 0.116 and NetworkManager 1.2.2 on Fedora 24 with a otherwise unused network interface instead of em1.

After deleting the bridge, the network interface can be brought up with ifup, nmcli, and Cockpit.

I'll try your kickstart file, but I guess this bug has been fixed already somewhere.

Comment 4 Marius Vollmer 2016-08-03 11:59:06 UTC
> 1. Install redhat-virtualization-host-4.0-20160708.0.x86_64 with kickstart file in attachment    

Can you give exact instructions for how to do this?  I am not sufficiently familiar with kickstart files, unfortunately.

Comment 5 Huijuan Zhao 2016-08-04 03:07:30 UTC
(In reply to Marius Vollmer from comment #4)
> > 1. Install redhat-virtualization-host-4.0-20160708.0.x86_64 with kickstart file in attachment    
> 
> Can you give exact instructions for how to do this?  I am not sufficiently
> familiar with kickstart files, unfortunately.

Put the liveimg.squashfs and kickstart.ks on http server, then install  liveimg.squashfs via PXE server which will call kickstart.ks to access install process.

But please do not worry this, I think this issue is no matter with the kickstart file, also can reproduce with iso install.

I reproduced this bug with cockpit-0.114-2.el7.x86_64 and NetworkManager-1.0.6-30.el7_2.x86_64.

I think the crucial issue maybe the NIC ifcfg file: ifcfg-em1.

1. After remove bridge0, the ifcfg-em1 looks like below, can not up em1 via "ifup em1".

# cat ifcfg-em1
NAME="em1"
DEVICE="em1"
ONBOOT=yes
NETBOOT=yes
UUID="22a273e8-3c1e-4d1c-abdf-7da8df63178b"
IPV6INIT=yes
BOOTPROTO=dhcp
TYPE=Ethernet
BRIDGE=aa680956-01e5-4c90-9193-dac38cf8ab03
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no

2. Fix ifcfg-em1, delete "BRIDGE=aa680956-01e5-4c90-9193-dac38cf8ab03", then can up em1 successful.

Comment 6 Huijuan Zhao 2016-08-04 03:14:13 UTC
Created attachment 1187332 [details]
After remove bridge0, all files in /etc/sysconfig/network-scripts

Comment 7 Huijuan Zhao 2016-08-04 03:16:56 UTC
(In reply to Marius Vollmer from comment #3)
> I can not reproduce this with cockpit 0.116 and NetworkManager 1.2.2 on
> Fedora 24 with a otherwise unused network interface instead of em1.
> 
> After deleting the bridge, the network interface can be brought up with
> ifup, nmcli, and Cockpit.
> 
> I'll try your kickstart file, but I guess this bug has been fixed already
> somewhere.

Maybe it has been fixed already somewhere. But for RHVH 4.0 build, the latest package is cockpit-0.114-2.el7.x86_64 and NetworkManager-1.0.6-30.el7_2.x86_64, so can still encounter this issue.

Comment 8 Marius Vollmer 2016-08-04 08:18:39 UTC
(In reply to Marius Vollmer from comment #3)
> I can not reproduce this with cockpit 0.116 and NetworkManager 1.2.2 on
> Fedora 24 with a otherwise unused network interface instead of em1.

I can reproduce it now.

NetworkManager gets itself into a inconsistent state (see below) and I managed to stay on the good side of that inconsistency somehow.  Doing a "nmcli c reload" at the right time will expose the inconsistency.

(In reply to Huijuan Zhao from comment #5)
> 2. Fix ifcfg-em1, delete "BRIDGE=aa680956-01e5-4c90-9193-dac38cf8ab03", then
> can up em1 successful.

Yes, that's the inconsistency, thanks for making me see it.  I'll write up reproduction steps without Cockpit or RHEV and then assign to NetworkManager.

Comment 9 Marius Vollmer 2016-08-04 08:53:04 UTC
NetworkManager leaves a bogus BRIDGE= line in ifcfg when a bridge slave is removed from a bridge.  A "nmcli c reload" will spontanously turn that interface into a slave again.

Steps:

- Take a otherwise unused network interface, "ens14" in my case.

- Make sure there are regular non-slave connection settings for that interface.

# nmcli dev con ens14
Device 'ens14' successfully activated with 'e0ff484e-c60c-426f-802e-96ba83832728'.
# nmcli dev dis ens14
Device 'ens14' successfully disconnected.
# nmcli c s ens14 | grep 'master:\|slave-type:'
connection.master:                      --
connection.slave-type:                  --
  
- Create a bridge.

# nmcli con add type bridge
Connection 'bridge' (3eb8e7cd-357f-431d-9941-d45e5d093db5) successfully added.

- Make ens14 a slave of the bridge.

# nmcli con mod ens14 connection.master 3eb8e7cd-357f-431d-9941-d45e5d093db5 connection.slave-type bridge
# nmcli c s ens14 | grep 'master:\|slave-type:'
connection.master:                      3eb8e7cd-357f-431d-9941-d45e5d093db5
connection.slave-type:                  bridge
# grep BRIDGE /etc/sysconfig/network-scripts/ifcfg-ens14
BRIDGE=3eb8e7cd-357f-431d-9941-d45e5d093db5

- Liberate ens14 again:

# nmcli con mod ens14 connection.master "" connection.slave-type ""
# nmcli c s ens14 | grep 'master:\|slave-type:'
connection.master:                      --
connection.slave-type:                  --
# grep BRIDGE /etc/sysconfig/network-scripts/ifcfg-ens14
BRIDGE=3eb8e7cd-357f-431d-9941-d45e5d093db5

[ BRIDGE is still there. ]

- Delete bridge

# nmcli c del bridge
Connection 'bridge' (3eb8e7cd-357f-431d-9941-d45e5d093db5) successfully deleted.

- Bring ens14 up

# nmcli dev con ens14
Device 'ens14' successfully activated with 'e0ff484e-c60c-426f-802e-96ba83832728'.
# nmcli dev dis ens14
Device 'ens14' successfully disconnected.

[ This is where I stopped earlier and thought I can't reproduce the bug. ]

- Reload configuration

# nmcli con reload
# nmcli c s ens14 | grep 'master:\|slave-type:'
connection.master:                      3eb8e7cd-357f-431d-9941-d45e5d093db5
connection.slave-type:                  bridge

[ ens14 is a slave of a non-existing bridge. ]

- Try to bring ens14 up

# nmcli dev con ens14
Error: Device activation failed: Can not find a master for ens14: Master connection not found or invalid

- Clean up

# sed -i /etc/sysconfig/network-scripts/ifcfg-ens14 -e '/^BRIDGE=/d'
# nmcli con reload
# nmcli c s ens14 | grep 'master:\|slave-type:'
connection.master:                      --
connection.slave-type:                  --
# nmcli dev con ens14
Device 'ens14' successfully activated with 'e0ff484e-c60c-426f-802e-96ba83832728'.

Comment 10 Marius Vollmer 2016-08-04 08:57:49 UTC
Opps, the version.  I used

    NetworkManager-1.2.2-2.fc24.x86_64

on Fedora 24.

Comment 11 Beniamino Galvani 2016-08-04 13:19:53 UTC
Created attachment 1187505 [details]
[PATCH] ifcfg-rh: ensure master is cleared when updating a connection

Comment 12 Marius Vollmer 2016-08-05 07:08:34 UTC
Great, that was fast!  Thanks!

Comment 13 Huijuan Zhao 2016-08-15 11:01:40 UTC
This issue is fixed in redhat-virtualization-host-4.0-20160812.0.

Test version:
redhat-virtualization-host-4.0-20160812.0
imgbased-0.8.4-1.el7ev.noarch
cockpit-ws-0.114-2.el7.x86_64
cockpit-ovirt-dashboard-0.10.6-1.3.6.el7ev.noarch
redhat-virtualization-host-image-update-placeholder-4.0-1.el7.noarch

Steps to Reproduce:
1. Install redhat-virtualization-host-4.0-20160812.0.x86_64 with kickstart file in attachment    
2. Login cockpit website hostIP:9090 with root account
3. Select Networking page, create bridge0 on em1.
4. Delete bridge0 on em1 via cockpit.

Test results:
1. After step4, em1 can be up automatically

Comment 14 Huijuan Zhao 2016-08-16 08:25:54 UTC
(In reply to Huijuan Zhao from comment #13)
> This issue is fixed in redhat-virtualization-host-4.0-20160812.0.
> 
> Test version:
> redhat-virtualization-host-4.0-20160812.0
> imgbased-0.8.4-1.el7ev.noarch
> cockpit-ws-0.114-2.el7.x86_64
> cockpit-ovirt-dashboard-0.10.6-1.3.6.el7ev.noarch
> redhat-virtualization-host-image-update-placeholder-4.0-1.el7.noarch
> 
> Steps to Reproduce:
> 1. Install redhat-virtualization-host-4.0-20160812.0.x86_64 with kickstart
> file in attachment    
> 2. Login cockpit website hostIP:9090 with root account
> 3. Select Networking page, create bridge0 on em1.
> 4. Delete bridge0 on em1 via cockpit.
> 
> Test results:
> 1. After step4, em1 can be up automatically

Update:
After reboot RHVH, the slave NIC em1 can not be up again.
So this issue is not fixed completely

Comment 15 Thomas Haller 2016-08-18 08:42:11 UTC
(In reply to Beniamino Galvani from comment #11)
> Created attachment 1187505 [details]
> [PATCH] ifcfg-rh: ensure master is cleared when updating a connection

+ svSetValue (ifcfg, "DEVICETYPE", NULL, FALSE);

this first clears the DEVICETYPE setting, but latter write_connection_setting() may set it again.

Not sure, but doesn't ifcfg-writer aim to preserve comments and positions?
So that means, a file like:

   # set the DEVICETYPE
   DEVICETYPE=TeamPort

   #some other property
   UUID=uuid1


changes to:

   # set the DEVICETYPE
 
   #some other property
   UUID=uuid1
   DEVICETYPE=TeamPort



maybe, we can not clear "DEVICETYPE" in write_connection(), but only in write_connection_setting() if (and only if) it is to be actually cleared?




in write_connection_setting(), say we hit the condition:
    if (nm_setting_connection_is_slave_type (s_con, NM_SETTING_BOND_SET...
        svSetValue (ifcfg, "MASTER", master, FALSE);
        svSetValue (ifcfg, "SLAVE", "yes", FALSE);
I think, we should then also unset "BRIDGE", "TEAM_MASTER" (and "DEVICETYPE").
-- same for other branches.


(otherwise, patch looks right)

Comment 16 Beniamino Galvani 2016-08-19 08:34:31 UTC
Created attachment 1192070 [details]
[PATCH v2] ifcfg-rh: ensure master is cleared when updating a connection

(In reply to Thomas Haller from comment #15)
> Not sure, but doesn't ifcfg-writer aim to preserve comments and positions?

How about v2?

> maybe, we can not clear "DEVICETYPE" in write_connection(), but only in
> write_connection_setting() if (and only if) it is to be actually cleared?

Fixed.

Comment 17 Thomas Haller 2016-08-19 10:42:14 UTC
(In reply to Beniamino Galvani from comment #16)
> Created attachment 1192070 [details]
> [PATCH v2] ifcfg-rh: ensure master is cleared when updating a connection

v2 lgtm

Comment 18 Francesco Giudici 2016-08-19 16:39:19 UTC
lgtm

Comment 21 Huijuan Zhao 2016-08-29 02:37:35 UTC
Due to Bug 1367669, can not verify this bug completely now.
So I will verify this bug once RHVH support network in cockpit.

Comment 22 Vladimir Benes 2016-09-09 15:43:40 UTC
I can convert ethernet to bridge slave and vice versa w/o residues in ifcfg file.

Comment 24 errata-xmlrpc 2016-11-03 19:23:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2581.html

Comment 25 Marius Vollmer 2016-11-10 13:02:19 UTC
*** Bug 1362358 has been marked as a duplicate of this bug. ***

Comment 26 Marius Vollmer 2017-01-02 12:23:19 UTC
*** Bug 1401416 has been marked as a duplicate of this bug. ***