Bug 1381314

Summary: ZONE being blanked in ifcfg on reboot
Product: Red Hat Enterprise Linux 7 Reporter: Robert Locke <rlocke>
Component: firewalldAssignee: Thomas Woerner <twoerner>
Status: CLOSED ERRATA QA Contact: Tomas Dolezal <todoleza>
Severity: urgent Docs Contact: Mirek Jahoda <mjahoda>
Priority: urgent    
Version: 7.3CC: ajohn, antoine.tran, aperotti, brubisch, crlb, cww, daniel_johnson1, devin, dmoessne, fj-lsoft-oss, hannsj_uhl, jreznik, mabrown, mgrepl, mjahoda, msugaya, ndev, pasik, pasteur, Petaris, ptalbert, pvrabec, radoslaw.piliszek, rlocke, rmanes, scott.spyrison, songshuaishuai2, sunnyy, todoleza, twoerner, yuvald
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: Regression
Fixed In Version: firewalld-0.4.3.2-9 Doc Type: Bug Fix
Doc Text:
Previously, firewalld created the ifcfg backup file with the .old extension when the ZONE setting of an interface with ifcfg was modified by the firewalld D-Bus interface or the command-line tools. As a consequence, the ZONE setting in the ifcfg file was set to a blank value by firewalld on reboot if NetworkManager and the network service were both enabled. With this update, firewalld creates the ifcfg backup file with the .bak extension, and the ZONE setting remains correct.
Story Points: ---
Clone Of:
: 1410860 (view as bug list) Environment:
Last Closed: 2017-08-01 16:22:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1298243, 1317092, 1377248, 1400961, 1410860, 1414046    
Attachments:
Description Flags
test proposed fixes none

Description Robert Locke 2016-10-03 16:28:58 UTC
Description of problem:
I have ZONE=trusted in my ifcfg-br0 file. When I reboot, this is somehow changed to ZONE= which puts it in the default public zone which is breaking my VMs.

Version-Release number of selected component (if applicable):
firewalld-0.4.3.2-8.el7
NetworkManager-1.4.0-12.el7
systemd-219-30.el7

How reproducible:
Every reboot

Steps to Reproduce:
1. Have ZONE=trusted in ifcfg file
2. Reboot system
3. View ifcfg file

Actual results:
ZONE is blanked in ifcfg file

Expected results:
ZONE should be respected and kept in ifcfg file

Additional info:

Comment 2 Robert Locke 2016-10-03 18:45:40 UTC
This appears to be a regression when compared to 7.2 which does not exhibit this problem.

It also appears to be a function of having both NetworkManager and network active and enabled.

In my scenario, I want NetworkManager available to control certain interfaces and need to use network because I have some openvswitch interfaces.

Comment 3 Thomas Woerner 2016-10-04 10:19:15 UTC
Is NM_CONTROLLED=no set in this ifcfg file?

Comment 4 Robert Locke 2016-10-04 12:01:54 UTC
NM_CONTROLLED has not existed in the files in 7.2 nor in 7.3.

Since this is a Linux Bridge, does NM_CONTROLLED need to be set in the bridge ifcfg *and* the underlying physical interface?

If it helps and doesn't break 7.2, I'd be happy to put it in.

Comment 5 Thomas Woerner 2016-10-04 12:36:56 UTC
NM_CONTROLLED=no tells NetworkManager not to use the ifcfg file. This is not explained in the Networking Guide, somehow.

There are several things that might interfere with. There is a new firewalld version (0.4.3.2) and also a new NetworkManager version (1.4.0) in RHEL-7.3. Please check if the configuration of the bridge in NM shows the correct zone. I think the new NM version will try to take care about the bridge. Then it might either use another configuration (ifcfg or own) or the bridge is controlled by NM and the network service. This could then also result in issues.

Can you try to set NM_CONTROLLED=no in the ifcfg file of that bridge to test of this is fixing your issue? Or use NM to control the bridge.

I think this needs to be verified and therefore it would be good to get the sos information of that system.

Comment 7 Robert Locke 2016-10-04 13:32:04 UTC
Created attachment 1207196 [details]
sosreport after initial install when it still works

This is the sosreport after the initial install. Evidently network is the one that brought up the bridge:

[root@foundation0 ~]# systemctl -l status network
● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled)
   Active: active (exited) since Tue 2016-10-04 10:18:42 EDT; 6min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 963 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=0/SUCCESS)

Oct 04 10:18:42 foundation0.ilt.example.com systemd[1]: Starting LSB: Bring up/down networking...
Oct 04 10:18:42 foundation0.ilt.example.com network[963]: Bringing up loopback interface:  [  OK  ]
Oct 04 10:18:42 foundation0.ilt.example.com network[963]: Bringing up interface enp2s0:  [  OK  ]
Oct 04 10:18:42 foundation0.ilt.example.com network[963]: Bringing up interface br0:  [  OK  ]
Oct 04 10:18:42 foundation0.ilt.example.com systemd[1]: Started LSB: Bring up/down networking.
[root@foundation0 ~]# firewall-cmd --get-active-zones
external
  interfaces: eno1
trusted
  interfaces: br0
public
  interfaces: enp2s0

After I reboot, the ifcfg-br0 was moved to ifcfg-br0.old and the "new" ifcfg-br0 has the blank ZONE=

[root@foundation0 ~]# firewall-cmd --get-active-zones
public
  interfaces: br0 enp2s0 eno1

Comment 8 Robert Locke 2016-10-04 13:42:42 UTC
As an aside, what is responsible for creating the ifcfg-<iface>.old files in /etc/sysconfig/network-scripts/? That seems to be the culprit of pulling out the ZONE value.

Comment 9 Robert Locke 2016-10-04 17:03:25 UTC
Putting in NM_CONTROLLED=no has not helped.

It really appears to be related to whatever process/service is creating those .old versions of the ifcfg- file. That process is not preserving the ZONE= value.

Comment 10 Robert Locke 2016-10-04 20:58:19 UTC
So, with NetworkManager disabled during the installation (as part of kickstart), something is still moving ifcfg-br0 to ifcfg-br0.old where "ZONE" is being "blanked"....

So, something with "network.service" or "firewalld" is generating the .old file and failing to replicate the ZONE value.

Comment 11 Thomas Woerner 2016-10-06 13:31:58 UTC
Please try to apply this patch to firewall/core/fw_zone.py in /usr/lib/python2.7/site-packages/:

--- firewall/core/fw_zone.py.old
+++ firewall/core/fw_zone.py
@@ -671,7 +671,7 @@ class FirewallZone(object):
         zone_transaction.add_post(self.__unregister_interface, _obj,
                                   interface_id)
 
-        zone_transaction.add_post(ifcfg_set_zone_of_interface, "", interface)
+        #zone_transaction.add_post(ifcfg_set_zone_of_interface, "", interface)
 
         if use_zone_transaction is None:
             zone_transaction.execute(True)

This will stop the ifcfg file adaption in the remove_interface method on zones. 

With keeping the ifcfg file adaption in add_interface it should still be sufficient for people that are not using NetworkManager at all.

Comment 12 Robert Locke 2016-10-06 19:10:10 UTC
That patch has worked for the scenario where I have entirely disabled NetworkManager, am using just network.service, and still using firewalld for rules.

This would work for me.

I will test tomorrow with the original configuration of mish-mash NetworkManager and network.service both being active.

Is there a timeline for incorporating this before release or am I looking at a post-release 7.3 rpm (perhaps not until 7.4?)

Comment 13 Robert Locke 2016-10-08 00:37:56 UTC
This is also working with the mishmash of NetworkManager and network.service both active.

Though I have a separate bone to pick with network.service not respecting NM_CONTROLLED=yes of late.

Comment 14 daniel 2016-11-16 11:25:43 UTC
Either I'm doing sth wrong or this is not working:

[root@dhcp165 ~]# uname -a
Linux dhcp165.coe.muc.redhat.com 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp165 ~]# 

[root@dhcp165 ~]# diff -u /usr/lib/python2.7/site-packages/firewall/core/fw_zone.py.old /usr/lib/python2.7/site-packages/firewall/core/fw_zone.py
--- /usr/lib/python2.7/site-packages/firewall/core/fw_zone.py.old	2016-11-16 12:12:24.191360989 +0100
+++ /usr/lib/python2.7/site-packages/firewall/core/fw_zone.py	2016-11-16 12:12:58.941175540 +0100
@@ -671,7 +671,7 @@
         zone_transaction.add_post(self.__unregister_interface, _obj,
                                   interface_id)
 
-        zone_transaction.add_post(ifcfg_set_zone_of_interface, "", interface)
+        #zone_transaction.add_post(ifcfg_set_zone_of_interface, "", interface)
 
         if use_zone_transaction is None:
             zone_transaction.execute(True)
[root@dhcp165 ~]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
[root@dhcp165 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
IPV6INIT=no
BOOTPROTO=none
ONBOOT=yes
#ZONE=public
DNS1=10.32.96.1
TYPE=Ethernet
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
NAME="System eth0"
UUID=5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03
IPADDR=10.32.111.165
PREFIX=20
GATEWAY=10.32.111.254
ZONE=test
[root@dhcp165 ~]# nmcli con reload eth0
[root@dhcp165 ~]# nmcli con show 'System eth0' |grep -i zone
connection.zone:                        test
GENERAL.ZONE:                           test
[root@dhcp165 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
IPV6INIT=no
BOOTPROTO=none
ONBOOT=yes
#ZONE=public
DNS1=10.32.96.1
TYPE=Ethernet
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
NAME="System eth0"
UUID=5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03
IPADDR=10.32.111.165
PREFIX=20
GATEWAY=10.32.111.254
ZONE=test
[root@dhcp165 ~]# 
[root@dhcp165 ~]# firewall-cmd --zone=test --list-all
test (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: ssh
  ports: 
  protocols: 
  masquerade: no
  forward-ports: 
  sourceports: 
  icmp-blocks: 
  rich rules: 
	
[root@dhcp165 ~]# 
[root@dhcp165 ~]# systemctl restart network
[root@dhcp165 ~]# 
[root@dhcp165 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
IPV6INIT=no
BOOTPROTO=none
ONBOOT=yes
#ZONE=public
DNS1=10.32.96.1
TYPE=Ethernet
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
NAME="System eth0"
UUID=5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03
IPADDR=10.32.111.165
PREFIX=20
GATEWAY=10.32.111.254
[root@dhcp165 ~]# nmcli con show 'System eth0' |grep -i zone
connection.zone:                        --
GENERAL.ZONE:                           --
[root@dhcp165 ~]# 
[root@dhcp165 ~]# firewall-cmd --zone=test --list-all
test
  target: default
  icmp-block-inversion: no
  interfaces: 
  sources: 
  services: ssh
  ports: 
  protocols: 
  masquerade: no
  forward-ports: 
  sourceports: 
  icmp-blocks: 
  rich rules: 
	
[root@dhcp165 ~]# 
[root@dhcp165 ~]# 



NetworkManager and network are both enabled and even if the fix is applied (if I'm not mistaken) after a network restart or reboot of the system the configuration is gone which makes my customers network unusable and we need a fix here asap

Comment 15 daniel 2016-11-16 12:09:56 UTC
btw, when doing just an ifdown eth0 the ZONE=test becomes ZONE=

Comment 16 daniel 2016-11-16 12:37:19 UTC
Further testing:

playing around I found, if I disable NetworkManager on above system, add the ZONE=test entry again, reboot, enable NetworkManager and reboot again, it still holds the ZONE=test configuration





on a System I disabled NetworkManager pre upgrade to 7.3 there are no problems and the ZONE entry in ifcfg file is always valid and the interface stays in the defined zone. If enabling NetworkManager after the update this does not influence the ZONE entry in ifcfg file and so the interface keeps in the firewall zone it should be.

Comment 17 daniel 2016-11-16 14:38:18 UTC
ok, I cannot find any method to set a zone for an interface in 7.3 using NetworkManager,

neither via ifcfg (ZONE= ) and 'firewall-cmd --permanent --zone=test --add-interface=eth0' which aren't supported anyways but also not via 

nmcli c modify 'System eth0' connection.zone test as this is reset after reboot...

and also not with the fix from comment #11

[root@dhcp165 ~]# 
[root@dhcp165 ~]# diff -u /usr/lib/python2.7/site-packages/firewall/core/fw_zone.py /usr/lib/python2.7/site-packages/firewall/core/fw_zone.py.old
--- /usr/lib/python2.7/site-packages/firewall/core/fw_zone.py	2016-11-16 15:13:21.111542507 +0100
+++ /usr/lib/python2.7/site-packages/firewall/core/fw_zone.py.old	2016-11-16 15:12:56.851717795 +0100
@@ -671,7 +671,7 @@
         zone_transaction.add_post(self.__unregister_interface, _obj,
                                   interface_id)
 
-#        zone_transaction.add_post(ifcfg_set_zone_of_interface, "", interface)
+        zone_transaction.add_post(ifcfg_set_zone_of_interface, "", interface)
 
         if use_zone_transaction is None:
             zone_transaction.execute(True)
[root@dhcp165 ~]# nmcli con show 'System eth0' |grep -i zone
connection.zone:                        --
GENERAL.ZONE:                           --
[root@dhcp165 ~]# nmcli c modify 'System eth0' connection.zone test
[root@dhcp165 ~]# 
[root@dhcp165 ~]# nmcli con show 'System eth0' |grep -i zone
connection.zone:                        test
GENERAL.ZONE:                           test
[root@dhcp165 ~]# 
[root@dhcp165 ~]# reboot

$ ssh root.muc.redhat.com
Last login: Wed Nov 16 15:10:28 2016 from 10.32.64.93
[root@dhcp165 ~]# 
[root@dhcp165 ~]# nmcli con show 'System eth0' |grep -i zone
connection.zone:                        --
GENERAL.ZONE:                           --
[root@dhcp165 ~]# 

and the same happens when I just do a ifdown eth0.

the only ways I got this working is either:

- disable network.service
  (after the upgrade I nevertheless had to set the zone once again 
   via  nmcli c modify 'System eth0' connection.zone test   to make
   it persistent again)
or
-disable NetworkManager
   (After that the ZONE=.. in ifcfg file worked in RHEL7.2 and
    after upgrading to 7.3 w/o any further issues)

pre or after the upgrade. As this worked in RHEL7.2 I consider this a regression, or do I miss sth here ?

Comment 18 daniel 2016-11-16 14:39:39 UTC
Created attachment 1221174 [details]
protocoll of tests and issues

Comment 20 Thomas Woerner 2016-11-17 13:49:32 UTC
The ifcfg .old file is generated by firewalld. This has been fixed upstream with 

https://github.com/t-woerner/firewalld/commit/fe6cf16e5a5ef3e49cdb554af8cf18024371554a

Comment 21 Thomas Woerner 2016-11-17 14:22:17 UTC
@daniel

With the firewalld version in 7.3 the zone setting for connections is not saved in firewalld as long as NetworkManager is under control of the interface. This applies to firewall-cmd .. --change-interface= and the GUIs. This is done to only have one place where the zone setting for this connection/interface is stored.

Please add the patch from comment 11 after doing the update.

The additional assignment of the patch in comment 20 is making sure that an ifcfg backup save file is using the .bak extension and not the .old extension. ifcfg files with the .bak extension are ignored by NetworkManager and the network service.

With these two patches I do not see any issues any more after reboots and ifdowns/ifups and network service restarts. Please verify if this is also the case for you.

If needed I will create a test package with these two patches applied.

Comment 22 daniel 2016-11-17 17:40:49 UTC
Created attachment 1221612 [details]
test proposed fixes

Comment 23 daniel 2016-11-17 17:44:13 UTC
Thomas,

find attached my test of the suggested fixes attached as shown in previous update.

Test plan was:


Steps: RHEL7.2:
1) configure with interface in zone trust :

# systemctl list-units --all|egrep -i " network\.|NetworkManager"
# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# firewall-cmd --zone=public --list-all
# firewall-cmd --zone=test  --list-all
# nmcli con show 'System eth0' |grep -i zone
-------


# nmcli c modify 'System eth0' connection.zone test
# nmcli con show 'System eth0' |grep -i zone
# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# firewall-cmd --zone=test  --list-all

2)reboot---

# nmcli con show 'System eth0' |grep -i zone
# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# firewall-cmd --zone=test  --list-all

--> expected: configuration persistent ==> OK

3) update to RHEL7.3 and reboot

# nmcli con show 'System eth0' |grep -i zone
# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# firewall-cmd --zone=test  --list-all

--> expected: configuration gone  ==> OK

4) implement fixes, reconfigure ==> done


# nmcli c modify 'System eth0' connection.zone test
# nmcli con show 'System eth0' |grep -i zone
# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# firewall-cmd --zone=test  --list-all

5) reboot

# nmcli con show 'System eth0' |grep -i zone
# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# firewall-cmd --zone=test  --list-all

--> expected: configuration persistant ==> OK

-> second reboot,expected ok ==> OK 

6)  second test 
6.1 revert to buggy scripts in RHEL7.3 and reboot
  --> expected is to have issue again ==> OK
6.2 put back in fixes and edit just ifcfg file, reboot
   (will this work w/o nmcli con reload?) 
  --> expected, will do ==>OK

So I'd say your fix is working indeed, but the renaming did not work for me, see attached protocol, perhaps I made a mistake.

Cheers,
Daniel

Comment 25 Thomas Woerner 2016-12-01 10:35:27 UTC
(In reply to daniel from comment #23)
> Thomas,
> 
...
> 
> So I'd say your fix is working indeed, but the renaming did not work for me,
> see attached protocol, perhaps I made a mistake.
> 
> Cheers,
> Daniel

The renaming is working, there is no ifcfg-X.old in use even after changing the ifcfg file. Or are you talking about another rename?

Comment 26 Thomas Woerner 2016-12-01 11:07:42 UTC
Here is the upstream patch:

https://github.com/t-woerner/firewalld/commit/636e01137515f3830c655619096e9642651a674c

Comment 33 Petaris 2016-12-21 00:47:23 UTC
I can confirm that the solution offered in Comment 11 worked on three servers affected at my site so far.

One server had NO existing ZONE=, the two others both had ZONE= specified for each network interface.

In my environment this only seems to be happening to systems that are using a SCAP baseline.  Could those others of you who have had this occur check if you are utilizing a SCAP baseline as well?

Thank you,

Justin

Comment 35 Radosław Piliszek 2017-01-03 17:52:38 UTC
From what I discovered this bug affects ifcfg's matched by DEVICE but not by HWADDR.

I fixed it by disabling the network service as I use NetworkManager anyway.

Comment 41 Thomas Woerner 2017-01-18 11:01:03 UTC
*** Bug 1385521 has been marked as a duplicate of this bug. ***

Comment 44 Tomas Dolezal 2017-01-30 14:23:09 UTC
This bug was mistakenly closed, it's in the previous state now.

Comment 46 Sunny 2017-02-16 23:49:23 UTC
I got the following workaround without using the patch, not sure how reliable it is and not sure how it really works, but thought I leave it here in case it uncovers weird problems.

For example I got eth1 in public zone and want to put it into trusted zone.

1) ifdown eth1
2) Add ZONE=trusted to ifcfg-eth1.
3) ifup eth1
4) firewall-cmd --zone=trusted --list-all --> eth1 is now in trusted zone.
5) reboot
6) firewall-cmd --zone=trusted --list-all --> eth1 still in trusted zone.

If I do 2) then 1) then ifdown eth1 wipes ZONE in ifcfg-eth1. So ifdown is not called when the server is rebooted?

NetworkManager is disabled and not running, network.service is enabled and running.

So as a result I got something like this in puppet to manage it:

      augeas { "${interface}_ZONE":
        context => "/files/etc/sysconfig/network-scripts/ifcfg-${interface}",
        changes => 'set ZONE trusted',
      }~>
      exec { "profile__network_stop_${interface}":
        command     => "ifdown ${interface}",
        path        => '/usr/sbin',
        refreshonly => true,
      }->
      augeas { "${interface}_ZONE_again":
        context => "/files/etc/sysconfig/network-scripts/ifcfg-${interface}",
        changes => 'set ZONE trusted',
      }~>
      exec { "profile__network_start_${interface}":
        command     => "ifup ${interface}",
        path        => '/usr/sbin',
        refreshonly => true,
      }

Not ideal but works.

Comment 49 crlb@uvic.ca 2017-07-14 21:26:48 UTC
Here's a simple workaround based on the fact that both firewalld and ifdown (/usr/sbin/usernetctl) both 
save the old configuration when they write a new configuration:

  [root@beaver-ocata network-scripts]# mv /usr/sbin/ifdown /usr/sbin/ifdown-original
  [root@beaver-ocata network-scripts]# vi /usr/sbin/ifdown
  #!/bin/bash
    /usr/sbin/ifdown-original $1 $2     

    if [ -f /etc/sysconfig/network-scripts/ifcfg-${1}.old ]; then
      mv /etc/sysconfig/network-scripts/ifcfg-${1}.old /etc/sysconfig/network-scripts/ifcfg-${1}
    fi

Comment 50 errata-xmlrpc 2017-08-01 16:22:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1934

Comment 51 Antoine TRAN 2018-06-05 08:55:47 UTC
From https://github.com/moby/moby/issues/16137#issuecomment-394630145, I can confirm this issue is not fixed yet, in some environment. I can see at least 3 projects (mine, Angelinsky7  and PMarci) where our CentOs/RedHat is >= 7.3 and this still happen.