Bug 1544211

Summary: iptables is dropping rules on package update
Product: Red Hat OpenStack Reporter: Yolanda Robla <yroblamo>
Component: openstack-tripleo-heat-templatesAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED ERRATA QA Contact: Raviv Bar-Tal <rbartal>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: apevec, aschultz, atelang, augol, fbaudin, iptables-maint-list, jraju, lbezdick, lhh, mbultel, mburns, mcornea, nyechiel, pablo.iranzo, psutter, rhel-osp-director-maint, sathlang, srevivo, todoleza, yprokule, yroblamo
Target Milestone: z8Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-5.3.8-9.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-17 15:40:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
iptables -L output none

Description Yolanda Robla 2018-02-11 11:12:07 UTC
Created attachment 1394529 [details]
iptables -L output

When performing a yum update on iptables, i saw my rules being dropped after the update, causing a service disruption on my deployment.

This is the yum.log, so you can see the version of the package:

sudo cat /var/log/yum.log 
Feb 09 17:08:16 Updated: tuned-profiles-cpu-partitioning-2.8.0-5.el7_4.2.noarch
Feb 10 08:13:59 Updated: iptables-1.4.21-18.2.el7_4.x86_64
Feb 10 08:13:59 Updated: iptables-services-1.4.21-18.2.el7_4.x86_64


I also attach yum history info, so you can see the original package that was installed:

1221 packages excluded due to repository priority protections
    Actualizado iptables-1.4.21-18.0.1.el7.centos.x86_64          @?base
    Actualizar           1.4.21-18.2.el7_4.x86_64                 @updates
    Actualizado iptables-services-1.4.21-18.0.1.el7.centos.x86_64 @base
    Actualizar                    1.4.21-18.2.el7_4.x86_64        @updates
history info

This is the fragment of /var/log/messages showing the failure:

Feb 10 08:13:39 overcloud-novacompute-1 su: (to root) heat-admin on pts/1
Feb 10 08:13:59 overcloud-novacompute-1 yum[206603]: Updated: iptables-1.4.21-18.2.el7_4.x86_64
Feb 10 08:13:59 overcloud-novacompute-1 yum[206603]: Updated: iptables-services-1.4.21-18.2.el7_4.x86_64
Feb 10 08:13:59 overcloud-novacompute-1 systemd: Reloading.
Feb 10 08:13:59 overcloud-novacompute-1 systemd: [/usr/lib/systemd/system/ip6tables.service:3] Failed to add dependency on syslog.target,iptables.service, ignoring: Invalid argument
Feb 10 08:14:00 overcloud-novacompute-1 systemd: Stopping IPv4 firewall with iptables...
Feb 10 08:14:00 overcloud-novacompute-1 iptables.init: iptables: Setting chains to policy ACCEPT: raw filter [  OK  ]
Feb 10 08:14:00 overcloud-novacompute-1 iptables.init: iptables: Flushing firewall rules: [  OK  ]
Feb 10 08:14:00 overcloud-novacompute-1 iptables.init: iptables: Unloading modules: [  OK  ]
Feb 10 08:14:00 overcloud-novacompute-1 systemd: Starting IPv4 firewall with iptables...
Feb 10 08:14:00 overcloud-novacompute-1 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Feb 10 08:14:00 overcloud-novacompute-1 iptables.init: iptables: Applying firewall rules: [  OK  ]
Feb 10 08:14:00 overcloud-novacompute-1 systemd: Started IPv4 firewall with iptables.


I also attach the iptables -L before and after package update

Comment 2 Tomas Dolezal 2018-02-12 13:42:11 UTC
this is fixed in bz1380141. the bug is unfortunately present in the package that is being uninstalled, so the fix will take effect only for packages that are upgraded from version that is the fixed one.

iptables-1.4.21-18.2.el7_4 is actually the fixed package version.
* Mon Sep 18 2017 Phil Sutter - 1.4.21-18.2
- Prevent iptables.service and ip6tables.service from running in parallel
  (RHBZ#1491963)
- Don't restart services upon upgrade (RHBZ#1491961)

Comment 3 Yolanda Robla 2018-02-12 14:44:37 UTC
Ok moving to Upgrades in TripleO , to see if some workaround can be proposed

Comment 4 Marius Cornea 2018-02-12 15:08:23 UTC
Adding a note here: this bug is breaking data plane connectivity during minor update, update succeeds but overcloud instances become unreahable during update.

Comment 5 Yolanda Robla 2018-02-12 15:14:22 UTC
A minor clarification, it's not the overcloud nodes itself, but the vms created in the overcloud.
The way i noticed was the following:
- deploy overcloud
- create a external network with a range of fips
- create a vm, and attach a fip
- spin a continuous pingtest there
- trigger minor update, you will see that the dataplane connection breaks as soon as you update iptables

Comment 6 Yolanda Robla 2018-02-12 16:05:42 UTC
I was able to avoid the disruption by executing :
 iptables-save > /etc/sysconfig/iptables

See the content that is saved in compute-0 (where i saved the iptables), vs the content that is on compute-1 (that was not saved)

http://pastebin.test.redhat.com/554976

Comment 7 Phil Sutter 2018-02-13 12:51:08 UTC
Hi Yolanda,

(In reply to Yolanda Robla from comment #6)
> I was able to avoid the disruption by executing :
>  iptables-save > /etc/sysconfig/iptables

Note that this will make all temporary iptables rules persist across reboots if iptables service is enabled on the system. Did you make sure this won't upset VM management services during startup due to unexpected iptables rules?

> See the content that is saved in compute-0 (where i saved the iptables), vs
> the content that is on compute-1 (that was not saved)
> 
> http://pastebin.test.redhat.com/554976

Assuming the issue is fixed in current iptables package and your workaround doesn't cause negative side-effects, are you fine with closing this ticket then?

Cheers, Phil

Comment 8 Yolanda Robla 2018-02-13 13:08:14 UTC
I added that as a hint on how it could be solved, but i don't like to use this workaround, as most of the customers that deployed older OSP will be affected. I'd like that this comment is considered by the upgrade team, and add the iptables-save or any other specific fix, as part of the minor update.

Comment 9 Marius Cornea 2018-02-13 20:52:01 UTC
I tried this scenario on my environment and I was able to reproduce the issue:

on compute node iptables got updated:
[root@compute-0 ~]# grep iptables /var/log/yum.log 
Feb 13 09:25:40 Updated: iptables-1.4.21-18.2.el7_4.x86_64
Feb 13 09:26:38 Updated: iptables-services-1.4.21-18.2.el7_4.x86_64

FIP ping results during the entire run of the minor update:

[stack@undercloud-0 ~]$ tail -10 ping_results_201802131435 
[1518554529.301842] 64 bytes from 10.0.0.219: icmp_seq=4022 ttl=63 time=1.05 ms
[1518554530.302780] 64 bytes from 10.0.0.219: icmp_seq=4023 ttl=63 time=0.952 ms
[1518554531.303541] 64 bytes from 10.0.0.219: icmp_seq=4024 ttl=63 time=0.840 ms
[1518554532.303817] 64 bytes from 10.0.0.219: icmp_seq=4025 ttl=63 time=1.01 ms
[1518554533.304779] 64 bytes from 10.0.0.219: icmp_seq=4026 ttl=63 time=0.979 ms
[1518554534.304992] 64 bytes from 10.0.0.219: icmp_seq=4027 ttl=63 time=0.939 ms

--- 10.0.0.219 ping statistics ---
4027 packets transmitted, 3975 received, 1% packet loss, time 4029978ms
rtt min/avg/max/mdev = 0.445/0.929/4.717/0.203 ms

Difference is that I cannot see the 'Stopping IPv4 firewall with iptables' in /var/log/messages only

[root@compute-0 heat-admin]# grep 'IPv4 firewall with iptables' /var/log/messages  
Feb 13 09:53:31 localhost systemd: Starting IPv4 firewall with iptables...
Feb 13 09:53:31 localhost systemd: Started IPv4 firewall with iptables.

Comment 10 Marius Cornea 2018-02-13 20:52:54 UTC
(In reply to Marius Cornea from comment #9)
> I tried this scenario on my environment and I was able to reproduce the
> issue:

^ Sorry, I was _not_ able to reproduce the issue.

Comment 11 Yolanda Robla 2018-02-14 07:24:32 UTC
What is your original iptables package? See mine:

    Actualizado iptables-1.4.21-18.0.1.el7.centos.x86_64          @?base
    Actualizar           1.4.21-18.2.el7_4.x86_64                 @updates

From 18.0.1 to 18.2 , can you check yours?

Comment 12 Marius Cornea 2018-02-15 22:42:57 UTC
(In reply to Yolanda Robla from comment #11)
> What is your original iptables package? See mine:
> 
>     Actualizado iptables-1.4.21-18.0.1.el7.centos.x86_64          @?base
>     Actualizar           1.4.21-18.2.el7_4.x86_64                 @updates
> 
> From 18.0.1 to 18.2 , can you check yours?

Apologies for the delay. I was indeed able to reproduce this issue when using an older overcloud image(rhel 7.3):

updating from iptables-1.4.21-17.el7.x86_64 to iptables-1.4.21-18.2.el7_4.x86_64

As mentioned in comment #7 I think iptables-save will also save the temporary rules set by Neutron which we should avoid.

Comment 13 Yolanda Robla 2018-02-16 10:39:47 UTC
What can be the alternatives? Can we discriminate between the rules we want to save?

Comment 14 Phil Sutter 2018-02-16 13:54:18 UTC
Hi Yolanda,

(In reply to Yolanda Robla from comment #8)
> I added that as a hint on how it could be solved, but i don't like to use
> this workaround, as most of the customers that deployed older OSP will be
> affected. I'd like that this comment is considered by the upgrade team, and
> add the iptables-save or any other specific fix, as part of the minor update.

I played around a bit with spec file triggers and it looks like I have a
solution:

| %triggerun -n iptables-services -- iptables-services < 1.4.21-20%{?dist}
| mkdir -p /var/run/iptables
| iptables-save > /var/run/iptables/iptables.save
| ip6tables-save > /var/run/iptables/ip6tables.save
| 
| %triggerpostun -n iptables-services -- iptables-services < 1.4.21-20%{?dist}
| iptables-restore -c -w 10 /var/run/iptables/iptables.save
| ip6tables-restore -c -w 10 /var/run/iptables/ip6tables.save
| rm -rf /var/run/iptables

These triggers run before and after deinstallation of the old
iptables-services package. As you can see I'm using your workaround, namely
saving the old ruleset and restoring it after the assumed service restart.
I've successfully tested this on a local machine for testing.

Given the importance of this ticket, I guess we might still get it into
RHEL7.5, for 7.4 then z-stream process applies of course.

Cheers, Phil

Comment 18 Sofer Athlan-Guyot 2018-03-20 17:58:58 UTC
Very early implementation of a workaround during the upgrade based on the ovs one.
It fixes update *and* upgrade.

TODO: as it's rhel related we need to "upport" it to master.  The upgrade is using ansible from ocata on so it's completly different from ocata on.

Comment 32 errata-xmlrpc 2018-05-17 15:40:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1593