1643304 – firewalld reload causes namespace wide egress IP to stop working

Bug 1643304 - firewalld reload causes namespace wide egress IP to stop working

Summary: firewalld reload causes namespace wide egress IP to stop working

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Dan Winship
QA Contact:	Meng Bo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-25 21:15 UTC by Taneem Ibrahim
Modified:	2019-06-04 10:40 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Egress IP-related iptables rules were not recreated if they got deleted. Consequence: If a user restarted firewalld or iptables.service on a node that hosted egress IPs, then those egress IPs would stop working. (Traffic that should have used the egress IP would use the node's normal IP instead.) Fix: Egress IP iptables rules are now recreated if they are removed. Result: Egress IPs work reliably.
Clone Of:
Clones:	1653380 1653381 1653382 1653384 (view as bug list)
Environment:
Last Closed:	2019-06-04 10:40:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Origin (Github)	21441	0	None	None	None	2018-11-08 14:15:55 UTC
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:40:58 UTC

Description Taneem Ibrahim 2018-10-25 21:15:09 UTC

Description of problem:

firewall-cmd reload (even when there are no rule changes) causes iptables reload error and removes egress IP rules. To resolve it, we have to run oc patch hostsubnet to remove and add the egress IP back to the individual namespaces. 

Version-Release number of selected component (if applicable):

v3.7.46

How reproducible:

Always

Steps to Reproduce:
1. Follow the instructions below to enable static egress IP:
https://docs.openshift.com/container-platform/3.7/admin_guide/managing_networking.html#enabling-static-ips-for-external-project-traffic

2. Run: firewall-cmd reload


Actual results:

Following IPTable rules are thrown:

Oct 24 18:46:59  firewalld[1071]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -n -L DOCKER' failed: iptables: No chain/target/match by that name.
...
...
Oct 24 18:46:59  firewalld[1071]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C POSTROUTING -s <redacted>/16 ! -o docker0 -j MASQUERADE' failed: iptables: No chain/target/match by that name.



Expected results:

Egress IP should work when firewalld is enabled.


Additional info:

Comment 4 Dan Winship 2018-11-07 15:22:11 UTC

@Meng Bo: can you try this again? It won't fail completely, but the egress traffic will end up using the node's normal IP rather than the egress IP:

1. Set up a cluster with firewalld running on the nodes
2. Set up an egress IP, test that it works
3. On the node with the egress IP, run "firewall-cmd --reload"
4. Try egress from a pod again, see that it uses the node IP rather than the egress IP

Comment 6 Meng Bo 2018-11-08 07:27:14 UTC

Hmm...

Yes, I got the problem result now.

After firewall-cmd --reload, the pod will use the node's IP as source IP instead of egressIP.

The reason should be the condition which Weibin discovered. 

Thanks, Weibin!

Comment 7 Dan Winship 2018-11-08 14:15:56 UTC

(In reply to Weibin Liang from comment #5)
> But egreeIP rule can be restored in iptalbes if continue running systemctl
> restart openvsitch/docker/atomic-openshift-node.

Sure, but you're not supposed to have to do that.

Fixed by https://github.com/openshift/origin/pull/21441. I'll do backports after that merges.

Comment 8 Dan Winship 2018-11-19 18:42:40 UTC

So do we need this backported to 3.7 or is the customer happy with their current workaround? (Or planning to upgrade to something newer than 3.7 soon?)

Comment 13 Meng Bo 2018-12-03 09:29:04 UTC

Tested on ocp 3.11.50
The issue has been fixed.

Comment 16 errata-xmlrpc 2019-06-04 10:40:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.