Bug 1502602 - [3.6] Iptables not getting updated with correct endpoints
Summary: [3.6] Iptables not getting updated with correct endpoints
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.6.z
Assignee: Dan Williams
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks: 1520926 1521151 1522935
TreeView+ depends on / blocked
 
Reported: 2017-10-16 10:17 UTC by Vladislav Walek
Modified: 2018-01-23 17:58 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When a Network Egress DNS policy was used, a bug may have prevented further correct operation of the proxy, resulting in new pods not handling service requests. That bug is fixed and Egress DNS policies can now be used without triggering this bug.
Clone Of:
: 1520926 1521151 (view as bug list)
Environment:
Last Closed: 2018-01-23 17:58:09 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift ose pull 958 None None None 2020-03-24 06:31:01 UTC
Origin (Github) 17584 None None None 2017-12-05 12:07:09 UTC
Red Hat Product Errata RHBA-2018:0113 normal SHIPPED_LIVE OpenShift Container Platform 3.7 and 3.6 bug fix and enhancement update 2018-01-23 22:55:59 UTC

Description Vladislav Walek 2017-10-16 10:17:05 UTC
Description of problem:

Iptables shows the incorrect IP address of a endpoint. The iptables are not getting updated.
The restart of the openvswitch service will fix the issue.

Version-Release number of selected component (if applicable):
OpenSHift Container Platfrom 3.6

How reproducible:
can't reproduce

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ben Bennett 2017-10-16 12:43:30 UTC
Please grab the node logs and the output from iptables-save.

Comment 3 Ben Bennett 2017-10-24 17:51:46 UTC
Also, please grab the logs from the affected node.

Comment 4 Vladislav Walek 2017-10-26 10:46:32 UTC
Hello,

got the reply from customer. The iptables is not getting updated:

The main reason is that somehow nodes loose commands from master. Nodes don't update iptables rules for pods, even though the system knows what should  be updated. So the etcd stores the info right, and it can be retrieved right using client, but all the nodes stop receiving iptables updates. Pod works and gets righ ip, and local traffic works to it, but iptables doesn't implement the right forward/nat rules needed to forward traffic to/from the pod. There is a marking in logs for iptables version not being detected, which could result to this perhaps:

Oct 26 08:49:35 oc-master-1-0 atomic-openshift-node: I1026 08:49:35.510552  120469 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait
Oct 26 08:49:35 oc-master-1-0 atomic-openshift-node: I1026 08:49:35.510552  120469 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait

Comment 5 Vladislav Walek 2017-10-26 10:58:59 UTC
Hello,

maybe found the issue. I have lab with 3.5 with iptables:
iptables-1.4.21-17.el7.x86_64

and the iptables-restore --version doesn't work as on customers. 
I did check on 3.6 where iptables runs with version

iptables-1.4.21-18.el7.x86_64

and it works. I suggested to update. However, if with the 3.6, shouldn't it be mandatory to have the version of iptables?
Thx

Comment 6 Vladislav Walek 2017-10-26 11:01:35 UTC
Hello Ben,

definitely the issue with version. After update to version 18 of iptables, it works.
The next step is to make the dependency between the 3.6 openshift and iptables 18.

Thx

Comment 59 Ben Bennett 2017-12-05 12:07:09 UTC
The problem was the deadlock that Miheer identified.  A thread held a lock and never released it because a for loop never triggers deferred actions.

Comment 61 Hongan Li 2018-01-08 03:19:52 UTC
verified in atomic-openshift-3.6.173.0.94-1.git.0.8525e8f.el7.x86_64 and the issue has been fixed.

Comment 64 errata-xmlrpc 2018-01-23 17:58:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0113


Note You need to log in before you can comment on or make changes to this bug.