Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1502602 - [3.6] Iptables not getting updated with correct endpoints
[3.6] Iptables not getting updated with correct endpoints
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking (Show other bugs)
3.6.0
Unspecified Unspecified
urgent Severity urgent
: ---
: 3.6.z
Assigned To: Dan Williams
Meng Bo
:
Depends On:
Blocks: 1520926 1521151 1522935
  Show dependency treegraph
 
Reported: 2017-10-16 06:17 EDT by Vladislav Walek
Modified: 2018-01-23 12:58 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When a Network Egress DNS policy was used, a bug may have prevented further correct operation of the proxy, resulting in new pods not handling service requests. That bug is fixed and Egress DNS policies can now be used without triggering this bug.
Story Points: ---
Clone Of:
: 1520926 1521151 (view as bug list)
Environment:
Last Closed: 2018-01-23 12:58:09 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Origin (Github) 17584 None None None 2017-12-05 07:07 EST
Github openshift/ose/pull/958 None None None 2017-12-05 08:25 EST
Red Hat Product Errata RHBA-2018:0113 normal SHIPPED_LIVE OpenShift Container Platform 3.7 and 3.6 bug fix and enhancement update 2018-01-23 17:55:59 EST

  None (edit)
Description Vladislav Walek 2017-10-16 06:17:05 EDT
Description of problem:

Iptables shows the incorrect IP address of a endpoint. The iptables are not getting updated.
The restart of the openvswitch service will fix the issue.

Version-Release number of selected component (if applicable):
OpenSHift Container Platfrom 3.6

How reproducible:
can't reproduce

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Ben Bennett 2017-10-16 08:43:30 EDT
Please grab the node logs and the output from iptables-save.
Comment 3 Ben Bennett 2017-10-24 13:51:46 EDT
Also, please grab the logs from the affected node.
Comment 4 Vladislav Walek 2017-10-26 06:46:32 EDT
Hello,

got the reply from customer. The iptables is not getting updated:

The main reason is that somehow nodes loose commands from master. Nodes don't update iptables rules for pods, even though the system knows what should  be updated. So the etcd stores the info right, and it can be retrieved right using client, but all the nodes stop receiving iptables updates. Pod works and gets righ ip, and local traffic works to it, but iptables doesn't implement the right forward/nat rules needed to forward traffic to/from the pod. There is a marking in logs for iptables version not being detected, which could result to this perhaps:

Oct 26 08:49:35 oc-master-1-0 atomic-openshift-node: I1026 08:49:35.510552  120469 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait
Oct 26 08:49:35 oc-master-1-0 atomic-openshift-node: I1026 08:49:35.510552  120469 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait
Comment 5 Vladislav Walek 2017-10-26 06:58:59 EDT
Hello,

maybe found the issue. I have lab with 3.5 with iptables:
iptables-1.4.21-17.el7.x86_64

and the iptables-restore --version doesn't work as on customers. 
I did check on 3.6 where iptables runs with version

iptables-1.4.21-18.el7.x86_64

and it works. I suggested to update. However, if with the 3.6, shouldn't it be mandatory to have the version of iptables?
Thx
Comment 6 Vladislav Walek 2017-10-26 07:01:35 EDT
Hello Ben,

definitely the issue with version. After update to version 18 of iptables, it works.
The next step is to make the dependency between the 3.6 openshift and iptables 18.

Thx
Comment 59 Ben Bennett 2017-12-05 07:07:09 EST
The problem was the deadlock that Miheer identified.  A thread held a lock and never released it because a for loop never triggers deferred actions.
Comment 61 hongli 2018-01-07 22:19:52 EST
verified in atomic-openshift-3.6.173.0.94-1.git.0.8525e8f.el7.x86_64 and the issue has been fixed.
Comment 64 errata-xmlrpc 2018-01-23 12:58:09 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0113

Note You need to log in before you can comment on or make changes to this bug.