Bug 1502602

Summary: [3.6] Iptables not getting updated with correct endpoints
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: NetworkingAssignee: Dan Williams <dcbw>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.6.0CC: aos-bugs, bbennett, bleanhar, bmeng, dcbw, emahoney, erich, hongli, misalunk, stwalter, vwalek
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When a Network Egress DNS policy was used, a bug may have prevented further correct operation of the proxy, resulting in new pods not handling service requests. That bug is fixed and Egress DNS policies can now be used without triggering this bug.
Story Points: ---
Clone Of:
: 1520926 1521151 (view as bug list) Environment:
Last Closed: 2018-01-23 17:58:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1520926, 1521151, 1522935    

Description Vladislav Walek 2017-10-16 10:17:05 UTC
Description of problem:

Iptables shows the incorrect IP address of a endpoint. The iptables are not getting updated.
The restart of the openvswitch service will fix the issue.

Version-Release number of selected component (if applicable):
OpenSHift Container Platfrom 3.6

How reproducible:
can't reproduce

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ben Bennett 2017-10-16 12:43:30 UTC
Please grab the node logs and the output from iptables-save.

Comment 3 Ben Bennett 2017-10-24 17:51:46 UTC
Also, please grab the logs from the affected node.

Comment 4 Vladislav Walek 2017-10-26 10:46:32 UTC
Hello,

got the reply from customer. The iptables is not getting updated:

The main reason is that somehow nodes loose commands from master. Nodes don't update iptables rules for pods, even though the system knows what should  be updated. So the etcd stores the info right, and it can be retrieved right using client, but all the nodes stop receiving iptables updates. Pod works and gets righ ip, and local traffic works to it, but iptables doesn't implement the right forward/nat rules needed to forward traffic to/from the pod. There is a marking in logs for iptables version not being detected, which could result to this perhaps:

Oct 26 08:49:35 oc-master-1-0 atomic-openshift-node: I1026 08:49:35.510552  120469 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait
Oct 26 08:49:35 oc-master-1-0 atomic-openshift-node: I1026 08:49:35.510552  120469 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait

Comment 5 Vladislav Walek 2017-10-26 10:58:59 UTC
Hello,

maybe found the issue. I have lab with 3.5 with iptables:
iptables-1.4.21-17.el7.x86_64

and the iptables-restore --version doesn't work as on customers. 
I did check on 3.6 where iptables runs with version

iptables-1.4.21-18.el7.x86_64

and it works. I suggested to update. However, if with the 3.6, shouldn't it be mandatory to have the version of iptables?
Thx

Comment 6 Vladislav Walek 2017-10-26 11:01:35 UTC
Hello Ben,

definitely the issue with version. After update to version 18 of iptables, it works.
The next step is to make the dependency between the 3.6 openshift and iptables 18.

Thx

Comment 59 Ben Bennett 2017-12-05 12:07:09 UTC
The problem was the deadlock that Miheer identified.  A thread held a lock and never released it because a for loop never triggers deferred actions.

Comment 61 Hongan Li 2018-01-08 03:19:52 UTC
verified in atomic-openshift-3.6.173.0.94-1.git.0.8525e8f.el7.x86_64 and the issue has been fixed.

Comment 64 errata-xmlrpc 2018-01-23 17:58:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0113