Bug 1380167

Summary: Latency on iptables rules update after atomic-openshift-node service restart
Product: OpenShift Container Platform Reporter: Miheer Salunke <misalunk>
Component: NetworkingAssignee: Dan Winship <danw>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.1CC: aos-bugs, bbennett, eparis, hongli, ndordet, tdawson
Target Milestone: ---Keywords: Reopened
Target Release: 3.2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: OpenShift nodes initialized some of their data structures incorrectly at startup. Consequence: After restarting a node, pods on that node would be unable to access some service IP addresses until a change was made to that service or a resync occurred. Fix: The buggy initialization code was fixed. Result: All services should be accessible as expected after restarting a node.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-04 14:28:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Miheer Salunke 2016-09-28 20:23:49 UTC
Description of problem:
Latency on iptables rules update after atomic-openshift-node service restart
After atomic-openshift-node service restarting, iptables rules (KUBE-SERVICES chain) are not ok during few minutes.

Version-Release number of selected component (if applicable):
3.2.0.20-1

How reproducible:
On customer end

Steps to Reproduce:
1.
2.
3.

Actual results:
After atomic-openshift-node service restarting, iptables rules (KUBE-SERVICES chain) are not ok during few minutes.

Expected results:
After atomic-openshift-node service restarting, iptables rules (KUBE-SERVICES chain) shall be OK right from the begining.

Additional info:

Comment 6 Ben Bennett 2016-10-17 17:50:38 UTC
Ok, so the problem is at:
  https://github.com/openshift/origin/blob/v1.2.0/Godeps/_workspace/src/github.com/openshift/openshift-sdn/plugins/osdn/registry.go#L133

We need to point to the list item, not use the pod to loop over it otherwise we are just pointing to that variable and we end up using the same pointer for all items.

Comment 9 Ben Bennett 2016-10-18 14:26:51 UTC
A little more info:
 - This will happen any time the atomic-openshift-node software is restated
 - It will self-correct after 5-10 minutes when the data structures refresh, it is only the initial initialization that is incorrect
 - This is resolved in 3.3 because the way all of this is tracked was completely re-done

Comment 10 Ben Bennett 2016-10-24 14:38:27 UTC
Miheer: Can you open a new bug for the new issue they are seeing with 3.3?  It is different from this one that they originally hit (on 3.2).

Comment 11 Ben Bennett 2016-10-24 14:39:04 UTC
Dropping the priority since it self-corrects and is fixed in 3.3.

Comment 12 Ben Bennett 2016-10-26 20:13:23 UTC
This is fixed in 3.3.  There is a PR ready for 3.2, but a merge was rejected because the urgency seemed low.

Comment 13 Miheer Salunke 2016-10-27 15:34:19 UTC
@Ben Sir-> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1389451

Comment 14 Troy Dawson 2017-03-01 19:51:20 UTC
Re-Opened.
Pull Request for fix:
https://github.com/openshift/ose/pull/641

Comment 17 Hongan Li 2017-03-23 06:55:20 UTC
verified in OCP 3.2.1.28 and the issue has been fixed.

After atomic-openshift-node service restarting, iptables rules (KUBE-SERVICES chain) is OK in about 15 seconds. 

[root@host-8-175-119 ~]# openshift version
openshift v3.2.1.28
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5
[root@host-8-175-119 ~]# 
[root@host-8-175-119 ~]# systemctl restart atomic-openshift-node
[root@host-8-175-119 ~]# 
[root@host-8-175-119 ~]# iptables -L KUBE-SERVICES
Chain KUBE-SERVICES (1 references)
target     prot opt source               destination         
REJECT     tcp  --  anywhere             172.30.147.75        /* install-test/cakephp-mysql-example:web has no endpoints */ tcp dpt:webcache reject-with icmp-port-unreachable
[root@host-8-175-119 ~]# 
[root@host-8-175-119 ~]# 
[root@host-8-175-119 ~]# 
[root@host-8-175-119 ~]# iptables -L KUBE-SERVICES
Chain KUBE-SERVICES (1 references)
target     prot opt source               destination         
[root@host-8-175-119 ~]#

Comment 19 errata-xmlrpc 2017-04-04 14:28:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0865