Bug 1367199

Summary: iptablesSyncPeriod should default to 30s OOTB
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: InstallerAssignee: Samuel Munilla <smunilla>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: aos-bugs, ghuang, jeder, jokerman, mifiedle, mmccomas, tdawson, tstclair, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 09:44:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Fiedler 2016-08-15 20:22:43 UTC
Description of problem:

iptablesSyncPeriod was made configurable in 

https://github.com/openshift/openshift-docs/issues/1051
https://github.com/openshift/openshift-ansible/pull/743

The default value (5 seconds) in the code and the installer is too aggressive.  Based on scalability testing during 3.3 with over 7K nodes, iptables consumes too much CPU at this setting.

Version-Release number of selected component (if applicable): 

3.3.0.18


How reproducible: Always


Steps to Reproduce:
1. Install an HA cluster (3 masters, 3 etcd, 2 router/registry, 300 nodes
2. Run cluster-loader from the SVT Git repo to create 5000 projects (20K pods) using the node vertical test configuration.


Actual results:

Not all pods run - only about 7K can start
iptables consumes a full core and stays  pegged for most of the test
Many errors like this show up in the system log:

Aug 15 16:00:33 mvirt-m-1 atomic-openshift-node: E0815 16:00:33.497082   49875 node_iptables.go:64] Syncing openshift iptables failed: Failed to ensure rule {nat POSTROUTING [-s 10.128.0.0/10 ! -d 10.128.0.0/10 -j MASQUERADE]} exists: error checking rule: exit status 4: iptables: Resource temporarily unavailable.



Expected results:

No errors starting all 20K pods

Comment 1 Timothy St. Clair 2016-08-15 20:24:39 UTC
Default upstream sync period is 30s.

Comment 2 Mike Fiedler 2016-08-15 20:33:15 UTC
The default should be 30s.   kubernetes originally had 5s upstream but has since changed it to 30s in the code.

Comment 3 openshift-github-bot 2016-08-16 17:22:48 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/735729b08506be08b2a5215a8c1d628cac6d7741
Bug 1367199 - iptablesSyncPeriod should default to 30s OOTB

Update the default to thirty seconds.

https://github.com/openshift/openshift-ansible/commit/dcfddb882554b7f1a9aa1f4024ba9eb2ebf07204
Merge pull request #2306 from smunilla/BZ1367199

Bug 1367199 - iptablesSyncPeriod should default to 30s OOTB

Comment 5 Gan Huang 2016-08-23 02:14:44 UTC
Verified with atomic-openshift-utils-3.3.13-1.git.0.7435ce7.el7.noarch

$ grep "iptables_sync_period" /usr/share/ansible/openshift-ansible/roles/openshift_facts/library/openshift_facts.py
                                    iptables_sync_period='30s',

Comment 6 Gan Huang 2016-08-23 02:46:29 UTC
iptablesSyncPeriod is set to 30s by default in node-config.yaml.

# grep "iptablesSyncPeriod" /etc/origin/node/node-config.yaml 
iptablesSyncPeriod: "30s"

Comment 8 Mike Fiedler 2016-08-23 12:32:18 UTC
I will verify, but this is actually hard to verify.   The 30 seconds is a MAX limit on the resync.   Changes to services or endpoints will still force resyncs at more frequent intervals.   tstclair, I think there should be a separate bz/issue opened to establish a MIN sync.   Agree?

Comment 9 Mike Fiedler 2016-09-02 11:55:07 UTC
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1371971 to follow up on the min resync interval.

Verified in 3.3.0.28 that iptablesResyncPeriod correctly set to 30 seconds during install.

Comment 11 errata-xmlrpc 2016-09-27 09:44:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933