Bug 1371971 - Need a minimum counterpart to iptablesSyncPeriod
Summary: Need a minimum counterpart to iptablesSyncPeriod
Keywords:
Status: CLOSED DUPLICATE of bug 1387149
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.3.0
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: ---
Assignee: Timothy St. Clair
QA Contact: Meng Bo
URL:
Whiteboard: aos-scalability-34
Depends On:
Blocks: OSOPS_V3
TreeView+ depends on / blocked
 
Reported: 2016-08-31 14:57 UTC by Mike Fiedler
Modified: 2016-10-28 16:02 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-28 16:02:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mike Fiedler 2016-08-31 14:57:26 UTC
Description of problem:

On large clusters which are changing quickly (new endpoints, pods, svc, etc) iptables can consume a full core on all systems.   This was seen when scaling up the CNCF cluster to 300 nodes and 5K projects.   Each project had 3 deployments, 3 services, 4 pods and more.

iptablesSyncPeriod provides an upper bound (default 30s) to how often iptables will synch, but when many changes are occurring there is no lower bound which can lead to the state where iptables is consuming a core on the nodes.

This bz requests creation of a parameter to set a lower bound > "whenever changes occur".   Propose to default it to 0 but allow it to be raised in large dynamic clusters.

Comment 2 Mike Fiedler 2016-09-01 12:43:11 UTC
I saw it during cluster load up and during project deletion.   Pretty sure it was in conjunction with create/delete activity.

Comment 3 Timothy St. Clair 2016-09-01 12:53:47 UTC
k, it should be batching on 30 sec intervals.  We'll need to dig.

Comment 4 Timothy St. Clair 2016-09-28 21:17:40 UTC
During a large load, or bulk rectification, the issue is that continuous service/endpoint updates are computationally expensive b/c the rules are constantly being updated.  What is more disconcerting, is that any high degree of endpoint churn will cause broadcasts to every node in the cluster to refresh their tables. 

We may need to modify this to be a bulk time-windowed operation.

Comment 5 Timothy St. Clair 2016-09-28 21:43:06 UTC
xref: https://github.com/kubernetes/kubernetes/issues/33693

Comment 6 Timothy St. Clair 2016-10-21 22:00:26 UTC
Working on some fixes - https://github.com/kubernetes/kubernetes/pull/35334


Note You need to log in before you can comment on or make changes to this bug.