Bug 1727441 - kube-proxy periodic iptables reloads are extremely disruptive in large clusters
Summary: kube-proxy periodic iptables reloads are extremely disruptive in large clusters
Keywords:
Status: CLOSED DUPLICATE of bug 1803149
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.4.0
Assignee: Juan Luis de Sousa-Valadas
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks: 1801743 1801744
TreeView+ depends on / blocked
 
Reported: 2019-07-06 10:04 UTC by Miheer Salunke
Modified: 2020-04-03 12:53 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1801737 1801742 1801743 1801744 (view as bug list)
Environment:
Last Closed: 2020-02-20 11:52:55 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift origin pull 23872 'None' closed Bug 1727441: kube-proxy periodic iptables reloads are extremely disruptive in large clusters 2020-09-14 11:15:52 UTC

Comment 17 Dan Winship 2019-07-15 14:55:43 UTC
Belated update from Friday: it looks like the problem is:

      1. They have so many iptables rules that even things like
         "iptables -C" take a long time because of iptables API
         awfulness, and:

      2. Because of oddities in RHEL backporting and k8s iptables
         feature detection, OCP on RHEL 7 decides that
         "iptables-restore" supports "--wait=2", but "iptables" only
         supports "--wait" (forever).

      3. So the random periodic /sbin/iptables resync calls end up causing
         kube-proxy's iptables-restore calls to time out and fail.

      4. Fix: bump iptablesSyncPeriod up to something ridiculously high

This should hopefully get the customer's cluster stable enough that they can progress with their upgrade plans. We should look into having this work better out-of-the-box in 3.11 at least.

Comment 18 Ryan Howe 2019-07-15 20:48:44 UTC
So in the ose enterprise code we should set the variable WaitSecondsMinVersion = "1.4.21"  instead of what is set, "1.4.22", so that we make use of the --wait=seconds feature  since this was backported via bug 1438597

https://github.com/openshift/ose/blob/enterprise-3.11/vendor/k8s.io/kubernetes/pkg/util/iptables/iptables.go#L127

Comment 30 Anurag saxena 2019-12-10 00:30:36 UTC
Although its ON_QA but this needs to be backported to 4.3 as confirmed with Dan Winship.

Comment 31 Dan Winship 2019-12-10 13:11:27 UTC
Sorry, yeah, this didn't merge until after 4.3 split off so the bug should have been moved to 4.4. Ignore the comments from the errata system; it's lying.

Comment 32 Dan Winship 2019-12-10 13:13:27 UTC
And actually, there were two parts, one in origin and one in sdn, and only the origin half merged, so this isn't fully fixed even in 4.4

Comment 34 Juan Luis de Sousa-Valadas 2020-02-20 11:52:55 UTC
This was merged on the 1.17 rebase

*** This bug has been marked as a duplicate of bug 1803149 ***


Note You need to log in before you can comment on or make changes to this bug.