Description of problem: On large clusters which are changing quickly (new endpoints, pods, svc, etc) iptables can consume a full core on all systems. This was seen when scaling up the CNCF cluster to 300 nodes and 5K projects. Each project had 3 deployments, 3 services, 4 pods and more. iptablesSyncPeriod provides an upper bound (default 30s) to how often iptables will synch, but when many changes are occurring there is no lower bound which can lead to the state where iptables is consuming a core on the nodes. This bz requests creation of a parameter to set a lower bound > "whenever changes occur". Propose to default it to 0 but allow it to be raised in large dynamic clusters.
I saw it during cluster load up and during project deletion. Pretty sure it was in conjunction with create/delete activity.
k, it should be batching on 30 sec intervals. We'll need to dig.
During a large load, or bulk rectification, the issue is that continuous service/endpoint updates are computationally expensive b/c the rules are constantly being updated. What is more disconcerting, is that any high degree of endpoint churn will cause broadcasts to every node in the cluster to refresh their tables. We may need to modify this to be a bulk time-windowed operation.
xref: https://github.com/kubernetes/kubernetes/issues/33693
Working on some fixes - https://github.com/kubernetes/kubernetes/pull/35334