1736002 – iptables loadbalancing is not balanced

Bug 1736002 - iptables loadbalancing is not balanced

Summary: iptables loadbalancing is not balanced

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Casey Callendrello
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-01 13:46 UTC by Dan Winship
Modified:	2019-08-20 14:02 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-08-20 14:02:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Dan Winship 2019-08-01 13:46:14 UTC

from bug 1734509:

    It looks like 10.0.135.165 is consistently busy, 10.0.143.184
    is consistently somewhat less busy, and 10.0.155.216 is
    suspiciously slacking off almost the whole time. Counting the
    number of kube-apiserver log messages in each 10-minute period:

             .165   .184   .216

    15:0x    1924   1489   1635
    15:1x    1696    995    249
    15:2x    1368    654     62
    15:3x    1406    700     95
    15:4x    1053    534    103
    15:5x     440    184      8
    16:0x      92     40      8

    This corresponds to the ordering of the endpoints in the iptables
    rules, so it seems like iptables isn't actually balancing
    connections correctly.

This is not a new bug. It appears to have always been this way and we just didn't notice. (Or at least, it also shows up in the logs of a randomly-selected test run from January.)

The iptables rules:

-A KUBE-SERVICES -d 172.30.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y

-A KUBE-SVC-NPX46M4PTMTKRN6Y -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-ZDFKTDCPOS2CD6PV
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-GKTS3YR2HYOAX2GL
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-DJPSN5YGTYNXPKQR

-A KUBE-SEP-ZDFKTDCPOS2CD6PV -p tcp -m tcp -j DNAT --to-destination 10.0.135.165:6443
-A KUBE-SEP-GKTS3YR2HYOAX2GL -p tcp -m tcp -j DNAT --to-destination 10.0.143.184:6443
-A KUBE-SEP-DJPSN5YGTYNXPKQR -p tcp -m tcp -j DNAT --to-destination 10.0.155.216:6443

Ignoring the rounding error, this *should* work: the first KUBE-SVC-NPX4... rule matches 1/3 of packets, the second matches 1/2 of the packets that didn't match the first rule, and the last matches all of the packets that didn't match either of the first two rules. That should give us 1/3 / 1/3 / 1/3. But apparently it doesn't.

Comment 1 Casey Callendrello 2019-08-05 16:01:56 UTC

This is probably just a long-running connection problem. API server is disrupted, everyone reconnects to the 1 or 2 available endpoints, then never ever disconnect.

Perhaps client-go should have an exponentially-distributed random reconnection interval?

Comment 2 Dan Winship 2019-08-12 09:51:15 UTC

Oh, interesting. That should be easy to prove if so. (See if a service with lots of short connections shows the same distribution.) I guess if that is what the problem is, then the next question is "does it really matter or are we fine with the fact that some apiservers work harder than others?"

Comment 3 Casey Callendrello 2019-08-12 11:59:43 UTC

I "tested" this some time ago, when we threw apachebench against a throw-away service. Well, more precisely, a colleague walked up to my desk, asking why load-balancing was broken. They had seen very uneven load-balancing despite issuing many homogeneous requests.

It turned out that apachebench uses keep-alive by default, and never ever reconnects. Random load-balancing is only effective for a large number of requests, of course. So, their 10-or-so load-generation connections were balanced unevenly since there just weren't enough coin-flips to regress to the mean.

Comment 4 Casey Callendrello 2019-08-20 14:02:47 UTC

Marking this as NOTABUG - I think we're fine here.

Note You need to log in before you can comment on or make changes to this bug.