Description of problem: Kube-proxy creates a REJECT rule in the filter table in the KUBE-SERVICES chain which is part of OUTPUT. The rule looks like: 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.239.59 /* telar/hello-openshift:8888-tcp has no endpoints */ tcp dpt:8888 reject-with icmp-port-unreachable However, because the packet is created in a different net namespace it never gets to the filter table and the packet iptables doesn't try to match it against the filter table and hence it's not rejected immediately. So what will happen is that the ARP resolution will fail, which is much slower, and it's aggravated if there are a lot of attempts, check actual results vs expected results. This is a problem because if we do a lot of connections Version-Release number of selected component (if applicable): 3.11 Most certainly 4.4 is also affected... How reproducible: Always Steps to Reproduce: 1. oc new-project test-filter 2. oc new-app openshift/hello openshift 3. oc scale dc/hello openshift 4. oc new-app httpd 5. ssh to the node using the httpd pod 6. nsenter the httpd pod net namespace (nsenter -n -t <pid>) 7. echo "GET http://<svc ip>:<svc port>" | /tmp/vegeta attack -duration=5s | tee results.bin | /tmp/vegeta report Actual results: $ sudo nsenter -n -t 3978 # echo "GET http://172.30.239.59:8080" | /tmp/vegeta attack -duration=5s | tee results.bin | /tmp/vegeta report Requests [total, rate, throughput] 250, 50.20, 0.00 Duration [total, attack, wait] 21.046354763s, 4.979783809s, 16.066570954s Latencies [mean, 50, 90, 95, 99, max] 7.281432594s, 7.915426174s, 17.769387549s, 17.919384172s, 18.019570465s, 18.03946945s Bytes In [total, mean] 0, 0.00 Bytes Out [total, mean] 0, 0.00 Success [ratio] 0.00% Status Codes [code:count] 0:250 Error Set: Get http://172.30.239.59:8080: dial tcp 0.0.0.0:0->172.30.239.59:8080: connect: no route to host Expected results: I created an iptables rule in the pod's net namespace to compare timing. $ sudo nsenter -n -t 3978 # iptables -A OUTPUT -d 172.30.239.59/32 -p tcp -m comment --comment "telar/hello-openshift:8080-tcp has no endpoints" -m tcp --dport 8080 -j REJECT --reject-with icmp-port-unreachable # echo "GET http://172.30.239.59:8080" | /tmp/vegeta attack -duration=5s | tee results.bin | /tmp/vegeta report Requests [total, rate, throughput] 250, 50.20, 0.00 Duration [total, attack, wait] 5.981782675s, 4.979960498s, 1.001822177s Latencies [mean, 50, 90, 95, 99, max] 1.0017498s, 1.001660707s, 1.001843035s, 1.002594677s, 1.003528148s, 1.005008586s Bytes In [total, mean] 0, 0.00 Bytes Out [total, mean] 0, 0.00 Success [ratio] 0.00% Status Codes [code:count] 0:250 Error Set: Get http://172.30.239.59:8080: dial tcp 0.0.0.0:0->172.30.239.59:8080: connect: connection refused When adding the rule we can see a much much better performance refusing the connections. Additional info: 1- I'm using vegeta which we don't support because it's what the customer provided, we can probably use ab (apache benchmark) instead. 2- We need to actually check the impact on /proc/sys/net/ipv4/icmp_msgs_burst 3- Reject is only accepted in the filter table, so I'm not quite sure how to better implement this, perhaps doing a DNAT to the service IP fixes it but I'd need to check.
This is a known upstream bug; https://github.com/kubernetes/kubernetes/pull/72534 is the fix. That's in v1.14. We could backport that if there's demand.
Customer closed the case and has not requested for it to be backported. If one customer needs this we can backport it, but I'm closing this if nobody requests it.