Dan Winship has been helping me find the root cause of this issue. The slack thread for reference: https://coreos.slack.com/archives/CDCP2LA9L/p1603141588175200 Summary: > We see 'net/http: TLS handshake timeout' errors in kas log, and oas has 'TLS handshake error from 10.130.0.1:36798: EOF' Dan's suggestion is: If you set the VXLAN MTU to the wrong value, then when the server tries to send its certificate, the packets won't get delivered, and then both sides think they're waiting for the other one to talk. So we need to ask the customer if they have tweaked the MTU settings. Also sople is working to get us a fresh set of data capture from the cluster, must-gather, prometheus metrics dump.
This is likely the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1825219 There is a Knowledge Base article in progress at https://access.redhat.com/solutions/5252831 that documents a daemonset you can deploy to work around the issue. The kernel team has investigated this extensively, and found that we are erroneously getting a "needs fragmentation" packet from the underlay (either kernel or netowork) with the ip address of an openshift node. This causes the host kernel to change the PMTU and that causes packets to get fragmented which breaks the VxLAN networking. There is a kernel change to the PMTU code to make it handle this weird case, but the actual source of the packet has not yet been identified.
Assigning to 4.7 to identify the issue and the fix. If appropriate we will backport to an earlier release.
@bbennett > There is a Knowledge Base article in progress at https://access.redhat.com/solutions/5252831 that documents a daemonset you can deploy to work around the issue. I worked with the client today, following the instructions in solution #5252831, but could not reproduce the problem. We connected via SSH to each Master node and attempted to craft an ICMP message according to the instructions. Here's an excerpt that demonstrates that the nodes are not affected by the problem regarding the MTU: $ ping -M do -c4 -s 1500 10.13.86.71 ping: local error: Message too long, mtu=1500 $ ping -M do -c4 -s 1473 10.13.86.71 ping: local error: Message too long, mtu=1500 $ ping -M do -c4 -s 1472 10.13.86.71 1480 bytes from 10.13.86.71: icmp_seq=1 ttl=64 time=0.866 ms However, an anomaly was observed when trying to ping certain master nodes. In certain cases, crafting messages that were ten times larger than the expected MTU limit of 1500 still resulted in a successful response. This only happened in specific situations: From Master 0: - ping Master 1, size 15K => unexpected success (no size limit) - ping Master 2, size 15K => expected failure (MTU 1500) From Master 1: - ping Master 0, size 15K => expected failure (MTU 1500) - ping Master 2, size 15K => unexpected success (no size limit) From Master 2: - ping Master 0, size 15K => expected failure (MTU 1500) - ping Master 1, size 15K => expected failure (MTU 1500) Is this an expected behaviour or an anomaly?
We had a call to sync about this, and the comment above happened after the node was rebooted so the PMTU was cleared and the commands above would not have shown the problem.
Hello @bbennett the DaemonSet has produced these logs (echoed lines have been removed for brevity): Ciao Rigel, di seguito i log richiesti oggi in call: [root@aznpi000070 ~]# for name in $(oc get pod -n openshift-network-operator | grep "cachefix" | awk '{ print $1}'); do oc logs $name -n openshift-network-operator | grep -v "+"; done I1026 12:08:02.781740835 - cachefix - start cachefix ocp-np-b8blg-worker-northeurope2-hr5lw I1026 12:07:57.924032965 - cachefix - start cachefix ocp-np-b8blg-master-2 I1026 12:08:02.989800775 - cachefix - start cachefix ocp-np-b8blg-master-0 10.13.94.6 via 10.13.86.65 dev eth0 cache expires 574sec mtu 1450 10.13.94.6 via 10.13.86.65 dev eth0 cache expires 574sec mtu 1450 10.13.94.6 via 10.13.86.65 dev eth0 cache expires 574sec mtu 1450 10.13.94.5 via 10.13.86.65 dev eth0 cache expires 566sec mtu 1450 10.13.94.6 via 10.13.86.65 dev eth0 cache 10.13.94.5 via 10.13.86.65 dev eth0 cache expires 566sec mtu 1450 10.13.94.6 via 10.13.86.65 dev eth0 cache 10.13.94.5 via 10.13.86.65 dev eth0 cache expires 566sec mtu 1450 10.13.94.6 via 10.13.86.65 dev eth0 cache 10.13.94.5 via 10.13.86.65 dev eth0 cache 10.13.94.6 via 10.13.86.65 dev eth0 cache 10.13.86.71 dev eth0 cache expires 569sec mtu 1450 10.13.94.5 via 10.13.86.65 dev eth0 cache 10.13.94.6 via 10.13.86.65 dev eth0 cache 10.13.94.5 via 10.13.86.65 dev eth0 cache 10.13.94.6 via 10.13.86.65 dev eth0 cache 10.13.94.5 via 10.13.86.65 dev eth0 cache 10.13.94.6 via 10.13.86.65 dev eth0 cache 10.13.86.71 dev eth0 cache expires 560sec mtu 1450 10.13.94.5 via 10.13.86.65 dev eth0 cache 10.13.94.6 via 10.13.86.65 dev eth0 cache 10.13.94.5 via 10.13.86.65 dev eth0 cache 10.13.94.6 via 10.13.86.65 dev eth0 cache I1026 12:08:02.970139025 - cachefix - start cachefix ocp-np-b8blg-master-1 10.13.94.6 via 10.13.86.65 dev eth0 cache expires 581sec mtu 1450 10.13.94.6 via 10.13.86.65 dev eth0 cache expires 581sec mtu 1450 10.13.94.6 via 10.13.86.65 dev eth0 cache expires 581sec mtu 1450 I1026 12:08:02.617898458 - cachefix - start cachefix ocp-np-b8blg-worker-northeurope1-v8rxq I1026 12:08:02.564567057 - cachefix - start cachefix ocp-np-b8blg-worker-northeurope3-pkgpg What is your opinion? Are we seeing the MTU problem manifesting itself here?
*** This bug has been marked as a duplicate of bug 1825219 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days