Red Hat Bugzilla – Bug 1309881
[DOCS] Logging improvement: Deploying pods TLS handshake timeout errors may be due to MTU sizes.
Last modified: 2016-08-06 19:57:19 EDT
Description of problem:
In deploying a pod we would get failures. Looking in the deploy pod docker log:
docker log <deployer container ID>
would show the the error:
1 deployer.go:65] couldn't get deployment <namespace>/cakephp-example-6: Get https://internal.api.clustername.openshift.com:443/api/v1/namespaces/<namespace>/replicationcontrollers/cakephp-example-6: net/http: TLS handshake timeout
This ultimately turned out to be an MTU issue. The mtu size of eth0 was 9000 (jumbo frames) while the mtu of tun0 (ovs) was 1500. We noticed through a tcpdump that anything bigger than 1500 was coming in on eth0 but was not going across tun0. It was being dropped and caused the error shown above.
When speaking with Clayton, he suggested that this error may be able to be bubbled up and show the user that there may be an MTU issue present. Finding this MTU bug was very involved. Having the user alerted to this earlier may help.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. configure openshift node with eth0 with mtu size of 9000.
2. configure tun0 to have size of 1500
3. attempt to deploy pod to the node.
4. if the deploy fails, on the node where the deploy failed, run "docker logs <id of deploy container>"
TLS handshake timeout
The build should communicate with the master api without errors.
There is probably an underlying bug that doesn't allow tun0 to pass traffic bigger than the MTU that it has set. Frame fragmentation should happen and traffic should pass.
In general when we report Golang connection errors via a client on the platform we should probably suggest this error. That's going to be any client running on the cluster, plus maybe masters.
Yes, this should be a docs issue
- Add section to "Troubleshooting OpenShift SDN" that describes how MTU mismatch between tun0 and eth0 (for example) can be the cause of authentication (SSL handshake) errors.
- Link to the new section from various places, including "Master and Node Configuration" and "Configuring the SDN".
Matt, WDYT of the doc plan (comment #4)?
Commit pushed to master at https://github.com/openshift/openshift-docs
Merge pull request #2145 from tnguyen-rh/bz1309881
Add "TLS Handshake Timeout" section
Closes bug 1309881
Moving to RELEASE_PENDING.
Moving to CLOSED CURRENTRELEASE.