Bug 1762137 - Get error : "TLS handshake timeout"
Summary: Get error : "TLS handshake timeout"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.3.0
Assignee: Stefan Schimanski
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-16 02:27 UTC by zhou ying
Modified: 2020-01-23 11:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:07:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:08:15 UTC

Description zhou ying 2019-10-16 02:27:48 UTC
Description of problem:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-proxy-4.2/85

Version-Release number of selected component (if applicable):
release-openshift-ocp-installer-e2e-aws-proxy-4.2


[sig-apps] CronJob should delete successful finished jobs with limit of one successful job [Suite:openshift/conformance/parallel] [Suite:k8s] expand_less	17s
fail [k8s.io/kubernetes/test/e2e/e2e.go:104]: Unexpected error:
    <*url.Error | 0xc002e3a540>: {
        Op: "Get",
        URL: "https://api.ci-op-pxrdxk91-9c5bf.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse&resourceVersion=0",
        Err: {},
    }
    Get https://api.ci-op-pxrdxk91-9c5bf.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse&resourceVersion=0: net/http: TLS handshake timeout
occurred

[sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: dir-bindmounted] [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with defaults [Suite:openshift/conformance/parallel] [Suite:k8s] expand_less	12s
fail [k8s.io/kubernetes/test/e2e/e2e.go:104]: Unexpected error:
    <*url.Error | 0xc003670330>: {
        Op: "Get",
        URL: "https://api.ci-op-pxrdxk91-9c5bf.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse&resourceVersion=0",
        Err: {},
    }
    Get https://api.ci-op-pxrdxk91-9c5bf.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse&resourceVersion=0: net/http: TLS handshake timeout
occurred

Oct 15 21:29:23.655 E ns/openshift-ingress pod/router-prometheus-897669695-xhrnh node/ip-10-0-142-143.ec2.internal container=router container exited with code 2 (Error): go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:11.125738       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:16.140130       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:21.135372       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:26.139040       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:32.626815       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:37.605538       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:42.605837       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:47.621741       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:52.608262       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:57.619228       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:29:02.605424       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:29:07.606204       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:29:13.774262       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:29:18.771430       1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n

Comment 1 Daneyon Hansen 2019-10-22 00:27:09 UTC
Similar to [1], I believe the test failure is due to a lack of resource capacity for the proxy created by the e2e-aws-proxy job. Eric Wolinetz increased the resource capacity of the proxy that gets created by the e2e-aws-proxy job. Please retest and update the bug based on your findings.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1761677

Comment 2 Dan Mace 2019-10-31 20:01:05 UTC
Ingress doesn't own or manage the API load balancer, that's created by the installer. The similarity to https://bugzilla.redhat.com/show_bug.cgi?id=1765276 is interesting.

Moving over to the installer team. Could be the apiserver endpoint (as is possible in https://bugzilla.redhat.com/show_bug.cgi?id=1765276) or a networking issue.

Comment 3 Abhinav Dahiya 2019-11-04 19:58:39 UTC
The error is from hitting the kube-apiserver.

Comment 6 Daneyon Hansen 2019-11-11 21:54:16 UTC
It appears that the external apiserver url is being used by the 2 failing tests. [1] removed the external apiserver from default noProxy list. `proxyconnect` does not appear in either "TLS handshake timeout failure" so the calls are not being proxied as expected. [2] was recently merged to revert [1]. Can you rerun the test with a payload that includes [2] and report back?

[1] https://github.com/openshift/cluster-network-operator/pull/328
[2] https://github.com/openshift/cluster-network-operator/pull/388

Comment 7 Michal Fojtik 2019-11-21 12:34:04 UTC
Moving to MODIFIED to rerun the test and verify this as fixed.

Comment 9 zhou ying 2019-11-26 07:51:16 UTC
Build cluster on aws with proxy, run the automation on locally , can't reproduce the issue again. 

[zhouying@dhcp-140-138 origin]$ openshift-tests run-test "[sig-apps] CronJob should delete successful/failed finished jobs with limit of one job [Suite:openshift/conformance/parallel] [Suite:k8s]"
......
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-apps] CronJob
  /home/golang/src/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/apps/cronjob.go:55
[It] should delete successful/failed finished jobs with limit of one job [Suite:openshift/conformance/parallel] [Suite:k8s]
  /home/golang/src/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/apps/cronjob.go:233
STEP: Creating a AllowConcurrent cronjob with custom successful-jobs-history-limit
STEP: Ensuring a finished job exists
STEP: Ensuring a finished job exists by listing jobs explicitly
STEP: Ensuring this job and its pods does not exist anymore
STEP: Ensuring there is 1 finished job by listing jobs explicitly
STEP: Removing cronjob
STEP: Creating a AllowConcurrent cronjob with custom failed-jobs-history-limit
STEP: Ensuring a finished job exists
STEP: Ensuring a finished job exists by listing jobs explicitly
STEP: Ensuring this job and its pods does not exist anymore
STEP: Ensuring there is 1 finished job by listing jobs explicitly
STEP: Removing cronjob
[AfterEach] [sig-apps] CronJob
  /home/golang/src/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:152
Nov 26 15:23:31.096: INFO: Waiting up to 3m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-cronjob-2666" for this suite.
Nov 26 15:23:32.372: INFO: Running AfterSuite actions on all nodes
Nov 26 15:23:32.372: INFO: Running AfterSuite actions on node 1

Comment 11 errata-xmlrpc 2020-01-23 11:07:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.