Description of problem: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-proxy-4.2/85 Version-Release number of selected component (if applicable): release-openshift-ocp-installer-e2e-aws-proxy-4.2 [sig-apps] CronJob should delete successful finished jobs with limit of one successful job [Suite:openshift/conformance/parallel] [Suite:k8s] expand_less 17s fail [k8s.io/kubernetes/test/e2e/e2e.go:104]: Unexpected error: <*url.Error | 0xc002e3a540>: { Op: "Get", URL: "https://api.ci-op-pxrdxk91-9c5bf.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse&resourceVersion=0", Err: {}, } Get https://api.ci-op-pxrdxk91-9c5bf.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse&resourceVersion=0: net/http: TLS handshake timeout occurred [sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: dir-bindmounted] [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with defaults [Suite:openshift/conformance/parallel] [Suite:k8s] expand_less 12s fail [k8s.io/kubernetes/test/e2e/e2e.go:104]: Unexpected error: <*url.Error | 0xc003670330>: { Op: "Get", URL: "https://api.ci-op-pxrdxk91-9c5bf.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse&resourceVersion=0", Err: {}, } Get https://api.ci-op-pxrdxk91-9c5bf.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse&resourceVersion=0: net/http: TLS handshake timeout occurred Oct 15 21:29:23.655 E ns/openshift-ingress pod/router-prometheus-897669695-xhrnh node/ip-10-0-142-143.ec2.internal container=router container exited with code 2 (Error): go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:11.125738 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:16.140130 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:21.135372 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:26.139040 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:32.626815 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:37.605538 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:42.605837 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:47.621741 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:52.608262 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:28:57.619228 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:29:02.605424 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:29:07.606204 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:29:13.774262 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\nI1015 21:29:18.771430 1 router.go:561] Router reloaded:\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n
Similar to [1], I believe the test failure is due to a lack of resource capacity for the proxy created by the e2e-aws-proxy job. Eric Wolinetz increased the resource capacity of the proxy that gets created by the e2e-aws-proxy job. Please retest and update the bug based on your findings. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1761677
Ingress doesn't own or manage the API load balancer, that's created by the installer. The similarity to https://bugzilla.redhat.com/show_bug.cgi?id=1765276 is interesting. Moving over to the installer team. Could be the apiserver endpoint (as is possible in https://bugzilla.redhat.com/show_bug.cgi?id=1765276) or a networking issue.
The error is from hitting the kube-apiserver.
It appears that the external apiserver url is being used by the 2 failing tests. [1] removed the external apiserver from default noProxy list. `proxyconnect` does not appear in either "TLS handshake timeout failure" so the calls are not being proxied as expected. [2] was recently merged to revert [1]. Can you rerun the test with a payload that includes [2] and report back? [1] https://github.com/openshift/cluster-network-operator/pull/328 [2] https://github.com/openshift/cluster-network-operator/pull/388
Moving to MODIFIED to rerun the test and verify this as fixed.
Build cluster on aws with proxy, run the automation on locally , can't reproduce the issue again. [zhouying@dhcp-140-138 origin]$ openshift-tests run-test "[sig-apps] CronJob should delete successful/failed finished jobs with limit of one job [Suite:openshift/conformance/parallel] [Suite:k8s]" ...... STEP: Waiting for a default service account to be provisioned in namespace [BeforeEach] [sig-apps] CronJob /home/golang/src/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/apps/cronjob.go:55 [It] should delete successful/failed finished jobs with limit of one job [Suite:openshift/conformance/parallel] [Suite:k8s] /home/golang/src/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/apps/cronjob.go:233 STEP: Creating a AllowConcurrent cronjob with custom successful-jobs-history-limit STEP: Ensuring a finished job exists STEP: Ensuring a finished job exists by listing jobs explicitly STEP: Ensuring this job and its pods does not exist anymore STEP: Ensuring there is 1 finished job by listing jobs explicitly STEP: Removing cronjob STEP: Creating a AllowConcurrent cronjob with custom failed-jobs-history-limit STEP: Ensuring a finished job exists STEP: Ensuring a finished job exists by listing jobs explicitly STEP: Ensuring this job and its pods does not exist anymore STEP: Ensuring there is 1 finished job by listing jobs explicitly STEP: Removing cronjob [AfterEach] [sig-apps] CronJob /home/golang/src/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:152 Nov 26 15:23:31.096: INFO: Waiting up to 3m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-cronjob-2666" for this suite. Nov 26 15:23:32.372: INFO: Running AfterSuite actions on all nodes Nov 26 15:23:32.372: INFO: Running AfterSuite actions on node 1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062