Description of problem: Failed job: https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/53 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/61 Failed error: Aug 21 07:52:50.591 E kube-apiserver Kube API started failing: Get https://api.ci-op-5brzliy5-282fe.ci.azure.devcluster.openshift.com:6443/api/v1/namespaces/kube-system?timeout=3s: context deadline exceeded (Client.Timeout exceeded while awaiting headers) Version-Release number of selected component (if applicable): redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-azure-4.2 How reproducible: Some times
Aug 21 07:40:45.998: INFO: stderr: "+ set -euo pipefail\n+ rc=0\n+ curl -X GET -H Host:www.google.com -s -S -o /tmp/body -D /tmp/headers https://13.89.139.152 -w '{\"code\":%{http_code}}' -k\n+ rc=7\n++ echo 7\n++ cat /tmp/output\n++ cat /tmp/error\n++ json_escape\n++ python -c 'import json,sys; print json.dumps(sys.stdin.read())'\n++ cat /tmp/body\ncat: /tmp/body: No such file or directory\n++ base64 -w 0 -\n++ cat /tmp/headers\n++ json_escape\n++ python -c 'import json,sys; print json.dumps(sys.stdin.read())'\n+ echo '{\"test\":0,\"rc\":7,\"curl\":{\"code\":000},\"error\":\"curl: (7) Failed connect to 13.89.139.152:443; Connection timed out\\n\",\"body\":\"\",\"headers\":\"\"}'\n" 9694 Aug 21 07:40:45.998: INFO: stdout: "{\"test\":0,\"rc\":7,\"curl\":{\"code\":000},\"error\":\"curl: (7) Failed connect to 13.89.139.152:443; Connection timed out\\n\",\"body\":\"\",\"headers\":\"\"}\n" There seems to be plenty of networking errors in that test runs, I wonder if Casey or somebody from networking team can judge in.
That's weird... the test should be going through the router's Service IP[1]. According to the logs, that's 13.89.139.152... why? That doesn't look like a service IP at all? I assume that's because it points to a load balancer. I remember Azure external-facing load-balancers having issues when being accessed internally, but I can't find anything about that in the documentation now. So, is it possible that WaitForRouterServiceIP() is doing the wrong thing? Over to Routing. 1: https://github.com/openshift/origin/blob/master/test/extended/router/router.go#L52
Whatever's wrong here is probably the same issue as the other Azure flakes, but we can keep this issue open and distinct for now. https://bugzilla.redhat.com/show_bug.cgi?id=1741532 https://bugzilla.redhat.com/show_bug.cgi?id=1741534 https://bugzilla.redhat.com/show_bug.cgi?id=1725259
We believe this was solved by https://github.com/openshift/origin/pull/23688. Will open new bugs as necessary.
Confirmed with latest job , can't find the same issue, will verify it: https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/155 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/157 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/156 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/153
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922