Bug 1855055
Summary: | 4.4.11->4.5.rc7 upgrade fails with console route not reachable for health check | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> |
Component: | Networking | Assignee: | Ben Bennett <bbennett> |
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | hongli, xtian, yanpzhan |
Version: | 4.5 | Keywords: | Upgrades |
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-09 07:08:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Mike Fiedler
2020-07-08 19:16:11 UTC
Also seen upgrading 4.3.27-> 4.4.11-> 4.5.0.rc7 for the profile "UPI on Azure with RHEL7.8 (FIPS off) & Etcd Encryption on" Also reproduced in 4.5.11->4.5.0.rc7 for profile "Disconnected UPI on OSP13 with RHCOS & RHEL7.8(FIPS off)" This is not simply a console issue, I think there should be newtorking issue. From the console pod log, requested oauth failed. And in oauth and dns pod log, there are errors about timeout and connection refused: dns pod log: 2020-07-08T14:15:37.218659014-04:00 [ERROR] plugin/errors: 2 quay.io. A: read udp 10.130.2.3:40336->192.168.2.126:53: i/o timeout 2020-07-08T14:15:53.247144697-04:00 [ERROR] plugin/errors: 2 quay.io. AAAA: read udp 10.130.2.3:42965->192.168.2.126:53: i/o timeout 2020-07-08T14:16:21.315802564-04:00 [ERROR] plugin/errors: 2 quay.io. A: read udp 10.130.2.3:41473->192.168.2.126:53: i/o timeout 2020-07-08T14:17:27.512081613-04:00 [ERROR] plugin/errors: 2 quay.io. A: read udp 10.130.2.3:56983->192.168.2.126:53: i/o timeout 2020-07-08T14:17:32.512434661-04:00 [ERROR] plugin/errors: 2 quay.io. AAAA: read udp 10.130.2.3:44750->192.168.2.126:53: i/o timeout 2020-07-08T14:17:38.531933356-04:00 [ERROR] plugin/errors: 2 quay.io. A: read udp 10.130.2.3:57556->192.168.2.126:53: i/o timeout 2020-07-08T14:17:49.555756251-04:00 [ERROR] plugin/errors: 2 quay.io. AAAA: read udp 10.130.2.3:50139->192.168.2.126:53: i/o timeout 2020-07-08T14:18:27.652333915-04:00 [ERROR] plugin/errors: 2 quay.io. A: read udp 10.130.2.3:38392->192.168.2.126:53: i/o timeout 2020-07-08T14:18:27.652333915-04:00 [ERROR] plugin/errors: 2 quay.io. AAAA: read udp 10.130.2.3:51525->192.168.2.126:53: i/o timeout 2020-07-08T14:19:06.82095886-04:00 [ERROR] plugin/errors: 2 quay.io. A: read udp 10.130.2.3:33067->192.168.2.126:53: i/o timeout 2020-07-08T14:19:17.97797534-04:00 [ERROR] plugin/errors: 2 quay.io. AAAA: read udp 10.130.2.3:59760->192.168.2.126:53: i/o timeout 2020-07-08T14:19:22.979214064-04:00 [ERROR] plugin/errors: 2 quay.io. AAAA: read udp 10.130.2.3:38789->192.168.2.126:53: i/o timeout 2020-07-08T14:19:51.065620448-04:00 [ERROR] plugin/errors: 2 quay.io. AAAA: read udp 10.130.2.3:46334->192.168.2.126:53: i/o timeout 2020-07-08T14:19:51.065771132-04:00 [ERROR] plugin/errors: 2 quay.io. A: read udp 10.130.2.3:58715->192.168.2.126:53: i/o timeout =========================================== oauth pod log: 2020-07-08T18:00:05.833632973Z E0708 18:00:05.833558 1 reflector.go:382] k8s.io/client-go.2/tools/cache/reflector.go:125: Failed to watch *v1.ConfigMap: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dextension-apiserver-authentication&resourceVersion=142622&timeout=8m25s&timeoutSeconds=505&watch=true: dial tcp 172.30.0.1:443: connect: connection refused 2020-07-08T18:00:05.845414673Z E0708 18:00:05.845358 1 reflector.go:382] k8s.io/client-go.2/tools/cache/reflector.go:125: Failed to watch *v1.ConfigMap: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dextension-apiserver-authentication&resourceVersion=141655&timeout=9m1s&timeoutSeconds=541&watch=true: dial tcp 172.30.0.1:443: connect: connection refused 2020-07-08T18:00:06.069692896Z E0708 18:00:06.069578 1 webhook.go:111] Failed to make webhook authenticator request: Post https://172.30.0.1:443/apis/authentication.k8s.io/v1/tokenreviews: dial tcp 172.30.0.1:443: connect: connection refused 2020-07-08T18:00:06.069759844Z E0708 18:00:06.069707 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Post https://172.30.0.1:443/apis/authentication.k8s.io/v1/tokenreviews: dial tcp 172.30.0.1:443: connect: connection refused] ========================== console pod log: 2020-07-08T18:19:53.699110874Z 2020-07-08T18:19:53Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ugdci08204808.qe.devcluster.openshift.com/oauth/token failed: Head https://oauth-openshift.apps.ugdci08204808.qe.devcluster.openshift.com: dial tcp 192.168.0.7:443: connect: no route to host (In reply to Mike Fiedler from comment #2) > Also seen upgrading 4.3.27-> 4.4.11-> 4.5.0.rc7 for the profile "UPI on > Azure with RHEL7.8 (FIPS off) & Etcd Encryption on" Hi @Mike I saw you said this issue also be reproduced in Azure platform , not sure if you can have the must-gather logs. I doubt it's another bug for we met yesterday on azure https://bugzilla.redhat.com/show_bug.cgi?id=1854383#c3 From must-gather, we can find that one of the router pod was scheduled to RHEL worker name: router-default-789d8bf48-v29qg nodeName: ugdci08204808-xtxvb-rhel-0 it should be duplicate with https://bugzilla.redhat.com/show_bug.cgi?id=1848945 (In reply to zhaozhanqi from comment #5) > (In reply to Mike Fiedler from comment #2) > > Also seen upgrading 4.3.27-> 4.4.11-> 4.5.0.rc7 for the profile "UPI on > > Azure with RHEL7.8 (FIPS off) & Etcd Encryption on" > > Hi @Mike I saw you said this issue also be reproduced in Azure platform , > not sure if you can have the must-gather logs. I doubt it's another bug for > we met yesterday on azure > https://bugzilla.redhat.com/show_bug.cgi?id=1854383#c3 if hit https://bugzilla.redhat.com/show_bug.cgi?id=1854383#c3, the ingress operator should be Degraded *** This bug has been marked as a duplicate of bug 1848945 *** |