Bug 1870565
Summary: | [4.6] co authentication status oscillates with error "WellKnownReadyControllerDegraded failed to GET well-known https://a.b.c.d:6443/.well-known/oauth-authorization-server: dial tcp a.b.c.d:6443: i/o timeout" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | pmali |
Component: | kube-apiserver | Assignee: | Stefan Schimanski <sttts> |
Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.6 | CC: | aos-bugs, mfojtik, sttts, wsun, xxia |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:29:49 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
pmali
2020-08-20 11:35:34 UTC
It seems that the KAS is having some kind of trouble with connectivity, the conncheck container shows ``` 2020-08-20T11:27:02.626596944Z W0820 11:27:02.626549 1 connection_checker.go:123] Unable to update status of kube-apiserver-zhsunupgrade819-ctcx8-master-0-to-load-balancer-api-external: rpc error: code = ResourceExhausted desc = trying to send message larger than max (2986286 vs. 2097152) ``` quite regularly, as well as ``` 2020-08-20T11:27:03.279927499Z I0820 11:27:03.279898 1 connection_checker.go:216] Failure | TCPConnectError | 9.996883073s | Failed to establish a TCP connection to api-int.zhsunupgrade819.qe.azure.devcluster.openshift.com:6443: dial tcp 10.0.0.4:6443: i/o timeout ``` which appears to match the following error from the KAS container: ``` 2020-08-20T11:27:02.230108683Z E0820 11:27:02.230029 17 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), Code:8, Message:"trying to send message larger than max (2980240 vs. 2097152)", Details:[]*anypb.Any(nil)} 2020-08-20T11:27:02.230308195Z I0820 11:27:02.230264 17 trace.go:205] Trace[1183161086]: "Update" url:/apis/controlplane.operator.openshift.io/v1alpha1/namespaces/openshift-kube-apiserver/podnetworkconnectivitychecks/kube-apiserver-zhsunupgrade819-ctcx8-master-0-to-load-balancer-api-internal/status,user-agent:cluster-kube-apiserver-operator/v0.0.0 (linux/amd64) kubernetes/$Format,client:::1 (20-Aug-2020 11:27:00.240) (total time: 1990ms): 2020-08-20T11:27:02.230308195Z Trace[1183161086]: ---"Conversion done" 165ms (11:27:00.412) 2020-08-20T11:27:02.230308195Z Trace[1183161086]: [1.990242374s] [1.990242374s] END ``` It's possible that the CAO is hitting the KAS when it's incapable of a response, yet I am not sure what to make out of the above. IPI installed one 4.5.7 cluster on azure with fips on and encryption etcd, upgraded to 4.6 nightly build successfully. $ oc get infrastructures.config.openshift.io cluster -o yaml ... platformSpec: type: Azure ... $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-31-194600 True False 10s Cluster version is 4.6.0-0.nightly-2020-08-31-194600 $ oc get clusterversion -o json|jq ".items[0].status.history" [ { "completionTime": "2020-09-01T05:13:18Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-31-194600", "startedTime": "2020-09-01T04:11:57Z", "state": "Completed", "verified": false, "version": "4.6.0-0.nightly-2020-08-31-194600" }, { "completionTime": "2020-09-01T03:46:27Z", "image": "quay.io/openshift-release-dev/ocp-release@sha256:776b7e8158edf64c82f18f5ec4d6ef378ac3de81ba0dc2700b885ceb62e71279", "startedTime": "2020-09-01T03:15:16Z", "state": "Completed", "verified": false, "version": "4.5.7" } ] $ oc get nodes NAME STATUS ROLES AGE VERSION kewang0151-h4szj-master-0 Ready master 122m v1.19.0-rc.2+f71a7ab-dirty kewang0151-h4szj-master-1 Ready master 122m v1.19.0-rc.2+f71a7ab-dirty kewang0151-h4szj-master-2 Ready master 122m v1.19.0-rc.2+f71a7ab-dirty kewang0151-h4szj-worker-northcentralus-9fkpz Ready worker 106m v1.19.0-rc.2+f71a7ab-dirty kewang0151-h4szj-worker-northcentralus-t6kpr Ready worker 106m v1.19.0-rc.2+f71a7ab-dirty kewang0151-h4szj-worker-northcentralus-thnhm Ready worker 106m v1.19.0-rc.2+f71a7ab-dirty $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.6.0-0.nightly-2020-08-31-194600 True False False 3m5s cloud-credential 4.6.0-0.nightly-2020-08-31-194600 True False False 126m cluster-autoscaler 4.6.0-0.nightly-2020-08-31-194600 True False False 114m config-operator 4.6.0-0.nightly-2020-08-31-194600 True False False 115m console 4.6.0-0.nightly-2020-08-31-194600 True False False 13m csi-snapshot-controller 4.6.0-0.nightly-2020-08-31-194600 True False False 17m dns 4.6.0-0.nightly-2020-08-31-194600 True False False 30m etcd 4.6.0-0.nightly-2020-08-31-194600 True False False 119m image-registry 4.6.0-0.nightly-2020-08-31-194600 True False False 104m ingress 4.6.0-0.nightly-2020-08-31-194600 True False False 47m insights 4.6.0-0.nightly-2020-08-31-194600 True False False 115m kube-apiserver 4.6.0-0.nightly-2020-08-31-194600 True False False 119m kube-controller-manager 4.6.0-0.nightly-2020-08-31-194600 True False False 119m kube-scheduler 4.6.0-0.nightly-2020-08-31-194600 True False False 118m kube-storage-version-migrator 4.6.0-0.nightly-2020-08-31-194600 True False False 17m machine-api 4.6.0-0.nightly-2020-08-31-194600 True False False 112m machine-approver 4.6.0-0.nightly-2020-08-31-194600 True False False 119m machine-config 4.6.0-0.nightly-2020-08-31-194600 True False False 10m marketplace 4.6.0-0.nightly-2020-08-31-194600 True False False 17m monitoring 4.6.0-0.nightly-2020-08-31-194600 True False False 16m network 4.6.0-0.nightly-2020-08-31-194600 True False False 121m node-tuning 4.6.0-0.nightly-2020-08-31-194600 True False False 46m openshift-apiserver 4.6.0-0.nightly-2020-08-31-194600 True False False 21m openshift-controller-manager 4.6.0-0.nightly-2020-08-31-194600 True False False 45m openshift-samples 4.6.0-0.nightly-2020-08-31-194600 True False False 46m operator-lifecycle-manager 4.6.0-0.nightly-2020-08-31-194600 True False False 120m operator-lifecycle-manager-catalog 4.6.0-0.nightly-2020-08-31-194600 True False False 120m operator-lifecycle-manager-packageserver 4.6.0-0.nightly-2020-08-31-194600 True False False 13m service-ca 4.6.0-0.nightly-2020-08-31-194600 True False False 121m storage 4.6.0-0.nightly-2020-08-31-194600 True False False 47m All be fine, move the bug Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |