Bug 2035757
Summary: | [IPI on Alibabacloud] one master node turned NotReady which leads to installation failed | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jianli Wei <jiwei> |
Component: | Installer | Assignee: | aos-install |
Installer sub component: | openshift-installer | QA Contact: | Jianli Wei <jiwei> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | anusaxen, brlu, bteng, gpei, kwoodson, mrbraga, mstaeble, rioliu |
Version: | 4.10 | Keywords: | TestBlocker |
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-10 16:36:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jianli Wei
2021-12-27 12:11:27 UTC
@mrbraga FYI I retried with "build openshift/installer#5535" (https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1482198246280400896), by launching 5 clusters, 2 of them succeeded, the other 3 failed but it seems not the node NotReady issue. #1 QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/67219/ (SUCCESS) debug msg=Time elapsed per stage: debug msg= cluster: 2m40s debug msg= bootstrap: 1m8s debug msg=Bootstrap Complete: 22m34s debug msg= API: 2m42s debug msg= Bootstrap Destroy: 46s debug msg= Cluster Operators: 16m19s info msg=Time elapsed: 43m30s #2 QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/67240/ (SUCCESS) debug msg=Time elapsed per stage: debug msg= cluster: 3m32s debug msg= bootstrap: 1m2s debug msg=Bootstrap Complete: 22m17s debug msg= API: 3m30s debug msg= Bootstrap Destroy: 34s debug msg= Cluster Operators: 13m12s info msg=Time elapsed: 40m39s #3 QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/67241/ (FAILURE) > All nodes are Ready, but the operator "console" doesn't tell VERSION. $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-603-67zxn-master-0 Ready master 68m v1.23.0+60f5a1c jiwei-603-67zxn-master-1 Ready master 68m v1.23.0+60f5a1c jiwei-603-67zxn-master-2 Ready master 67m v1.23.0+60f5a1c jiwei-603-67zxn-worker-us-east-1a-pl58n Ready worker 47m v1.23.0+60f5a1c jiwei-603-67zxn-worker-us-east-1b-dfcr5 Ready worker 47m v1.23.0+60f5a1c $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 70m Unable to apply 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest: an unknown error has occurred: MultipleErrors $ oc get co | grep -Ev 'True False False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE console $ #4 QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/67242/ (FAILURE) > All nodes are Ready, but the operator "console" doesn't tell VERSION. $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-604-wv22c-master-0 Ready master 74m v1.23.0+60f5a1c jiwei-604-wv22c-master-1 Ready master 74m v1.23.0+60f5a1c jiwei-604-wv22c-master-2 Ready master 74m v1.23.0+60f5a1c jiwei-604-wv22c-worker-a-b2hdn Ready worker 29m v1.23.0+60f5a1c jiwei-604-wv22c-worker-c-ldl5h Ready worker 49m v1.23.0+60f5a1c $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 75m Unable to apply 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest: an unknown error has occurred: MultipleErrors $ oc get co | grep -Ev 'True False False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE console $ #5 QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/67243/ (FAILURE) > Only master nodes are Ready. $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-605-8m5g7-master-0 Ready master 27m v1.23.0+60f5a1c jiwei-605-8m5g7-master-1 Ready master 18m v1.23.0+60f5a1c jiwei-605-8m5g7-master-2 Ready master 33m v1.23.0+60f5a1c $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE jiwei-605-8m5g7-master-0 Running ecs.g6.xlarge cn-hangzhou cn-hangzhou-k 41m jiwei-605-8m5g7-master-1 Running ecs.g6.xlarge cn-hangzhou cn-hangzhou-i 41m jiwei-605-8m5g7-master-2 Running ecs.g6.xlarge cn-hangzhou cn-hangzhou-j 41m jiwei-605-8m5g7-worker-i-7stnd Provisioned ecs.g6.large cn-hangzhou cn-hangzhou-i 19m jiwei-605-8m5g7-worker-k-7t466 Provisioned ecs.g6.large cn-hangzhou cn-hangzhou-k 19m $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest False False True 24m APIServicesAvailable: "oauth.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request... baremetal 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 19m cloud-controller-manager 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 32m cloud-credential 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 22m cluster-autoscaler 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 22m config-operator 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 24m console csi-snapshot-controller 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 23m dns 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 19m etcd 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 18m image-registry ingress False True True 3m40s The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.) insights 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 18m kube-apiserver 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 13m kube-controller-manager 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 14m kube-scheduler 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 2m26s kube-storage-version-migrator 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 23m machine-api 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 16m machine-approver 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 22m machine-config 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 17m marketplace 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 22m monitoring False True True 2m24s Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. network False True False 31m The network is starting up node-tuning 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 22m openshift-apiserver 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest False False False 22m APIServicesAvailable: "apps.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request... openshift-controller-manager 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 18m openshift-samples operator-lifecycle-manager 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 23m operator-lifecycle-manager-catalog 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 23m operator-lifecycle-manager-packageserver False True False 23m ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install timeout service-ca 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 24m storage 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest True False False 19m $ *** Bug 2005647 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |