Bug 1821912
| Summary: | OCP UPI installation Failing with Failed to wait for bootstrapping to complete: timed out waiting for the condition | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Petr Balogh <pbalogh> |
| Component: | Installer | Assignee: | Abhinav Dahiya <adahiya> |
| Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | shmohan |
| Version: | 4.4 | Keywords: | Automation |
| Target Milestone: | --- | ||
| Target Release: | 4.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-08 13:39:34 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Petr Balogh
2020-04-07 19:53:42 UTC
Currently in our CI we are using 'rhcos_ami:ami-06c85f9d106577272' which I suspect could be a problem, I will try with new ami and update the bug. Tried AWSUPI 4.3 with ami-0d8f77b753c0d96dd (chosen default by the upi-on-aws_install.sh, but no luck . Failed again with ========================================== ./openshift-install wait-for bootstrap-complete --dir /mnt/shmohan/Downloads/git/ocs-ci/external/openshift-misc-1586304109/v3-launch-templates/functionality-testing/aos-4_3/hosts/install-dir level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.shmohanupi.qe.rh-ocs.com:6443..." level=error msg="Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get https://api.shmohanupi.qe.rh-ocs.com:6443/apis/config.openshift.io/v1/clusteroperators: dial tcp 18.223.90.69:6443: connect: connection refused" level=info msg="Use the following commands to gather logs from the cluster" level=info msg="openshift-install gather bootstrap --help" level=fatal msg="waiting for Kubernetes API: context deadline exceeded" ============================================= Same issue with ocp-4.4 aws upi as well. ============================ ./openshift-install wait-for bootstrap-complete --dir /mnt/shmohan/Downloads/git/ocs-ci/external/openshift-misc-1586308020/v3-launch-templates/functionality-testing/aos-4_4/hosts/install-dir level=info msg="Waiting up to 20m0s for the Kubernetes API at https://api.shmohanupi.qe.rh-ocs.com:6443..." level=error msg="Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get https://api.shmohanupi.qe.rh-ocs.com:6443/apis/config.openshift.io/v1/clusteroperators: dial tcp 3.22.137.239:6443: connect: connection refused" level=info msg="Use the following commands to gather logs from the cluster" level=info msg="openshift-install gather bootstrap --help" level=fatal msg="waiting for Kubernetes API: context deadline exceeded" + exit 3 ============================= Plrease stop filing these bugs as urgent unless you're going to learn how to debug bootstrap failures and route them to the appropriate team. This was blocking OCS QE to test on top of OCP 4.4 as a dependent product. Sorry for that but as we need to get results over 4.4 ASAP for OCS 4.4. So I needed to get high attention here hence I set urgent priority/severity. For how to debug bootstrap failures, we will highly appreciate some session which will be given to our OCS QE Ecosystem team about how to debug those gathered logs. Can someone give us such session? It will definitely help us/you a lot when filling such bugs. The issue here was really cause of old RHCOS AMI used so you can close this one but I will appreciate the reply for above question about session how to debug deployment issues on OCP for our team and I guess more teams will appreciate it as well if we will do some recording. Or is there some such of recording already? Thanks Step #1 is always review the bootkube log in bootstrap/journals/bootkube.log which shows what's failing or what it's waiting on, this may very well just be a situation where you need to wait longer. Apr 07 15:50:36 ip-10-0-10-45 bootkube.sh[14122]: [#91] failed to create some manifests: Apr 07 15:50:36 ip-10-0-10-45 bootkube.sh[14122]: "99_openshift-machineconfig_99-master-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-master-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Apr 07 15:50:36 ip-10-0-10-45 bootkube.sh[14122]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" *** This bug has been marked as a duplicate of bug 1816178 *** |