Bug 1712409
| Summary: | Installer returns 0 return code when hitting a FATAL event | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Chris Callegari <ccallega> | ||||||
| Component: | Installer | Assignee: | Abhinav Dahiya <adahiya> | ||||||
| Installer sub component: | openshift-installer | QA Contact: | sheng.lao <shlao> | ||||||
| Status: | CLOSED WORKSFORME | Docs Contact: | |||||||
| Severity: | high | ||||||||
| Priority: | unspecified | CC: | bleanhar, dgoodwin, jialiu | ||||||
| Version: | 4.1.0 | Keywords: | NeedsTestCase | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 4.2.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2019-07-03 00:56:30 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
[rhel7] [08:29:05 AM] [ccallega@~]$ ssh -A core.195.207 '/usr/local/bin/installer-gather.sh 10.0.130.71 10.0.152.89 10.0.172.120' The authenticity of host '35.173.195.207 (35.173.195.207)' can't be established. ECDSA key fingerprint is SHA256:5zKwAiUCOxxJlQTcFmzGL4DPxVDk+5wQ3Irw68ElKXI. ECDSA key fingerprint is MD5:e2:3e:31:47:67:38:4f:47:ab:12:7b:b8:eb:89:17:cf. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '35.173.195.207' (ECDSA) to the list of known hosts. Gathering bootstrap journals ... Gathering bootstrap containers ... Gathering rendered assets... Gathering cluster resources ... Waiting for logs ... error: the server doesn't have a resource type "nodes" error: the server doesn't have a resource type "pods" error: the server doesn't have a resource type "pods" error: the server doesn't have a resource type "nodes" error: the server doesn't have a resource type "apiservices" error: the server doesn't have a resource type "clusteroperators" error: the server doesn't have a resource type "clusterversion" error: the server doesn't have a resource type "csr" error: the server doesn't have a resource type "configmaps" error: the server doesn't have a resource type "kubeapiserver" error: the server doesn't have a resource type "endpoints" error: the server doesn't have a resource type "machineconfigpools" error: the server doesn't have a resource type "events" error: the server doesn't have a resource type "nodes" error: the server doesn't have a resource type "machineconfigs" error: the server doesn't have a resource type "namespaces" error: the server doesn't have a resource type "kubecontrollermanager" error: the server doesn't have a resource type "pods" error: the server doesn't have a resource type "openshiftapiserver" error: the server doesn't have a resource type "roles" error: the server doesn't have a resource type "rolebindings" error: the server doesn't have a resource type "secrets" Error from server (NotFound): the server could not find the requested resource error: the server doesn't have a resource type "secrets" error: the server doesn't have a resource type "services" Gather remote logs Log bundle written to ~/log-bundle.tar.gz [rhel7] [08:33:25 AM] [ccallega@~]$ scp core.195.207:~/log-bundle.tar.gz . log-bundle.tar.gz 100% 2222KB 8.6MB/s 00:00 This cluster is called e0675dd7c3c5. Logs are attached... Created attachment 1571601 [details]
log-bundle.tar.gz
Created attachment 1571602 [details]
installer-logs.tar.gz
Hi Chris, Any more tips for reproducing this? We haven't been able to reproduce this. I've been deeply focused on Disconnected Install and havne't been able to retest. Two other engineers also confirmed the return code before I submitted the Bugzilla. If it can't be reproduced then someone picked it up and fixed it. I'm good to close this BZ. |
Description of problem: Installer returns 0 return code when hitting a FATAL event. The installer should return a non 0 return code when hitting a FATAL event. Version-Release number of selected component (if applicable): $ ~/bin/openshift-install version /home/ccallega/bin/openshift-install v4.1.0-201905161311-dirty built from commit 3b5a270b5246295938e8cc71a69d7a3b99a4df11 release image quay.io/openshift-release-dev/ocp-release@sha256:6f4cf2db7e63c4dba54496a72b83fec22c49293b520ff0cdb78f1e38b23f1ccb How reproducible: Always Steps to Reproduce: 1. export OPENSHIFT_CLUSTER_NAME=blah 2. ~/bin/openshift-install --log-level debug --dir /tmp/openshift/${OPENSHIFT_CLUSTER_NAME} create install-config 3. Create invalid hash for registry.svc.ci.openshift.org element in pull secret 4. ~/bin/openshift-install --log-level debug --dir /tmp/openshift/${OPENSHIFT_CLUSTER_NAME} create cluster | tee /tmp/openshift/${OPENSHIFT_CLUSTER_NAME}/debug.local Actual results: DEBUG DEBUG Apply complete! Resources: 141 added, 0 changed, 0 destroyed. DEBUG DEBUG The state of your infrastructure has been saved to the path DEBUG below. This state is required to modify and destroy your DEBUG infrastructure, so keep it safe. To inspect the complete state DEBUG use the `terraform show` command. DEBUG DEBUG State path: /tmp/openshift-install-928363870/terraform.tfstate DEBUG OpenShift Installer v4.1.0-201905161311-dirty DEBUG Built from commit 3b5a270b5246295938e8cc71a69d7a3b99a4df11 INFO Waiting up to 30m0s for the Kubernetes API at https://api.e0675dd7c3c5.ccallegar-aws.sysdeseng.com:6443... DEBUG Still waiting for the Kubernetes API: Get https://api.e0675dd7c3c5.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.216.52.32:6443: i/o timeout DEBUG Still waiting for the Kubernetes API: Get https://api.e0675dd7c3c5.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.216.164.202:6443: connect: connection refused DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource ... ... ... DEBUG Fetching "Install Config"... DEBUG Loading "Install Config"... DEBUG Loading "SSH Key"... DEBUG Using "SSH Key" loaded from state file DEBUG Loading "Base Domain"... DEBUG Loading "Platform"... DEBUG Using "Platform" loaded from state file DEBUG Using "Base Domain" loaded from state file DEBUG Loading "Cluster Name"... DEBUG Loading "Base Domain"... DEBUG Using "Cluster Name" loaded from state file DEBUG Loading "Pull Secret"... DEBUG Using "Pull Secret" loaded from state file DEBUG Loading "Platform"... DEBUG Using "Install Config" loaded from state file DEBUG Reusing previously-fetched "Install Config" INFO Use the following commands to gather logs from the cluster INFO ssh -A core.195.207 '/usr/local/bin/installer-gather.sh 10.0.130.71 10.0.152.89 10.0.172.120' INFO scp core.195.207:~/log-bundle.tar.gz . FATAL waiting for Kubernetes API: context deadline exceeded [rhel7] [04:03:03 PM] [ccallega@~]$ echo $? 0 Expected results: FATAL waiting for Kubernetes API: context deadline exceeded [rhel7] [04:03:03 PM] [ccallega@~]$ echo $? 2 (or something not 0) Additional info: