Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1712409

Summary: Installer returns 0 return code when hitting a FATAL event
Product: OpenShift Container Platform Reporter: Chris Callegari <ccallega>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: sheng.lao <shlao>
Status: CLOSED WORKSFORME Docs Contact:
Severity: high    
Priority: unspecified CC: bleanhar, dgoodwin, jialiu
Version: 4.1.0Keywords: NeedsTestCase
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-03 00:56:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log-bundle.tar.gz
none
installer-logs.tar.gz none

Description Chris Callegari 2019-05-21 12:34:18 UTC
Description of problem:
Installer returns 0 return code when hitting a FATAL event.  

The installer should return a non 0 return code when hitting a FATAL event.

Version-Release number of selected component (if applicable):
$ ~/bin/openshift-install version
/home/ccallega/bin/openshift-install v4.1.0-201905161311-dirty
built from commit 3b5a270b5246295938e8cc71a69d7a3b99a4df11
release image quay.io/openshift-release-dev/ocp-release@sha256:6f4cf2db7e63c4dba54496a72b83fec22c49293b520ff0cdb78f1e38b23f1ccb

How reproducible:
Always

Steps to Reproduce:
1. export OPENSHIFT_CLUSTER_NAME=blah
2. ~/bin/openshift-install --log-level debug --dir /tmp/openshift/${OPENSHIFT_CLUSTER_NAME} create install-config
3. Create invalid hash for registry.svc.ci.openshift.org element in pull secret
4. ~/bin/openshift-install --log-level debug --dir /tmp/openshift/${OPENSHIFT_CLUSTER_NAME} create cluster | tee /tmp/openshift/${OPENSHIFT_CLUSTER_NAME}/debug.local

Actual results:
DEBUG
DEBUG Apply complete! Resources: 141 added, 0 changed, 0 destroyed.
DEBUG
DEBUG The state of your infrastructure has been saved to the path
DEBUG below. This state is required to modify and destroy your
DEBUG infrastructure, so keep it safe. To inspect the complete state
DEBUG use the `terraform show` command.
DEBUG
DEBUG State path: /tmp/openshift-install-928363870/terraform.tfstate
DEBUG OpenShift Installer v4.1.0-201905161311-dirty
DEBUG Built from commit 3b5a270b5246295938e8cc71a69d7a3b99a4df11
INFO Waiting up to 30m0s for the Kubernetes API at https://api.e0675dd7c3c5.ccallegar-aws.sysdeseng.com:6443...
DEBUG Still waiting for the Kubernetes API: Get https://api.e0675dd7c3c5.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.216.52.32:6443: i/o timeout
DEBUG Still waiting for the Kubernetes API: Get https://api.e0675dd7c3c5.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.216.164.202:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource
... ... ...
DEBUG Fetching "Install Config"...
DEBUG Loading "Install Config"...
DEBUG   Loading "SSH Key"...
DEBUG   Using "SSH Key" loaded from state file
DEBUG   Loading "Base Domain"...
DEBUG     Loading "Platform"...
DEBUG     Using "Platform" loaded from state file
DEBUG   Using "Base Domain" loaded from state file
DEBUG   Loading "Cluster Name"...
DEBUG     Loading "Base Domain"...
DEBUG   Using "Cluster Name" loaded from state file
DEBUG   Loading "Pull Secret"...
DEBUG   Using "Pull Secret" loaded from state file
DEBUG   Loading "Platform"...
DEBUG Using "Install Config" loaded from state file
DEBUG Reusing previously-fetched "Install Config"
INFO Use the following commands to gather logs from the cluster
INFO ssh -A core.195.207 '/usr/local/bin/installer-gather.sh 10.0.130.71 10.0.152.89 10.0.172.120'
INFO scp core.195.207:~/log-bundle.tar.gz .
FATAL waiting for Kubernetes API: context deadline exceeded

[rhel7] [04:03:03 PM]
[ccallega@~]$ echo $?
0

Expected results:
FATAL waiting for Kubernetes API: context deadline exceeded

[rhel7] [04:03:03 PM]
[ccallega@~]$ echo $?
2 (or something not 0)

Additional info:

Comment 1 Chris Callegari 2019-05-21 12:34:50 UTC
[rhel7] [08:29:05 AM]
[ccallega@~]$ ssh -A core.195.207 '/usr/local/bin/installer-gather.sh 10.0.130.71 10.0.152.89 10.0.172.120'
The authenticity of host '35.173.195.207 (35.173.195.207)' can't be established.
ECDSA key fingerprint is SHA256:5zKwAiUCOxxJlQTcFmzGL4DPxVDk+5wQ3Irw68ElKXI.
ECDSA key fingerprint is MD5:e2:3e:31:47:67:38:4f:47:ab:12:7b:b8:eb:89:17:cf.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '35.173.195.207' (ECDSA) to the list of known hosts.
Gathering bootstrap journals ...
Gathering bootstrap containers ...
Gathering rendered assets...
Gathering cluster resources ...
Waiting for logs ...
error: the server doesn't have a resource type "nodes"
error: the server doesn't have a resource type "pods"
error: the server doesn't have a resource type "pods"
error: the server doesn't have a resource type "nodes"
error: the server doesn't have a resource type "apiservices"
error: the server doesn't have a resource type "clusteroperators"
error: the server doesn't have a resource type "clusterversion"
error: the server doesn't have a resource type "csr"
error: the server doesn't have a resource type "configmaps"
error: the server doesn't have a resource type "kubeapiserver"
error: the server doesn't have a resource type "endpoints"
error: the server doesn't have a resource type "machineconfigpools"
error: the server doesn't have a resource type "events"
error: the server doesn't have a resource type "nodes"
error: the server doesn't have a resource type "machineconfigs"
error: the server doesn't have a resource type "namespaces"
error: the server doesn't have a resource type "kubecontrollermanager"
error: the server doesn't have a resource type "pods"
error: the server doesn't have a resource type "openshiftapiserver"
error: the server doesn't have a resource type "roles"
error: the server doesn't have a resource type "rolebindings"
error: the server doesn't have a resource type "secrets"
Error from server (NotFound): the server could not find the requested resource
error: the server doesn't have a resource type "secrets"
error: the server doesn't have a resource type "services"
Gather remote logs
Log bundle written to ~/log-bundle.tar.gz

[rhel7] [08:33:25 AM]
[ccallega@~]$ scp core.195.207:~/log-bundle.tar.gz .
log-bundle.tar.gz                                                                                                                                                           100% 2222KB   8.6MB/s   00:00

Comment 2 Chris Callegari 2019-05-21 12:35:43 UTC
This cluster is called e0675dd7c3c5.

Logs are attached...

Comment 3 Chris Callegari 2019-05-21 12:37:06 UTC
Created attachment 1571601 [details]
log-bundle.tar.gz

Comment 4 Chris Callegari 2019-05-21 12:37:47 UTC
Created attachment 1571602 [details]
installer-logs.tar.gz

Comment 7 Brenton Leanhardt 2019-06-17 17:41:14 UTC
Hi Chris,

Any more tips for reproducing this?  We haven't been able to reproduce this.

Comment 8 Chris Callegari 2019-06-27 22:04:33 UTC
I've been deeply focused on Disconnected Install and havne't been able to retest.  Two other engineers also confirmed the return code before I submitted the Bugzilla.


If it can't be reproduced then someone picked it up and fixed it.  I'm good to close this BZ.