Bug 1712409 - Installer returns 0 return code when hitting a FATAL event
Summary: Installer returns 0 return code when hitting a FATAL event
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.0
Assignee: Abhinav Dahiya
QA Contact: sheng.lao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-21 12:34 UTC by Chris Callegari
Modified: 2019-07-03 00:56 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-03 00:56:30 UTC


Attachments (Terms of Use)
log-bundle.tar.gz (2.17 MB, application/gzip)
2019-05-21 12:37 UTC, Chris Callegari
no flags Details
installer-logs.tar.gz (1.27 MB, application/gzip)
2019-05-21 12:37 UTC, Chris Callegari
no flags Details

Description Chris Callegari 2019-05-21 12:34:18 UTC
Description of problem:
Installer returns 0 return code when hitting a FATAL event.  

The installer should return a non 0 return code when hitting a FATAL event.

Version-Release number of selected component (if applicable):
$ ~/bin/openshift-install version
/home/ccallega/bin/openshift-install v4.1.0-201905161311-dirty
built from commit 3b5a270b5246295938e8cc71a69d7a3b99a4df11
release image quay.io/openshift-release-dev/ocp-release@sha256:6f4cf2db7e63c4dba54496a72b83fec22c49293b520ff0cdb78f1e38b23f1ccb

How reproducible:
Always

Steps to Reproduce:
1. export OPENSHIFT_CLUSTER_NAME=blah
2. ~/bin/openshift-install --log-level debug --dir /tmp/openshift/${OPENSHIFT_CLUSTER_NAME} create install-config
3. Create invalid hash for registry.svc.ci.openshift.org element in pull secret
4. ~/bin/openshift-install --log-level debug --dir /tmp/openshift/${OPENSHIFT_CLUSTER_NAME} create cluster | tee /tmp/openshift/${OPENSHIFT_CLUSTER_NAME}/debug.local

Actual results:
DEBUG
DEBUG Apply complete! Resources: 141 added, 0 changed, 0 destroyed.
DEBUG
DEBUG The state of your infrastructure has been saved to the path
DEBUG below. This state is required to modify and destroy your
DEBUG infrastructure, so keep it safe. To inspect the complete state
DEBUG use the `terraform show` command.
DEBUG
DEBUG State path: /tmp/openshift-install-928363870/terraform.tfstate
DEBUG OpenShift Installer v4.1.0-201905161311-dirty
DEBUG Built from commit 3b5a270b5246295938e8cc71a69d7a3b99a4df11
INFO Waiting up to 30m0s for the Kubernetes API at https://api.e0675dd7c3c5.ccallegar-aws.sysdeseng.com:6443...
DEBUG Still waiting for the Kubernetes API: Get https://api.e0675dd7c3c5.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.216.52.32:6443: i/o timeout
DEBUG Still waiting for the Kubernetes API: Get https://api.e0675dd7c3c5.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.216.164.202:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource
... ... ...
DEBUG Fetching "Install Config"...
DEBUG Loading "Install Config"...
DEBUG   Loading "SSH Key"...
DEBUG   Using "SSH Key" loaded from state file
DEBUG   Loading "Base Domain"...
DEBUG     Loading "Platform"...
DEBUG     Using "Platform" loaded from state file
DEBUG   Using "Base Domain" loaded from state file
DEBUG   Loading "Cluster Name"...
DEBUG     Loading "Base Domain"...
DEBUG   Using "Cluster Name" loaded from state file
DEBUG   Loading "Pull Secret"...
DEBUG   Using "Pull Secret" loaded from state file
DEBUG   Loading "Platform"...
DEBUG Using "Install Config" loaded from state file
DEBUG Reusing previously-fetched "Install Config"
INFO Use the following commands to gather logs from the cluster
INFO ssh -A core@35.173.195.207 '/usr/local/bin/installer-gather.sh 10.0.130.71 10.0.152.89 10.0.172.120'
INFO scp core@35.173.195.207:~/log-bundle.tar.gz .
FATAL waiting for Kubernetes API: context deadline exceeded

[rhel7] [04:03:03 PM]
[ccallega@~]$ echo $?
0

Expected results:
FATAL waiting for Kubernetes API: context deadline exceeded

[rhel7] [04:03:03 PM]
[ccallega@~]$ echo $?
2 (or something not 0)

Additional info:

Comment 1 Chris Callegari 2019-05-21 12:34:50 UTC
[rhel7] [08:29:05 AM]
[ccallega@~]$ ssh -A core@35.173.195.207 '/usr/local/bin/installer-gather.sh 10.0.130.71 10.0.152.89 10.0.172.120'
The authenticity of host '35.173.195.207 (35.173.195.207)' can't be established.
ECDSA key fingerprint is SHA256:5zKwAiUCOxxJlQTcFmzGL4DPxVDk+5wQ3Irw68ElKXI.
ECDSA key fingerprint is MD5:e2:3e:31:47:67:38:4f:47:ab:12:7b:b8:eb:89:17:cf.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '35.173.195.207' (ECDSA) to the list of known hosts.
Gathering bootstrap journals ...
Gathering bootstrap containers ...
Gathering rendered assets...
Gathering cluster resources ...
Waiting for logs ...
error: the server doesn't have a resource type "nodes"
error: the server doesn't have a resource type "pods"
error: the server doesn't have a resource type "pods"
error: the server doesn't have a resource type "nodes"
error: the server doesn't have a resource type "apiservices"
error: the server doesn't have a resource type "clusteroperators"
error: the server doesn't have a resource type "clusterversion"
error: the server doesn't have a resource type "csr"
error: the server doesn't have a resource type "configmaps"
error: the server doesn't have a resource type "kubeapiserver"
error: the server doesn't have a resource type "endpoints"
error: the server doesn't have a resource type "machineconfigpools"
error: the server doesn't have a resource type "events"
error: the server doesn't have a resource type "nodes"
error: the server doesn't have a resource type "machineconfigs"
error: the server doesn't have a resource type "namespaces"
error: the server doesn't have a resource type "kubecontrollermanager"
error: the server doesn't have a resource type "pods"
error: the server doesn't have a resource type "openshiftapiserver"
error: the server doesn't have a resource type "roles"
error: the server doesn't have a resource type "rolebindings"
error: the server doesn't have a resource type "secrets"
Error from server (NotFound): the server could not find the requested resource
error: the server doesn't have a resource type "secrets"
error: the server doesn't have a resource type "services"
Gather remote logs
Log bundle written to ~/log-bundle.tar.gz

[rhel7] [08:33:25 AM]
[ccallega@~]$ scp core@35.173.195.207:~/log-bundle.tar.gz .
log-bundle.tar.gz                                                                                                                                                           100% 2222KB   8.6MB/s   00:00

Comment 2 Chris Callegari 2019-05-21 12:35:43 UTC
This cluster is called e0675dd7c3c5.

Logs are attached...

Comment 3 Chris Callegari 2019-05-21 12:37:06 UTC
Created attachment 1571601 [details]
log-bundle.tar.gz

Comment 4 Chris Callegari 2019-05-21 12:37:47 UTC
Created attachment 1571602 [details]
installer-logs.tar.gz

Comment 7 Brenton Leanhardt 2019-06-17 17:41:14 UTC
Hi Chris,

Any more tips for reproducing this?  We haven't been able to reproduce this.

Comment 8 Chris Callegari 2019-06-27 22:04:33 UTC
I've been deeply focused on Disconnected Install and havne't been able to retest.  Two other engineers also confirmed the return code before I submitted the Bugzilla.


If it can't be reproduced then someone picked it up and fixed it.  I'm good to close this BZ.


Note You need to log in before you can comment on or make changes to this bug.