Bug 1504593

Summary: Installer doesn't report the installer status correctly if openshift health checks failed
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: InstallerAssignee: Russell Teague <rteague>
Status: CLOSED ERRATA QA Contact: Gan Huang <ghuang>
Severity: low Docs Contact:
Priority: medium    
Version: 3.7.0CC: aos-bugs, jokerman, mmccomas, rteague
Target Milestone: ---   
Target Release: 3.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The OpenShift Health Checker was not part of an Installer Phase and was not reported after playbook execution. The OpenShift Health Checker section of the primary installer path has been moved to its own section and an installer 'phase' has been added to report on installer status.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:18:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gan Huang 2017-10-20 09:50:27 UTC
Description of problem:
If a openshift health check failure occurs during execution, installer didn't report the failed playbook in `INSTALLER STATUS`, instead it indicated that the whole playbook run successfully


Version-Release number of the following components:
openshift-ansible-3.7.0-0.143.7.git.0.3163a4f.el7.noarch.rpm

How reproducible:
always 

Steps to Reproduce:
1. Trigger installation with openshift health check enabled


Actual results:
Installer failed at openshift health check, but `INSTALLER STATUS` didn't report the failure.

PLAY RECAP *********************************************************************
localhost                  : ok=27   changed=0    unreachable=0    failed=0   
qe-jliu-37rpm-lb-1.1020-wcd.qe.rhcloud.com : ok=42   changed=13   unreachable=0    failed=0   
qe-jliu-37rpm-master-etcd-1.1020-wcd.qe.rhcloud.com : ok=44   changed=6    unreachable=0    failed=1   
qe-jliu-37rpm-master-etcd-2.1020-wcd.qe.rhcloud.com : ok=43   changed=6    unreachable=0    failed=1   
qe-jliu-37rpm-master-etcd-3.1020-wcd.qe.rhcloud.com : ok=43   changed=6    unreachable=0    failed=1   
qe-jliu-37rpm-node-primary-1.1020-wcd.qe.rhcloud.com : ok=42   changed=6    unreachable=0    failed=1   
qe-jliu-37rpm-node-primary-2.1020-wcd.qe.rhcloud.com : ok=42   changed=6    unreachable=0    failed=1   
qe-jliu-37rpm-node-registry-router-1.1020-wcd.qe.rhcloud.com : ok=42   changed=6    unreachable=0    failed=1   
qe-jliu-37rpm-node-registry-router-2.1020-wcd.qe.rhcloud.com : ok=42   changed=6    unreachable=0    failed=1   


INSTALLER STATUS ***************************************************************
Initialization             : Complete
etcd Install               : Complete
NFS Install                : Not Started
Load balancer Install      : Complete
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : Complete
GlusterFS Install          : Not Started
Hosted Install             : Complete
Metrics Install            : Not Started
Logging Install            : Not Started
Service Catalog Install    : Not Started


Failure summary:


  1. Hosts:    qe-jliu-37rpm-master-etcd-1.1020-wcd.qe.rhcloud.com, qe-jliu-37rpm-master-etcd-2.1020-wcd.qe.rhcloud.com, qe-jliu-37rpm-master-etcd-3.1020-wcd.qe.rhcloud.com, qe-jliu-37rpm-node-primary-1.1020-wcd.qe.rhcloud.com, qe-jliu-37rpm-node-primary-2.1020-wcd.qe.rhcloud.com, qe-jliu-37rpm-node-registry-router-1.1020-wcd.qe.rhcloud.com, qe-jliu-37rpm-node-registry-router-2.1020-wcd.qe.rhcloud.com
     Play:     Verify Requirements
     Task:     openshift_health_check
     Message:  One or more checks failed
     Details:  check "docker_image_availability":
               One or more required Docker images are not available:
                   registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.7.0,
                   registry.reg-aws.openshift.com:443/openshift3/ose-docker-registry:v3.7.0,
                   registry.reg-aws.openshift.com:443/openshift3/ose-haproxy-router:v3.7.0,
                   registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.7.0
               Configured registries: registry.reg-aws.openshift.com:443, registry.access.redhat.com
               Checked by: timeout 10 skopeo inspect --tls-verify=false docker://{registry}/{image}

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Russell Teague 2017-10-26 12:26:55 UTC
I already have a PR open to address this.
Proposed: https://github.com/openshift/openshift-ansible/pull/5742

Comment 2 Russell Teague 2017-10-27 16:21:08 UTC
Merged: https://github.com/openshift/openshift-ansible/pull/5742

Comment 3 Russell Teague 2017-10-31 18:53:18 UTC
$ git tag --contains c66536bc27db98232ba1e231cfdee48a72936d5b
openshift-ansible-3.7.0-0.184.0
openshift-ansible-3.7.0-0.185.0
openshift-ansible-3.7.0-0.186.0
openshift-ansible-3.7.0-0.187.0
openshift-ansible-3.7.0-0.188.0

Comment 5 Gan Huang 2017-11-01 06:00:21 UTC
Tested in openshift-ansible-3.7.0-0.188.0.git.0.aebb674.el7.noarch.rpm

Installer aborted and reported the solution.

INSTALLER STATUS ***************************************************************
Initialization             : Complete
Health Check               : In Progress
	This phase can be restarted by running: playbooks/byo/openshift-checks/pre-install.yml

Comment 8 errata-xmlrpc 2017-11-28 22:18:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188