Bug 1390064 - [quick install] a complete installed cluster was reported as a mix of installed and uninstalled env
Summary: [quick install] a complete installed cluster was reported as a mix of instal...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Tim Bielawa
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-31 06:52 UTC by liujia
Modified: 2017-03-08 18:43 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: a-o-i was considering extra hosts when determining if the target HA environment is a mix of installed and uninstalled hosts. Consequence: The comparison failed and incorrectly would report that a fully installed environment was actually a mix of installed and uninstalled. Fix: Non-masters and non-nodes were removed from the comparison. Result: Installed HA environments are correctly detected.
Clone Of:
Environment:
Last Closed: 2017-01-18 12:47:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description liujia 2016-10-31 06:52:59 UTC
Description of problem:
Triger an install agaginst existed env, the detected result is a mix of installed and uninstalled env because just master and node host are checked for installed hosts while all hosts are counted when judge the cluster is an "all installed env" or "mix of installed and uninstalled env".

Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.4.13-1.git.0.ff1d588.el7.noarch

How reproducible:
always

Steps to Reproduce:
1.Trigger an install against existed cluster with HA env
#atomic-openshift-installer install

Actual results:
After "gathering information from hosts", it will show an wrong detected result as following:
A mix of installed and uninstalled hosts have been detected in your environment.
Please make sure your environment was installed successfully before adding new nodes.

Expected results:
The detected result should be "All specified hosts in specified environment are installed."

Comment 1 Scott Dodson 2016-11-01 15:08:34 UTC
Need quick installer config file and logs from the install run.

Comment 2 Tim Bielawa 2016-11-01 16:43:29 UTC
I was able to reproduce this today

> *** Installation Summary ***
>
> Hosts:
> - m01.example.com
>   - OpenShift master
>   - OpenShift node (Unscheduled)
>   - Etcd (Embedded)
>   - Storage
> - n01.example.com
>   - OpenShift node (Dedicated)
> - n02.example.com
>   - OpenShift node (Dedicated)
> - l01.example.com
>   - Load Balancer (Preconfigured)
>
> Gathering information from hosts...
>
> A mix of installed and uninstalled hosts have been detected in your environment.
> Please make sure your environment was installed successfully before adding new nodes.
>
> Do you want to (re)install the environment?

To reproduce this I was required to enter a hostname/ip address for the high-availability question. I also answered 'N' for the "reference HAProxy LB" question.

If I skip the high-availability question the reported bug does not surface:

> *** Installation Summary ***
> 
> Hosts:
> - m01.example.com
>   - OpenShift master
>   - OpenShift node (Unscheduled)
>   - Etcd (Embedded)
>   - Storage
> - n01.example.com
>   - OpenShift node (Dedicated)
> - n02.example.com
>   - OpenShift node (Dedicated)
>
> Gathering information from hosts...
> All specified hosts in specified environment are installed.
> Do you want to (re)install the environment?

Interesting.

Comment 3 liujia 2016-11-02 02:12:09 UTC
(In reply to Tim Bielawa from comment #2)
> I was able to reproduce this today
> 
> > *** Installation Summary ***
> >
> > Hosts:
> > - m01.example.com
> >   - OpenShift master
> >   - OpenShift node (Unscheduled)
> >   - Etcd (Embedded)
> >   - Storage
> > - n01.example.com
> >   - OpenShift node (Dedicated)
> > - n02.example.com
> >   - OpenShift node (Dedicated)
> > - l01.example.com
> >   - Load Balancer (Preconfigured)
> >
> > Gathering information from hosts...
> >
> > A mix of installed and uninstalled hosts have been detected in your environment.
> > Please make sure your environment was installed successfully before adding new nodes.
> >
> > Do you want to (re)install the environment?
> 
> To reproduce this I was required to enter a hostname/ip address for the
> high-availability question. I also answered 'N' for the "reference HAProxy
> LB" question.
> 
> If I skip the high-availability question the reported bug does not surface:
> 
> > *** Installation Summary ***
> > 
> > Hosts:
> > - m01.example.com
> >   - OpenShift master
> >   - OpenShift node (Unscheduled)
> >   - Etcd (Embedded)
> >   - Storage
> > - n01.example.com
> >   - OpenShift node (Dedicated)
> > - n02.example.com
> >   - OpenShift node (Dedicated)
> >
> > Gathering information from hosts...
> > All specified hosts in specified environment are installed.
> > Do you want to (re)install the environment?
> 
> Interesting.

Tim, u are right. If u input LB's hostname, then there will be a lb host in config file, which will result that checked installed host number is less than total host number in config file.

It seems that we just check master and node host for if it is installed host or not, but all hosts in config file will be counted when judge the cluster is an "all installed env" or "mix of installed and uninstalled env".

Comment 4 liujia 2016-11-02 02:17:02 UTC
I added some print info for line727-line737 in cli_isntaller.py and prove it should be the cause.

Comment 5 Tim Bielawa 2016-11-02 17:11:40 UTC
Johnny Liu, a fix has been written and merged into master. The bug should be fixed now.

Comment 10 Tim Bielawa 2016-11-04 18:27:52 UTC
New PR opened which will hopefully work correctly this time.

https://github.com/openshift/openshift-ansible/pull/2729


Also, you can run the atomic-openshift-installer with the '-d' option to produce a debug file in /tmp/installer.txt. When you see those 'INSTALLER_LOG.debug()' lines, that's how you get the information they are debugging.

Can you please re-run the tests against the new PR? If you have mock and tito installed you can run these commands to build and install the a-o-i package locally

> rm -rf /tmp/tito
> tito build --test --rpm
> sudo dnf erase -y "openshift-ansible*" "atomic-openshift-utils"
> sudo dnf install -y /tmp/tito/noarch/*

Comment 12 Tim Bielawa 2016-11-07 17:45:13 UTC
Waiting on PR to get merged before setting back to MODIFIED.

Comment 13 openshift-github-bot 2016-11-08 16:27:38 UTC
Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/be5fbab1e3c11458f7ec979346e627923f3efe79
Merge pull request #2729 from tbielawa/BZ1390064

Bug 1390064 - [quick install] a complete installed cluster was reported as a mix of installed and uninstalled env

Comment 15 liujia 2016-11-10 04:45:33 UTC
Version:
atomic-openshift-utils-3.4.20-1.git.0.2031d1e.el7.noarch

Step:
1. Install OCP 3.4 in ha env
2. Trigger an install against existed cluster and still input all host above
#atomic-openshift-installer install

Result:
The check result hint all specified hosts in specified environment are installed.

Comment 17 errata-xmlrpc 2017-01-18 12:47:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.