Bug 1643417 - OCP on OSP provisioning of separate etcd hosts do not work
Summary: OCP on OSP provisioning of separate etcd hosts do not work
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Tzu-Mainn Chen
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-26 08:16 UTC by Øystein Bedin
Modified: 2019-01-10 09:05 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-10 09:04:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0024 0 None None None 2019-01-10 09:05:50 UTC

Description Øystein Bedin 2018-10-26 08:16:08 UTC
Description of problem:
When specifying 'openshift_openstack_num_etcd' in the inventory, the installation of OCP on OSP fails. This is due to openshift_node_group_name being a required field, yet it's empty for the etcd node(s). Populating the field in the OSP metadata and creating an etcd node_group in the inventory makes the install proceed, but it eventually fails with control plane pods not coming up. 


Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:
Set openshift_openstack_num_etcd in the inventory for a OCP on OSP deployment, then go through the steps to perform a provisioning. 


Steps to Reproduce:
1. Set openshift_openstack_num_etcd to 1 or higher (odd number)
2. Follow the steps in the OCP on OSP documentation to perform an install
3. Observe that the separate etcd host(s) is coming up 
4. Observe that the installation eventually fails

Actual results:
Installation of OCP on OSP fails with:


TASK [Validate openshift_node_groups and openshift_node_group_name] **************************************************************************************************************************************************************************
Friday 26 October 2018  08:12:49 +0000 (0:00:08.771)       0:07:26.061 ******** 
fatal: [master-0.openshift.obedin.osp.example.com]: FAILED! => {"msg": "last_checked_host: etcd-0.openshift.obedin.osp.example.com, last_checked_var: openshift_node_group_name;openshift_node_group_name must be defined for all nodes"}


Expected results:
Installation with separate etcd hosts succeeds 

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Tomas Sedovic 2018-10-26 13:46:24 UTC
Assigning to Mainn. I'm mostly working on openshift-install these days.

Comment 2 Tzu-Mainn Chen 2018-10-26 14:39:25 UTC
I had the impression that the separated etcd role was deprecated; I remember that we removed it from our documentation. Tomas, am I just imagining things?

Comment 3 Tzu-Mainn Chen 2018-10-26 16:29:43 UTC
Nevermind; looks like maybe we just left it out. We'll try and fix this.

Could you detail your workaround you put into place prior to the ultimate failure?

Comment 4 Øystein Bedin 2018-10-27 16:33:43 UTC
Hi Tzu-Mainn - I've opened a PR [1] with my "workaround" / fixes. Please review. 

PS: I am still running through some additional tests, but so far the testing has shown good results. 


[1] : https://github.com/openshift/openshift-ansible/pull/10541

Comment 5 Tzu-Mainn Chen 2018-10-29 13:14:43 UTC
I'll take a look today. Thanks!

Comment 6 Tzu-Mainn Chen 2018-10-31 13:24:32 UTC
PR has merged

Comment 8 Wei Sun 2018-11-06 05:46:43 UTC
Hi,please help check if this bug could be verified.Thanks!

Comment 12 weiwei jiang 2019-01-03 10:32:17 UTC
Checked with openshift-ansible-3.11.59-1 and etcd as separate hosts work well.

PLAY RECAP *********************************************************************************************************************************************************************************************************************************************************************
app-node-0.wjiang-ocp.example.com : ok=221  changed=105  unreachable=0    failed=0                                                                                                                                                                                              
etcd-0.wjiang-ocp.example.com : ok=128  changed=41   unreachable=0    failed=0                                                                                                                                                                                                  
etcd-1.wjiang-ocp.example.com : ok=110  changed=34   unreachable=0    failed=0                                                                                                                                                                                                  
infra-node-0.wjiang-ocp.example.com : ok=220  changed=105  unreachable=0    failed=0                                                                                                                                                                                            
lb-0.wjiang-ocp.example.com : ok=119  changed=24   unreachable=0    failed=0
localhost                  : ok=146  changed=18   unreachable=0    failed=0
master-0.wjiang-ocp.example.com : ok=741  changed=337  unreachable=0    failed=0
master-1.wjiang-ocp.example.com : ok=359  changed=161  unreachable=0    failed=0


INSTALLER STATUS **************************************************************************************************************************************************************************************************************************************************************$
Initialization              : Complete (0:00:28)
Health Check                : Complete (0:00:10)
Node Bootstrap Preparation  : Complete (0:09:24)
etcd Install                : Complete (0:03:13)
Load Balancer Install       : Complete (0:01:06)
Master Install              : Complete (0:13:16)
Master Additional Install   : Complete (0:02:04)
Node Join                   : Complete (0:00:51)
Hosted Install              : Complete (0:08:10)
Web Console Install         : Complete (0:01:10)
Console Install             : Complete (0:00:55)
metrics-server Install      : Complete (0:00:00)
Service Catalog Install     : Complete (0:04:55)
Thursday 03 January 2019  18:28:46 +0800 (0:00:00.024)       1:00:37.405 ******
===============================================================================

Comment 14 errata-xmlrpc 2019-01-10 09:04:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0024


Note You need to log in before you can comment on or make changes to this bug.