Description of problem: When specifying 'openshift_openstack_num_etcd' in the inventory, the installation of OCP on OSP fails. This is due to openshift_node_group_name being a required field, yet it's empty for the etcd node(s). Populating the field in the OSP metadata and creating an etcd node_group in the inventory makes the install proceed, but it eventually fails with control plane pods not coming up. Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Set openshift_openstack_num_etcd in the inventory for a OCP on OSP deployment, then go through the steps to perform a provisioning. Steps to Reproduce: 1. Set openshift_openstack_num_etcd to 1 or higher (odd number) 2. Follow the steps in the OCP on OSP documentation to perform an install 3. Observe that the separate etcd host(s) is coming up 4. Observe that the installation eventually fails Actual results: Installation of OCP on OSP fails with: TASK [Validate openshift_node_groups and openshift_node_group_name] ************************************************************************************************************************************************************************** Friday 26 October 2018 08:12:49 +0000 (0:00:08.771) 0:07:26.061 ******** fatal: [master-0.openshift.obedin.osp.example.com]: FAILED! => {"msg": "last_checked_host: etcd-0.openshift.obedin.osp.example.com, last_checked_var: openshift_node_group_name;openshift_node_group_name must be defined for all nodes"} Expected results: Installation with separate etcd hosts succeeds Additional info: Please attach logs from ansible-playbook with the -vvv flag
Assigning to Mainn. I'm mostly working on openshift-install these days.
I had the impression that the separated etcd role was deprecated; I remember that we removed it from our documentation. Tomas, am I just imagining things?
Nevermind; looks like maybe we just left it out. We'll try and fix this. Could you detail your workaround you put into place prior to the ultimate failure?
Hi Tzu-Mainn - I've opened a PR [1] with my "workaround" / fixes. Please review. PS: I am still running through some additional tests, but so far the testing has shown good results. [1] : https://github.com/openshift/openshift-ansible/pull/10541
I'll take a look today. Thanks!
PR has merged
Hi,please help check if this bug could be verified.Thanks!
Checked with openshift-ansible-3.11.59-1 and etcd as separate hosts work well. PLAY RECAP ********************************************************************************************************************************************************************************************************************************************************************* app-node-0.wjiang-ocp.example.com : ok=221 changed=105 unreachable=0 failed=0 etcd-0.wjiang-ocp.example.com : ok=128 changed=41 unreachable=0 failed=0 etcd-1.wjiang-ocp.example.com : ok=110 changed=34 unreachable=0 failed=0 infra-node-0.wjiang-ocp.example.com : ok=220 changed=105 unreachable=0 failed=0 lb-0.wjiang-ocp.example.com : ok=119 changed=24 unreachable=0 failed=0 localhost : ok=146 changed=18 unreachable=0 failed=0 master-0.wjiang-ocp.example.com : ok=741 changed=337 unreachable=0 failed=0 master-1.wjiang-ocp.example.com : ok=359 changed=161 unreachable=0 failed=0 INSTALLER STATUS **************************************************************************************************************************************************************************************************************************************************************$ Initialization : Complete (0:00:28) Health Check : Complete (0:00:10) Node Bootstrap Preparation : Complete (0:09:24) etcd Install : Complete (0:03:13) Load Balancer Install : Complete (0:01:06) Master Install : Complete (0:13:16) Master Additional Install : Complete (0:02:04) Node Join : Complete (0:00:51) Hosted Install : Complete (0:08:10) Web Console Install : Complete (0:01:10) Console Install : Complete (0:00:55) metrics-server Install : Complete (0:00:00) Service Catalog Install : Complete (0:04:55) Thursday 03 January 2019 18:28:46 +0800 (0:00:00.024) 1:00:37.405 ****** ===============================================================================
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0024