Bug 1399984

Summary: [etcd3]Failed to install ose-3.1 with etcd3
Product: OpenShift Container Platform Reporter: Wenkai Shi <weshi>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED WONTFIX QA Contact: Wenkai Shi <weshi>
Severity: high Docs Contact:
Priority: medium    
Version: 3.1.0CC: aos-bugs, bleanhar, jchaloup, jokerman, mbarrett, mmccomas, weshi, wmeng, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-22 21:10:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Wenkai Shi 2016-11-30 07:53:40 UTC
Description of problem:
So far, version 3 etcd named etcd3. For testing etcd3, we modified the code of openshift-ansible, update package name from "etcd" to "etcd3" to let it install etcd3 by default.
Prepare a new env with 3 etcd node, installation get failed when start etcd.service.

In v3.1 openshift-ansible playbook, the behaviour of deploying etcd cluster is to deploy the fisrt etcd server, wait the etcd server started, and then deploy others. 
For etcd3, before others are deployed, the first etcd node can not find the other two etcd servers. then the first etcd server would never get started. While for etcd2 in the same case, the fist etcd server start successfully even if the other two etcd nodes are not installed yet.

So this issue will not happen in v3.2, v3.3, v3.4.  Because the behaviours in 3.2/3.3/3.4 openshift-ansible playbook are different, In those versions, it will deploy all etcd servers one by one, then wait for etcd server started.

Version-Release number of selected component (if applicable):
openshift-ansible-3.0.98-1
openshift v3.1.1.8
etcd3-3.0.3-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Modified the code of openshift-ansible like listed:
[root@ansible ~]# vim /usr/share/ansible/openshift-ansible/roles/etcd/tasks/main.yml
...
- name: Install etcd
  action: "{{ ansible_pkg_mgr }} name=etcd3 state=present"
  when: not etcd_is_containerized | bool
...
2.Prepare a env, make sure it has a etcd cluster which is composed of 3 etcd servers.
3.

Actual results:
Installation failed

[root@ansible ~]# ansible-playboos -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
...
PLAY [Configure first etcd host] **********************************************
...
TASK: [etcd | Enable etcd] **************************************************** 
failed: [etcd1.example.com] => {"failed": true}
msg: Job for etcd.service failed because a timeout was exceeded. See "systemctl status etcd.service" and "journalctl -xe" for details.
...

Expected results:
Installation succeed.

Additional info:

[root@ansible ~]# cat hosts
...
[masters]
master.example.com
node.example.com
[nodes]
master1.example.com
master2.example.com

node1.example.com
node2.example.com

[etcd]
etcd1.example.com 
etcd2.example.com
etcd3.example.com

[lb]
lb.example.com

[nfs]
nfs.example.com

Comment 2 Wenkai Shi 2016-11-30 09:27:17 UTC
Additional info:
Sorry for typo mistake.

[root@ansible ~]# cat hosts
...
[masters]
master1.example.com
master2.example.com
[nodes]
master1.example.com
master2.example.com

node1.example.com
node2.example.com

[etcd]
etcd1.example.com 
etcd2.example.com
etcd3.example.com

[lb]
lb.example.com

[nfs]
nfs.example.com

Comment 3 Scott Dodson 2017-01-19 15:54:39 UTC
This shouldn't be an issue with the packaging updates in RHEL 7.3.2 because etcd-3.0.x obsoletes etcd3. Can you please test this again?

Comment 4 Scott Dodson 2017-01-19 15:55:00 UTC
If this is no longer an issue we should CLOSED NOTABUG

Comment 6 Jan Chaloupka 2017-02-13 13:45:59 UTC
Hi Wenkai,

it does not make much sense to run openshift 3.1 with etcd-3.*. Openshift 3.1 is derived from Kubernetes 1.2-alpha-7 which at the time did know anything about etcd v3. The Kubernetes 1.2-alpha-7 is using 2.2.2-4 which is internally (and possible externally) lot different from v3.

Do we support deployment of ose-1.3 with etcd > 3 at all?

Comment 8 Wenkai Shi 2017-02-14 02:37:41 UTC
(In reply to Jan Chaloupka from comment #6)
> Hi Wenkai,
> 
> it does not make much sense to run openshift 3.1 with etcd-3.*. Openshift
> 3.1 is derived from Kubernetes 1.2-alpha-7 which at the time did know
> anything about etcd v3. The Kubernetes 1.2-alpha-7 is using 2.2.2-4 which is
> internally (and possible externally) lot different from v3.
> 
> Do we support deployment of ose-1.3 with etcd > 3 at all?

Hi~

We didn't support deployment of Openshift3.1 with etcd3 officially, because there is no etcd3 when Openshift3.1 release. But think about this, if a customer want to deploy Openshift3.1 at this moment, the version of etcd will be 3.*.

Comment 10 Scott Dodson 2017-03-30 20:02:45 UTC
https://github.com/openshift/openshift-ansible/pull/3811 proposed fix

Comment 12 Wenkai Shi 2017-06-09 09:48:14 UTC
Check with version openshift-ansible-3.0.101-1.git.0.4d5c0f5.el7aos.noarch, installation failed, seems like the oo_first_etcd is missing.

# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
...
PLAY [Configure etcd certificates] ******************************************** 

GATHERING FACTS *************************************************************** 
FATAL: no hosts matched or all hosts have already failed -- aborting


TASK: [openshift_facts | Verify Ansible version is greater than or equal to 1.9.4] *** 
FATAL: no hosts matched or all hosts have already failed -- aborting
...

Comment 13 Scott Dodson 2017-06-09 20:08:24 UTC
Wenkai,

Are you using ansible-1.9.4 or ansible 2.x? This version of the installer is not compatible with 2.x.

Comment 14 Wenkai Shi 2017-06-12 02:17:23 UTC
(In reply to Scott Dodson from comment #13)
> Wenkai,
> 
> Are you using ansible-1.9.4 or ansible 2.x? This version of the installer is
> not compatible with 2.x.

Hi, I'm using ansible-1.9.4 to check this.

# rpm -q ansible
ansible-1.9.4-1.el7aos.noarch

Comment 15 Jan Chaloupka 2017-06-19 13:47:23 UTC
Wenkai, have you been able to verify the fix with Ansible-2.x?

Comment 16 Wenkai Shi 2017-06-20 05:47:24 UTC
(In reply to Jan Chaloupka from comment #15)
> Wenkai, have you been able to verify the fix with Ansible-2.x?

Try with Ansible-2.x, It doesn't works.

# rpm -q ansible
ansible-2.2.3.0-1.el7.noarch

# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
...
TASK [add_host] ****************************************************************
Tuesday 20 June 2017  04:44:43 +0000 (0:00:00.015)       0:00:00.052 ********** 
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ImportError: cannot import name bool
fatal: [localhost]: FAILED! => {"failed": true, "msg": "Unexpected failure during module execution.", "stdout": ""}
...

Comment 17 Jan Chaloupka 2017-06-20 09:50:01 UTC
How did you install the Ansible? It looks like missing python module.

Comment 18 Wenkai Shi 2017-06-20 09:57:50 UTC
(In reply to Jan Chaloupka from comment #17)
> How did you install the Ansible? It looks like missing python module.

I install it with yum install command, with this version of ansible, I can install OCP 36 env well.

Comment 19 Scott Dodson 2017-06-22 21:10:20 UTC
Moving this to WONTFIX until we get a customer case associated with it. The likelihood of anyone installing 3.1 right now is very low.