Description of problem: The task 'Get member item health status' could not detect the cluster status when use the embedded etcd .the migrate fails. Version-Release number of selected component (if applicable): openshift-ansible/pull/4492 How reproducible: always Steps to Reproduce: 1. install OCP v3.5 with dedicated etcd clusters 2. upgrade to v3.6 3. migrate to etcd v3 anible-playbook openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml Actual results: TASK [etcd_migrate : Get member item health status] **************************** TASK [etcd_migrate : Check the etcd cluster health] **************************** fatal: [host-8-175-59.host.centralci.eng.rdu2.redhat.com]: FAILED! => { "changed": false, "failed": true } MSG: Etcd member 172.16.120.249 is not healthy to retry, use: --limit @/root/openshift-ansible/playbooks/byo/openshift-etcd/migrate.retry PLAY RECAP ********************************************************************* host-8-175-59.host.centralci.eng.rdu2.redhat.com : ok=12 changed=2 unreachable=0 failed=1 localhost : ok=10 changed=0 unreachable=0 failed=0 Expected results: Expected results: Additional info:
The example inventory [OSEv3:children] masters nodes [OSEv3:vars] ansible_ssh_user=root xxxx xxxx [masters] master.example.com [nodes] master.example.com node.example.com
The `etcd_migrate` should not be run at all if the [etcd] hosts list is empty. After updating the tasks and running the etcd_migrate over the oo_first_master, I get the following error: 2017-06-22 09:01:00.347242 I | etcdserver/membership: added member 395f2befffe04859 [https://172.16.186.29:7001] to cluster 0 2017-06-22 09:01:00.347442 N | etcdserver/membership: set the initial cluster version to 3.2 2017-06-22 09:01:00.347452 C | etcdserver/membership: cluster cannot be downgraded (current version: 3.1.9 is lower than determined cluster version: 3.2). # openshift version openshift v3.6.121 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 So before the embedded etcd migration is even run the etcd client x.y version must be at least as high as etcd server x.y version.
With the current implementation of the migration, we can not migrate embedded etcd. The migration workflow corresponds to (offline migration): 1. disable master API 2. disable etcd members 3. migrate each etcd member 4. enable etcd members 5. re-attach leases 6. validate data (optional atm) 7. enable master API Given the embedded etcd is part of the master, there is no such thing as "disable/enable etcd". So in order to re-attach leases, one needs to start the master API. Which makes the validation impossible (once the master is started, v3 data starts to change). At the same time the leases can not be re-attached as the master API is enable outside of the etcd_migrate role. At most we can only migrate the v2 data to v3. However, without any validation or leases re-attaching.
Or, we could run the etcd daemon for the time of the re-attaching (and validation). Once done, we just stop the daemon and continue. One disadvantage of the approach is assumption the etcd rpm will always be installed on the master host. So far the rpm has been installed due to etcdctl command. Once/If the etcd rpm gets split into etcd rpm (with etcd binary) and etcd-client rpm (with etcdctl binary), this approach will not work. Unless we install the etcd binary explicitly.
Jan, We can not build etcd cluster with embedded Etcd, so migrate needn't count etcd member. And Scott had mentioned that OCP will not support embedded Etcd, but I have not found the official announcement. I think we can give this bug low priority.
Upstream PR: https://github.com/openshift/openshift-ansible/pull/4558
The v2->v3 migration of an embedded etcd is depricated. Instead, one needs to run: 1. `playbooks/byo/openshift-etcd/embedded2external.yml` to migrate the embedded etcd to an external one (see https://github.com/openshift/openshift-ansible/pull/5672) 2. then `playbooks/byo/openshift-etcd/migrate.yml` to migrate the v2 data to v3 data Upstream PR to enforce the limitation: https://github.com/openshift/openshift-ansible/pull/5733