Description of problem: - Once etcd v3 migrate playbook failed to complete but some keys were migrated, playbook fails "TASK [etcd : Check if there are any v3 data]". Version-Release number of the following components: - openshift-ansible-3.7.44 How reproducible: 100% Steps to Reproduce: 1. Run "ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml" 2. (Failed to complete. e.g My customer failed due to bz#1564098) 3. Re-run "ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml" Actual results: - Failed following task: TASK [etcd : Check if there are any v3 data] ****************************************************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/roles/etcd/tasks/migration/check.yml:19 changed: [foo.example.com] => {"changed": true, "cmd": ["etcdctl", "--cert", "/etc/etcd/peer.crt", "--key", "/etc/etcd/peer.key", "--cacert", "/etc/etcd/ca.crt", "--endpoints", "https://xx.xx.xx.xx:2379", "get", "", "--from-key", "--keys-only", "-w", "json", "--limit", "1"], "delta": "0:00:00.035706", "end": "2018-05-09 12:16:57.670671", "failed": false, "rc": 0, "start": "2018-05-09 12:16:57.634965", "stderr": "", "stderr_lines": [], "stdout": "{\"header\" ... snip ... ",\"create_revision\":11511764,\"mod_revision\":11511764,\"version\":1}],\"more\":true,\"count\":1637}"]} Expected results: - Even though re-running ansible, complete playbook and tasks remained. Additional info: - It is easy to continue the playbook by removing following lines, https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/migration/check.yml#L30-L32 ``` - fail: msg: "The etcd has at least one v3 key" when: "'count' in (l_etcdctl_output.stdout | from_json) and (l_etcdctl_output.stdout | from_json).count != 0" ``` However, docs mentions that v3 data could be overwritten if we migrate even though v3 keys exist. https://coreos.com/etcd/docs/latest/op-guide/v2-migration.html "Sometimes an etcd cluster will possibly have v3 data which should not be overwritten. In this case, the migration process may want to confirm no v3 data is committed before proceeding. One way to check the cluster has no v3 keys is to issue the following"
This is a deliberate check to ensure that we never re-migrate a cluster. If at any point the migration fails you must restore from backup. It's not safe to re-migrate because the migration process does not properly reconcile changes made to either v2 or v3 keys. If you're 100% certain that there's zero chance that any modification has taken place then you can disable that check by commenting it out and re-run the playbooks. But really the best thing to do is restore from backup and start over.