1576297 – etcd v3 migrate playbook does not have idempotency

Bug 1576297 - etcd v3 migrate playbook does not have idempotency

Summary: etcd v3 migrate playbook does not have idempotency

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Scott Dodson
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-09 07:45 UTC by Kenjiro Nakayama
Modified:	2021-06-10 16:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-09 12:57:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1564098	0	unspecified	CLOSED	etcd3 migrate playbook fails with single master topology due to handler error	2021-02-22 00:41:40 UTC

Internal Links: 1564098

Description Kenjiro Nakayama 2018-05-09 07:45:46 UTC

Description of problem:
- Once etcd v3 migrate playbook failed to complete but some keys were migrated, playbook fails "TASK [etcd : Check if there are any v3 data]".


Version-Release number of the following components:
- openshift-ansible-3.7.44


How reproducible: 100%

Steps to Reproduce:
1. Run "ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml"
2. (Failed to complete. e.g My customer failed due to bz#1564098)
3. Re-run "ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml"

Actual results:
- Failed following task:

   TASK [etcd : Check if there are any v3 data] ******************************************************************************************************************************************************************
   task path: /usr/share/ansible/openshift-ansible/roles/etcd/tasks/migration/check.yml:19
   changed: [foo.example.com] => {"changed": true, "cmd": ["etcdctl", "--cert", "/etc/etcd/peer.crt", "--key", "/etc/etcd/peer.key", "--cacert", "/etc/etcd/ca.crt", "--endpoints", "https://xx.xx.xx.xx:2379", "get", "", "--from-key", "--keys-only", "-w", "json", "--limit", "1"], "delta": "0:00:00.035706", "end": "2018-05-09 12:16:57.670671", "failed": false, "rc": 0, "start": "2018-05-09 12:16:57.634965", "stderr": "", "stderr_lines": [], "stdout": "{\"header\"  ... snip ... ",\"create_revision\":11511764,\"mod_revision\":11511764,\"version\":1}],\"more\":true,\"count\":1637}"]}

Expected results:
- Even though re-running ansible, complete playbook and tasks remained. 


Additional info:
- It is easy to continue the playbook by removing following lines,

https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/migration/check.yml#L30-L32
```
- fail:
    msg: "The etcd has at least one v3 key"
  when: "'count' in (l_etcdctl_output.stdout | from_json) and (l_etcdctl_output.stdout | from_json).count != 0"
```

However, docs mentions that v3 data could be overwritten if we migrate even though v3 keys exist.

https://coreos.com/etcd/docs/latest/op-guide/v2-migration.html
"Sometimes an etcd cluster will possibly have v3 data which should not be overwritten. In this case, the migration process may want to confirm no v3 data is committed before proceeding. One way to check the cluster has no v3 keys is to issue the following"

Comment 3 Scott Dodson 2018-05-09 12:57:01 UTC

This is a deliberate check to ensure that we never re-migrate a cluster. If at any point the migration fails you must restore from backup. It's not safe to re-migrate because the migration process does not properly reconcile changes made to either v2 or v3 keys.

If you're 100% certain that there's zero chance that any modification has taken place then you can disable that check by commenting it out and re-run the playbooks. But really the best thing to do is restore from backup and start over.

Note You need to log in before you can comment on or make changes to this bug.