Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1552252

Summary:	etcd migration playbook fails after upgrade to OCP 3.6
Product:	OpenShift Container Platform	Reporter:	Mark LaBonte <mlabonte>
Component:	Cluster Version Operator	Assignee:	Scott Dodson <sdodson>
Status:	CLOSED DUPLICATE	QA Contact:	liujia <jiajliu>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.6.0	CC:	aos-bugs, jokerman, mmccomas
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-03-06 22:00:03 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1724792

Description Mark LaBonte 2018-03-06 20:08:24 UTC

Description of problem:

After upgrading OCP from 3.5 to 3.6. the etcd migration playbook failed, complaining about missing snapshots.

The OCP documentation doesn't mention any prerequisites to running the migration playbook.



How reproducible:

Steps to Reproduce:
1. Perform a clean install of OCP 3.5
2. Upgrade to OCP 3.6
3. Run /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml

Actual results:

2018-03-06 14:51:35,289 p=32085 u=root |  TASK [etcd_migrate : Get member item health status] ****************************
2018-03-06 14:51:35,471 p=32085 u=root |  ok: [10.8.32.185] => (item=member 8e9e05c52164694d is healthy: got healthy result from https://10.8.32.185:2379)
2018-03-06 14:51:35,471 p=32085 u=root |  skipping: [10.8.32.185] => (item=cluster is healthy)
2018-03-06 14:51:35,483 p=32085 u=root |  TASK [etcd_migrate : Check the etcd cluster health] ****************************
2018-03-06 14:51:35,504 p=32085 u=root |  skipping: [10.8.32.185]
2018-03-06 14:51:35,515 p=32085 u=root |  TASK [etcd_migrate : Check if there is at least one v2 snapshot] ***************
2018-03-06 14:51:36,018 p=32085 u=root |  ok: [10.8.32.185]
2018-03-06 14:51:36,027 p=32085 u=root |  TASK [etcd_migrate : fail] *****************************************************
2018-03-06 14:51:36,179 p=32085 u=root |  fatal: [10.8.32.185]: FAILED! => {
    "changed": false
}

MSG:

Before the migration can proceed the etcd member must write down at least one snapshot under /var/lib/etcd//member/snap directory.

2018-03-06 14:51:36,180 p=32085 u=root |        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.retry


Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Scott Dodson 2018-03-06 22:00:03 UTC

Snapshots are taken every 10,000 changes by etcd. In cleanly provisioned environments you may not have had a snapshot taken as there haven't been 10,000 database changes. A snapshot is requisite for v2 to v3 migration.

You can workaround this by setting a lower ETCD_SNAPSHOT_COUNT value in /etc/etcd/etcd.conf restarting etcd, checking for a snapshot in /var/lib/etcd/member/snap then commenting that line and restarting etcd again.

If you have multiple etcd hosts you'll need to do this on all of them prior to performing the migration.

*** This bug has been marked as a duplicate of bug 1501752 ***