Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1552252

Summary: etcd migration playbook fails after upgrade to OCP 3.6
Product: OpenShift Container Platform Reporter: Mark LaBonte <mlabonte>
Component: Cluster Version OperatorAssignee: Scott Dodson <sdodson>
Status: CLOSED DUPLICATE QA Contact: liujia <jiajliu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-06 22:00:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1724792    

Description Mark LaBonte 2018-03-06 20:08:24 UTC
Description of problem:

After upgrading OCP from 3.5 to 3.6. the etcd migration playbook failed, complaining about missing snapshots.

The OCP documentation doesn't mention any prerequisites to running the migration playbook.



How reproducible:

Steps to Reproduce:
1. Perform a clean install of OCP 3.5
2. Upgrade to OCP 3.6
3. Run /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml

Actual results:

2018-03-06 14:51:35,289 p=32085 u=root |  TASK [etcd_migrate : Get member item health status] ****************************
2018-03-06 14:51:35,471 p=32085 u=root |  ok: [10.8.32.185] => (item=member 8e9e05c52164694d is healthy: got healthy result from https://10.8.32.185:2379)
2018-03-06 14:51:35,471 p=32085 u=root |  skipping: [10.8.32.185] => (item=cluster is healthy)
2018-03-06 14:51:35,483 p=32085 u=root |  TASK [etcd_migrate : Check the etcd cluster health] ****************************
2018-03-06 14:51:35,504 p=32085 u=root |  skipping: [10.8.32.185]
2018-03-06 14:51:35,515 p=32085 u=root |  TASK [etcd_migrate : Check if there is at least one v2 snapshot] ***************
2018-03-06 14:51:36,018 p=32085 u=root |  ok: [10.8.32.185]
2018-03-06 14:51:36,027 p=32085 u=root |  TASK [etcd_migrate : fail] *****************************************************
2018-03-06 14:51:36,179 p=32085 u=root |  fatal: [10.8.32.185]: FAILED! => {
    "changed": false
}

MSG:

Before the migration can proceed the etcd member must write down at least one snapshot under /var/lib/etcd//member/snap directory.

2018-03-06 14:51:36,180 p=32085 u=root |        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.retry


Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Scott Dodson 2018-03-06 22:00:03 UTC
Snapshots are taken every 10,000 changes by etcd. In cleanly provisioned environments you may not have had a snapshot taken as there haven't been 10,000 database changes. A snapshot is requisite for v2 to v3 migration.

You can workaround this by setting a lower ETCD_SNAPSHOT_COUNT value in /etc/etcd/etcd.conf restarting etcd, checking for a snapshot in /var/lib/etcd/member/snap then commenting that line and restarting etcd again.

If you have multiple etcd hosts you'll need to do this on all of them prior to performing the migration.

*** This bug has been marked as a duplicate of bug 1501752 ***