Bug 1556936

Summary: After etcd v2 to v3 migration, masters are restarted before persisting config changes to use storage-backend etcd3
Product: OpenShift Container Platform Reporter: bmorriso
Component: InstallerAssignee: Vadim Rutkovsky <vrutkovs>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.6.1CC: aos-bugs, bleanhar, erich, fcami, jchaloup, jliggitt, jokerman, mgugino, mmccomas, pdwyer, sdodson, wmeng
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openshift-ansible-3.7.38-1.git.0.77e88ab.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1557499 (view as bug list) Environment:
Last Closed: 2018-04-12 06:05:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1557499    

Description bmorriso 2018-03-15 14:58:31 UTC
Description of problem:

We have seen two instances now where, after a cluster was migrated from etcd v2 to v3, a master will revert to using v2 data after a restart of the atomic-openshift-master-api and atomic-openshift-master-controller services. 

In both cases, the clusters had been upgraded hours or days prior, and only after a restart of these services did they revert to using the old data.


Version-Release number of selected component (if applicable):
oc v3.6.173.0.96
kubernetes v1.6.1+5115d708d7


How reproducible:
We have seen this twice so far on two different clusters. 

Steps to Reproduce:
1. 
2. 
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jordan Liggitt 2018-03-15 16:23:47 UTC
etcd is at v3.1.3

Comment 3 Jan Chaloupka 2018-03-16 16:25:55 UTC
Upstream PR that fixes it: https://github.com/openshift/openshift-ansible/pull/7551

Comment 4 Michael Gugino 2018-03-16 17:04:51 UTC
This is already fixed in 3.7 and 3.6.  Fix for master has been picked: https://github.com/openshift/openshift-ansible/pull/7556

Comment 5 Michael Gugino 2018-03-16 17:33:58 UTC
Fix for 3.9: https://github.com/openshift/openshift-ansible/pull/7559

Comment 6 Michael Gugino 2018-03-16 17:34:34 UTC
3.6 and 3.7 merged 8 days ago:

3.7: https://github.com/openshift/openshift-ansible/pull/7313

3.6: https://github.com/openshift/openshift-ansible/pull/7226

Comment 7 Michael Gugino 2018-03-16 17:35:32 UTC
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1544399

Comment 16 liujia 2018-03-20 10:08:22 UTC
Tried both rpm and containerized etcd migration. Works well on openshift-ansible-3.6.173.0.110-1.git.0.ca81843.el7.noarch. After migration, new created data was stored in etcdv3 only.

Combined comment11&comment13&comment15, change bug status.

Comment 19 errata-xmlrpc 2018-04-12 06:05:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1106