Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1557499

Summary: After etcd v2 to v3 migration, masters are restarted before persisting config changes to use storage-backend etcd3
Product: OpenShift Container Platform Reporter: Brenton Leanhardt <bleanhar>
Component: InstallerAssignee: Vadim Rutkovsky <vrutkovs>
Status: CLOSED ERRATA QA Contact: Weihua Meng <wmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.7.0CC: aos-bugs, bleanhar, bmorriso, jchaloup, jialiu, jliggitt, jokerman, mgugino, mmccomas, sdodson, wmeng
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1556936 Environment:
Last Closed: 2018-04-05 09:40:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1556936    
Bug Blocks:    

Comment 1 Brenton Leanhardt 2018-03-16 17:42:17 UTC
https://github.com/openshift/openshift-ansible/pull/7313

Comment 4 Weihua Meng 2018-03-20 16:12:46 UTC
Fixed.

openshift-ansible-3.7.39-1.git.0.75ad335.el7.noarch
etcd-3.2.15-1.el7.x86_64

Steps:
1. fresh install HA OCP v3.5.5.31.63
2. upgrade to 3.6 with openshift-ansible-3.6.173.0.110-1.git.0.ca81843.el7.noarch
3. create sa sa123 in project wmeng1
data in etcd2 and no etcd3 data on all etcd hosts
[root@wmengetcdv2-master-etcd-2 ~]# etcdctl3 get /kubernetes.io/serviceaccounts/wmeng1/sa123 --prefix --keys-only --endpoints=wmengetcdv2-master-etcd-1:2379
[root@wmengetcdv2-master-etcd-2 ~]#  etcdctl2 get /kubernetes.io/serviceaccounts/wmeng1/sa123
{"kind":"ServiceAccount","apiVersion":"v1","metadata":{"name":"sa123","namespace":"wmeng1","selfLink":"/api/v1/namespaces/wmeng1/serviceaccounts/sa123","uid":"10dc656a-2c3d-11e8-b8e4-42010af00037","creationTimestamp":"2018-03-20T12:49:02Z"},"secrets":[{"name":"sa123-token-b7kf5"},{"name":"sa123-dockercfg-pf1zd"}],"imagePullSecrets":[{"name":"sa123-dockercfg-pf1zd"}]}

4. etcd migration v2 to v3
with /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml in rpm openshift-ansible-3.7.39-1.git.0.75ad335.el7.noarch
finish successfuly.

5.  etcd check
# ansible -i rpmetcdgce35.inv masters -m shell -a "cat /etc/origin/master/master-config.yaml |grep -A 1 backend"
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-1.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-2.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

6. master api is restarted and running
# ansible -i rpmetcdgce35.inv masters -m shell -a "systemctl status atomic-openshift-master-api | grep Active"
wmengetcdv2-master-etcd-1.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since 二 2018-03-20 09:29:34 EDT; 1min 58s ago

wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since 二 2018-03-20 09:29:34 EDT; 1min 58s ago

wmengetcdv2-master-etcd-2.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since 二 2018-03-20 09:29:34 EDT; 1min 59s ago

7. data in both etcd2 and etcd3 in all etcd hosts
[root@wmengetcdv2-master-etcd-3 ~]# etcdctl2 get /kubernetes.io/serviceaccounts/wmeng1/sa123
{"kind":"ServiceAccount","apiVersion":"v1","metadata":{"name":"sa123","namespace":"wmeng1","selfLink":"/api/v1/namespaces/wmeng1/serviceaccounts/sa123","uid":"10dc656a-2c3d-11e8-b8e4-42010af00037","creationTimestamp":"2018-03-20T12:49:02Z"},"secrets":[{"name":"sa123-token-b7kf5"},{"name":"sa123-dockercfg-pf1zd"}],"imagePullSecrets":[{"name":"sa123-dockercfg-pf1zd"}]}

[root@wmengetcdv2-master-etcd-3 ~]# etcdctl3 get /kubernetes.io/serviceaccounts/wmeng1/sa123 --prefix --keys-only --endpoints=wmengetcdv2-master-etcd-1:2379
/kubernetes.io/serviceaccounts/wmeng1/sa123

8. restart all master api and controllors
# ansible -i rpmetcdgce35.inv masters -m service -a 'name=atomic-openshift-master-api state=restarted'
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS => {
# ansible -i rpmetcdgce35.inv masters -m service -a 'name=atomic-openshift-master-controllers state=restarted'
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS => {

9. master config is still etcd3
# ansible -i rpmetcdgce35.inv masters -m shell -a "cat /etc/origin/master/master-config.yaml | grep -A 1 backend"
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-1.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-2.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

10. all master api and controllers are  restarted and running

11. create sa sa789 in project wmeng3, data is in etcd3 and not in etcd2

12. upgrade cluster ocp 3.7 with openshift-ansible-3.7.39-1.git.0.75ad335.el7.noarch

13. check etcd version
# ansible -i rpmetcdgce35.inv masters -m shell -a "cat /etc/origin/master/master-config.yaml | grep -A 1 backend"
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-2.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-1.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

14. check all master api are restarted.
# ansible -i rpmetcdgce35.inv masters -m shell -a "systemctl status atomic-openshift-master-api | grep Active"
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since Tue 2018-03-20 10:24:13 EDT; 1h 9min ago

wmengetcdv2-master-etcd-2.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since Tue 2018-03-20 10:24:12 EDT; 1h 9min ago

wmengetcdv2-master-etcd-1.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since Tue 2018-03-20 10:24:13 EDT; 1h 9min ago

15. check sa123 in etcd2 and etcd3
[root@wmengetcdv2-master-etcd-1 ~]# etcdctl2 get /kubernetes.io/serviceaccounts/wmeng1/sa123
{"kind":"ServiceAccount","apiVersion":"v1","metadata":{"name":"sa123","namespace":"wmeng1","selfLink":"/api/v1/namespaces/wmeng1/serviceaccounts/sa123","uid":"10dc656a-2c3d-11e8-b8e4-42010af00037","creationTimestamp":"2018-03-20T12:49:02Z"},"secrets":[{"name":"sa123-token-b7kf5"},{"name":"sa123-dockercfg-pf1zd"}],"imagePullSecrets":[{"name":"sa123-dockercfg-pf1zd"}]}

[root@wmengetcdv2-master-etcd-1 ~]# etcdctl3 get /kubernetes.io/serviceaccounts/wmeng1/sa123 --prefix --keys-only --endpoints=wmengetcdv2-master-etcd-1:2379
/kubernetes.io/serviceaccounts/wmeng1/sa123

16. check sa456
# etcdctl3 get /kubernetes.io/serviceaccounts/wmeng2/sa456 --prefix --keys-only --endpoints=wmengetcdv2-master-etcd-1:2379
/kubernetes.io/serviceaccounts/wmeng2/sa456

# etcdctl2 get /kubernetes.io/serviceaccounts/wmeng2/sa456
Error:  100: Key not found (/kubernetes.io/serviceaccounts/wmeng2) [31130]

17. deploy s2i and check
# oc get pods
NAME                            READY     STATUS      RESTARTS   AGE
cakephp-mysql-example-1-build   0/1       Completed   0          8m
cakephp-mysql-example-1-gqz6p   1/1       Running     0          5m
mysql-1-t5xkg                   1/1       Running     0          8m

the Fix Looks Good to me.
No regression issue found.

Comment 8 errata-xmlrpc 2018-04-05 09:40:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0636