Bug 1557499 - After etcd v2 to v3 migration, masters are restarted before persisting config changes to use storage-backend etcd3
Summary: After etcd v2 to v3 migration, masters are restarted before persisting config...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.7.z
Assignee: Vadim Rutkovsky
QA Contact: Weihua Meng
URL:
Whiteboard:
Depends On: 1556936
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-16 17:36 UTC by Brenton Leanhardt
Modified: 2018-04-05 09:41 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1556936
Environment:
Last Closed: 2018-04-05 09:40:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0636 0 None None None 2018-04-05 09:41:24 UTC

Comment 1 Brenton Leanhardt 2018-03-16 17:42:17 UTC
https://github.com/openshift/openshift-ansible/pull/7313

Comment 4 Weihua Meng 2018-03-20 16:12:46 UTC
Fixed.

openshift-ansible-3.7.39-1.git.0.75ad335.el7.noarch
etcd-3.2.15-1.el7.x86_64

Steps:
1. fresh install HA OCP v3.5.5.31.63
2. upgrade to 3.6 with openshift-ansible-3.6.173.0.110-1.git.0.ca81843.el7.noarch
3. create sa sa123 in project wmeng1
data in etcd2 and no etcd3 data on all etcd hosts
[root@wmengetcdv2-master-etcd-2 ~]# etcdctl3 get /kubernetes.io/serviceaccounts/wmeng1/sa123 --prefix --keys-only --endpoints=wmengetcdv2-master-etcd-1:2379
[root@wmengetcdv2-master-etcd-2 ~]#  etcdctl2 get /kubernetes.io/serviceaccounts/wmeng1/sa123
{"kind":"ServiceAccount","apiVersion":"v1","metadata":{"name":"sa123","namespace":"wmeng1","selfLink":"/api/v1/namespaces/wmeng1/serviceaccounts/sa123","uid":"10dc656a-2c3d-11e8-b8e4-42010af00037","creationTimestamp":"2018-03-20T12:49:02Z"},"secrets":[{"name":"sa123-token-b7kf5"},{"name":"sa123-dockercfg-pf1zd"}],"imagePullSecrets":[{"name":"sa123-dockercfg-pf1zd"}]}

4. etcd migration v2 to v3
with /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml in rpm openshift-ansible-3.7.39-1.git.0.75ad335.el7.noarch
finish successfuly.

5.  etcd check
# ansible -i rpmetcdgce35.inv masters -m shell -a "cat /etc/origin/master/master-config.yaml |grep -A 1 backend"
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-1.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-2.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

6. master api is restarted and running
# ansible -i rpmetcdgce35.inv masters -m shell -a "systemctl status atomic-openshift-master-api | grep Active"
wmengetcdv2-master-etcd-1.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since 二 2018-03-20 09:29:34 EDT; 1min 58s ago

wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since 二 2018-03-20 09:29:34 EDT; 1min 58s ago

wmengetcdv2-master-etcd-2.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since 二 2018-03-20 09:29:34 EDT; 1min 59s ago

7. data in both etcd2 and etcd3 in all etcd hosts
[root@wmengetcdv2-master-etcd-3 ~]# etcdctl2 get /kubernetes.io/serviceaccounts/wmeng1/sa123
{"kind":"ServiceAccount","apiVersion":"v1","metadata":{"name":"sa123","namespace":"wmeng1","selfLink":"/api/v1/namespaces/wmeng1/serviceaccounts/sa123","uid":"10dc656a-2c3d-11e8-b8e4-42010af00037","creationTimestamp":"2018-03-20T12:49:02Z"},"secrets":[{"name":"sa123-token-b7kf5"},{"name":"sa123-dockercfg-pf1zd"}],"imagePullSecrets":[{"name":"sa123-dockercfg-pf1zd"}]}

[root@wmengetcdv2-master-etcd-3 ~]# etcdctl3 get /kubernetes.io/serviceaccounts/wmeng1/sa123 --prefix --keys-only --endpoints=wmengetcdv2-master-etcd-1:2379
/kubernetes.io/serviceaccounts/wmeng1/sa123

8. restart all master api and controllors
# ansible -i rpmetcdgce35.inv masters -m service -a 'name=atomic-openshift-master-api state=restarted'
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS => {
# ansible -i rpmetcdgce35.inv masters -m service -a 'name=atomic-openshift-master-controllers state=restarted'
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS => {

9. master config is still etcd3
# ansible -i rpmetcdgce35.inv masters -m shell -a "cat /etc/origin/master/master-config.yaml | grep -A 1 backend"
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-1.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-2.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

10. all master api and controllers are  restarted and running

11. create sa sa789 in project wmeng3, data is in etcd3 and not in etcd2

12. upgrade cluster ocp 3.7 with openshift-ansible-3.7.39-1.git.0.75ad335.el7.noarch

13. check etcd version
# ansible -i rpmetcdgce35.inv masters -m shell -a "cat /etc/origin/master/master-config.yaml | grep -A 1 backend"
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-2.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

wmengetcdv2-master-etcd-1.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
    storage-backend:
    - etcd3

14. check all master api are restarted.
# ansible -i rpmetcdgce35.inv masters -m shell -a "systemctl status atomic-openshift-master-api | grep Active"
wmengetcdv2-master-etcd-3.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since Tue 2018-03-20 10:24:13 EDT; 1h 9min ago

wmengetcdv2-master-etcd-2.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since Tue 2018-03-20 10:24:12 EDT; 1h 9min ago

wmengetcdv2-master-etcd-1.0320-ybh.qe.rhcloud.com | SUCCESS | rc=0 >>
   Active: active (running) since Tue 2018-03-20 10:24:13 EDT; 1h 9min ago

15. check sa123 in etcd2 and etcd3
[root@wmengetcdv2-master-etcd-1 ~]# etcdctl2 get /kubernetes.io/serviceaccounts/wmeng1/sa123
{"kind":"ServiceAccount","apiVersion":"v1","metadata":{"name":"sa123","namespace":"wmeng1","selfLink":"/api/v1/namespaces/wmeng1/serviceaccounts/sa123","uid":"10dc656a-2c3d-11e8-b8e4-42010af00037","creationTimestamp":"2018-03-20T12:49:02Z"},"secrets":[{"name":"sa123-token-b7kf5"},{"name":"sa123-dockercfg-pf1zd"}],"imagePullSecrets":[{"name":"sa123-dockercfg-pf1zd"}]}

[root@wmengetcdv2-master-etcd-1 ~]# etcdctl3 get /kubernetes.io/serviceaccounts/wmeng1/sa123 --prefix --keys-only --endpoints=wmengetcdv2-master-etcd-1:2379
/kubernetes.io/serviceaccounts/wmeng1/sa123

16. check sa456
# etcdctl3 get /kubernetes.io/serviceaccounts/wmeng2/sa456 --prefix --keys-only --endpoints=wmengetcdv2-master-etcd-1:2379
/kubernetes.io/serviceaccounts/wmeng2/sa456

# etcdctl2 get /kubernetes.io/serviceaccounts/wmeng2/sa456
Error:  100: Key not found (/kubernetes.io/serviceaccounts/wmeng2) [31130]

17. deploy s2i and check
# oc get pods
NAME                            READY     STATUS      RESTARTS   AGE
cakephp-mysql-example-1-build   0/1       Completed   0          8m
cakephp-mysql-example-1-gqz6p   1/1       Running     0          5m
mysql-1-t5xkg                   1/1       Running     0          8m

the Fix Looks Good to me.
No regression issue found.

Comment 8 errata-xmlrpc 2018-04-05 09:40:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0636


Note You need to log in before you can comment on or make changes to this bug.