Description of problem: Upgrade v3.6 to v3.7 against non-ha containerized ocp. Upgrade succeed with new master-api and master-controllers containers created and services are running. But it seems just an illusion and in fact it is original master service works. After stop master service, then ocp does not work. for example, "oc get" can get nothing because 8443 port dost not in "listen" status. ===============after upgrade # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 259c947e2ec0 openshift3/ose:v3.7.0 "/usr/bin/openshift s" 9 minutes ago Up 8 minutes atomic-openshift-master-controllers 66efdbd1938e openshift3/node:v3.7.0 "/usr/local/bin/origi" 9 minutes ago Up 9 minutes atomic-openshift-node 4016451d84d5 openshift3/ose:v3.7.0 "/usr/bin/openshift s" 10 minutes ago Up 10 minutes atomic-openshift-master-api cec043f69378 openshift3/openvswitch:v3.7.0 "/usr/local/bin/ovs-r" 10 minutes ago Up 10 minutes openvswitch 9e0d00a1c244 registry.access.redhat.com/rhel7/etcd "/usr/bin/etcd" 10 minutes ago Up 10 minutes etcd_container a2c4b8ba4e62 openshift3/ose:v3.6.173.0.59 "/usr/bin/openshift s" 12 minutes ago Up 12 minutes atomic-openshift-master # netstat -na|grep 8443 tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN tcp 0 0 10.240.0.85:45988 10.240.0.85:8443 ESTABLISHED ... # oc get node NAME STATUS AGE VERSION qe-jliu-con2-master-etcd-1 Ready,SchedulingDisabled 1h v1.7.6+a08f5eeb62 qe-jliu-con2-node-registry-router-1 Ready 1h v1.7.6+a08f5eeb62 ===========after stop original master service and just keep api and controllers serivces # systemctl stop atomic-openshift-master # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 66efdbd1938e openshift3/node:v3.7.0 "/usr/local/bin/origi" 14 minutes ago Up 13 minutes atomic-openshift-node 4016451d84d5 openshift3/ose:v3.7.0 "/usr/bin/openshift s" 14 minutes ago Up 14 minutes atomic-openshift-master-api cec043f69378 openshift3/openvswitch:v3.7.0 "/usr/local/bin/ovs-r" 14 minutes ago Up 14 minutes openvswitch 9e0d00a1c244 registry.access.redhat.com/rhel7/etcd "/usr/bin/etcd" 14 minutes ago Up 14 minutes etcd_container # oc get node The connection to the server qe-jliu-con2-master-etcd-1:8443 was refused - did you specify the right host or port? # netstat -na|grep 8443 tcp 0 0 10.240.0.85:45988 10.240.0.85:8443 TIME_WAIT tcp 0 0 10.240.0.85:46006 10.240.0.85:8443 TIME_WAIT Version-Release number of the following components: openshift-ansible-docs-3.7.0-0.178.0.git.0.27a1039.el7.noarch How reproducible: always Steps to Reproduce: 1. Container install v3.6 for non-ha deployment. 2. Upgrade v3.6 to v3.7 3. Actual results: Master api and master controller services does not work. Expected results: Master api and master controller services should work instead of original master service. Additional info: When re-run upgrade will hit the issue too even if not stop original master service manually.
Checking my environment in which I tested the upgrade. #### Listing master services #### # systemctl list-units atomic-openshift-master* UNIT LOAD ACTIVE SUB DESCRIPTION atomic-openshift-master-api.service loaded active running Atomic OpenShift Master API atomic-openshift-master-controllers.service loaded active running Atomic OpenShift Master Controllers ● atomic-openshift-master.service loaded failed failed atomic-openshift-master.service #### Listing nodes #### # oc get nodes NAME STATUS AGE VERSION 172.16.186.5 Ready 2d v1.7.0+80709908fd #### Listing master services #### # systemctl status atomic-openshift-master-api.service atomic-openshift-master-controllers.service atomic-openshift-master ● atomic-openshift-master-api.service - Atomic OpenShift Master API Loaded: loaded (/etc/systemd/system/atomic-openshift-master-api.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2017-10-23 10:21:36 EDT; 1 day 21h ago Docs: https://github.com/openshift/origin Main PID: 29553 (docker-current) Memory: 3.2M CGroup: /system.slice/atomic-openshift-master-api.service └─29553 /usr/bin/docker-current run --rm --privileged --net=host --name atomic-openshift-master-api --env-file=/etc/sysconfig/atomic-openshift-master-api -v /var/lib/origin:/var/lib/origin -v /var/log:/var/log -v /var/run/do... Oct 25 08:12:19 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-api[29553]: I1025 12:12:19.981958 1 rest.go:349] Starting watch for /api/v1/secrets, rv=180901 labels= fields= timeout=6m1s Oct 25 08:12:20 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-api[29553]: E1025 12:12:20.007223 1 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted Oct 25 08:12:20 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-api[29553]: W1025 12:12:20.007476 1 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/inf...n compacted Oct 25 08:12:20 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-api[29553]: I1025 12:12:20.484626 1 rest.go:349] Starting watch for /api/v1/services, rv=9402 labels= fields= timeout=5m19s Oct 25 08:12:21 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-api[29553]: I1025 12:12:21.029228 1 rest.go:349] Starting watch for /api/v1/secrets, rv=181474 labels= fields= timeout=9m9s Oct 25 08:12:23 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-api[29553]: I1025 12:12:23.476394 1 rest.go:349] Starting watch for /api/v1/endpoints, rv=9402 labels= fields= timeout=7m43s Oct 25 08:12:27 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-api[29553]: I1025 12:12:27.311980 1 rest.go:349] Starting watch for /api/v1/serviceaccounts, rv=181070 labels= fields= timeout=8m19s Oct 25 08:12:27 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-api[29553]: E1025 12:12:27.318241 1 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted Oct 25 08:12:27 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-api[29553]: W1025 12:12:27.318430 1 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/inf...n compacted Oct 25 08:12:28 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-api[29553]: I1025 12:12:28.322543 1 rest.go:349] Starting watch for /api/v1/serviceaccounts, rv=181482 labels= fields= timeout=5m10s ● atomic-openshift-master-controllers.service - Atomic OpenShift Master Controllers Loaded: loaded (/etc/systemd/system/atomic-openshift-master-controllers.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2017-10-23 10:21:46 EDT; 1 day 21h ago Docs: https://github.com/openshift/origin Main PID: 29651 (docker-current) Memory: 3.2M CGroup: /system.slice/atomic-openshift-master-controllers.service └─29651 /usr/bin/docker-current run --rm --privileged --net=host --name atomic-openshift-master-controllers --env-file=/etc/sysconfig/atomic-openshift-master-controllers -v /var/lib/origin:/var/lib/origin -v /var/run/docker.... Oct 25 08:12:07 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-controllers[29651]: I1025 12:12:07.037379 1 event.go:218] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"router-1-deploy", UI... Oct 25 08:12:07 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-controllers[29651]: I1025 12:12:07.037392 1 event.go:218] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"docker-registry-2-de... Oct 25 08:12:09 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-controllers[29651]: W1025 12:12:09.088748 1 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/infor...n compacted Oct 25 08:12:11 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-controllers[29651]: I1025 12:12:11.039688 1 scheduler.go:168] Failed to schedule pod: default/docker-registry-2-deploy Oct 25 08:12:11 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-controllers[29651]: I1025 12:12:11.039743 1 factory.go:734] Updating pod condition for default/docker-registry-2-deploy to (PodScheduled==False) Oct 25 08:12:11 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-controllers[29651]: I1025 12:12:11.039920 1 event.go:218] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"docker-registry-2-de... Oct 25 08:12:16 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-controllers[29651]: W1025 12:12:16.501476 1 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/infor...n compacted Oct 25 08:12:19 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-controllers[29651]: I1025 12:12:19.042191 1 scheduler.go:168] Failed to schedule pod: default/docker-registry-2-deploy Oct 25 08:12:19 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-controllers[29651]: I1025 12:12:19.042252 1 factory.go:734] Updating pod condition for default/docker-registry-2-deploy to (PodScheduled==False) Oct 25 08:12:19 jchaloup-openshift-master-vw9dm-r1.localdomain atomic-openshift-master-controllers[29651]: I1025 12:12:19.042277 1 event.go:218] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"docker-registry-2-de... ● atomic-openshift-master.service Loaded: loaded (/etc/systemd/system/atomic-openshift-master.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2017-10-23 11:17:31 EDT; 1 day 20h ago Main PID: 4178 (code=exited, status=2) Hint: Some lines were ellipsized, use -l to show in full. The atomic-openshift-master.service does not run, the oc get nodes returns the list of nodes, `oc get all --all-namespaces` returns various resources. Let me check again your summary.
# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 674bc4714a7c openshift3/ose:v3.7.0 "/usr/bin/openshift s" 45 hours ago Up 45 hours atomic-openshift-master-controllers 41560663bba9 openshift3/ose:v3.7.0 "/usr/bin/openshift s" 45 hours ago Up 45 hours atomic-openshift-master-api b6cad80a682a registry.access.redhat.com/rhel7/etcd "/usr/bin/etcd" 2 days ago Up 2 days etcd_container
Liujia, can you share your inventory file?
Upstream PR: https://github.com/openshift/openshift-ansible/pull/5929
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/fffb5e5e516d018a8d4bd063bc439a0a81447e31 Merge pull request #5929 from ingvagabund/remove-master-service-during-non-ha-to-ha-upgrade Automatic merge from submit-queue. remove master.service during the non-ha to ha upgrade Bug: 1506165
Verified on openshift-ansible-3.7.0-0.189.0.git.0.d497c5e.el7.noarch. After upgrade, checked that only api and controller service works. Restart docker and still only api and controller service works.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188