Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1713228

Summary: Upgrade fail if an apiserver on the same node as CVO fails
Product: OpenShift Container Platform Reporter: Tomáš Nožička <tnozicka>
Component: Cluster Version OperatorAssignee: Abhinav Dahiya <adahiya>
Status: CLOSED NOTABUG QA Contact: liujia <jiajliu>
Severity: medium Docs Contact:
Priority: low    
Version: 4.1.0CC: aos-bugs, bleanhar, ccoleman, erich, jokerman, mmccomas, wking, wsun, xxia
Target Milestone: ---Keywords: NeedsTestCase
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-30 17:31:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tomáš Nožička 2019-05-23 08:10:30 UTC
Clayton discovered that if the apiserver on the node the CVO is on fails, we never make progress on an upgrade.

CVO uses localhost network to communicate to the apiserver because service network isn't available before it creates networking operator. Localhost means that if the 1 of the 3 apiserver fails and is the same one CVO is using (on the same node), CVO never makes progress, nor can it rollback.

Given it uses localhost we likely need to run it in HA mode on all master with leader election.

Comment 1 Xingxing Xia 2019-05-23 10:42:22 UTC
Reproduced it in 4.1.0-0.nightly-2019-05-21-060354 -> 4.1.0-0.nightly-2019-05-22-050858
$ oc get po -n openshift-kube-apiserver
kube-apiserver-ip-10-0-128-254.sa-east-1.compute.internal      2/2     Running     0          21m                                                                    
kube-apiserver-ip-10-0-133-211.sa-east-1.compute.internal      2/2     Running     0          22m
kube-apiserver-ip-10-0-157-53.sa-east-1.compute.internal       2/2     Running     0          19m

Check `oc get po -n openshift-cluster-version -o wide`, found it is on 10.0.128.254. 
Thus make pod kube-apiserver-ip-10-0-128-254.sa-east-1.compute.internal fail by:
$ ssh-ocp4 core.128.254
[core@ip-10-0-128-254 ~]$ sudo mv /etc/kubernetes/manifests/kube-apiserver-pod.yaml ~/

$ oc get po -n openshift-kube-apiserver
kube-apiserver-ip-10-0-133-211.sa-east-1.compute.internal      2/2     Running     0          22m
kube-apiserver-ip-10-0-157-53.sa-east-1.compute.internal       2/2     Running     0          19m

Then run `oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-22-050858 --force`
$ watch oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS                                                                               
version   4.1.0-0.nightly-2019-05-21-060354   True        False         33m     Cluster version is 4.1.0-0.nightly-2019-05-21-060354

^ It always shows like above, no upgrade progress

Comment 2 Xingxing Xia 2019-05-23 10:44:38 UTC
Workaround: repeat below until CVO pod is rescheduled to other master other than 10.0.128.254 by:
$ oc delete po -n openshift-cluster-version -l k8s-app=cluster-version-operator
$ oc get po -n openshift-cluster-version -o wide

Then `oc get clusterversion` becomes to show upgrade progress

Comment 3 W. Trevor King 2019-05-23 11:57:02 UTC
> Given it uses localhost we likely need to run it in HA mode on all master with leader election.

Or we could have it exit if the local API server was unreachable, in which case it would be automatically rescheduled, possibly to a node with a working API server.  If it landed on a node with a broken API server, it would just die again.

Comment 4 Tomáš Nožička 2019-05-23 15:01:07 UTC
We have shortly discussed the same idea before, but I didn't like multiple restart on happy path. Also the image being present on one node should make scheduling prefer to retry it there. 

But you need leader election even at the scale of 1 with recreate strategy, hopefully CVO already does it, so this would become just changing scale from 1 to 3 and setting anti affinity.

Comment 5 Brenton Leanhardt 2019-07-29 17:41:06 UTC
This is more severe for single master clusters.

Comment 8 Scott Dodson 2019-09-30 17:31:03 UTC
This is only a problem if the cluster is single master which is unsupported, if it's multimaster things will eventually fail over and the upgrade will continue.