Hide Forgot
Description of problem: During downgrade from 4.7 to 4.6 the 4.7 CVO, running the new for 4.7 self-managed-high-availability cluster profile, ignores CVO 4.6 manifests since they do not contain that cluster profile. How reproducible: Upgrade 4.6 -> 4.7.0-nightly -> 4.6 Actual results: 4.6 downgrade incomplete Expected results: Successful 4.6 downgrade
Bug 1916586#c2 was added with TestBlocker with reason, now it was closed, so let me add TestBlocker to this 4.6 bug 1925199 due to same reason (blocking the epic MSTR-1055 test). Also today gave a try 4.6.16->4.7.0-0.nightly-2021-02-05-07473->4.6.16, the downgrade result is: oc get clusterversion version 4.6.16 True False 2m1s Cluster version is 4.6.16 oc get co --no-headers | grep -v "4\.7.*True.*False.*False" baremetal 4.7.0-0.nightly-2021-02-05-074735 True False False 143m cloud-credential 4.7.0-0.nightly-2021-02-05-074735 True False False 7h8m cluster-autoscaler 4.7.0-0.nightly-2021-02-05-074735 True False False 7h3m console 4.7.0-0.nightly-2021-02-05-074735 True False False 96m csi-snapshot-controller 4.7.0-0.nightly-2021-02-05-074735 True False False 96m dns 4.7.0-0.nightly-2021-02-05-074735 True False False 7h2m image-registry 4.7.0-0.nightly-2021-02-05-074735 True False False 6h56m ingress 4.7.0-0.nightly-2021-02-05-074735 True False False 6h56m insights 4.7.0-0.nightly-2021-02-05-074735 True False False 7h4m kube-storage-version-migrator 4.7.0-0.nightly-2021-02-05-074735 True False False 104m machine-api 4.7.0-0.nightly-2021-02-05-074735 True False False 6h53m machine-approver 4.7.0-0.nightly-2021-02-05-074735 True False False 7h3m machine-config 4.7.0-0.nightly-2021-02-05-074735 True False False 90m marketplace 4.7.0-0.nightly-2021-02-05-074735 True False False 103m monitoring 4.7.0-0.nightly-2021-02-05-074735 True False False 6h41m network 4.7.0-0.nightly-2021-02-05-074735 True False False 119m node-tuning 4.7.0-0.nightly-2021-02-05-074735 True False False 141m openshift-samples 4.7.0-0.nightly-2021-02-05-074735 True False False 141m operator-lifecycle-manager 4.7.0-0.nightly-2021-02-05-074735 True False False 7h3m operator-lifecycle-manager-catalog 4.7.0-0.nightly-2021-02-05-074735 True False False 7h3m operator-lifecycle-manager-packageserver 4.7.0-0.nightly-2021-02-05-074735 True False False 93m service-ca 4.7.0-0.nightly-2021-02-05-074735 True False False 7h4m storage 4.7.0-0.nightly-2021-02-05-074735 True False False 96m
> oc get co --no-headers | grep -v "4\.7.*True.*False.*False" Sorry typo, I intended to type `oc get co --no-headers | grep -v "4\.6.*True.*False.*False"`, given `oc get clusterversion` showed downgrade to 4.6 completed.
My follow-up testing worked fine. Note that I used a cluster-bot cluster environment with only this fix: build openshift/cluster-version-operator#512. Then performed the upgrade/downgrade sequence 4.6 -> 4.7.0-nightly -> 4.6. The failure that this fix addresses is easily identified. Without this fix, when one performs the downgrade from 4.7 -> 4.6 the 4.7 CVO pod continues to run since a 4.6 pod is never started as it should be. The upgrade never completes and the message 'Done syncing for deployment "openshift-cluster-version/cluster-version-operator"' is continually logged.
FYI the different issue of above machine-config problem is: bug 1911841
Verifying it by upgrading 4.6.0-0.nightly-2021-02-25-051452 -> 4.7.0 -> 4.6.0-0.nightly-2021-02-25-051452 # oc project openshift-cluster-version Now using project "openshift-cluster-version" on server "https://api.yangyang0225.qe.gcp.devcluster.openshift.com:6443". # oc get all NAME READY STATUS RESTARTS AGE pod/cluster-version-operator-7cccc87cc5-x6mvs 1/1 Running 0 18h pod/version--5trfj-c9n8s 0/1 Completed 0 18h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/cluster-version-operator ClusterIP 172.30.244.202 <none> 9099/TCP 21h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-version-operator 1/1 1 1 21h NAME DESIRED CURRENT READY AGE replicaset.apps/cluster-version-operator-7cccc87cc5 1 1 1 21h replicaset.apps/cluster-version-operator-7f6c4b684c 0 0 0 20h replicaset.apps/cluster-version-operator-895997499 0 0 0 21h NAME COMPLETIONS DURATION AGE job.batch/version--5trfj 1/1 4s 18h job.batch/version--f6tnc 1/1 17s 20h # oc describe pod/cluster-version-operator-7cccc87cc5-x6mvs Name: cluster-version-operator-7cccc87cc5-x6mvs Namespace: openshift-cluster-version Priority: 2000000000 Priority Class Name: system-cluster-critical Node: yangyang0225-hw64t-master-2.c.openshift-qe.internal/10.0.0.5 Start Time: Thu, 25 Feb 2021 04:55:28 -0500 Labels: k8s-app=cluster-version-operator pod-template-hash=7cccc87cc5 Annotations: <none> Status: Running IP: 10.0.0.5 IPs: IP: 10.0.0.5 Controlled By: ReplicaSet/cluster-version-operator-7cccc87cc5 Containers: cluster-version-operator: Container ID: cri-o://5c4dd3032e62c1930e6ba17e736892c47a7ccd3b84c51e746b587a32899f113d Image: registry.ci.openshift.org/ocp/release@sha256:368af7cd3e8288bf3150d55909811f4d9bdb0a8791b54961346c09c0b73da434 <---It's 4.6 payload Image ID: registry.ci.openshift.org/ocp/release@sha256:368af7cd3e8288bf3150d55909811f4d9bdb0a8791b54961346c09c0b73da434 Port: <none> Host Port: <none> Args: start --release-image=registry.ci.openshift.org/ocp/release@sha256:368af7cd3e8288bf3150d55909811f4d9bdb0a8791b54961346c09c0b73da434 --enable-auto-update=false --enable-default-cluster-version=true --serving-cert-file=/etc/tls/serving-cert/tls.crt --serving-key-file=/etc/tls/serving-cert/tls.key --v=5 State: Running Started: Thu, 25 Feb 2021 04:55:32 -0500 Ready: True Restart Count: 0 Requests: cpu: 20m memory: 50Mi Environment: KUBERNETES_SERVICE_PORT: 6443 KUBERNETES_SERVICE_HOST: api-int.yangyang0225.qe.gcp.devcluster.openshift.com NODE_NAME: (v1:spec.nodeName) Mounts: /etc/cvo/updatepayloads from etc-cvo-updatepayloads (ro) /etc/ssl/certs from etc-ssl-certs (ro) /etc/tls/serving-cert from serving-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from default-token-hp8kx (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: etc-ssl-certs: Type: HostPath (bare host directory volume) Path: /etc/ssl/certs HostPathType: etc-cvo-updatepayloads: Type: HostPath (bare host directory volume) Path: /etc/cvo/updatepayloads HostPathType: serving-cert: Type: Secret (a volume populated by a Secret) SecretName: cluster-version-operator-serving-cert Optional: false default-token-hp8kx: Type: Secret (a volume populated by a Secret) SecretName: default-token-hp8kx Optional: false QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/network-unavailable:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 120s node.kubernetes.io/unreachable:NoExecute op=Exists for 120s node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: <none> When downgrading to 4.6, the 4.6 cvo pod starts and is running. Moving it to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.20 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0674