1925199 – 4.7 to 4.6 downgrade fails due to 4.7 Cluster Profile Support manifest changes

Bug 1925199 - 4.7 to 4.6 downgrade fails due to 4.7 Cluster Profile Support manifest changes

Summary: 4.7 to 4.6 downgrade fails due to 4.7 Cluster Profile Support manifest changes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.6.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Jack Ottofaro
QA Contact:	Yang Yang
Docs Contact:
URL:
Whiteboard:
Depends On:	1907329
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-04 15:08 UTC by Jack Ottofaro
Modified:	2021-03-09 20:16 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-09 20:16:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-version-operator pull 512	0	None	open	Bug 1925199: add current profile annotations to CVO manifests	2021-02-20 02:43:12 UTC
Red Hat Product Errata	RHBA-2021:0674	0	None	None	None	2021-03-09 20:16:19 UTC

Description Jack Ottofaro 2021-02-04 15:08:58 UTC

Description of problem:

During downgrade from 4.7 to 4.6 the 4.7 CVO, running the new for 4.7 self-managed-high-availability cluster profile, ignores CVO 4.6 manifests since they do not contain that cluster profile.

How reproducible:

Upgrade 4.6 -> 4.7.0-nightly -> 4.6

Actual results:
4.6 downgrade incomplete

Expected results:
Successful 4.6 downgrade

Comment 1 Xingxing Xia 2021-02-05 13:25:15 UTC

Bug 1916586#c2 was added with TestBlocker with reason, now it was closed, so let me add TestBlocker to this 4.6 bug 1925199 due to same reason (blocking the epic MSTR-1055 test).
Also today gave a try 4.6.16->4.7.0-0.nightly-2021-02-05-07473->4.6.16, the downgrade result is:
oc get clusterversion
version   4.6.16    True        False         2m1s    Cluster version is 4.6.16

oc get co --no-headers | grep -v "4\.7.*True.*False.*False"
baremetal                                  4.7.0-0.nightly-2021-02-05-074735   True   False   False   143m
cloud-credential                           4.7.0-0.nightly-2021-02-05-074735   True   False   False   7h8m
cluster-autoscaler                         4.7.0-0.nightly-2021-02-05-074735   True   False   False   7h3m
console                                    4.7.0-0.nightly-2021-02-05-074735   True   False   False   96m
csi-snapshot-controller                    4.7.0-0.nightly-2021-02-05-074735   True   False   False   96m
dns                                        4.7.0-0.nightly-2021-02-05-074735   True   False   False   7h2m
image-registry                             4.7.0-0.nightly-2021-02-05-074735   True   False   False   6h56m
ingress                                    4.7.0-0.nightly-2021-02-05-074735   True   False   False   6h56m
insights                                   4.7.0-0.nightly-2021-02-05-074735   True   False   False   7h4m
kube-storage-version-migrator              4.7.0-0.nightly-2021-02-05-074735   True   False   False   104m
machine-api                                4.7.0-0.nightly-2021-02-05-074735   True   False   False   6h53m
machine-approver                           4.7.0-0.nightly-2021-02-05-074735   True   False   False   7h3m
machine-config                             4.7.0-0.nightly-2021-02-05-074735   True   False   False   90m
marketplace                                4.7.0-0.nightly-2021-02-05-074735   True   False   False   103m
monitoring                                 4.7.0-0.nightly-2021-02-05-074735   True   False   False   6h41m
network                                    4.7.0-0.nightly-2021-02-05-074735   True   False   False   119m
node-tuning                                4.7.0-0.nightly-2021-02-05-074735   True   False   False   141m
openshift-samples                          4.7.0-0.nightly-2021-02-05-074735   True   False   False   141m
operator-lifecycle-manager                 4.7.0-0.nightly-2021-02-05-074735   True   False   False   7h3m
operator-lifecycle-manager-catalog         4.7.0-0.nightly-2021-02-05-074735   True   False   False   7h3m
operator-lifecycle-manager-packageserver   4.7.0-0.nightly-2021-02-05-074735   True   False   False   93m
service-ca                                 4.7.0-0.nightly-2021-02-05-074735   True   False   False   7h4m
storage                                    4.7.0-0.nightly-2021-02-05-074735   True   False   False   96m

Comment 2 Xingxing Xia 2021-02-10 07:39:55 UTC

> oc get co --no-headers | grep -v "4\.7.*True.*False.*False"
Sorry typo, I intended to type `oc get co --no-headers | grep -v "4\.6.*True.*False.*False"`, given `oc get clusterversion` showed downgrade to 4.6 completed.

Comment 6 Jack Ottofaro 2021-02-16 22:36:03 UTC

My follow-up testing worked fine. Note that I used a cluster-bot cluster environment with only this fix: build openshift/cluster-version-operator#512. Then performed the upgrade/downgrade sequence 4.6 -> 4.7.0-nightly -> 4.6.

The failure that this fix addresses is easily identified. Without this fix, when one performs the downgrade from 4.7 -> 4.6 the 4.7 CVO pod continues to run since a 4.6 pod is never started as it should be. The upgrade never completes and the message 'Done syncing for deployment "openshift-cluster-version/cluster-version-operator"' is continually logged.

Comment 7 Xingxing Xia 2021-02-18 07:11:11 UTC

FYI the different issue of above machine-config problem is: bug 1911841

Comment 9 Yang Yang 2021-02-26 04:43:45 UTC

Verifying it by upgrading 4.6.0-0.nightly-2021-02-25-051452 -> 4.7.0 -> 4.6.0-0.nightly-2021-02-25-051452

# oc project openshift-cluster-version
Now using project "openshift-cluster-version" on server "https://api.yangyang0225.qe.gcp.devcluster.openshift.com:6443".

# oc get all
NAME                                            READY   STATUS      RESTARTS   AGE
pod/cluster-version-operator-7cccc87cc5-x6mvs   1/1     Running     0          18h
pod/version--5trfj-c9n8s                        0/1     Completed   0          18h

NAME                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/cluster-version-operator   ClusterIP   172.30.244.202   <none>        9099/TCP   21h

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-version-operator   1/1     1            1           21h

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-version-operator-7cccc87cc5   1         1         1       21h
replicaset.apps/cluster-version-operator-7f6c4b684c   0         0         0       20h
replicaset.apps/cluster-version-operator-895997499    0         0         0       21h

NAME                       COMPLETIONS   DURATION   AGE
job.batch/version--5trfj   1/1           4s         18h
job.batch/version--f6tnc   1/1           17s        20h

# oc describe pod/cluster-version-operator-7cccc87cc5-x6mvs
Name:                 cluster-version-operator-7cccc87cc5-x6mvs
Namespace:            openshift-cluster-version
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 yangyang0225-hw64t-master-2.c.openshift-qe.internal/10.0.0.5
Start Time:           Thu, 25 Feb 2021 04:55:28 -0500
Labels:               k8s-app=cluster-version-operator
                      pod-template-hash=7cccc87cc5
Annotations:          <none>
Status:               Running
IP:                   10.0.0.5
IPs:
  IP:           10.0.0.5
Controlled By:  ReplicaSet/cluster-version-operator-7cccc87cc5
Containers:
  cluster-version-operator:
    Container ID:  cri-o://5c4dd3032e62c1930e6ba17e736892c47a7ccd3b84c51e746b587a32899f113d
    Image:         registry.ci.openshift.org/ocp/release@sha256:368af7cd3e8288bf3150d55909811f4d9bdb0a8791b54961346c09c0b73da434   <---It's 4.6 payload
    Image ID:      registry.ci.openshift.org/ocp/release@sha256:368af7cd3e8288bf3150d55909811f4d9bdb0a8791b54961346c09c0b73da434
    Port:          <none>
    Host Port:     <none>
    Args:
      start
      --release-image=registry.ci.openshift.org/ocp/release@sha256:368af7cd3e8288bf3150d55909811f4d9bdb0a8791b54961346c09c0b73da434
      --enable-auto-update=false
      --enable-default-cluster-version=true
      --serving-cert-file=/etc/tls/serving-cert/tls.crt
      --serving-key-file=/etc/tls/serving-cert/tls.key
      --v=5
    State:          Running
      Started:      Thu, 25 Feb 2021 04:55:32 -0500
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     20m
      memory:  50Mi
    Environment:
      KUBERNETES_SERVICE_PORT:  6443
      KUBERNETES_SERVICE_HOST:  api-int.yangyang0225.qe.gcp.devcluster.openshift.com
      NODE_NAME:                 (v1:spec.nodeName)
    Mounts:
      /etc/cvo/updatepayloads from etc-cvo-updatepayloads (ro)
      /etc/ssl/certs from etc-ssl-certs (ro)
      /etc/tls/serving-cert from serving-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hp8kx (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  etc-ssl-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs
    HostPathType:  
  etc-cvo-updatepayloads:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cvo/updatepayloads
    HostPathType:  
  serving-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-version-operator-serving-cert
    Optional:    false
  default-token-hp8kx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hp8kx
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 120s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 120s
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:          <none>

When downgrading to 4.6, the 4.6 cvo pod starts and is running. Moving it to verified state.

Comment 12 errata-xmlrpc 2021-03-09 20:16:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.20 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0674

Note You need to log in before you can comment on or make changes to this bug.