Bug 1992478

Summary:	Upgrading descheduler operator from 4.8 to 4.9 fails
Product:	OpenShift Container Platform	Reporter:	Jan Chaloupka <jchaloup>
Component:	kube-scheduler	Assignee:	Mike Dame <mdame>
Status:	CLOSED DUPLICATE	QA Contact:	RamaKasturi <knarra>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.9	CC:	aos-bugs, jchaloup, knarra, maszulik, mfojtik, sttts
Target Milestone:	---
Target Release:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1991938	Environment:
Last Closed:	2021-09-01 17:28:16 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jan Chaloupka 2021-08-11 07:50:31 UTC

+++ This bug was initially created as a clone of Bug #1991938 +++

Description of problem:
Upgrading descheduler operator from 4.8 to 4.9 fails with error “CRD removes version v1beta1 that is listed as a stored version on the existing CRD”


Version-Release number of selected component (if applicable):
clusterkubedescheduleroperator.4.9.0-202108050954


How reproducible:
Hit once

Steps to Reproduce:
1. Install 4.6 descheduler operator
2. Upgrade from 4.6 -> 4.7 -> 4.8 -> 4.9
3.

Actual results:
Upgrading from 4.8 to 4.9 fails
[knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator
NAME                                                DISPLAY                            VERSION              REPLACES                                            PHASE
clusterkubedescheduleroperator.4.8.0-202107291502   Kube Descheduler Operator          4.8.0-202107291502   clusterkubedescheduleroperator.4.7.0-202107292319   Replacing
clusterkubedescheduleroperator.4.9.0-202108050954   Kube Descheduler Operator          4.9.0-202108050954   clusterkubedescheduleroperator.4.8.0-202107291502   Pending
elasticsearch-operator.5.2.0-28                     OpenShift Elasticsearch Operator   5.2.0-28             elasticsearch-operator.5.1.1-27                     Succeeded


status:
  bundleLookups:
  - catalogSourceRef:
      name: qe-app-registry
      namespace: openshift-marketplace
    identifier: clusterkubedescheduleroperator.4.9.0-202108050954
    path: registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:f2be2b8652b6e98693cafc4ed311c95d2ce6af9c458bedad41341703dae3e1d1
    properties: '{"properties":[{"type":"olm.gvk","value":{"group":"operator.openshift.io","kind":"KubeDescheduler","version":"v1"}},{"type":"olm.package","value":{"packageName":"cluster-kube-descheduler-operator","version":"4.9.0-202108050954"}}]}'
    replaces: clusterkubedescheduleroperator.4.8.0-202107291502
  catalogSources: []
  conditions:
  - lastTransitionTime: "2021-08-10T10:03:01Z"
    lastUpdateTime: "2021-08-10T10:03:01Z"
    message: 'risk of data loss updating "kubedeschedulers.operator.openshift.io":
      new CRD removes version v1beta1 that is listed as a stored version on the existing
      CRD'
    reason: InstallComponentFailed
    status: "False"
    type: Installed
  message: 'risk of data loss updating "kubedeschedulers.operator.openshift.io": new
    CRD removes version v1beta1 that is listed as a stored version on the existing
    CRD'
  phase: Failed
  plan:
  - resolving: clusterkubedescheduleroperator.4.9.0-202108050954
    resource:
      group: operators.coreos.com
      kind: ClusterServiceVersion


Expected results:
Upgrade should be successful

Additional info:

Comment 2 RamaKasturi 2021-08-24 16:37:44 UTC

Moving the bug back to assigned state as i still hit the same issue.

Conditions:
    Last Transition Time:  2021-08-24T16:21:59Z
    Last Update Time:      2021-08-24T16:21:59Z
    Message:               risk of data loss updating "kubedeschedulers.operator.openshift.io": new CRD removes version v1beta1 that is listed as a stored version on the existing CRD
    Reason:                InstallComponentFailed
    Status:                False
    Type:                  Installed
  Message:                 risk of data loss updating "kubedeschedulers.operator.openshift.io": new CRD removes version v1beta1 that is listed as a stored version on the existing CRD
  Phase:                   Failed
  Plan:
    Resolving:  clusterkubedescheduleroperator.4.9.0-202108221722
    Resource:
      Group:             operators.coreos.com
      Kind:              ClusterServiceVersion
      Manifest:          {"kind":"ConfigMap","name":"bba9d26db310eed6d2f206561382b49752abb4a151e9f83b2ed75ec314da13a","namespace":"openshift-marketplace","catalogSourceName":"qe-app-registry","catalogSourceNamespace":"openshift-marketplace","replaces":"clusterkubedescheduleroperator.4.8.0-202108181331","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"operator.openshift.io\",\"kind\":\"KubeDescheduler\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"cluster-kube-descheduler-operator\",\"version\":\"4.9.0-202108221722\"}}]}"}
      Name:              clusterkubedescheduleroperator.4.9.0-202108221722
      Source Name:       qe-app-registry
      Source Namespace:  openshift-marketplace
      Version:           v1alpha1
    Status:              Created
    Resolving:           clusterkubedescheduleroperator.4.9.0-202108221722
    Resource:
      Group:             apiextensions.k8s.io
      Kind:              CustomResourceDefinition
      Manifest:          {"kind":"ConfigMap","name":"bba9d26db310eed6d2f206561382b49752abb4a151e9f83b2ed75ec314da13a","namespace":"openshift-marketplace","catalogSourceName":"qe-app-registry","catalogSourceNamespace":"openshift-marketplace","replaces":"clusterkubedescheduleroperator.4.8.0-202108181331","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"operator.openshift.io\",\"kind\":\"KubeDescheduler\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"cluster-kube-descheduler-operator\",\"version\":\"4.9.0-202108221722\"}}]}"}
      Name:              kubedeschedulers.operator.openshift.io
      Source Name:       qe-app-registry
      Source Namespace:  openshift-marketplace
      Version:           v1
    Status:              Unknown

Comment 3 Stefan Schimanski 2021-08-26 07:30:22 UTC

Is this a 4.9 blocker+ because it blocks upgrades of clusters with descheduler installed?

Comment 4 Stefan Schimanski 2021-08-26 07:30:34 UTC

Compare https://github.com/openshift/cluster-kube-descheduler-operator/pull/210#issuecomment-906164149

Comment 5 Jan Chaloupka 2021-08-27 09:50:39 UTC

From the cluster provided by Rama, etcd stores the cluster object as operator.openshift.io/v1, not operator.openshift.io/v1betav1

etcdctl get /kubernetes.io/operator.openshift.io/kubedeschedulers/openshift-kube-descheduler-operator/cluster
/kubernetes.io/operator.openshift.io/kubedeschedulers/openshift-kube-descheduler-operator/cluster
{"apiVersion":"operator.openshift.io/v1","kind":"KubeDescheduler","metadata":{"creationTimestamp":"2021-08-26T15:06:39Z","generation":2,"managedFields":[{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{".":{},"f:deschedulingIntervalSeconds":{},"f:image":{}}},"manager":"Mozilla","operation":"Update","time":"2021-08-26T15:06:39Z"},{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:logLevel":{},"f:operatorLogLevel":{}},"f:status":{".":{},"f:readyReplicas":{}}},"manager":"cluster-kube-descheduler-operator","operation":"Update","time":"2021-08-26T15:06:39Z"},{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:profiles":{}}},"manager":"kubectl-edit","operation":"Update","time":"2021-08-26T16:22:42Z"},{"apiVersion":"operator.openshift.io/v1","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:generations":{}}},"manager":"cluster-kube-descheduler-operator","operation":"Update","subresource":"status","time":"2021-08-27T07:07:42Z"}],"name":"cluster","namespace":"openshift-kube-descheduler-operator","uid":"0e6a6282-3efa-47ad-8e32-8177a1f5ce1b"},"spec":{"deschedulingIntervalSeconds":3600,"logLevel":"Normal","operatorLogLevel":"Normal","profiles":["LifecycleAndUtilization","TopologyAndDuplicates","LifecycleAndUtilization"]},"status":{"generations":[{"group":"apps","hash":"","lastGeneration":4,"name":"cluster","namespace":"openshift-kube-descheduler-operator","resource":"deployments"}],"readyReplicas":0}}

Comment 6 Jan Chaloupka 2021-08-27 12:11:19 UTC

So one can do the following:
1. create migration manifest:
```
apiVersion: migration.k8s.io/v1alpha1
kind: StorageVersionMigration
metadata:
  name: operator-kubedescheduler-storage-version-migration
spec:
  resource:
    group: operator.openshift.io
    resource: kubedeschedulers
    version: v1beta1
```
2. update the crd and remove the v1beta1 from the .status.storedVersions field
```
oc proxy --port=8080 &
curl -d '[{ "op": "replace", "path":"/status/storedVersions", "value": ["v1"] }]' -H "Content-Type: application/json-patch+json" -X PATCH http://localhost:8080/apis/apiextensions.k8s.io/v1/customresourcedefinitions/kubedeschedulers.operator.openshift.io/status
```

Although, at this point the installplan is in failed state and the catalog operator no longer retries to follow it to finish the upgrade.

Comment 7 Mike Dame 2021-09-01 17:28:16 UTC

The fix for this is in https://github.com/openshift/cluster-kube-descheduler-operator/pull/215

However we have 2 bugs: this one and https://bugzilla.redhat.com/show_bug.cgi?id=1991938 for 4.8, and the fix is only going into the 4.8 branch.

So, since this is unrelated to 4.9 (we have identified that the fix needs to go into the 4.8 operator), I am going to close this BZ since it is blocking the 4.8 bug from merging.

If there is any objection, feel free to reopen.

*** This bug has been marked as a duplicate of bug 1991938 ***