Bug 1992478 - Upgrading descheduler operator from 4.8 to 4.9 fails
Summary: Upgrading descheduler operator from 4.8 to 4.9 fails
Keywords:
Status: CLOSED DUPLICATE of bug 1991938
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.0
Assignee: Mike Dame
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-11 07:50 UTC by Jan Chaloupka
Modified: 2021-09-01 17:30 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1991938
Environment:
Last Closed: 2021-09-01 17:28:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-descheduler-operator pull 210 0 None None None 2021-08-12 07:32:23 UTC

Description Jan Chaloupka 2021-08-11 07:50:31 UTC
+++ This bug was initially created as a clone of Bug #1991938 +++

Description of problem:
Upgrading descheduler operator from 4.8 to 4.9 fails with error “CRD removes version v1beta1 that is listed as a stored version on the existing CRD”


Version-Release number of selected component (if applicable):
clusterkubedescheduleroperator.4.9.0-202108050954


How reproducible:
Hit once

Steps to Reproduce:
1. Install 4.6 descheduler operator
2. Upgrade from 4.6 -> 4.7 -> 4.8 -> 4.9
3.

Actual results:
Upgrading from 4.8 to 4.9 fails
[knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator
NAME                                                DISPLAY                            VERSION              REPLACES                                            PHASE
clusterkubedescheduleroperator.4.8.0-202107291502   Kube Descheduler Operator          4.8.0-202107291502   clusterkubedescheduleroperator.4.7.0-202107292319   Replacing
clusterkubedescheduleroperator.4.9.0-202108050954   Kube Descheduler Operator          4.9.0-202108050954   clusterkubedescheduleroperator.4.8.0-202107291502   Pending
elasticsearch-operator.5.2.0-28                     OpenShift Elasticsearch Operator   5.2.0-28             elasticsearch-operator.5.1.1-27                     Succeeded


status:
  bundleLookups:
  - catalogSourceRef:
      name: qe-app-registry
      namespace: openshift-marketplace
    identifier: clusterkubedescheduleroperator.4.9.0-202108050954
    path: registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:f2be2b8652b6e98693cafc4ed311c95d2ce6af9c458bedad41341703dae3e1d1
    properties: '{"properties":[{"type":"olm.gvk","value":{"group":"operator.openshift.io","kind":"KubeDescheduler","version":"v1"}},{"type":"olm.package","value":{"packageName":"cluster-kube-descheduler-operator","version":"4.9.0-202108050954"}}]}'
    replaces: clusterkubedescheduleroperator.4.8.0-202107291502
  catalogSources: []
  conditions:
  - lastTransitionTime: "2021-08-10T10:03:01Z"
    lastUpdateTime: "2021-08-10T10:03:01Z"
    message: 'risk of data loss updating "kubedeschedulers.operator.openshift.io":
      new CRD removes version v1beta1 that is listed as a stored version on the existing
      CRD'
    reason: InstallComponentFailed
    status: "False"
    type: Installed
  message: 'risk of data loss updating "kubedeschedulers.operator.openshift.io": new
    CRD removes version v1beta1 that is listed as a stored version on the existing
    CRD'
  phase: Failed
  plan:
  - resolving: clusterkubedescheduleroperator.4.9.0-202108050954
    resource:
      group: operators.coreos.com
      kind: ClusterServiceVersion


Expected results:
Upgrade should be successful

Additional info:

Comment 2 RamaKasturi 2021-08-24 16:37:44 UTC
Moving the bug back to assigned state as i still hit the same issue.

Conditions:
    Last Transition Time:  2021-08-24T16:21:59Z
    Last Update Time:      2021-08-24T16:21:59Z
    Message:               risk of data loss updating "kubedeschedulers.operator.openshift.io": new CRD removes version v1beta1 that is listed as a stored version on the existing CRD
    Reason:                InstallComponentFailed
    Status:                False
    Type:                  Installed
  Message:                 risk of data loss updating "kubedeschedulers.operator.openshift.io": new CRD removes version v1beta1 that is listed as a stored version on the existing CRD
  Phase:                   Failed
  Plan:
    Resolving:  clusterkubedescheduleroperator.4.9.0-202108221722
    Resource:
      Group:             operators.coreos.com
      Kind:              ClusterServiceVersion
      Manifest:          {"kind":"ConfigMap","name":"bba9d26db310eed6d2f206561382b49752abb4a151e9f83b2ed75ec314da13a","namespace":"openshift-marketplace","catalogSourceName":"qe-app-registry","catalogSourceNamespace":"openshift-marketplace","replaces":"clusterkubedescheduleroperator.4.8.0-202108181331","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"operator.openshift.io\",\"kind\":\"KubeDescheduler\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"cluster-kube-descheduler-operator\",\"version\":\"4.9.0-202108221722\"}}]}"}
      Name:              clusterkubedescheduleroperator.4.9.0-202108221722
      Source Name:       qe-app-registry
      Source Namespace:  openshift-marketplace
      Version:           v1alpha1
    Status:              Created
    Resolving:           clusterkubedescheduleroperator.4.9.0-202108221722
    Resource:
      Group:             apiextensions.k8s.io
      Kind:              CustomResourceDefinition
      Manifest:          {"kind":"ConfigMap","name":"bba9d26db310eed6d2f206561382b49752abb4a151e9f83b2ed75ec314da13a","namespace":"openshift-marketplace","catalogSourceName":"qe-app-registry","catalogSourceNamespace":"openshift-marketplace","replaces":"clusterkubedescheduleroperator.4.8.0-202108181331","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"operator.openshift.io\",\"kind\":\"KubeDescheduler\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"cluster-kube-descheduler-operator\",\"version\":\"4.9.0-202108221722\"}}]}"}
      Name:              kubedeschedulers.operator.openshift.io
      Source Name:       qe-app-registry
      Source Namespace:  openshift-marketplace
      Version:           v1
    Status:              Unknown

Comment 3 Stefan Schimanski 2021-08-26 07:30:22 UTC
Is this a 4.9 blocker+ because it blocks upgrades of clusters with descheduler installed?

Comment 5 Jan Chaloupka 2021-08-27 09:50:39 UTC
From the cluster provided by Rama, etcd stores the cluster object as operator.openshift.io/v1, not operator.openshift.io/v1betav1

etcdctl get /kubernetes.io/operator.openshift.io/kubedeschedulers/openshift-kube-descheduler-operator/cluster
/kubernetes.io/operator.openshift.io/kubedeschedulers/openshift-kube-descheduler-operator/cluster
{"apiVersion":"operator.openshift.io/v1","kind":"KubeDescheduler","metadata":{"creationTimestamp":"2021-08-26T15:06:39Z","generation":2,"managedFields":[{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{".":{},"f:deschedulingIntervalSeconds":{},"f:image":{}}},"manager":"Mozilla","operation":"Update","time":"2021-08-26T15:06:39Z"},{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:logLevel":{},"f:operatorLogLevel":{}},"f:status":{".":{},"f:readyReplicas":{}}},"manager":"cluster-kube-descheduler-operator","operation":"Update","time":"2021-08-26T15:06:39Z"},{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:profiles":{}}},"manager":"kubectl-edit","operation":"Update","time":"2021-08-26T16:22:42Z"},{"apiVersion":"operator.openshift.io/v1","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:generations":{}}},"manager":"cluster-kube-descheduler-operator","operation":"Update","subresource":"status","time":"2021-08-27T07:07:42Z"}],"name":"cluster","namespace":"openshift-kube-descheduler-operator","uid":"0e6a6282-3efa-47ad-8e32-8177a1f5ce1b"},"spec":{"deschedulingIntervalSeconds":3600,"logLevel":"Normal","operatorLogLevel":"Normal","profiles":["LifecycleAndUtilization","TopologyAndDuplicates","LifecycleAndUtilization"]},"status":{"generations":[{"group":"apps","hash":"","lastGeneration":4,"name":"cluster","namespace":"openshift-kube-descheduler-operator","resource":"deployments"}],"readyReplicas":0}}

Comment 6 Jan Chaloupka 2021-08-27 12:11:19 UTC
So one can do the following:
1. create migration manifest:
```
apiVersion: migration.k8s.io/v1alpha1
kind: StorageVersionMigration
metadata:
  name: operator-kubedescheduler-storage-version-migration
spec:
  resource:
    group: operator.openshift.io
    resource: kubedeschedulers
    version: v1beta1
```
2. update the crd and remove the v1beta1 from the .status.storedVersions field
```
oc proxy --port=8080 &
curl -d '[{ "op": "replace", "path":"/status/storedVersions", "value": ["v1"] }]' -H "Content-Type: application/json-patch+json" -X PATCH http://localhost:8080/apis/apiextensions.k8s.io/v1/customresourcedefinitions/kubedeschedulers.operator.openshift.io/status
```

Although, at this point the installplan is in failed state and the catalog operator no longer retries to follow it to finish the upgrade.

Comment 7 Mike Dame 2021-09-01 17:28:16 UTC
The fix for this is in https://github.com/openshift/cluster-kube-descheduler-operator/pull/215

However we have 2 bugs: this one and https://bugzilla.redhat.com/show_bug.cgi?id=1991938 for 4.8, and the fix is only going into the 4.8 branch.

So, since this is unrelated to 4.9 (we have identified that the fix needs to go into the 4.8 operator), I am going to close this BZ since it is blocking the 4.8 bug from merging.

If there is any objection, feel free to reopen.

*** This bug has been marked as a duplicate of bug 1991938 ***


Note You need to log in before you can comment on or make changes to this bug.