Bug 1992478
Summary: | Upgrading descheduler operator from 4.8 to 4.9 fails | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jan Chaloupka <jchaloup> |
Component: | kube-scheduler | Assignee: | Mike Dame <mdame> |
Status: | CLOSED DUPLICATE | QA Contact: | RamaKasturi <knarra> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.9 | CC: | aos-bugs, jchaloup, knarra, maszulik, mfojtik, sttts |
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1991938 | Environment: | |
Last Closed: | 2021-09-01 17:28:16 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Chaloupka
2021-08-11 07:50:31 UTC
Moving the bug back to assigned state as i still hit the same issue. Conditions: Last Transition Time: 2021-08-24T16:21:59Z Last Update Time: 2021-08-24T16:21:59Z Message: risk of data loss updating "kubedeschedulers.operator.openshift.io": new CRD removes version v1beta1 that is listed as a stored version on the existing CRD Reason: InstallComponentFailed Status: False Type: Installed Message: risk of data loss updating "kubedeschedulers.operator.openshift.io": new CRD removes version v1beta1 that is listed as a stored version on the existing CRD Phase: Failed Plan: Resolving: clusterkubedescheduleroperator.4.9.0-202108221722 Resource: Group: operators.coreos.com Kind: ClusterServiceVersion Manifest: {"kind":"ConfigMap","name":"bba9d26db310eed6d2f206561382b49752abb4a151e9f83b2ed75ec314da13a","namespace":"openshift-marketplace","catalogSourceName":"qe-app-registry","catalogSourceNamespace":"openshift-marketplace","replaces":"clusterkubedescheduleroperator.4.8.0-202108181331","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"operator.openshift.io\",\"kind\":\"KubeDescheduler\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"cluster-kube-descheduler-operator\",\"version\":\"4.9.0-202108221722\"}}]}"} Name: clusterkubedescheduleroperator.4.9.0-202108221722 Source Name: qe-app-registry Source Namespace: openshift-marketplace Version: v1alpha1 Status: Created Resolving: clusterkubedescheduleroperator.4.9.0-202108221722 Resource: Group: apiextensions.k8s.io Kind: CustomResourceDefinition Manifest: {"kind":"ConfigMap","name":"bba9d26db310eed6d2f206561382b49752abb4a151e9f83b2ed75ec314da13a","namespace":"openshift-marketplace","catalogSourceName":"qe-app-registry","catalogSourceNamespace":"openshift-marketplace","replaces":"clusterkubedescheduleroperator.4.8.0-202108181331","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"operator.openshift.io\",\"kind\":\"KubeDescheduler\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"cluster-kube-descheduler-operator\",\"version\":\"4.9.0-202108221722\"}}]}"} Name: kubedeschedulers.operator.openshift.io Source Name: qe-app-registry Source Namespace: openshift-marketplace Version: v1 Status: Unknown Is this a 4.9 blocker+ because it blocks upgrades of clusters with descheduler installed? Compare https://github.com/openshift/cluster-kube-descheduler-operator/pull/210#issuecomment-906164149 From the cluster provided by Rama, etcd stores the cluster object as operator.openshift.io/v1, not operator.openshift.io/v1betav1 etcdctl get /kubernetes.io/operator.openshift.io/kubedeschedulers/openshift-kube-descheduler-operator/cluster /kubernetes.io/operator.openshift.io/kubedeschedulers/openshift-kube-descheduler-operator/cluster {"apiVersion":"operator.openshift.io/v1","kind":"KubeDescheduler","metadata":{"creationTimestamp":"2021-08-26T15:06:39Z","generation":2,"managedFields":[{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{".":{},"f:deschedulingIntervalSeconds":{},"f:image":{}}},"manager":"Mozilla","operation":"Update","time":"2021-08-26T15:06:39Z"},{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:logLevel":{},"f:operatorLogLevel":{}},"f:status":{".":{},"f:readyReplicas":{}}},"manager":"cluster-kube-descheduler-operator","operation":"Update","time":"2021-08-26T15:06:39Z"},{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:profiles":{}}},"manager":"kubectl-edit","operation":"Update","time":"2021-08-26T16:22:42Z"},{"apiVersion":"operator.openshift.io/v1","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:generations":{}}},"manager":"cluster-kube-descheduler-operator","operation":"Update","subresource":"status","time":"2021-08-27T07:07:42Z"}],"name":"cluster","namespace":"openshift-kube-descheduler-operator","uid":"0e6a6282-3efa-47ad-8e32-8177a1f5ce1b"},"spec":{"deschedulingIntervalSeconds":3600,"logLevel":"Normal","operatorLogLevel":"Normal","profiles":["LifecycleAndUtilization","TopologyAndDuplicates","LifecycleAndUtilization"]},"status":{"generations":[{"group":"apps","hash":"","lastGeneration":4,"name":"cluster","namespace":"openshift-kube-descheduler-operator","resource":"deployments"}],"readyReplicas":0}} So one can do the following: 1. create migration manifest: ``` apiVersion: migration.k8s.io/v1alpha1 kind: StorageVersionMigration metadata: name: operator-kubedescheduler-storage-version-migration spec: resource: group: operator.openshift.io resource: kubedeschedulers version: v1beta1 ``` 2. update the crd and remove the v1beta1 from the .status.storedVersions field ``` oc proxy --port=8080 & curl -d '[{ "op": "replace", "path":"/status/storedVersions", "value": ["v1"] }]' -H "Content-Type: application/json-patch+json" -X PATCH http://localhost:8080/apis/apiextensions.k8s.io/v1/customresourcedefinitions/kubedeschedulers.operator.openshift.io/status ``` Although, at this point the installplan is in failed state and the catalog operator no longer retries to follow it to finish the upgrade. The fix for this is in https://github.com/openshift/cluster-kube-descheduler-operator/pull/215 However we have 2 bugs: this one and https://bugzilla.redhat.com/show_bug.cgi?id=1991938 for 4.8, and the fix is only going into the 4.8 branch. So, since this is unrelated to 4.9 (we have identified that the fix needs to go into the 4.8 operator), I am going to close this BZ since it is blocking the 4.8 bug from merging. If there is any objection, feel free to reopen. *** This bug has been marked as a duplicate of bug 1991938 *** |