Bug 1992478
| Summary: | Upgrading descheduler operator from 4.8 to 4.9 fails | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jan Chaloupka <jchaloup> |
| Component: | kube-scheduler | Assignee: | Mike Dame <mdame> |
| Status: | CLOSED DUPLICATE | QA Contact: | RamaKasturi <knarra> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.9 | CC: | aos-bugs, jchaloup, knarra, maszulik, mfojtik, sttts |
| Target Milestone: | --- | ||
| Target Release: | 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1991938 | Environment: | |
| Last Closed: | 2021-09-01 17:28:16 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jan Chaloupka
2021-08-11 07:50:31 UTC
Moving the bug back to assigned state as i still hit the same issue.
Conditions:
Last Transition Time: 2021-08-24T16:21:59Z
Last Update Time: 2021-08-24T16:21:59Z
Message: risk of data loss updating "kubedeschedulers.operator.openshift.io": new CRD removes version v1beta1 that is listed as a stored version on the existing CRD
Reason: InstallComponentFailed
Status: False
Type: Installed
Message: risk of data loss updating "kubedeschedulers.operator.openshift.io": new CRD removes version v1beta1 that is listed as a stored version on the existing CRD
Phase: Failed
Plan:
Resolving: clusterkubedescheduleroperator.4.9.0-202108221722
Resource:
Group: operators.coreos.com
Kind: ClusterServiceVersion
Manifest: {"kind":"ConfigMap","name":"bba9d26db310eed6d2f206561382b49752abb4a151e9f83b2ed75ec314da13a","namespace":"openshift-marketplace","catalogSourceName":"qe-app-registry","catalogSourceNamespace":"openshift-marketplace","replaces":"clusterkubedescheduleroperator.4.8.0-202108181331","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"operator.openshift.io\",\"kind\":\"KubeDescheduler\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"cluster-kube-descheduler-operator\",\"version\":\"4.9.0-202108221722\"}}]}"}
Name: clusterkubedescheduleroperator.4.9.0-202108221722
Source Name: qe-app-registry
Source Namespace: openshift-marketplace
Version: v1alpha1
Status: Created
Resolving: clusterkubedescheduleroperator.4.9.0-202108221722
Resource:
Group: apiextensions.k8s.io
Kind: CustomResourceDefinition
Manifest: {"kind":"ConfigMap","name":"bba9d26db310eed6d2f206561382b49752abb4a151e9f83b2ed75ec314da13a","namespace":"openshift-marketplace","catalogSourceName":"qe-app-registry","catalogSourceNamespace":"openshift-marketplace","replaces":"clusterkubedescheduleroperator.4.8.0-202108181331","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"operator.openshift.io\",\"kind\":\"KubeDescheduler\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"cluster-kube-descheduler-operator\",\"version\":\"4.9.0-202108221722\"}}]}"}
Name: kubedeschedulers.operator.openshift.io
Source Name: qe-app-registry
Source Namespace: openshift-marketplace
Version: v1
Status: Unknown
Is this a 4.9 blocker+ because it blocks upgrades of clusters with descheduler installed? Compare https://github.com/openshift/cluster-kube-descheduler-operator/pull/210#issuecomment-906164149 From the cluster provided by Rama, etcd stores the cluster object as operator.openshift.io/v1, not operator.openshift.io/v1betav1
etcdctl get /kubernetes.io/operator.openshift.io/kubedeschedulers/openshift-kube-descheduler-operator/cluster
/kubernetes.io/operator.openshift.io/kubedeschedulers/openshift-kube-descheduler-operator/cluster
{"apiVersion":"operator.openshift.io/v1","kind":"KubeDescheduler","metadata":{"creationTimestamp":"2021-08-26T15:06:39Z","generation":2,"managedFields":[{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{".":{},"f:deschedulingIntervalSeconds":{},"f:image":{}}},"manager":"Mozilla","operation":"Update","time":"2021-08-26T15:06:39Z"},{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:logLevel":{},"f:operatorLogLevel":{}},"f:status":{".":{},"f:readyReplicas":{}}},"manager":"cluster-kube-descheduler-operator","operation":"Update","time":"2021-08-26T15:06:39Z"},{"apiVersion":"operator.openshift.io/v1beta1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:profiles":{}}},"manager":"kubectl-edit","operation":"Update","time":"2021-08-26T16:22:42Z"},{"apiVersion":"operator.openshift.io/v1","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:generations":{}}},"manager":"cluster-kube-descheduler-operator","operation":"Update","subresource":"status","time":"2021-08-27T07:07:42Z"}],"name":"cluster","namespace":"openshift-kube-descheduler-operator","uid":"0e6a6282-3efa-47ad-8e32-8177a1f5ce1b"},"spec":{"deschedulingIntervalSeconds":3600,"logLevel":"Normal","operatorLogLevel":"Normal","profiles":["LifecycleAndUtilization","TopologyAndDuplicates","LifecycleAndUtilization"]},"status":{"generations":[{"group":"apps","hash":"","lastGeneration":4,"name":"cluster","namespace":"openshift-kube-descheduler-operator","resource":"deployments"}],"readyReplicas":0}}
So one can do the following:
1. create migration manifest:
```
apiVersion: migration.k8s.io/v1alpha1
kind: StorageVersionMigration
metadata:
name: operator-kubedescheduler-storage-version-migration
spec:
resource:
group: operator.openshift.io
resource: kubedeschedulers
version: v1beta1
```
2. update the crd and remove the v1beta1 from the .status.storedVersions field
```
oc proxy --port=8080 &
curl -d '[{ "op": "replace", "path":"/status/storedVersions", "value": ["v1"] }]' -H "Content-Type: application/json-patch+json" -X PATCH http://localhost:8080/apis/apiextensions.k8s.io/v1/customresourcedefinitions/kubedeschedulers.operator.openshift.io/status
```
Although, at this point the installplan is in failed state and the catalog operator no longer retries to follow it to finish the upgrade.
The fix for this is in https://github.com/openshift/cluster-kube-descheduler-operator/pull/215 However we have 2 bugs: this one and https://bugzilla.redhat.com/show_bug.cgi?id=1991938 for 4.8, and the fix is only going into the 4.8 branch. So, since this is unrelated to 4.9 (we have identified that the fix needs to go into the 4.8 operator), I am going to close this BZ since it is blocking the 4.8 bug from merging. If there is any objection, feel free to reopen. *** This bug has been marked as a duplicate of bug 1991938 *** |