Bug 1887740
| Summary: | cannot install descheduler operator after uninstalling it | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | RamaKasturi <knarra> | ||||||||||
| Component: | kube-scheduler | Assignee: | Jan Chaloupka <jchaloup> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | medium | ||||||||||||
| Version: | 4.6 | CC: | aos-bugs, jchaloup, krizza, mfojtik, nhale, scuppett, vdinh | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | 4.7.0 | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2021-02-24 15:25:34 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
RamaKasturi
2020-10-13 09:03:40 UTC
Created attachment 1721149 [details]
catalog operator logs
Created attachment 1721150 [details]
OLM operator logs
Created attachment 1721152 [details]
descheduler operator install plan
Restarting catalog-operator, resp. olm-operator pod does not help. Also, removing the installplan object does not help (installplan object is not recreated after deleting it). Created attachment 1721156 [details]
catalog operator logs (delete followed by re-creation of descheduler operator)
There's a bunch of
```
time="2020-10-13T09:30:07Z" level=info msg="error updating InstallPlan status" id=S4nLU ip=install-vdjqm namespace=openshift-kube-descheduler-operator phase=Installing updateError="Operation cannot be fulfilled on installplans.operators.coreos.com \"install-vdjqm\": the object has been modified; please apply your changes to the latest version and try again"
E1013 09:30:07.681290 1 queueinformer_operator.go:290] sync {"update" "openshift-kube-descheduler-operator/install-vdjqm"} failed: error updating InstallPlan status: Operation cannot be fulfilled on installplans.operators.coreos.com "install-vdjqm": the object has been modified; please apply your changes to the latest version and try again
```
Are not there some read/write conflicts in OLM itself?
Hi Rama, I coundn't reproduce it in my cluster: https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/117614/artifact/workdir/install-dir/auth/kubeconfig/*view*/ [root@preserve-olm-env data]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-10-12-223649 True False 6h41m Cluster version is 4.6.0-0.nightly-2020-10-12-223649 [root@preserve-olm-env data]# oc -n openshift-operator-lifecycle-manager exec catalog-operator-5f654b87f-mnc74 -- olm --version OLM version: 0.16.1 git commit: 6f59080264afd89fa786ca872f759470d8764b22 1) Install it from UI, it works well. [root@preserve-olm-env data]# oc get sub -A NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-kube-descheduler-operator cluster-kube-descheduler-operator cluster-kube-descheduler-operator qe-app-registry 4.6 [root@preserve-olm-env data]# oc get ip -n openshift-kube-descheduler-operator NAME CSV APPROVAL APPROVED install-5k5tj clusterkubedescheduleroperator.4.6.0-202010061132.p0 Automatic true [root@preserve-olm-env data]# oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.6.0-202010061132.p0 Kube Descheduler Operator 4.6.0-202010061132.p0 Succeeded 2) Uninstall it, it works well. [root@preserve-olm-env data]# oc get sub -A No resources found [root@preserve-olm-env data]# oc get ip -n openshift-kube-descheduler-operator No resources found in openshift-kube-descheduler-operator namespace. [root@preserve-olm-env data]# oc get sa -n openshift-kube-descheduler-operator NAME SECRETS AGE builder 2 2m11s default 2 2m11s deployer 2 2m11s 3) Reinstall it, it works well. [root@preserve-olm-env data]# oc get sub -A NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-kube-descheduler-operator cluster-kube-descheduler-operator cluster-kube-descheduler-operator qe-app-registry 4.6 [root@preserve-olm-env data]# oc get ip -n openshift-kube-descheduler-operator NAME CSV APPROVAL APPROVED install-pjsg6 clusterkubedescheduleroperator.4.6.0-202010061132.p0 Automatic true [root@preserve-olm-env data]# oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.6.0-202010061132.p0 Kube Descheduler Operator 4.6.0-202010061132.p0 Succeeded [root@preserve-olm-env data]# oc get pods -n openshift-kube-descheduler-operator NAME READY STATUS RESTARTS AGE descheduler-operator-ccd58fcb7-lh2zd 1/1 Running 0 38s [root@preserve-olm-env data]# oc get sa -n openshift-kube-descheduler-operator NAME SECRETS AGE builder 2 4m59s default 2 4m59s deployer 2 4m59s openshift-descheduler 2 46s In your cluster: https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/117438/artifact/workdir/install-dir/auth/kubeconfig/*view*/ [root@preserve-olm-env data]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-rc.2 True False 27h Cluster version is 4.6.0-rc.2 [root@preserve-olm-env data]# oc -n openshift-operator-lifecycle-manager exec catalog-operator-59546d8c85-dj5qf -- olm --version OLM version: 0.16.1 git commit: 6f59080264afd89fa786ca872f759470d8764b22 [root@preserve-olm-env data]# oc get sub -A NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-kube-descheduler-operator cluster-kube-descheduler-operator cluster-kube-descheduler-operator qe-app-registry 4.6 [root@preserve-olm-env data]# [root@preserve-olm-env data]# [root@preserve-olm-env data]# oc get ip -n openshift-kube-descheduler-operator NAME CSV APPROVAL APPROVED install-rctv7 clusterkubedescheduleroperator.4.6.0-202010061132.p0 Automatic true It's weried, there are the same CSVs(Replaces) here. Could you help give more details? Thanks! [root@preserve-olm-env data]# oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.6.0-202010061132.p0 Kube Descheduler Operator 4.6.0-202010061132.p0 clusterkubedescheduleroperator.4.6.0-202010061132.p0 Pending [root@preserve-olm-env data]# oc describe csv -n openshift-kube-descheduler-operator Name: clusterkubedescheduleroperator.4.6.0-202010061132.p0 Namespace: openshift-kube-descheduler-operator Labels: olm.api.623e59b3c80e3376=provided operators.coreos.com/cluster-kube-descheduler-operator.openshift-kube-descheduler-op= Annotations: alm-examples: ... Status: Conditions: Last Transition Time: 2020-10-13T07:15:17Z Last Update Time: 2020-10-13T07:15:17Z Message: requirements not yet checked Phase: Pending Reason: RequirementsUnknown Last Transition Time: 2020-10-13T07:15:17Z Last Update Time: 2020-10-13T07:15:17Z Message: one or more requirements couldn't be found Phase: Pending Reason: RequirementsNotMet Last Transition Time: 2020-10-13T07:15:17Z Last Update Time: 2020-10-13T07:15:17Z Message: one or more requirements couldn't be found Phase: Pending Reason: RequirementsNotMet Requirement Status: Group: operators.coreos.com Kind: ClusterServiceVersion Message: CSV minKubeVersion (1.19.0) less than server version (v1.19.0+d59ce34) Name: clusterkubedescheduleroperator.4.6.0-202010061132.p0 Status: Present Version: v1alpha1 Group: apiextensions.k8s.io Kind: CustomResourceDefinition Message: CRD is present and Established condition is true Name: kubedeschedulers.operator.openshift.io Status: Present Uuid: f05e9e93-73cb-484e-ad77-32f28e8f6190 Version: v1 Dependents: Group: rbac.authorization.k8s.io Kind: PolicyRule Message: cluster rule:{"verbs":["*"],"apiGroups":["operator.openshift.io"],"resources":["*"]} Status: NotSatisfied Version: v1 Group: rbac.authorization.k8s.io Kind: PolicyRule Message: cluster rule:{"verbs":["*"],"apiGroups":["kubedeschedulers.operator.openshift.io"],"resources":["*"]} Status: NotSatisfied Version: v1 Group: rbac.authorization.k8s.io Kind: PolicyRule Message: cluster rule:{"verbs":["*"],"apiGroups":[""],"resources":["services","pods","configmaps","secrets","names","nodes","pods/eviction","events"]} Status: NotSatisfied Version: v1 Group: rbac.authorization.k8s.io Kind: PolicyRule Message: cluster rule:{"verbs":["get","watch","list"],"apiGroups":["scheduling.k8s.io"],"resources":["priorityclasses"]} Status: NotSatisfied Version: v1 Group: rbac.authorization.k8s.io Kind: PolicyRule Message: cluster rule:{"verbs":["*"],"apiGroups":["apps"],"resources":["deployments","replicasets"]} Status: NotSatisfied Version: v1 Group: Kind: ServiceAccount Message: Policy rule not satisfied for service account Name: openshift-descheduler Status: PresentNotSatisfied Version: v1 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal RequirementsUnknown 115m (x2 over 115m) operator-lifecycle-manager requirements not yet checked Normal RequirementsNotMet 115m (x2 over 115m) operator-lifecycle-manager one or more requirements couldn't be found The SA is exist. [root@preserve-olm-env data]# oc get sa -n openshift-kube-descheduler-operator NAME SECRETS AGE builder 2 134m default 2 134m deployer 2 134m openshift-descheduler 2 130m Anyway, workaround: 1) Uninstall it. [root@preserve-olm-env data]# oc get sa -n openshift-kube-descheduler-operator NAME SECRETS AGE builder 2 165m default 2 165m deployer 2 165m [root@preserve-olm-env data]# oc get sub -n openshift-kube-descheduler-operator No resources found in openshift-kube-descheduler-operator namespace. [root@preserve-olm-env data]# oc get ip -n openshift-kube-descheduler-operator No resources found in openshift-kube-descheduler-operator namespace. [root@preserve-olm-env data]# oc get csv -n openshift-kube-descheduler-operator No resources found in openshift-kube-descheduler-operator namespace. 2) Delete the job, ConfigMap in the openshift-marketplace project. [root@preserve-olm-env data]# oc get job No resources found in openshift-marketplace namespace. [root@preserve-olm-env data]# oc get cm NAME DATA AGE marketplace-operator-lock 0 27h marketplace-trusted-ca 1 28h 3) Delete the OLM pods. [root@preserve-olm-env data]# oc delete pods --all -n openshift-operator-lifecycle-manager pod "catalog-operator-59546d8c85-dj5qf" deleted pod "olm-operator-6984b748cf-bbs28" deleted pod "packageserver-9b65c8b76-wl7lf" deleted pod "packageserver-9b65c8b76-xx9pv" deleted [root@preserve-olm-env data]# [root@preserve-olm-env data]# oc get pods -n openshift-operator-lifecycle-manager NAME READY STATUS RESTARTS AGE catalog-operator-59546d8c85-hgqs4 1/1 Running 0 22s olm-operator-6984b748cf-m26vz 1/1 Running 0 22s packageserver-9b65c8b76-djzx7 1/1 Running 0 21s packageserver-9b65c8b76-mdl78 1/1 Running 0 21s 4) Reinstall it, it works well. [root@preserve-olm-env data]# oc get sub -A NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-kube-descheduler-operator cluster-kube-descheduler-operator cluster-kube-descheduler-operator qe-app-registry 4.6 [root@preserve-olm-env data]# oc get ip -n openshift-kube-descheduler-operator NAME CSV APPROVAL APPROVED install-jwvr5 clusterkubedescheduleroperator.4.6.0-202010061132.p0 Automatic true [root@preserve-olm-env data]# oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.6.0-202010061132.p0 Kube Descheduler Operator 4.6.0-202010061132.p0 Succeeded [root@preserve-olm-env data]# [root@preserve-olm-env data]# oc get pods -n openshift-kube-descheduler-operator NAME READY STATUS RESTARTS AGE descheduler-operator-5d88f758f6-lhvbs 1/1 Running 0 49s [root@preserve-olm-env data]# [root@preserve-olm-env data]# oc get sa -n openshift-kube-descheduler-operator NAME SECRETS AGE builder 2 3h8m default 2 3h8m deployer 2 3h8m openshift-descheduler 2 59s Just tried workaround suggested by JianZhang and i could successfully deploy the operator, but we still do not know what is causing this issue, thanks !! Setting target release to the active development branch (4.7.0). For any fixes, where required and requested, cloned BZs will be created for those release maintenance streams where appropriate once they are identified. Hi, So the main issue is here due to the skipRange that is specified in the CSV. It is ">=4.3.0-0 < 4.6.0" I believe. The version in the installed CSV is 4.6.0-202010061132.p0 which falls into that range. Essentially, when major, minor, and patch are equal, a pre-release version has lower precedence than a normal version so 4.6.0-202010061132.p0 < 4.6.0. Because of this, the solver is confused by this and adds the this version into `replaces` field so we end up with the situation that this version is replacing itself. As a result, the CSV is stuck in Pending state. You can fix this by changing the skipRange to ">=4.3.0-0 < 4.6.0-0". Thanks, Vu Thank you very much for debugging and providing the fix!!! Another thing we have learned about OLM. Verified bug with the payload below and i see that uninstalling & installing descheduler operator works fine. [knarra@knarra openshift-client-linux-4.7.0-0.nightly-2020-11-10-023606]$ ./oc version Client Version: 4.7.0-0.nightly-2020-11-10-023606 Server Version: 4.7.0-0.nightly-2020-11-10-023606 Kubernetes Version: v1.19.2+7e80e12 [knarra@knarra openshift-client-linux-4.7.0-0.nightly-2020-11-10-023606]$ ./oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.7.0-202011031553.p0 Kube Descheduler Operator 4.7.0-202011031553.p0 Succeeded Also as per the PR i see that olm.skipRange has been set as '>=4.3.0-0 < 4.7.0-0'. Uninstalled & installed about 4 times and did not see any issue so moving the bug to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |