Bug 1896102 - OLM not updating operator to the next version due to a stuck installplan in the "UpgradePending" state
Summary: OLM not updating operator to the next version due to a stuck installplan in t...
Keywords:
Status: CLOSED DUPLICATE of bug 1860185
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-09 18:13 UTC by James Harrington
Modified: 2024-03-25 17:00 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-09 21:43:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description James Harrington 2020-11-09 18:13:21 UTC
Description of problem:

OLM isn't upgrading the cloud-ingress-operator operator on cluster to the latest version. The subscription status is showing the `currentCSV` to be "cloud-ingress-operator.v0.1.175-e727583" however the CSV on cluster is "cloud-ingress-operator.v0.1.177-8cad995"

The subscription status shows that the installplan install-spvzh for CSV version cloud-ingress-operator.v0.1.175-e727583 is in the "UpgradePending" state.

Looking at the installplans on cluster we see install-f9mcs which is for CSV version cloud-ingress-operator.v0.1.177-8cad995 was approved an installed as well as install-spvzh for cloud-ingress-operator.v0.1.175-e727583


$ oc get ip -n openshift-cloud-ingress-operator install-spvzh -o json | jq '.status | "\(.conditions) \(.phase)"'

"[{\"lastTransitionTime\":\"2020-06-19T15:20:38Z\",\"lastUpdateTime\":\"2020-06-19T15:20:38Z\",\"status\":\"True\",\"type\":\"Installed\"}] Complete"

$ oc get ip -n openshift-cloud-ingress-operator install-f9mcs -o json | jq '.status | "\(.conditions) \(.phase)"'

"[{\"lastTransitionTime\":\"2020-06-19T15:20:44Z\",\"lastUpdateTime\":\"2020-06-19T15:20:44Z\",\"status\":\"True\",\"type\":\"Installed\"}] Complete"

Please NOTE:

Full disclosure this operator's CSV is referencing the CRD that it doesn't deploy. That CRD is present on cluster and the requirements for the CSV are satisfied. We are fixing this.


Version-Release number of selected component (if applicable):

oc get pods -n openshift-operator-lifecycle-manager -o json | jq '.items[].spec.containers[].image'
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d97a825602c5285fc6847534aeb7ead1b99059b709c513ad806686b52e27d2b4"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d97a825602c5285fc6847534aeb7ead1b99059b709c513ad806686b52e27d2b4"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d97a825602c5285fc6847534aeb7ead1b99059b709c513ad806686b52e27d2b4"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d97a825602c5285fc6847534aeb7ead1b99059b709c513ad806686b52e27d2b4"


How reproducible:

Not everytime, unable to reproduce reliably at the moment

Steps to Reproduce:
1.
2.
3.

Actual results:

OLM appears to be stuck and cannot move the installplan install-spvzh into a AtLatestKnown state

Expected results:

OLM to upgrade the cloud-ingress-operator

Additional info:

The catalog pod is return a new version for cloud-ingress-operator.v0.1.177-8cad995 

oc run grpcurl-query -n openshift-operator-lifecycle-manager --rm=true  --restart=Never --attach=true --image=quay.io/rogbas/grpcurl -- -plaintext 10.204.132.87:50051  api.Registry/ListBundles | jq -c '. |select(.replaces|match("8cad995"))| {packageName,csvName,channelName,replaces}'
{"packageName":"cloud-ingress-operator","csvName":"cloud-ingress-operator.v0.1.179-ae0b008","channelName":"production","replaces":"cloud-ingress-operator.v0.1.177-8cad995"}

Install plans on cluster 

oc get ip -n openshift-cloud-ingress-operator
NAME            CSV                                       APPROVAL    APPROVED
install-f9mcs   cloud-ingress-operator.v0.1.177-8cad995   Automatic   true
install-mxw2p   cloud-ingress-operator.v0.1.172-64a442f   Automatic   true
install-spvzh   cloud-ingress-operator.v0.1.175-e727583   Automatic   true
install-v7fnl   cloud-ingress-operator.v0.1.174-184d837   Automatic   true

CSV on cluster 

oc get csv -n openshift-cloud-ingress-operator
NAME                                               DISPLAY                           VERSION           REPLACES                                           PHASE
cloud-ingress-operator.v0.1.177-8cad995            cloud-ingress-operator   

CSV interesting metadata:

oc get csv cloud-ingress-operator.v0.1.177-8cad995 -n openshift-cloud-ingress-operator -o json | jq '.status.requirementStatus[] | "\(.name) \(.kind) \(.status)"' 
"subjectpermissions.managed.openshift.io CustomResourceDefinition Present"
"cloud-ingress-operator ServiceAccount Present"

oc get csv cloud-ingress-operator.v0.1.177-8cad995 -n openshift-cloud-ingress-operator -o json | jq '.status.requirementStatus[] | select(.dependents!=null) | .dependents[] | "\(.kind) \(.status)"'
"PolicyRule Satisfied"
"PolicyRule Satisfied"
"PolicyRule Satisfied"
"PolicyRule Satisfied"
"PolicyRule Satisfied"
"PolicyRule Satisfied"
"PolicyRule Satisfied"
"PolicyRule Satisfied"
"PolicyRule Satisfied"
"PolicyRule Satisfied"
"PolicyRule Satisfied"

Comment 3 Kevin Rizza 2020-11-09 21:43:56 UTC
Based on our investigation, I'm going to close this out as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1860185

While this bug is already resolved, it appears that the problem state was already tripped during install on a previous version. Doing a reinstall will resolve this problem, and it won't be encountered in the future based on the current ocp version of this cluster (which now includes the fix).

*** This bug has been marked as a duplicate of bug 1860185 ***


Note You need to log in before you can comment on or make changes to this bug.