Description of problem: Unable to pull images in the Elasticsearch Operator Upgrade Version-Release number of selected component (if applicable): 4.6.x How reproducible: All times Steps to Reproduce: 1. Try to upgrade the Elasticsearch Operator 2. The operation starts 3. The event log shows : Back-off pulling image "registry.stage.redhat.io/openshift4/ose-elasticsearch-operator-bundle@sha256:6d2587129c746ec28d384540322b40b05833e7e00b25cca584e004af9a1d292e" Actual results: Unable to update the Operator Expected results: Able to update the Operator Additional info: This also happening to a customer with ServiceMesh
Moving to RELEASE team who is responsible for final bundled images in 4.6
For the reports I have researched so far, the bundles in question were never published and should not be referenced in index images that customers use. Furthermore, when I check the index images used as catalog sources for OperatorHub, they correctly refer only to publicly released bundles. For customers experiencing this, the index image in question is registry.redhat.io/redhat/redhat-operator-index:v4.6 (which is a floating tag updated whenever new content is published). The only hypothesis I have for what happened here is that somehow our pipeline for publishing index images mistakenly published content from our stage environment, and then later published the correct content, leaving a window where customers got the stage content. It could be helpful to find out what shasum the affected customers have for this image on their cluster, and then to have them re-pull the image to see if it fixes their problem (if the problem still exists). Unfortunately I'm not familiar with how/where exactly they would do this.
The stage catalog was inadvertently published as prod, so that's the root cause of this. However that has been fixed, so the next steps are for the OLM team to help the impacted customers recover their upgrades. Since OLM automatically refreshes the catalog, that part of it should be fine, but i'm guessing the in flight upgrades are still going to be stuck w/o some sort of manual intervention to help them progress/pick up the correct bundle reference.
Thanks! LGTM, veirfy it.
The OCP 4.6 and later documentation has been updated in the Support guide to add the following "Refreshing failing subscriptions" section to the "Troubleshooting Operator issues" topic: https://docs.openshift.com/container-platform/4.7/support/troubleshooting/troubleshooting-operator-issues.html The same "Refreshing failing subscriptions" section has also been added to the "Deleting Operators from a cluster" topic in the Operators guide: https://docs.openshift.com/container-platform/4.7/operators/admin/olm-deleting-operators-from-cluster.html Please let me know if there are any outstanding documentation requests related to this issue.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days