Bug 1967621 - Operator fails to install and OLM tries to delete nonexistent catalog pods under openshift-marketplace/redhat-marketplace
Summary: Operator fails to install and OLM tries to delete nonexistent catalog pods u...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6
Hardware: All
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Alexander Greene
QA Contact: Bruno Andrade
URL:
Whiteboard:
Depends On:
Blocks: 1973582 1989723
TreeView+ depends on / blocked
 
Reported: 2021-06-03 13:32 UTC by Alfredo Pizarro
Modified: 2024-10-01 18:27 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When updating a Catalog Source a Get call is immediately followed by a Delete call on a number of resources related to the Catalog Source. Consequence: In some instances, the resource has already been deleted but the resource still exists in the cache. This allows the Get call to succeed but the following delete call fails as the resource does not exist on cluster. Fix: Updated OLM to ignore the error returned by the Delete call if the resource is not found. Result: OLM no longer reports an error when updating a catalog due to a caching issue that results in a "Resource Not Found" error from the delete call.
Clone Of:
Environment:
Last Closed: 2021-10-18 17:32:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:33:02 UTC

Description Alfredo Pizarro 2021-06-03 13:32:49 UTC
Description of problem:

There is an active OCS operator install that is stuck trying to install and OLM is complaining when it tries to delete catalog pods under openshift-marketplace:

ocs install status:
lastTransitionTime: "2021-05-26T21:42:51Z"
    lastUpdateTime: "2021-05-26T21:42:51Z"
    message: install timeout
    phase: Failed
    reason: InstallCheckFailed


Catalog source logs:

2021-05-27T17:04:57.517829141Z E0527 17:04:57.517756       1 queueinformer_operator.go:290] sync {"update" "openshift-marketplace/redhat-marketplace"} failed: couldn't ensure registry server - error ensuring updated catalog source pod: : error deleting duplicate catalog polling pod: redhat-marketplace-t9mzw: error deleting pod: redhat-marketplace-t9mzw: pods "redhat-marketplace-t9mzw" not found

But, there is no pod with that name:
$ omg get pods
NAME                                                             READY  STATUS     RESTARTS  AGE
5bae77dfc8df1dc4e8403e4e24d9e6ee44122c82b9795de0312812d818tfxj6  0/1    Succeeded  0         6d
64df73cd6ca959f7f62e31221ffc25dfafbaf2627dded53543399c431c86qqd  0/1    Succeeded  0         15d
965304d2cdcc277b3c03a16e3490f65f85ba839251fb7ed304cd6a1ac1685ml  0/1    Succeeded  0         15d
certified-operators-5fhz4                                        1/1    Running    0         7h30m
community-operators-kftpt                                        1/1    Running    0         1h8m
marketplace-operator-6cc74874c7-kz89g                            1/1    Running    0         141d
redhat-marketplace-72k4h                                         1/1    Running    0         5d
redhat-operators-d7frj                                           1/1    Running    0         16h


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 Alexander Greene 2021-08-03 17:27:38 UTC
A fix was merged to the Downstream distribution in this commit: https://github.com/openshift/operator-framework-olm/commit/817d4ede702f42ddfe5e0f30c9605fa7f5640493

Moving to modified state.

Comment 7 Bruno Andrade 2021-08-05 00:18:30 UTC
Looks good from now, didn't find the issue on this version. Marking as verified.

oc exec catalog-operator-5b746cd5b7-bcbxj  -n openshift-operator-lifecycle-manager -- olm --version
OLM version: 0.18.3
git commit: 552292e080289112aa03776b6e1f3957e72d46b3
OCP: 4.9.0-0.nightly-2021-08-04-131508


oc get pods -n openshift-marketplace 
NAME                                                              READY   STATUS      RESTARTS   AGE
478558eeddbb42083d83c915b4695642e264c74bdb0bd647f7b706f387nw47t   0/1     Completed   0          6m52s
certified-operators-2sdnf                                         0/1     Running     0          9s
community-operators-fl5bg                                         1/1     Running     0          73m
marketplace-operator-7fcfd4df55-cmzm8                             1/1     Running     0          78m
redhat-marketplace-8cqdk                                          1/1     Running     0          73m
redhat-operators-wqd4g                                            0/1     Running     0          9s


oc get ip -n openshift-storage                       
NAME            CSV                   APPROVAL    APPROVED
install-b9854   ocs-operator.v4.8.0   Automatic   true

oc get csv -n openshift-storage
NAME                  DISPLAY                       VERSION   REPLACES   PHASE
ocs-operator.v4.8.0   OpenShift Container Storage   4.8.0                Succeeded

Comment 10 errata-xmlrpc 2021-10-18 17:32:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.