Bug 1947212 - Upgrades may get stuck if permissions reduced
Summary: Upgrades may get stuck if permissions reduced
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Kevin Rizza
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-07 23:00 UTC by Ben Luddy
Modified: 2023-03-09 01:02 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-09 01:02:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ben Luddy 2021-04-07 23:00:12 UTC
Description of problem:

During an upgrade, if the scope of an operator's permissions is reduced (e.g. removal of an entry from the ClusterServiceVersion's .spec.permissions or .spec.clusterPermissions), both the old and the new ClusterServiceVersion may enter phase "Pending" and stay there indefinitely. These symptoms were first reported with a different root cause in https://bugzilla.redhat.com/show_bug.cgi?id=1934080.

The timeline looks something like this, assuming CSV "old" in "Replacing" and CSV "new" in "Pending", with an in-progress InstallPlan:

1. An InstallPlan Role step is applied, removing some rules in an update to the existing Role.
2. CSV "old" is reconciled, its RBAC requirements are no longer met, and it transitions from "Replacing" to "Pending". Without intervention, that requirement will remain unsatisfied because InstallPlan execution is a one-way process.
3. CSV "new" is reconciled. Because its .spec.replaces is "old", and "old" exists, it refuses to progress unless "old" is in phase "Replacing".

Version-Release number of selected component (if applicable): 4.6

How reproducible:

Sometimes, depending on the winner of the races between catalog-operator applying steps from an InstallPlan and olm-operator's CSV reconciliation.

Steps to Reproduce:

1. Create an index image containing a channel with two entries. The first entry should define some permission in its CSV, for example:

      clusterPermissions:
      - serviceAccountName: service-account
        rules:
        - apiGroups:
          - ""
          resources:
          - configmaps
          verbs:
          - get
          - list

and the second should reduce the scope of that permission, for example:

      clusterPermissions:
      - serviceAccountName: service-account
        rules:
        - apiGroups:
          - ""
          resources:
          - configmaps
          verbs:
          - get

2. Create a CatalogSource pointing to the index that was created in (1).
3. Create a Subscription (and OperatorGroup if necessary) for the CatalogSource created in (2) with .spec.startingCSV set to the name of the _first_ entry.

Actual results:

The first entry is installed, then (sometimes) the upgrade to the second entry never completes -- both "old" and "new" CSVs show phase "Pending" indefinitely.

Expected results:

The first entry is installed, then an upgrade to the second entry succeeds (the first CSV is automatically deleted and the second CSV has a good status).

Comment 1 Per da Silva 2022-01-11 20:36:57 UTC

*** This bug has been marked as a duplicate of bug 1942818 ***

Comment 2 Per da Silva 2022-01-12 00:32:45 UTC
The above closure was erroneous. Re-opening.

Comment 5 Shiftzilla 2023-03-09 01:02:04 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-8859


Note You need to log in before you can comment on or make changes to this bug.