Bug 1774732 - OLM fails to upgrade Kiali operator when using Manual approvals
Summary: OLM fails to upgrade Kiali operator when using Manual approvals
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.2.z
Assignee: Bowen Song
QA Contact: Tom Buskey
URL:
Whiteboard:
Depends On: 1775216
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-20 19:45 UTC by Edgar Hernández
Modified: 2019-11-27 16:47 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1775213 1775216 (view as bug list)
Environment:
Last Closed: 2019-11-27 16:34:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
OLM logs for manual strategy (79.96 KB, text/plain)
2019-11-20 19:45 UTC, Edgar Hernández
no flags Details
Kiali operator cluster roles as created after installing 1.9.1 (11.05 KB, text/plain)
2019-11-20 19:47 UTC, Edgar Hernández
no flags Details
Kiali operator ClusterRoleBindings as created after installing 1.9.1 (811 bytes, text/plain)
2019-11-20 19:47 UTC, Edgar Hernández
no flags Details
Kiali operator ServiceAccount as created after installing 1.9.1 (667 bytes, text/plain)
2019-11-20 19:48 UTC, Edgar Hernández
no flags Details
FYI: OLM logs with automatic upgrade strategy (which does a successful upgrade) (132.35 KB, application/octet-stream)
2019-11-20 19:49 UTC, Edgar Hernández
no flags Details
Install plan before upgrade (first install of the operator) (24.75 KB, application/octet-stream)
2019-11-26 22:14 UTC, Edgar Hernández
no flags Details
Install plan when upgrading (29.42 KB, application/octet-stream)
2019-11-26 22:15 UTC, Edgar Hernández
no flags Details
catalog-operator log (22.28 KB, application/octet-stream)
2019-11-26 22:16 UTC, Edgar Hernández
no flags Details

Description Edgar Hernández 2019-11-20 19:45:32 UTC
Created attachment 1638205 [details]
OLM logs for manual strategy

Description of problem:

OLM fails to correctly upgrade an operator when using manual approval for upgrades. After approving the install plan, the CSV is applied correctly, but all other dependent resources are not created. Thus, the affected operator doesn't work correctly due to RBAC issues.

Version-Release number of selected component (if applicable):

OCP 4.2.1.

How reproducible:

This was discovered when working on changes for the Kiali operator. Install version 1.9.1 of the Kiali operator using a custom source and using **manual** approvals for upgrades. Then, after uploading a new manifest with version 1.10.0 and approving the install, OLM erroes when applying a ClusteRole and sends the operator to a Pending state and never finishes the installation. 


Steps to Reproduce:
1. Open the OCP console and go to the OperatorHub page.
2. Search for the Kiali operator and click Install.
3. On the Install page, choose "Manual" for the update approval strategy. Leave the defaults for the other options. Then, install the operator.
4. Approve the install plan of the Kiali operator 1.9.1 and wait for its install. It should succeed.
5. Push a new version for the Kiali operator (say 1.10.0), and wait for OLM to scan for updates (or force a scan).
6. In the OCP console, go to the Subscription page of the Kiali operator. Wait for it to show "upgrading".
7. Approve the new install plan for the Kiali operator to start the upgrade.

Actual results:

After approving the install plan for the upgrade, OLM tries to apply the upgrade, but it fails. Then, apparently, it sends the Kiali operator to a "Pending" state and it never finishes the install.

Previous version 1.9.1 of the Kiali operator appears to be uninstalled successfully. New version (say 1.10.0) is deployed, but the pod never starts because of missing clusterroles, service accounts and other dependent resources.

Expected results:

After the manual approval of the install plan, OLM should properly upgrade the operator.

ALTERNATIVE: Perhaps, the subscription page could show a "retry" button, in case automatic retry is not feasible.

Additional info:

Looks like the issue is particular with Manual upgrades. I tried the Automatic upgrade strategy and it successfully applies the upgrade. So, *probably*, the issue may be generan and NOT specific to the Kiali operator.

I'm attaching:
- Logs of OLM under manual upgrade. Errors start at line 228.
- The following Kiali operator resources that OLM should be managing: ClusterRoles, ClusteRoleBindings, ServiceAccount

Comment 1 Edgar Hernández 2019-11-20 19:47:05 UTC
Created attachment 1638207 [details]
Kiali operator cluster roles as created after installing 1.9.1

Comment 2 Edgar Hernández 2019-11-20 19:47:52 UTC
Created attachment 1638209 [details]
Kiali operator ClusterRoleBindings as created after installing 1.9.1

Comment 3 Edgar Hernández 2019-11-20 19:48:18 UTC
Created attachment 1638210 [details]
Kiali operator ServiceAccount as created after installing 1.9.1

Comment 4 Edgar Hernández 2019-11-20 19:49:12 UTC
Created attachment 1638211 [details]
FYI: OLM logs with automatic upgrade strategy (which does a successful upgrade)

Comment 5 Edgar Hernández 2019-11-20 19:51:35 UTC
About the logs with the automatic upgrade, the upgrade seems to start at line 780.

Comment 6 Evan Cordell 2019-11-22 21:28:35 UTC
Could you please provide the `InstallPlan` objects that you are seeing in the manual approval case? And can you provide logs from `catalog-operator` during the failed upgrade?

Comment 7 Edgar Hernández 2019-11-26 22:14:26 UTC
Created attachment 1639973 [details]
Install plan before upgrade (first install of the operator)

Comment 8 Edgar Hernández 2019-11-26 22:15:05 UTC
Created attachment 1639974 [details]
Install plan when upgrading

Comment 9 Edgar Hernández 2019-11-26 22:16:03 UTC
Created attachment 1639975 [details]
catalog-operator log

Comment 10 Edgar Hernández 2019-11-26 22:20:53 UTC
I'm providing the requested data. However, I can no longer replicate.
I moved to OCP 4.2.4 and the issue seems to be gone.

Originally, I saw this issue on OCP 4.2.1.


Note You need to log in before you can comment on or make changes to this bug.