Created attachment 1638205 [details] OLM logs for manual strategy Description of problem: OLM fails to correctly upgrade an operator when using manual approval for upgrades. After approving the install plan, the CSV is applied correctly, but all other dependent resources are not created. Thus, the affected operator doesn't work correctly due to RBAC issues. Version-Release number of selected component (if applicable): OCP 4.2.1. How reproducible: This was discovered when working on changes for the Kiali operator. Install version 1.9.1 of the Kiali operator using a custom source and using **manual** approvals for upgrades. Then, after uploading a new manifest with version 1.10.0 and approving the install, OLM erroes when applying a ClusteRole and sends the operator to a Pending state and never finishes the installation. Steps to Reproduce: 1. Open the OCP console and go to the OperatorHub page. 2. Search for the Kiali operator and click Install. 3. On the Install page, choose "Manual" for the update approval strategy. Leave the defaults for the other options. Then, install the operator. 4. Approve the install plan of the Kiali operator 1.9.1 and wait for its install. It should succeed. 5. Push a new version for the Kiali operator (say 1.10.0), and wait for OLM to scan for updates (or force a scan). 6. In the OCP console, go to the Subscription page of the Kiali operator. Wait for it to show "upgrading". 7. Approve the new install plan for the Kiali operator to start the upgrade. Actual results: After approving the install plan for the upgrade, OLM tries to apply the upgrade, but it fails. Then, apparently, it sends the Kiali operator to a "Pending" state and it never finishes the install. Previous version 1.9.1 of the Kiali operator appears to be uninstalled successfully. New version (say 1.10.0) is deployed, but the pod never starts because of missing clusterroles, service accounts and other dependent resources. Expected results: After the manual approval of the install plan, OLM should properly upgrade the operator. ALTERNATIVE: Perhaps, the subscription page could show a "retry" button, in case automatic retry is not feasible. Additional info: Looks like the issue is particular with Manual upgrades. I tried the Automatic upgrade strategy and it successfully applies the upgrade. So, *probably*, the issue may be generan and NOT specific to the Kiali operator. I'm attaching: - Logs of OLM under manual upgrade. Errors start at line 228. - The following Kiali operator resources that OLM should be managing: ClusterRoles, ClusteRoleBindings, ServiceAccount
Created attachment 1638207 [details] Kiali operator cluster roles as created after installing 1.9.1
Created attachment 1638209 [details] Kiali operator ClusterRoleBindings as created after installing 1.9.1
Created attachment 1638210 [details] Kiali operator ServiceAccount as created after installing 1.9.1
Created attachment 1638211 [details] FYI: OLM logs with automatic upgrade strategy (which does a successful upgrade)
About the logs with the automatic upgrade, the upgrade seems to start at line 780.
Could you please provide the `InstallPlan` objects that you are seeing in the manual approval case? And can you provide logs from `catalog-operator` during the failed upgrade?
Created attachment 1639973 [details] Install plan before upgrade (first install of the operator)
Created attachment 1639974 [details] Install plan when upgrading
Created attachment 1639975 [details] catalog-operator log
I'm providing the requested data. However, I can no longer replicate. I moved to OCP 4.2.4 and the issue seems to be gone. Originally, I saw this issue on OCP 4.2.1.