Bug 1885398

Summary: CSV with only Webhook conversion can't be installed
Product: OpenShift Container Platform Reporter: Marcel Apfelbaum <mapfelba>
Component: OLMAssignee: Alexander Greene <agreene>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: agreene, jiazha, krizza, mapfelba, nhale
Version: 4.6   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When deploying an operator with an API Service, Conversion Webhook, or an Admission Webhook, OLM should retrieve the CA from an existing resource in order to calculate a CA Hash annotation. This annotation influences a Deployment Hash that OLM relies on to confirm that the deployment is installed correctly. OLM currently does not retrieve the CA from Conversion Webhooks, resulting in a bad Deployment Hash which causes OLM to attempt to reinstall the CSV. Consequence: If a CSV defines a Conversion Webhook but does not include an API Service or an Admission Webhook the CSV will cycle through the Pending, ReadyToInstall, and Installing phases indefinitely. Fix: OLM so it will use the existing Conversion Webhook to retrieve that value of the CA and correctly calculate the Deployment Hash. Result: OLM can now install CSVs that define a Conversion Webhook without an API Service or Admission Webhook.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:23:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1886448    
Attachments:
Description Flags
Proposed patch none

Description Marcel Apfelbaum 2020-10-05 19:42:35 UTC
Description of problem:
A CSV with only a Conversion Webhook and no other Mutating/Validating  Wenhooks can't be installed.
Installation will be stuck in the loop: ReadyToInstall ->Installing -> Pending -> ReadyToInstall

The only info in events is:
"calculated deployment install is bad"

Version-Release number of selected component (if applicable):
Latest

How reproducible:
100%

Steps to Reproduce:
1. Deploy an operator having a CSV only with a Webhook Conversion.


Additional info:
The root cause  is hidden by:
   https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/olm/operator.go#L1535
that hides the actual error.
In this case the cause was a missing flow in:
   https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/olm/apiservices.go#L264
It does not handle the desc.Type == v1alpha1.ConversionWebhook case.

Comment 1 Marcel Apfelbaum 2020-10-05 19:46:15 UTC
Created attachment 1719144 [details]
Proposed patch

Proposed by Alexander Greene
Verified by Marcel Apfelbaum

Comment 4 yhui 2020-10-20 16:54:24 UTC
Version:
[hui@localhost 1020]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-10-17-034503   True        False         10h     Cluster version is 4.7.0-0.nightly-2020-10-17-034503
[hui@localhost 1020]$ oc exec olm-operator-69b864f866-6sjj4 -n openshift-operator-lifecycle-manager -- olm --version
OLM version: 0.16.1
git commit: e2c0f2c47573ec5dfc509502881fa3dd8eb7bae9

Test procedure:
1. Download the latest example operators https://github.com/awgreene/webhook-operator/tree/olm.

2. Following the guide to install the operators using OLM.
[root@preserve-olm-env webhook-operator]# kubectl apply -f olm/ocp/install/00_catsrc.yaml
catalogsource.operators.coreos.com/webhook-operator-catalog created
[root@preserve-olm-env webhook-operator]# kubectl apply -f olm/ocp/install/01_sub.yaml
subscription.operators.coreos.com/webhook-operator-subscription created

3. Check the sub, csv and pods have been created successfully.
[root@preserve-olm-env webhook-operator]# oc get sub -n openshift-operators
NAME                            PACKAGE            SOURCE                     CHANNEL
webhook-operator-subscription   webhook-operator   webhook-operator-catalog   alpha
[root@preserve-olm-env webhook-operator]# oc get csv -n openshift-operators
NAME                      DISPLAY            VERSION   REPLACES   PHASE
webhook-operator.v0.0.1   Webhook Operator   0.0.1                Succeeded
[root@preserve-olm-env webhook-operator]# oc get pods -n openshift-operators
NAME                                        READY   STATUS    RESTARTS   AGE
webhook-operator-webhook-5598fb7797-m5rhp   2/2     Running   0          47m

4. Edit the CSV to remove validating and mutating webhooks. Only left conversion webhook.
[root@preserve-olm-env webhook-operator]# oc edit csv webhook-operator.v0.0.1 -n openshift-operators
clusterserviceversion.operators.coreos.com/webhook-operator.v0.0.1 edited

5. Delete the pods. Then the pods have been installed successfully again.
[root@preserve-olm-env webhook-operator]# oc get pods -n openshift-operators
NAME                                        READY   STATUS    RESTARTS   AGE
webhook-operator-webhook-5598fb7797-cmrx6   2/2     Running   0          18s

The operator having a CSV only with a Conversion webhook can be installed successfully.
Verify the bug on 4.7.0.

Comment 8 errata-xmlrpc 2021-02-24 15:23:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633