Description of problem (please be detailed as possible and provide log snippests): When setting up ODF for an external Ceph cluster on an OpenShift cluster with cert-manager running a validating webhook gets created. This validating webhook is called rook-ceph-webhook and will deny the .json configuration. With following error message: ``` Error while reconciling: admission webhook "cephcluster-wh-rook-ceph-admission-controller-openshift-storage.rook.io" denied the request: invalid create : external mode enabled cannot have mon,dashboard,monitoring,network,disruptionManagement,storage fields in CR ``` Link to webhook code: https://github.com/rook/rook/blob/master/pkg/apis/ceph.rook.io/v1/cluster.go#L49 If one deletes the created webhook it will connect the Ceph cluster to OpenShift like normal, with no observed issues. It continues to work even after reapplying the webhook to the OpenShift cluster. If cert-manager is not installed since before, installing the ODF operator will not trigger the webhook to get installed at all. On the other hand, if cert-manager is installed afterwards and then the ODF operator gets upgraded, the webhook will get created. Upgrades with the webhook present does not seem to affect the ceph cluster in any way. We have so far not observed any issues when upgrading the ODF operator with cert-manager installed. Thus, as far as we can see, the issue is only for the initial connection to the Ceph cluster. Version of all relevant components (if applicable): OpenShift 4.11.3 4.11.5 ODF 4.11.3 4.11.4 Cert-Manager quay.io/jetstack/cert-manager-controller:v1.10.0 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? This blocks the integration of ODF external with OCP. Is there any workaround available to the best of your knowledge? * Install OCP cluster * Install ODF operator - Observe that rook-ceph-webhook is not created. * Connect to External ceph cluster * Add cert-manager helm chart. - Observe that the rook-ceph-webhook is not created - On the next ODF operator upgrade, the rook-ceph-webhook will be created * This allows for upgrades of ODF operator, without losing the connection to the external ceph cluster. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes. Can this issue reproduce from the UI? If a current OCP cluster exists, then yes. If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Install OCP cluster 2. Add the cert-manager helm chart 3. Install the ODF operator - Observe that the rook-ceph-webhook gets created 4. Try to connect the ODF external ceph cluster. Actual results: ODF operator starts using cert-manager to set up resources in the cluster (Issuer, Certificate). It then creates the "rook-ceph-webhook" which blocks the integration of ODF external. Expected results: ODF to not start using other software and both make unannounced changes in the cluster and block one from connecting to the external Ceph cluster. Additional info: Support case: https://access.redhat.com/support/cases/#/case/03381863
As mentioned in gChat, we'll disable the webhook in the downstream product until we officially support it. Yes, we'll backport to till 4.10 or earlier if possible.
Marking as a blocker since it affects the first install experience, the workaround is difficult, and the fix is simple and low risk.
(In reply to Subham Rai from comment #5) > As mentioned in gChat, we'll disable the webhook in the downstream product > until we officially support it. > > Yes, we'll backport to till 4.10 or earlier if possible. The above issue will not be in 4.10 since we do not have the webhook with cert-manager in 4.10, the changes are from 4.11. So, I'll backport it till 4.11.
Update: ========== verified with below versions openshift installer (4.12.0-0.nightly-2023-01-10-062211) ocs-registry:4.12.0-167 1. install OCP (4.12.0-0.nightly-2023-01-10-062211) 2. install cert-manager 3. deploy ODF ( 4.12.0-167 ) ODF deployment is successfull without any issues > no rook-ceph-webhook is created oc get validatingwebhookconfigurations.admissionregistration.k8s.io NAME WEBHOOKS AGE admissionwebhook.noobaa.io-2hrfx 1 13m alertmanagerconfigs.openshift.io 1 157m autoscaling.openshift.io 2 167m cert-manager-webhook 1 24m cluster-baremetal-validating-webhook-configuration 1 167m controlplanemachineset.machine.openshift.io 1 167m machine-api 2 168m multus.openshift.io 1 170m performance-addon-operator 1 172m prometheusrules.openshift.io 1 157m snapshot.storage.k8s.io 1 168m validation.csi.vsphere.vmware.com 1 167m > oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.12.0 NooBaa Operator 4.12.0 Succeeded ocs-operator.v4.12.0 OpenShift Container Storage 4.12.0 Succeeded odf-csi-addons-operator.v4.12.0 CSI Addons 4.12.0 Succeeded odf-operator.v4.12.0 OpenShift Data Foundation 4.12.0 Succeeded openshift-cert-manager.v1.7.1 cert-manager Operator for Red Hat OpenShift 1.7.1-1 Succeeded $ oc get storagesystem NAME STORAGE-SYSTEM-KIND STORAGE-SYSTEM-NAME ocs-external-storagecluster-storagesystem storagecluster.ocs.openshift.io/v1 ocs-external-storagecluster $ oc get storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-external-storagecluster 7m37s Ready true 2023-01-17T09:20:50Z 4.12.0 $ oc describe storagecluster Name: ocs-external-storagecluster Namespace: openshift-storage Status: Conditions: Last Heartbeat Time: 2023-01-17T09:28:27Z Last Transition Time: 2023-01-17T09:20:51Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: True Type: ReconcileComplete Last Heartbeat Time: 2023-01-17T09:28:27Z Last Transition Time: 2023-01-17T09:22:49Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: True Type: Available Last Heartbeat Time: 2023-01-17T09:28:27Z Last Transition Time: 2023-01-17T09:22:49Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: False Type: Progressing Last Heartbeat Time: 2023-01-17T09:28:27Z Last Transition Time: 2023-01-17T09:20:51Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: False Type: Degraded Last Heartbeat Time: 2023-01-17T09:28:27Z Last Transition Time: 2023-01-17T09:22:49Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: True Type: Upgradeable External Secret Hash: 054181a0aa3997cfe5a1c170a0f97cebda33176bfaf358ad2a1648ffccfb1f25265818fc446f8c4831eacff2dc0715e5d621177a3c3bf632779585c621c86d6f External Storage: Granted Capacity: 0 Images: Ceph: Desired Image: quay.io/rhceph-dev/rhceph@sha256:957294824e1cbf89ca24a1a2aa2a8e8acd567cfb5a25535e2624989ad1046a60 Noobaa Core: Actual Image: quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:82bcc82a78933ae759127bc8917edbe91737c41e01f18638278f939a4548c8d3 Desired Image: quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:82bcc82a78933ae759127bc8917edbe91737c41e01f18638278f939a4548c8d3 Noobaa DB: Actual Image: quay.io/rhceph-dev/rhel8-postgresql-12@sha256:3d805540d777b09b4da6df99e7cddf9598d5ece4af9f6851721a9961df40f5a1 Desired Image: quay.io/rhceph-dev/rhel8-postgresql-12@sha256:3d805540d777b09b4da6df99e7cddf9598d5ece4af9f6851721a9961df40f5a1 Kms Server Connection: Phase: Ready > from rook-ceph-operator log , webhook resources is deleted as per expected. 2023-01-17 09:10:55.226404 I | rookcmd: starting Rook v4.12.0-0.f4e99907f9b9f05a190303465f61d12d5d24cace with arguments '/usr/local/bin/rook ceph operator' 2023-01-17 09:10:55.226461 I | rookcmd: flag values: --enable-machine-disruption-budget=false, --help=false, --kubeconfig=, --log-level=INFO, --operator-image=, --service-account= 2023-01-17 09:10:55.226464 I | cephcmd: starting Rook-Ceph operator 2023-01-17 09:10:55.357359 I | cephcmd: base ceph version inside the rook operator image is "ceph version 16.2.10-94.el8cp (48ce8ed67474ea50f10c019b9445be7f49749d23) pacific (stable)" 2023-01-17 09:10:55.371796 I | op-k8sutil: ROOK_CURRENT_NAMESPACE_ONLY="true" (env var) 2023-01-17 09:10:55.371812 I | operator: watching the current namespace "openshift-storage" for a Ceph CRs 2023-01-17 09:10:55.371845 I | operator: setting up schemes 2023-01-17 09:10:55.373409 I | operator: setting up the controller-runtime manager I0117 09:10:56.424146 1 request.go:601] Waited for 1.04665259s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/tuned.openshift.io/v1?timeout=32s 2023-01-17 09:10:58.227602 I | operator: delete webhook resources since webhook is disabled 2023-01-17 09:10:58.227619 I | operator: deleting validating webhook rook-ceph-webhook 2023-01-17 09:10:58.231537 I | operator: deleting webhook cert manager Certificate rook-admission-controller-cert 2023-01-17 09:10:58.289564 I | operator: deleting webhook cert manager Issuer %sselfsigned-issuer 2023-01-17 09:10:58.393056 I | operator: deleting validating webhook service %srook-ceph-admission-controller 2023-01-17 09:10:58.396828 I | ceph-cluster-controller: successfully started 2023-01-17 09:10:58.396894 I | ceph-cluster-controller: hotplug orchestration disabled 2023-01-17 09:10:58.396903 I | ceph-crashcollector-controller: successfully started 2023-01-17 09:10:58.396920 I | ceph-block-pool-controller: successfully started 2023-01-17 09:10:58.396934 I | ceph-object-store-user-controller: successfully started 2023-01-17 09:10:58.396947 I | ceph-object-realm-controller: successfully started > logs are having %s, I will raise separate bug for logging issue