Bug 1904070 - Operator with webhooks fail to deploy on 4.5, works on master
Summary: Operator with webhooks fail to deploy on 4.5, works on master
Keywords:
Status: CLOSED DUPLICATE of bug 1920665
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Nick Hale
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks: 1921000
TreeView+ depends on / blocked
 
Reported: 2020-12-03 13:18 UTC by Alexander Greene
Modified: 2021-01-27 11:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1921000 (view as bug list)
Environment:
Last Closed: 2021-01-26 20:28:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alexander Greene 2020-12-03 13:18:41 UTC
Description of problem:
Based on: https://github.com/operator-framework/operator-lifecycle-manager/issues/1839
Operator with webhooks fail to deploy on 4.5, works on master

Version-Release number of selected component (if applicable):
4.5

How reproducible:
Always

Steps to Reproduce:
1.Apply the following command:
kubectl apply -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: opentelemetry-operator-manifests
  namespace: olm
spec:
  sourceType: grpc
  image: quay.io/jpkroehling/opentelemetry-operator-index:${BUNDLE_VERSION}
EOF
kubectl wait --for=condition=ready pod -l olm.catalogSource=opentelemetry-operator-manifests -n olm
kubectl apply -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: opentelemetry-operator-subscription
  namespace: operators
spec:
  channel: "alpha"
  installPlanApproval: Automatic
  name: opentelemetry-operator
  source: opentelemetry-operator-manifests
  sourceNamespace: olm
EOF

Actual results:
kubectl get events -n operators shows that the operator fails to deploy due to TLS issues. Messages are similar to this:

55s         Warning   Failed                pod/opentelemetry-operator-controller-manager-9bdd9fcc4-ctr8k     Error: cannot find volume "apiservice-cert" to mount into container "kube-rbac-proxy"
55s         Warning   Failed                pod/opentelemetry-operator-controller-manager-9bdd9fcc4-ctr8k     Error: cannot find volume "apiservice-cert" to mount into container "manager"

Expected results:


Additional info:

Comment 3 Andrew Bays 2021-01-26 12:13:54 UTC
I am experiencing something similar when trying to deploy my team's operator against 4.6.12.  We were forced to use operator-sdk 1.1.0 because 1.2.0+ produced a validation error when calling the "create webhook" command:

# operator-sdk create webhook --group osp-director --version v1beta1 --kind BaremetalSet --programmatic-validation
...
FATA[0000] failed to create webhook with version "3-alpha": operator-sdk create webhook requires an api with the group, kind and version provided

Anyhow, once the scaffolding is created within the operator, I modified the "config" directory and its subdirectories to include webhook and cert-manager content.  I created my image, index and bundle and push them to quay.  When I deploy the operator via OLM, the controller manager pod gets stuck in ContainerCreating with the "Error: cannot find volume 'apiservice-cert' to mount into container 'manager'" error.  I can manually create the secret before deploying the operator to work around this issue, but then I notice some other peculiarities.  The pod deploys just fine, but no actual ValidatingWebhookConfiguration (which happens to be the type of webhook I am trying to deploy) is created in my cluster.  I look at the CSV for the operator, and I do indeed see the webhook represented there:

# oc get csv osp-director-operator.v0.0.1 -o json | jq -r '.spec.webhookdefinitions'
[
  {
    "admissionReviewVersions": [
      "v1beta1"
    ],
    "deploymentName": "osp-director-operator-controller-manager",
    "failurePolicy": "Fail",
    "generateName": "vbaremetalset.kb.io",
    "rules": [
      {
        "apiGroups": [
          "osp-director.openstack.org"
        ],
        "apiVersions": [
          "v1beta1"
        ],
        "operations": [
          "CREATE",
          "UPDATE"
        ],
        "resources": [
          "baremetalsets"
        ]
      }
    ],
    "sideEffects": "None",
    "type": "ValidatingAdmissionWebhook",
    "webhookPath": "/validate-osp-director-openstack-org-v1beta1-baremetalset"
  }
]

Yet, it is not present:

# oc get validatingwebhookconfiguration | grep vbaremetalset
#

Also, it seems OLM is creating the webhook service for me, but the port spec is wrong:

# oc get svc osp-director-operator-controller-manager-service -o yaml
...
spec:
  clusterIP: 172.30.222.94
  ports:
  - name: "443"
    port: 443
    protocol: TCP
    targetPort: 443   <---- should be 9443
...

Yet the deployment spec in the CSV has the right port value:

# oc get deployments osp-director-operator-controller-manager -o yaml
...
      command:
        - /manager
        image: quay.io/abays/osp-director-operator:0.0.1
        imagePullPolicy: Always
        name: manager
        ports:
        - containerPort: 9443
          name: webhook-server
          protocol: TCP
...

Any pointers or insights in regards to what might be going on would be greatly appreciated.  Thank you!

Comment 4 Nick Hale 2021-01-26 20:27:41 UTC
Hi Andrew,

This bug (referring to the mount issues) is a duplicate of another low-priority bug that hasn't and will not be patched in 4.6.

Check out this comment for some more details: https://bugzilla.redhat.com/show_bug.cgi?id=1920665#c1

The port issue looks unique -- please open fresh BZ so we can focus on it directly.

Comment 5 Nick Hale 2021-01-26 20:28:30 UTC

*** This bug has been marked as a duplicate of bug 1920665 ***


Note You need to log in before you can comment on or make changes to this bug.