Description of problem: When installing security profile operator from console firstly, the ds spod createFailed, and report error: failed calling webhook "binding.spo.io": failed to call webhook: Post "https://webhook-service.openshift-security-profiles.svc:443/mutate-v1-pod-binding?timeout=10s": no endpoints available for service "webhook-service" Version-Release number of selected component (if applicable): 4.11.0-0.nightly-2022-07-05-083948 How reproducible: hit twice Steps to Reproduce: 1. install security profile operator from console firstly, keep default configuration, the console prompt: install succeed. 2. check the operator status using oc command # oc get all -n openshift-security-profiles NAME READY STATUS RESTARTS AGE pod/security-profiles-operator-54965b9c94-88pgk 1/1 Running 0 19m pod/security-profiles-operator-54965b9c94-fkk69 1/1 Running 0 19m pod/security-profiles-operator-54965b9c94-pl8pp 1/1 Running 0 19m pod/security-profiles-operator-webhook-544f4f5978-9mz8d 1/1 Running 0 19m pod/security-profiles-operator-webhook-544f4f5978-pdnsq 1/1 Running 0 19m pod/security-profiles-operator-webhook-544f4f5978-t2crm 1/1 Running 0 19m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/metrics ClusterIP 172.30.43.126 <none> 443/TCP 19m service/webhook-service ClusterIP 172.30.187.216 <none> 443/TCP 19m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/spod 0 0 0 0 0 kubernetes.io/os=linux 19m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/security-profiles-operator 3/3 3 3 19m deployment.apps/security-profiles-operator-webhook 3/3 3 3 19m NAME DESIRED CURRENT READY AGE replicaset.apps/security-profiles-operator-54965b9c94 3 3 3 19m replicaset.apps/security-profiles-operator-webhook-544f4f5978 3 3 3 19m # oc describe ds spod -n openshift-security-profiles Name: spod Selector: app=security-profiles-operator,name=spod Node-Selector: kubernetes.io/os=linux Labels: <none> Annotations: deprecated.daemonset.template.generation: 1 Desired Number of Nodes Scheduled: 0 Current Number of Nodes Scheduled: 0 Number of Nodes Scheduled with Up-to-date Pods: 0 Number of Nodes Scheduled with Available Pods: 0 Number of Nodes Misscheduled: 0 Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed … Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 10m (x10 over 10m) daemonset-controller Error creating: Internal error occurred: failed calling webhook "binding.spo.io": failed to call webhook: Post "https://webhook-service.openshift-security-profiles.svc:443/mutate-v1-pod-binding?timeout=10s": no endpoints available for service "webhook-service" Warning FailedCreate 10m daemonset-controller Error creating: Internal error occurred: failed calling webhook "binding.spo.io": failed to call webhook: Post "https://webhook-service.openshift-security-profiles.svc:443/mutate-v1-pod-binding?timeout=10s": dial tcp 10.131.0.16:9443: connect: connection refused Warning FailedCreate 4m18s (x7 over 10m) daemonset-controller Error creating: Internal error occurred: failed calling webhook "binding.spo.io": failed to call webhook: Post "https://webhook-service.openshift-security-profiles.svc:443/mutate-v1-pod-binding?timeout=10s": context deadline exceeded 3. check catalogsource qe-app-registry status # oc get catalogsource qe-app-registry -o yaml -n openshift-marketplace apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2022-07-06T02:22:31Z" generation: 1 … status: connectionState: address: qe-app-registry.openshift-marketplace.svc:50051 lastConnect: "2022-07-06T07:25:40Z" lastObservedState: READY latestImageRegistryPoll: "2022-07-06T07:54:53Z" message: 'couldn''t ensure registry server - error ensuring updated catalog source pod: : creating update catalog source pod: Internal error occurred: failed calling webhook "binding.spo.io": failed to call webhook: Post "https://webhook-service.openshift-security-profiles.svc:443/mutate-v1-pod-binding?timeout=10s": context deadline exceeded' reason: RegistryServerError registryService: createdAt: "2022-07-06T02:22:31Z" port: "50051" protocol: grpc serviceName: qe-app-registry serviceNamespace: openshift-marketplace 4. uninstall security profile operator from console, it prompt uninstall succeed 5. delete spo default namespace openshift-security-profiles # oc delete ns openshift-security-profiles namespace "openshift-security-profiles" deleted 6. install security profile operator using oc command manually with another namespace [root@qe-preserve-minmlimerrn-1 ~]# oc create -f - <<EOF > apiVersion: v1 > kind: Namespace > metadata: > name: security-profiles-operator > EOF namespace/security-profiles-operator created [root@qe-preserve-minmlimerrn-1 ~]# oc create -f - <<EOF > apiVersion: operators.coreos.com/v1 > kind: OperatorGroup > metadata: > name: security-profiles-operator > namespace: security-profiles-operator > spec: > targetNamespaces: > - security-profiles-operator > EOF operatorgroup.operators.coreos.com/security-profiles-operator created [root@qe-preserve-minmlimerrn-1 ~]# oc create -f - <<EOF > apiVersion: operators.coreos.com/v1alpha1 > kind: Subscription > metadata: > name: security-profiles-operator-sub > namespace: security-profiles-operator > spec: > channel: release-0.4 > name: security-profiles-operator > source: qe-app-registry > sourceNamespace: openshift-marketplace > EOF subscription.operators.coreos.com/security-profiles-operator-sub created [root@qe-preserve-minmlimerrn-1 ~]# oc get csv -n security-profiles-operator NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.v5.5.0 OpenShift Elasticsearch Operator 5.5.0 Succeeded security-profiles-operator.v0.4.3-dev Security Profiles Operator 0.4.3-dev Succeeded [root@qe-preserve-minmlimerrn-1 ~]# oc get ip -n security-profiles-operator NAME CSV APPROVAL APPROVED install-4w4rq security-profiles-operator.v0.4.3-dev Automatic true 7. check the security profile operator status # oc get all -n security-profiles-operator NAME READY STATUS RESTARTS AGE pod/security-profiles-operator-575df5bbd4-82z7l 1/1 Running 0 5m1s pod/security-profiles-operator-575df5bbd4-mgncc 1/1 Running 0 5m1s pod/security-profiles-operator-575df5bbd4-tbll9 1/1 Running 0 5m1s pod/security-profiles-operator-webhook-544f4f5978-gr8wc 1/1 Running 0 4m54s pod/security-profiles-operator-webhook-544f4f5978-l6fh9 1/1 Running 0 4m54s pod/security-profiles-operator-webhook-544f4f5978-z7989 1/1 Running 0 4m54s pod/spod-27vps 3/3 Running 0 4m43s pod/spod-7wgtm 3/3 Running 0 4m43s pod/spod-85wxg 3/3 Running 0 4m43s pod/spod-9n92k 3/3 Running 0 4m44s pod/spod-jn66z 3/3 Running 0 4m44s pod/spod-l6c59 3/3 Running 0 4m43s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/metrics ClusterIP 172.30.98.125 <none> 443/TCP 4m54s service/webhook-service ClusterIP 172.30.169.184 <none> 443/TCP 4m54s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/spod 6 6 6 6 6 kubernetes.io/os=linux 4m54s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/security-profiles-operator 3/3 3 3 5m1s deployment.apps/security-profiles-operator-webhook 3/3 3 3 4m54s NAME DESIRED CURRENT READY AGE replicaset.apps/security-profiles-operator-575df5bbd4 3 3 3 5m1s replicaset.apps/security-profiles-operator-webhook-544f4f5978 3 3 3 4m54s # oc describe ds spod -n security-profiles-operator Name: spod Selector: app=security-profiles-operator,name=spod Node-Selector: kubernetes.io/os=linux Labels: <none> Annotations: deprecated.daemonset.template.generation: 1 Desired Number of Nodes Scheduled: 6 Current Number of Nodes Scheduled: 6 Number of Nodes Scheduled with Up-to-date Pods: 6 Number of Nodes Scheduled with Available Pods: 6 Number of Nodes Misscheduled: 0 Pods Status: 6 Running / 0 Waiting / 0 Succeeded / 0 Failed … Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 9m59s (x10 over 10m) daemonset-controller Error creating: Internal error occurred: failed calling webhook "binding.spo.io": failed to call webhook: Post "https://webhook-service.security-profiles-operator.svc:443/mutate-v1-pod-binding?timeout=10s": no endpoints available for service "webhook-service" Warning FailedCreate 9m57s daemonset-controller Error creating: Internal error occurred: failed calling webhook "binding.spo.io": failed to call webhook: Post "https://webhook-service.security-profiles-operator.svc:443/mutate-v1-pod-binding?timeout=10s": dial tcp 10.129.2.19:9443: connect: connection refused Normal SuccessfulCreate 9m52s daemonset-controller Created pod: spod-9n92k Normal SuccessfulCreate 9m51s daemonset-controller Created pod: spod-jn66z Normal SuccessfulCreate 9m51s daemonset-controller Created pod: spod-l6c59 Normal SuccessfulCreate 9m51s daemonset-controller Created pod: spod-7wgtm Normal SuccessfulCreate 9m51s daemonset-controller Created pod: spod-27vps Normal SuccessfulCreate 9m51s daemonset-controller Created pod: spod-85wxg 8. check the catalogsource qe-app-registry status again # oc get catalogsource qe-app-registry -o yaml -n openshift-marketplace apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2022-07-06T02:22:31Z" generation: 1 … status: connectionState: address: qe-app-registry.openshift-marketplace.svc:50051 lastConnect: "2022-07-06T08:46:04Z" lastObservedState: READY latestImageRegistryPoll: "2022-07-06T08:45:34Z" registryService: createdAt: "2022-07-06T02:22:31Z" port: "50051" protocol: grpc serviceName: qe-app-registry serviceNamespace: openshift-marketplace Actual results: 2. the ds spod create failed and this status last at least 40 minutes the ds spod report error: Warning FailedCreate 10m (x10 over 10m) daemonset-controller Error creating: Internal error occurred: failed calling webhook "binding.spo.io": failed to call webhook: Post "https://webhook-service.openshift-security-profiles.svc:443/mutate-v1-pod-binding?timeout=10s": no endpoints available for service "webhook-service" 3. the catalogsource qe-app-registry report error: message: 'couldn''t ensure registry server - error ensuring updated catalog source pod: : creating update catalog source pod: Internal error occurred: failed calling webhook "binding.spo.io": failed to call webhook: Post "https://webhook-service.openshift-security-profiles.svc:443/mutate-v1-pod-binding?timeout=10s": context deadline exceeded' reason: RegistryServerError 6. the spo install succeed 7. the spo status is totally correct, the ds spod create normally though appear similar error during the start time 8. the catalogsource qe-app-registry recover normally. Expected results: 2. the ds spod should create successfully 3. the catalogsource qe-app-registry should not prompt error Additional info: from step 6~8, we can see the spo can install successfully manually,though appear similar error at the beginning.
# oc logs -f security-profiles-operator-webhook-544f4f5978-9mz8d -n openshift-security-profiles I0706 07:56:44.108999 1 logr.go:261] "msg"="Set logging verbosity to 0" I0706 07:56:44.109038 1 logr.go:261] "msg"="Profiling support enabled: false" I0706 07:56:44.109082 1 logr.go:261] setup "msg"="starting component: security-profiles-operator-webhook" "buildDate"="2022-06-08T01:51:09Z" "buildTags"="netgo,osusergo,seccomp,no_bpf" "cgoldFlags"="-lseccomp" "compiler"="gc" "dependencies"="github.com/PuerkitoBio/purell v1.1.1 h1:WEQqlqaGbrPkxLJWfBwQmfEAE1Z7ONdDLqrN38tNFfI=,github.com/PuerkitoBio/urlesc v0.0.0-20170810143723-de5bf2ad4578 h1:d+Bc7a5rLufV/sSk/8dngufqelfh6jnri85riMAaF/M=,github.com/ReneKroon/ttlcache/v2 v2.11.0 h1:OvlcYFYi941SBN3v9dsDcC2N8vRxyHcCmJb3Vl4QMoM=,github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=,github.com/blang/semver v3.5.1+incompatible h1:cQNTCjp13qL8KC3Nbxr/y2Bqb63oX6wdnnjpJbkM4JQ=,github.com/cert-manager/cert-manager v1.8.0 h1:A5FH4FUYGE/4lFYO6QzAWRxvSZfKlb9DZukv6lBPEiw=,github.com/cespare/xxhash/v2 v2.1.2 h1:YRXhKfTDauu4ajMg1TPgFO5jnlC2HCbmLXMcTG5cbYE=,github.com/containers/common v0.48.1-0.20220510094751-400832f41771 h1:rHd882jzJK1fIXCJWvc1zTX5CIv2aOyzzkqj6mezLLw=,github.com/cpuguy83/go-md2man/v2 v2.0.1 h1:r/myEWzV9lfsM1tFLgDyu0atFtJ1fXn261LKYj/3DxU=,github.com/crossplane/crossplane-runtime v0.16.0 h1:NstJdHeK3C+u3By0vQjOG1Y6+v53JYOy00IgCL9GHAw=,github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=,github.com/emicklei/go-restful v2.9.5+incompatible h1:spTtZBk5DYEvbxMVutUuTyh1Ao2r4iyvLdACqsl/Ljk=,github.com/evanphx/json-patch v4.12.0+incompatible h1:4onqiflcdA9EOZ4RxV643DvftH5pOlLGNtQ5lPWQu84=,github.com/fsnotify/fsnotify v1.5.1 h1:mZcQUHVQUQWoPXXtuf9yuEXKudkV2sx1E06UadKWpgI=,github.com/go-logr/logr v1.2.3 h1:2DntVwHkVopvECVRSlL5PSo9eG+cAkDCuckLubN+rq0=,github.com/go-openapi/jsonpointer v0.19.5 h1:gZr+CIYByUqjcgeLXnQu2gHYQC9o73G2XUeOFYEICuY=,github.com/go-openapi/jsonreference v0.19.5 h1:1WJP/wi4OjB4iV8KVbH73rQaoialJrqv8gitZLxGLtM=,github.com/go-openapi/swag v0.19.14 h1:gm3vOOXfiuw5i9p5N9xJvfjvuofpyvLA9Wr6QfK5Fng=,github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=,github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da h1:oI5xCqsCo564l8iNU+DwB5epxmsaqB+rhGL0m5jtYqE=,github.com/golang/protobuf v1.5.2 h1:ROPKBNFfQgOUMifHyP+KYbvpjbdoFNs+aK7DXlji0Tw=,github.com/google/gnostic v0.5.7-v3refs h1:FhTMOKj2VhjpouxvWJAV1TL304uMlb9zcDqkl6cEI54=,github.com/google/go-cmp v0.5.6 h1:BKbKCqvP6I+rmFHt06ZmyQtvB8xAkWdhFyr0ZUNZcxQ=,github.com/google/gofuzz v1.2.0 h1:xRy4A+RhZaiKjJ1bPfwQ8sedCA+YS2YcCHW6ec7JMi0=,github.com/google/uuid v1.3.0 h1:t6JiXgmwXMjEs8VusXIJk2BXHsn+wx8BZdTaoZ5fu7I=,github.com/imdario/mergo v0.3.12 h1:b6R2BslTbIEToALKP7LxUvijTsNI9TAe80pLWN2g/HU=,github.com/josharian/intern v1.0.0 h1:vlS4z54oSdjm0bgjRigI+G1HpF+tI+9rE5LLzOg8HmY=,github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM=,github.com/mailru/easyjson v0.7.6 h1:8yTIVnZgCoiM1TgqoeTl+LfU5Jg6/xL3QhGQnimLYnA=,github.com/matttproud/golang_protobuf_extensions v1.0.2-0.20181231171920-c182affec369 h1:I0XW9+e1XWDxdcEniV4rQAIOPUGDq67JSCiRCgGCZLI=,github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=,github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M=,github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=,github.com/nxadm/tail v1.4.8 h1:nPr65rt6Y5JFSKQO7qToXr7pePgD6Gwiw05lkbyAQTE=,github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417 h1:3snG66yBm59tKhhSPQrQ/0bCrv1LQbKt40LnUPiUxdc=,github.com/openshift/api v0.0.0-20220209124712-b632c5fc10c0 h1:Jy6cKRjMOC4c2EzTfYWofQnwtt3eGVsti88KM0qZhfQ=,github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=,github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.57.0 h1:dslXhV7NbAFID2fh0ZLMjodbMYuitiJzDEpYNOoyRrg=,github.com/prometheus/client_golang v1.12.2 h1:51L9cDoUHVrXx4zWYlcLQIZ+d+VXHgqnYKkIuq4g/34=,github.com/prometheus/client_model v0.2.0 h1:uq5h0d+GuxiXLJLNABMgp2qUWDPiLvgCzz2dUR+/W/M=,github.com/prometheus/common v0.32.1 h1:hWIdL3N2HoUx3B8j3YN9mWor0qhY/NlEKZEaXxuIRh4=,github.com/prometheus/procfs v0.7.3 h1:4jVXhlkAyzOScmCkXBTOLRLTz8EeU+eyjrwB/EPq0VU=,github.com/russross/blackfriday/v2 v2.1.0 h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk=,github.com/seccomp/libseccomp-golang v0.9.2-0.20210429002308-3879420cc921 h1:58EBmR2dMNL2n/FnbQewK3D14nXr0V9CObDSvMJLq+Y=,github.com/sirupsen/logrus v1.8.1 h1:dJKuHgqk1NNQlqoA6BTlM1Wf9DOH3NBjQyu0h9+AZZE=,github.com/spf13/afero v1.8.0 h1:5MmtuhAgYeU6qpa7w7bP0dv6MBYuup0vekhSpSkoq60=,github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=,github.com/urfave/cli/v2 v2.8.1 h1:CGuYNZF9IKZY/rfBe3lJpccSoIY1ytfvmgQT90cNOl4=,github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 h1:bAn7/zixMGCfxrRTfdpNzjtPYqr8smhKouy9mxVdGPU=,golang.org/x/net v0.0.0-20220225172249-27dd8689420f h1:oA4XRj0qtSt8Yo1Zms0CUlsT3KG69V2UGQWPBxujDmc=,golang.org/x/oauth2 v0.0.0-20211104180415-d3ed0bb246c8 h1:RerP+noqYHUQ8CMRcPlC2nvTa4dcBIjegkuWdcUDuqg=,golang.org/x/sync v0.0.0-20210220032951-036812b2e83c h1:5KslGYwFpkhGh+Q16bwMP3cOontH8FOep7tGV86Y7SQ=,golang.org/x/sys v0.0.0-20220422013727-9388b58f7150 h1:xHms4gcpe1YE7A3yIllJXP16CMAGuqwO2lX1mTyyRRc=,golang.org/x/term v0.0.0-20210927222741-03fcf44c2211 h1:JGgROgKl9N8DuW20oFS5gxc+lE67/N3FcwmBPMe7ArY=,golang.org/x/text v0.3.7 h1:olpwvP2KacW1ZWvsR7uQhoyTYvKAupfQrRGBFM352Gk=,golang.org/x/time v0.0.0-20220210224613-90d013bbcef8 h1:vVKdlvoWBphwdxWKrFZEuM0kGgGLxUOYcY4U/2Vjg44=,gomodules.xyz/jsonpatch/v2 v2.2.0 h1:4pT439QV83L+G9FkcCriY6EkpcK6r6bK+A5FBUMI7qY=,google.golang.org/genproto v0.0.0-20220304144024-325a89244dc8 h1:U9V52f6rAgINH7kT+musA1qF8kWyVOxzF8eYuOVuFwQ=,google.golang.org/grpc v1.47.0 h1:9n77onPX5F3qfFCqjy9dhn8PbNQsIKeVU04J9G7umt8=,google.golang.org/protobuf v1.28.0 h1:w43yiav+6bVFTBQFZX0r7ipe9JQ1QsbMgHwbBziscLw=,gopkg.in/inf.v0 v0.9.1 h1:73M5CoZyi3ZLMOyDlQh031Cx6N9NDJ2Vvfl76EDAgDc=,gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 h1:uRGJdciOHaEIrze2W8Q3AKkepLTh2hOroT7a+7czfdQ=,gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=,gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=,k8s.io/api v0.24.1 h1:BjCMRDcyEYz03joa3K1+rbshwh1Ay6oB53+iUx2H8UY=,k8s.io/apiextensions-apiserver v0.24.0 h1:JfgFqbA8gKJ/uDT++feAqk9jBIwNnL9YGdQvaI9DLtY=,k8s.io/apimachinery v0.24.1 h1:ShD4aDxTQKN5zNf8K1RQ2u98ELLdIW7jEnlO9uAMX/I=,k8s.io/client-go v0.24.1 h1:w1hNdI9PFrzu3OlovVeTnf4oHDt+FJLd9Ndluvnb42E=,k8s.io/component-base v0.24.0 h1:h5jieHZQoHrY/lHG+HyrSbJeyfuitheBvqvKwKHVC0g=,k8s.io/klog/v2 v2.60.1 h1:VW25q3bZx9uE3vvdL6M8ezOX79vA2Aq1nEWLqNQclHc=,k8s.io/kube-openapi v0.0.0-20220328201542-3ee0da9b0b42 h1:Gii5eqf+GmIEwGNKQYQClCayuJCe2/4fZUvF7VG99sU=,k8s.io/utils v0.0.0-20220210201930-3a6ce19ff2f9 h1:HNSDgDCrr/6Ly3WEGKZftiE7IY19Vz2GdbOCyI4qqhc=,sigs.k8s.io/controller-runtime v0.12.1 h1:4BJY01xe9zKQti8oRjj/NeHKRXthf1YkYJAgLONFFoI=,sigs.k8s.io/gateway-api v0.4.1 h1:Tof9/PNSZXyfDuTTe1XFvaTlvBRE6bKq1kmV6jj6rQE=,sigs.k8s.io/json v0.0.0-20211208200746-9f7c6b3444d2 h1:kDi4JBNAsJWfz1aEXhO8Jg87JJaPNLh5tIzYHgStQ9Y=,sigs.k8s.io/release-utils v0.6.0 h1:wJDuzWJqPH4a5FAxAXE2aBvbB6UMIW7iYMhsKnIMQkA=,sigs.k8s.io/structured-merge-diff/v4 v4.2.1 h1:bKCqE9GvQ5tiVHn5rfn1r+yao3aLQEaLzkkmAkf+A6Y=,sigs.k8s.io/yaml v1.3.0 h1:a2VclLzOGrwOHDiV8EfBGhvjHvP46CtW5j6POvhYGGo=" "gitCommit"="unknown" "gitCommitDate"="unknown" "gitTreeState"="clean" "goVersion"="1.18+nofips.git.4aa1efed4853ea067d665a952eee77c52faac774" "ldFlags"="-s -w -linkmode external -extldflags \"-static\" -X sigs.k8s.io/security-profiles-operator/internal/pkg/version.buildDate=2022-06-08T01:51:09Z -X sigs.k8s.io/security-profiles-operator/internal/pkg/version.version=0.4.3-dev" "libbpf"="none" "libseccomp"="2.5.2" "platform"="linux/amd64" "version"="0.4.3-dev" I0706 07:56:46.180362 1 request.go:601] Waited for 1.047914024s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/config.openshift.io/v1?timeout=32s I0706 07:56:47.832679 1 logr.go:261] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080" I0706 07:56:47.832941 1 logr.go:261] setup "msg"="registering webhooks" I0706 07:56:47.833026 1 server.go:145] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-v1-pod-binding" I0706 07:56:47.833094 1 server.go:145] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-v1-pod-recording" I0706 07:56:47.833186 1 logr.go:261] setup "msg"="starting webhook" I0706 07:56:47.833222 1 server.go:213] controller-runtime/webhook/webhooks "msg"="Starting webhook server" I0706 07:56:47.833335 1 internal.go:362] "msg"="Starting server" "addr"={"IP":"::","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics" I0706 07:56:47.833395 1 logr.go:261] controller-runtime/certwatcher "msg"="Updated current TLS certificate" I0706 07:56:47.833432 1 leaderelection.go:248] attempting to acquire leader lease openshift-security-profiles/security-profiles-operator-webhook-lock... I0706 07:56:47.833483 1 logr.go:261] controller-runtime/webhook "msg"="Serving webhook server" "host"="" "port"=9443 I0706 07:56:47.833522 1 logr.go:261] controller-runtime/certwatcher "msg"="Starting certificate watcher" 2022/07/06 07:56:49 http: TLS handshake error from 10.128.0.5:33804: remote error: tls: bad certificate W0706 07:56:51.412496 1 reflector.go:324] k8s.io/client-go.1/tools/cache/reflector.go:167: failed to list *v1alpha1.ProfileBinding: profilebindings.security-profiles-operator.x-k8s.io is forbidden: User "system:serviceaccount:openshift-security-profiles:spo-webhook" cannot list resource "profilebindings" in API group "security-profiles-operator.x-k8s.io" at the cluster scope E0706 07:56:51.412519 1 reflector.go:138] k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1alpha1.ProfileBinding: failed to list *v1alpha1.ProfileBinding: profilebindings.security-profiles-operator.x-k8s.io is forbidden: User "system:serviceaccount:openshift-security-profiles:spo-webhook" cannot list resource "profilebindings" in API group "security-profiles-operator.x-k8s.io" at the cluster scope W0706 07:56:52.797294 1 reflector.go:324] k8s.io/client-go.1/tools/cache/reflector.go:167: failed to list *v1alpha1.ProfileBinding: profilebindings.security-profiles-operator.x-k8s.io is forbidden: User "system:serviceaccount:openshift-security-profiles:spo-webhook" cannot list resource "profilebindings" in API group "security-profiles-operator.x-k8s.io" at the cluster scope E0706 07:56:52.797315 1 reflector.go:138] k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1alpha1.ProfileBinding: failed to list *v1alpha1.ProfileBinding: profilebindings.security-profiles-operator.x-k8s.io is forbidden: User "system:serviceaccount:openshift-security-profiles:spo-webhook" cannot list resource "profilebindings" in API group "security-profiles-operator.x-k8s.io" at the cluster scope W0706 07:56:55.525644 1 reflector.go:324] k8s.io/client-go.1/tools/cache/reflector.go:167: failed to list *v1alpha1.ProfileBinding: profilebindings.security-profiles-operator.x-k8s.io is forbidden: User "system:serviceaccount:openshift-security-profiles:spo-webhook" cannot list resource "profilebindings" in API group "security-profiles-operator.x-k8s.io" at the cluster scope E0706 07:56:55.525669 1 reflector.go:138] k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1alpha1.ProfileBinding: failed to list *v1alpha1.ProfileBinding: profilebindings.security-profiles-operator.x-k8s.io is forbidden: User "system:serviceaccount:openshift-security-profiles:spo-webhook" cannot list resource "profilebindings" in API group "security-profiles-operator.x-k8s.io" at the cluster scope W0706 07:56:59.216581 1 reflector.go:324] k8s.io/client-go.1/tools/cache/reflector.go:167: failed to list *v1alpha1.ProfileBinding: profilebindings.security-profiles-operator.x-k8s.io is forbidden: User "system:serviceaccount:openshift-security-profiles:spo-webhook" cannot list resource "profilebindings" in API group "security-profiles-operator.x-k8s.io" at the cluster scope E0706 07:56:59.216606 1 reflector.go:138] k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1alpha1.ProfileBinding: failed to list *v1alpha1.ProfileBinding: profilebindings.security-profiles-operator.x-k8s.io is forbidden: User "system:serviceaccount:openshift-security-profiles:spo-webhook" cannot list resource "profilebindings" in API group "security-profiles-operator.x-k8s.io" at the cluster scope E0706 07:57:01.405000 1 binding.go:111] binding "msg"="could not list profile bindings" "error"="list profile bindings: Timeout: failed waiting for *v1alpha1.ProfileBinding Informer to sync" E0706 07:57:03.799522 1 binding.go:111] binding "msg"="could not list profile bindings" "error"="list profile bindings: Timeout: failed waiting for *v1alpha1.ProfileBinding Informer to sync" E0706 07:57:03.799568 1 binding.go:111] binding "msg"="could not list profile bindings" "error"="list profile bindings: Timeout: failed waiting for *v1alpha1.ProfileBinding Informer to sync"
It looks like for some reason, the RBAC rules were not deployed correctly. What's confusing me is that according to the InstallPlan in the must-gather you attached, they were deployed fine. I wonder if there is a race between when the webhook starts and when the RBAC resources are created. Although the webhook deployments and MutatingWebhookConfigs are both created by the operator itself, which should really be only started after all the resources in the manifests are created. I'll try to reproduce locally -- how often are you able to reproduce yourself? My smoke test that I ran just now didn't show anything out of ordinary. But the biggest point this bug raises is that I think we shouldn't enable the webhooks at all by default. The webhooks should only be selectively enabled for namespaces, at least initially. I'll bring this up upstream.
Oh, I think this is a fallback from moving the default installation namespace from "security-profiles-operator" to "openshift-security-profiles" which I apparently didn't test correctly. - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: "2022-07-29T06:59:54Z" labels: app: security-profiles-operator name: spo-webhook resourceVersion: "127857" uid: 0cd9f059-354c-49ea-a547-f0f1efee7aa3 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: spo-webhook subjects: - kind: ServiceAccount name: spo-webhook namespace: security-profiles-operator <---- should be openshift-security-profiles
I pushed a patch downstream, let's see if it helps with the next build.
Update: this is now fixed downstream and I verified the fix.
Verification pass with 4.13.0-0.nightly-2022-12-20-174734 + security-profiles-operator-bundle-container-0.5.0-62 Verification with the steps in the bug. The bug not reproduced. Note: Below steps is not the standard steps to remove the operator. You are recommended to delete all secompprofiles and selinuxprofiles before uninstall the operator. 1. install security profile operator from console firstly, keep default configuration, the console prompt: install succeed. 2. check the operator status using oc command # $ oc get all -n openshift-security-profiles NAME READY STATUS RESTARTS AGE pod/security-profiles-operator-759986f6df-7rd5f 1/1 Running 0 2m58s pod/security-profiles-operator-759986f6df-gmmzr 1/1 Running 0 2m58s pod/security-profiles-operator-759986f6df-wmqr5 1/1 Running 0 2m58s pod/security-profiles-operator-webhook-d8b74c54d-bb8z2 1/1 Running 0 2m52s pod/security-profiles-operator-webhook-d8b74c54d-bwxsl 1/1 Running 0 2m52s pod/security-profiles-operator-webhook-d8b74c54d-z6h4p 1/1 Running 0 2m52s pod/spod-2lqj2 3/3 Running 0 2m52s pod/spod-4v2pz 3/3 Running 0 2m52s pod/spod-6krlc 3/3 Running 0 2m52s pod/spod-9km5g 3/3 Running 0 2m52s pod/spod-ljcqn 3/3 Running 0 2m52s pod/spod-xzhjt 3/3 Running 0 2m52s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/metrics ClusterIP 172.30.66.232 <none> 443/TCP 2m53s service/webhook-service ClusterIP 172.30.13.149 <none> 443/TCP 2m53s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/spod 6 6 6 6 6 kubernetes.io/os=linux 2m53s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/security-profiles-operator 3/3 3 3 2m59s deployment.apps/security-profiles-operator-webhook 3/3 3 3 2m53s NAME DESIRED CURRENT READY AGE replicaset.apps/security-profiles-operator-759986f6df 3 3 3 2m59s replicaset.apps/security-profiles-operator-webhook-d8b74c54d 3 3 3 2m53s # $ oc describe ds spod -n openshift-security-profiles Name: spod Selector: app=security-profiles-operator,name=spod Node-Selector: kubernetes.io/os=linux Labels: <none> Annotations: deprecated.daemonset.template.generation: 1 Desired Number of Nodes Scheduled: 6 Current Number of Nodes Scheduled: 6 Number of Nodes Scheduled with Up-to-date Pods: 6 Number of Nodes Scheduled with Available Pods: 6 Number of Nodes Misscheduled: 0 Pods Status: 6 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=security-profiles-operator name=spod Annotations: openshift.io/scc: privileged Service Account: spod Init Containers: non-root-enabler: Image: registry.redhat.io/compliance/openshift-security-profiles-rhel8-operator@sha256:2967bba1a86879d277fc626a82513c0226676b46d9aef4efe4352ade17e118da Port: <none> Host Port: <none> Args: non-root-enabler Limits: ephemeral-storage: 50Mi memory: 64Mi Requests: cpu: 100m ephemeral-storage: 10Mi memory: 32Mi Environment: SPO_VERBOSITY: 0 Mounts: /opt/spo-profiles from operator-profiles-volume (ro) /var/lib from host-varlib-volume (rw) /var/run/secrets/metrics from metrics-cert-volume (rw) selinux-shared-policies-copier: Image: registry.redhat.io/compliance/openshift-selinuxd-rhel8@sha256:ee7612f14997e4dc873f8744bef8f03569ad2f904280b06dc5382960f41cf8ec Port: <none> Host Port: <none> Command: bash -c Args: set -x chown 65535:0 /etc/selinux.d chmod 750 /etc/selinux.d semodule -i /usr/share/selinuxd/templates/*.cil semodule -i /opt/spo-profiles/selinuxd.cil semodule -i /opt/spo-profiles/selinuxrecording.cil Limits: ephemeral-storage: 50Mi memory: 1Gi Requests: cpu: 100m ephemeral-storage: 10Mi memory: 32Mi Environment: SPO_VERBOSITY: 0 Mounts: /etc/selinux from host-etcselinux-volume (rw) /etc/selinux.d from selinux-drop-dir (rw) /opt/spo-profiles from operator-profiles-volume (ro) /sys/fs/selinux from host-fsselinux-volume (rw) /var/lib/selinux from host-varlibselinux-volume (rw) Containers: security-profiles-operator: Image: registry.redhat.io/compliance/openshift-security-profiles-rhel8-operator@sha256:2967bba1a86879d277fc626a82513c0226676b46d9aef4efe4352ade17e118da Port: 8085/TCP Host Port: 0/TCP Args: daemon --with-selinux=true Limits: ephemeral-storage: 200Mi memory: 128Mi Requests: cpu: 100m ephemeral-storage: 50Mi memory: 64Mi Liveness: http-get http://:liveness-port/healthz delay=0s timeout=1s period=10s #success=1 #failure=1 Startup: http-get http://:liveness-port/healthz delay=0s timeout=1s period=3s #success=1 #failure=10 Environment: NODE_NAME: (v1:spec.nodeName) OPERATOR_NAMESPACE: (v1:metadata.namespace) SPOD_NAME: spod SPO_VERBOSITY: 0 Mounts: /etc/selinux.d from selinux-drop-dir (rw) /tmp/security-profiles-operator-recordings from profile-recording-output-volume (rw) /var/lib/kubelet/seccomp/operator from host-operator-volume (rw) /var/run/grpc from grpc-server-volume (rw) /var/run/selinuxd from selinuxd-private-volume (rw) selinuxd: Image: registry.redhat.io/compliance/openshift-selinuxd-rhel8@sha256:ee7612f14997e4dc873f8744bef8f03569ad2f904280b06dc5382960f41cf8ec Port: <none> Host Port: <none> Args: daemon --datastore-path /var/run/selinuxd/selinuxd.db --socket-path /var/run/selinuxd/selinuxd.sock --socket-uid 0 --socket-gid 65535 Limits: ephemeral-storage: 400Mi memory: 1Gi Requests: cpu: 100m ephemeral-storage: 200Mi memory: 512Mi Environment: SPO_VERBOSITY: 0 Mounts: /etc/selinux from host-etcselinux-volume (rw) /etc/selinux.d from selinux-drop-dir (ro) /sys/fs/selinux from host-fsselinux-volume (rw) /var/lib/selinux from host-varlibselinux-volume (rw) /var/run/selinuxd from selinuxd-private-volume (rw) metrics: Image: registry.redhat.io/openshift4/ose-kube-rbac-proxy@sha256:ac54cb8ff880a935ea3b4b1efc96d35bbf973342c450400d6417d06e59050027 Port: 9443/TCP Host Port: 0/TCP Args: --secure-listen-address=0.0.0.0:9443 --upstream=http://127.0.0.1:8080 --v=10 --tls-cert-file=/var/run/secrets/metrics/tls.crt --tls-private-key-file=/var/run/secrets/metrics/tls.key Limits: ephemeral-storage: 20Mi memory: 128Mi Requests: cpu: 50m ephemeral-storage: 10Mi memory: 32Mi Environment: <none> Mounts: /var/run/secrets/metrics from metrics-cert-volume (ro) Volumes: host-varlib-volume: Type: HostPath (bare host directory volume) Path: /var/lib HostPathType: Directory host-operator-volume: Type: HostPath (bare host directory volume) Path: /var/lib/security-profiles-operator HostPathType: DirectoryOrCreate operator-profiles-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: security-profiles-operator-profile Optional: false selinux-drop-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> selinuxd-private-volume: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> host-fsselinux-volume: Type: HostPath (bare host directory volume) Path: /sys/fs/selinux HostPathType: Directory host-etcselinux-volume: Type: HostPath (bare host directory volume) Path: /etc/selinux HostPathType: Directory host-varlibselinux-volume: Type: HostPath (bare host directory volume) Path: /var/lib/selinux HostPathType: Directory profile-recording-output-volume: Type: HostPath (bare host directory volume) Path: /tmp/security-profiles-operator-recordings HostPathType: DirectoryOrCreate host-auditlog-volume: Type: HostPath (bare host directory volume) Path: /var/log/audit HostPathType: DirectoryOrCreate host-syslog-volume: Type: HostPath (bare host directory volume) Path: /var/log HostPathType: DirectoryOrCreate metrics-cert-volume: Type: Secret (a volume populated by a Secret) SecretName: metrics-server-cert Optional: false sys-kernel-debug-volume: Type: HostPath (bare host directory volume) Path: /sys/kernel/debug HostPathType: Directory host-etc-osrelease-volume: Type: HostPath (bare host directory volume) Path: /etc/os-release HostPathType: File tmp-volume: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> grpc-server-volume: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 3m31s daemonset-controller Created pod: spod-ljcqn Normal SuccessfulCreate 3m31s daemonset-controller Created pod: spod-9km5g Normal SuccessfulCreate 3m31s daemonset-controller Created pod: spod-6krlc Normal SuccessfulCreate 3m31s daemonset-controller Created pod: spod-xzhjt Normal SuccessfulCreate 3m31s daemonset-controller Created pod: spod-4v2pz Normal SuccessfulCreate 3m31s daemonset-controller Created pod: spod-2lqj2 3. check catalogsource qe-app-registry status # $ oc get catalogsource qe-app-registry -o yaml -n openshift-marketplace apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2022-12-21T02:15:02Z" generation: 1 name: qe-app-registry namespace: openshift-marketplace resourceVersion: "200690" uid: bb49df4b-c8e0-4b3f-b844-2c21f8cf5681 spec: displayName: Production Operators image: quay.io/openshift-qe-optional-operators/aosqe-index:v4.13 publisher: OpenShift QE sourceType: grpc updateStrategy: registryPoll: interval: 15m status: connectionState: address: qe-app-registry.openshift-marketplace.svc:50051 lastConnect: "2022-12-21T08:34:23Z" lastObservedState: READY latestImageRegistryPoll: "2022-12-21T08:33:53Z" registryService: createdAt: "2022-12-21T02:15:02Z" port: "50051" protocol: grpc serviceName: qe-app-registry serviceNamespace: openshift-marketplace 4. uninstall security profile operator from console, it prompt uninstall succeed 5. delete spo default namespace openshift-security-profiles # oc delete ns openshift-security-profiles namespace "openshift-security-profiles" deleted 6. install security profile operator using oc command manually with another namespace # oc apply -f -<<EOF apiVersion: v1 kind: Namespace metadata: name: security-profiles-operator labels: pod-security.kubernetes.io/enforce: privileged openshift.io/cluster-monitoring: "true" --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: security-profiles-operator namespace: security-profiles-operator spec: targetNamespaces: - security-profiles-operator --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: security-profiles-operator-sub namespace: security-profiles-operator spec: channel: release-alpha-rhel-8 name: security-profiles-operator source: qe-app-registry sourceNamespace: openshift-marketplace EOF namespace/security-profiles-operator created operatorgroup.operators.coreos.com/security-profiles-operator created subscription.operators.coreos.com/security-profiles-operator-sub created $ oc project security-profiles-operator Now using project "security-profiles-operator" on server "https://api.xiyuan21-1.qe.devcluster.openshift.com:6443". $ oc get ip NAME CSV APPROVAL APPROVED install-nlwcc security-profiles-operator.v0.5.0 Automatic true $ oc get csv NAME DISPLAY VERSION REPLACES PHASE security-profiles-operator.v0.5.0 Security Profiles Operator 0.5.0 Succeeded $ oc get all -n security-profiles-operator NAME READY STATUS RESTARTS AGE pod/security-profiles-operator-6976ccc7b8-6z7ql 1/1 Running 0 91s pod/security-profiles-operator-6976ccc7b8-g6dvz 1/1 Running 0 91s pod/security-profiles-operator-6976ccc7b8-kxm45 1/1 Running 0 91s pod/security-profiles-operator-webhook-d8b74c54d-bhxhl 1/1 Running 0 86s pod/security-profiles-operator-webhook-d8b74c54d-hwtn7 1/1 Running 0 86s pod/security-profiles-operator-webhook-d8b74c54d-jnrpt 1/1 Running 0 86s pod/spod-29w89 3/3 Running 0 86s pod/spod-75b9r 3/3 Running 0 86s pod/spod-7cfgn 3/3 Running 0 86s pod/spod-d4bzw 3/3 Running 0 86s pod/spod-mcvvj 3/3 Running 0 86s pod/spod-zcb74 3/3 Running 0 86s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/metrics ClusterIP 172.30.198.95 <none> 443/TCP 86s service/webhook-service ClusterIP 172.30.51.136 <none> 443/TCP 86s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/spod 6 6 6 6 6 kubernetes.io/os=linux 86s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/security-profiles-operator 3/3 3 3 93s deployment.apps/security-profiles-operator-webhook 3/3 3 3 87s NAME DESIRED CURRENT READY AGE replicaset.apps/security-profiles-operator-6976ccc7b8 3 3 3 92s replicaset.apps/security-profiles-operator-webhook-d8b74c54d 3 3 3 87s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Security Profiles Operator release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8762