I'm working on this, will have an update shortly.
This issue is occurring because the helm-operator-1.3.0 base image does not have liveness and readiness probe enabled. You may have to test with the released 1.3.0 tag instead of building from master. Please test with: * binary extracted from the downstream image: registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-operator-sdk:v4.7.0-202101071845.p0 (or wherever you typically get test images) * the v1.3.0 released binary: https://github.com/operator-framework/operator-sdk/releases/tag/v1.3.0
@Jia Fan, I tried running the operator with these 2 base images: 1. quay.io/operator-framework/helm-operator:v1.3.0 and 2. registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-helm-operator:v4.7.0-202101081525.p0 I'm not able to reproduce this error. After creating the helm-operator project, I ran: 1. make install 2. make deploy 3. Applied the cr Output: ``` {"level":"info","ts":1610135963.618943,"logger":"cmd","msg":"Version","Go Version":"go1.15.5","GOOS":"linux","GOARCH":"amd64","helm-operator":"v1.3.0","commit":"1abf57985b43bf6a59dcd18147b3c574fa57d3f6"} {"level":"info","ts":1610135963.6198597,"logger":"cmd","msg":"WATCH_NAMESPACE environment variable not set. Watching all namespaces.","Namespace":""} I0108 19:59:24.670513 1 request.go:645] Throttling request took 1.034612949s, request: GET:https://172.30.0.1:443/apis/ingress.operator.openshift.io/v1?timeout=32s {"level":"info","ts":1610135966.2819138,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"127.0.0.1:8080"} {"level":"info","ts":1610135966.2827315,"logger":"helm.controller","msg":"Watching resource","apiVersion":"demo.my.domain/v1","kind":"Nginx","namespace":"","reconcilePeriod":"1m0s"} I0108 19:59:26.283017 1 leaderelection.go:243] attempting to acquire leader lease nginx-operator-system/nginx-operator... {"level":"info","ts":1610135966.2835867,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"} I0108 19:59:43.699785 1 leaderelection.go:253] successfully acquired lease nginx-operator-system/nginx-operator {"level":"info","ts":1610135983.700022,"logger":"controller-runtime.manager.controller.nginx-controller","msg":"Starting EventSource","source":"kind source: demo.my.domain/v1, Kind=Nginx"} {"level":"info","ts":1610135983.8005197,"logger":"controller-runtime.manager.controller.nginx-controller","msg":"Starting Controller"} {"level":"info","ts":1610135983.8005683,"logger":"controller-runtime.manager.controller.nginx-controller","msg":"Starting workers","worker count":4} I0108 19:59:44.890134 1 request.go:645] Throttling request took 1.044493238s, request: GET:https://172.30.0.1:443/apis/image.openshift.io/v1?timeout=32s ``` I also modified the CR and applied it again to check if manager is controlling the pods. Modified the replica count to 1 from 2: ``` ➜ nginx-operator oc get pods -n default NAME READY STATUS RESTARTS AGE nginx-sample-646f977b4f-6l67r 0/1 Terminating 0 63s nginx-sample-646f977b4f-wzht4 1/1 Running 0 63s ``` Since the logs say that `"failed to install release: rendered manifests contain a resource that already exists.`, Maybe the role.yaml was not applied correctly. Can you try removing all the previous operator deployments and checking if the error persists.
Can you also attach the project which is giving the brace error. It would be helpful to check if we are missing something in the scaffold.
Varsha , I test this with opeator-sdk 3 versions: version 1. operator-sdk version: "v1.3.0-8-g10fd87b", commit: "10fd87b3fd53198070a8f7c76acf06cc9d4cf84b", kubernetes version: "v1.19.4", go version: "go1.14", GOOS: "linux", GOARCH: "amd64" (the upstream master latest version) version 2. operator-sdk version: "v4.7.0-202101071845.p0-dirty", commit: "4f9bb0e23307fe974ed4e8784bc3b527e8ebdac3", kubernetes version: "v1.18.8", go version: "go1.15.5", GOOS: "linux", GOARCH: "amd64" (copy from downstream release image : registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-operator-sdk:v4.7.0-202101071845.p0) version 3. operator-sdk version: "v1.3.0-5-g50fcdb8", commit: "50fcdb8dd189ffe00d34a077adfcbee7309d5fec", kubernetes version: "v1.19.4", go version: "go1.14", GOOS: "linux", GOARCH: "amd64" (downstream https://github.com/openshift/ocp-release-operator-sdk/pull/87) version 2, version 3 generate the default rbac/role.yaml without "resources: serviceaccount" so the policy creating the CR will be forbidden. version 1 generate the default rbac/role.yaml with "resources: serviceaccount" can run well and Reconcile the CR success since the CR need create one serviceaccount. 1) The rbac/role.yaml by version 1 operator (v1.3.0-8-g10fd87b) apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: manager-role rules: ## ## Base operator rules ## # We need to get namespaces so the operator can read namespaces to ensure they exist - apiGroups: ........ ## ## Rules for apps.my.domain/v1beta1, Kind: Nginx ## - apiGroups: - apps.my.domain resources: - nginxes - nginxes/status - nginxes/finalizers verbs: - create - delete - get - list - patch - update - watch - verbs: - "*" apiGroups: - "" resources: - "serviceaccounts" - "services" - verbs: - "*" apiGroups: - "apps" resources: - "deployments" # +kubebuilder:scaffold:rules 2) The rbac/role.yaml by version 2/ version 3 operator apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: manager-role rules: ## ## Base operator rules ## # We need to get namespaces so the operator can read namespaces to ensure they exist - apiGroups: ...... ## ## Rules for demo.my.domain/v1, Kind: Nginx ## - apiGroups: - demo.my.domain resources: - nginxes - nginxes/status - nginxes/finalizers verbs: - create - delete - get - list - patch - update - watch - apiGroups: - "" resources: - pods - services - services/finalizers - endpoints - persistentvolumeclaims - events - configmaps - secrets verbs: - create - delete - get - list - patch - update - watch - apiGroups: - apps resources: - deployments - daemonsets - replicasets - statefulsets verbs: - create - delete - get - list - patch - update - watch # +kubebuilder:scaffold:rules
@Jia Fan, yes, the error does occur if the right permission for service account is not scaffolded. I locally checked out and tested the PR (https://github.com/openshift/ocp-release-operator-sdk/pull/87), the service account seems to have been scaffolded when run on OCP 4.7. `rbac/role.yaml` looks like this: ``` ## ## Rules for demo.my.domain/v1, Kind: Nginx ## - apiGroups: - demo.my.domain resources: - nginxes - nginxes/status - nginxes/finalizers verbs: - create - delete - get - list - patch - update - watch - verbs: - "*" apiGroups: - "" resources: # - "serviceaccounts" - "services" - verbs: - "*" apiGroups: - "apps" resources: - "deployments" # +kubebuilder:scaffold:rules ``` To be specific, the commit with the PR which I tested is `e534ae0bf45356506fc823f22667b36e572ee90e`. Was the version 2 commit (50fcdb8dd189ffe00d34a077adfcbee7309d5fec) from downstream SDK (ocp-release-operator-sdk) because I am not able to find it in the logs.
@Jia Fan, updating the findings here. All the three version of images seem to be right, as this is expected behavior. When the `operator-sdk create api` is run for helm-operator project: 1. If the $KUBECONFIG is set, then the resources which are required to run the helm chart in the cluster are scaffolded with required permissions. 2. When the $KUBECONFIG is not set, then the user has to manually add the required permissions. In this case, if not running on cluster, we need to manually add `service accounts` in role.yaml to provide necessary permissions. This is an expected behavior of the helm operator scaffolder. This is `not a bug` in the implementation, instead a bug against the documentation - https://bugzilla.redhat.com/show_bug.cgi?id=1915095
Check with 3 version operator-sdk in comment 6 after set the KUBECONFIG , all generated rbac/role.yaml has the "serviceaccount" sets. The operator run well. The documentation bug can trace the problem. Verified this bug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633