Bug 1984781 - nginx pod CrashLoopBackOff for Helm-based Operators
Summary: nginx pod CrashLoopBackOff for Helm-based Operators
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Alex Dellapenta
QA Contact: Cuiping HUO
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-22 08:35 UTC by Cuiping HUO
Modified: 2022-01-12 20:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-12 20:24:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Cuiping HUO 2021-07-22 08:35:00 UTC
Description of problem:
After creating a Helm-based Operator following https://sdk.operatorframework.io/docs/building-operators/helm/quickstart/, the nginx pod is always CrashLoopBackOff with Permission denied issue.

Version-Release number of selected component (if applicable):
$ operator-sdk version
operator-sdk version: "v1.8.0-ocp", commit: "016423f32c9757a10dc7a9e953818bc20ae3eba4", kubernetes version: "v1.20.2", go version: "go1.16.5", GOOS: "linux", GOARCH: "amd64"

cluster version: 4.8.0-0.nightly-2021-07-21-150743

How reproducible:
Always

Steps to Reproduce:
Follow https://sdk.operatorframework.io/docs/building-operators/helm/quickstart/ documentation
1.mkdir nginx-operator
cd nginx-operator
operator-sdk init --domain example.com --plugins helm
2.operator-sdk create api --group demo --version v1alpha1 --kind Nginx
3.make docker-build docker-push IMG=quay.io/cuipinghuo/nginx-operator-bundle:v0.0.2
4.make deploy IMG="quay.io/cuipinghuo/nginx-operator-bundle:v0.0.2"
5.oc apply -f config/samples/demo_v1alpha1_nginx.yaml
6. check nginx pod status

Actual results:
The nginx pod is CrashLoopBackOff

Expected results:
The nginx pod should be running

Additional info:
# oc get all -l "app.kubernetes.io/instance=nginx-sample"
NAME                                READY   STATUS             RESTARTS   AGE
pod/nginx-sample-7b99f754b9-qbmxv   0/1     CrashLoopBackOff   6          9m3s

NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/nginx-sample   ClusterIP   172.30.229.135   <none>        80/TCP    11m

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx-sample   0/1     1            0           11m

NAME                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-sample-7b99f754b9   1         1         0       9m3s


# oc logs -f nginx-sample-7b99f754b9-qbmxv 
2021/07/22 08:25:18 [warn] 1#1: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:2
nginx: [warn] the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:2
2021/07/22 08:25:18 [emerg] 1#1: mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)
nginx: [emerg] mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)

Comment 1 Venkat Ramaraju 2021-07-26 21:07:08 UTC
Hi,

Which helm image are you pulling in your Dockerfile? Since the 4.8 image hasn't officially released yet, I used quay.io/openshift/origin-helm-operator:4.8 in the Dockerfile.
With that, I'm not able to reproduce your error. 

$ oc get all -l "app.kubernetes.io/instance=nginx-sample"
NAME                                READY   STATUS    RESTARTS   AGE
pod/nginx-sample-7d765fb9cf-v5qk7   1/1     Running   0          85s

NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/nginx-sample   ClusterIP   172.30.202.72   <none>        80/TCP    86s

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx-sample   1/1     1            1           86s

NAME                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-sample-7d765fb9cf   1         1         1       86s


However, reading the logs you provided, the error seems to be related to permissions. Have you made any permission changes to the Dockerfile or the /etc/nginx/nginx.conf file?

Comment 2 Cuiping HUO 2021-07-27 06:24:08 UTC
Hello Venkat,
We use registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-helm-operator:v4.8 for testing before GA.
I also use quay.io/openshift/origin-helm-operator:4.8 to retest this scenario. 
And this permission error still exsits.
I didn't not change other parts of the Dockerfile (except image) nor the /etc/nginx/nginx.conf file.

# cat Dockerfile
# Build the manager binary
#FROM registry.redhat.io/openshift4/ose-helm-operator:v4.8
FROM quay.io/openshift/origin-helm-operator:4.8
ENV HOME=/opt/helm
COPY watches.yaml ${HOME}/watches.yaml
COPY helm-charts  ${HOME}/helm-charts
WORKDIR ${HOME}

Comment 3 Cuiping HUO 2021-07-29 06:47:39 UTC
https://access.redhat.com/solutions/3419001 has the similar issue and based on the solution mentioned in the link. 
add scc to nginx-sample's service account solve this problem.


# oc adm policy add-scc-to-user anyuid system:serviceaccount:nginx-operator-system:default
securitycontextconstraints.security.openshift.io/anyuid added to: ["system:serviceaccount:nginx-operator-system:default"]

# oc apply -f config/samples/demo_v1alpha1_nginx.yaml
nginx.demo.example.com/nginx-sample created

# oc get po
NAME                                                 READY   STATUS    RESTARTS   AGE
nginx-operator-controller-manager-67ccc6fd45-ttkn7   2/2     Running   0          106s
nginx-sample-7d765fb9cf-779m9                        1/1     Running   0          65s


# oc describe po nginx-sample-7d765fb9cf-779m9
Name:         nginx-sample-7d765fb9cf-779m9
Namespace:    nginx-operator-system
Priority:     0

Events:
  Type    Reason          Age    From                                                                Message
  ----    ------          ----   ----                                                                -------
  Normal  Scheduled       8m59s  default-scheduler                                                   Successfully assigned nginx-operator-system/nginx-sample-7d765fb9cf-779m9 to yangyang47-4-9x57k-worker-b-w9ksx.c.openshift-qe.internal
  Normal  AddedInterface  8m57s  multus                                                              Add eth0 [10.129.2.58/23] from openshift-sdn
  Normal  Pulled          8m57s  kubelet, yangyang47-4-9x57k-worker-b-w9ksx.c.openshift-qe.internal  Container image "nginx:1.16.0" already present on machine
  Normal  Created         8m57s  kubelet, yangyang47-4-9x57k-worker-b-w9ksx.c.openshift-qe.internal  Created container nginx
  Normal  Started         8m57s  kubelet, yangyang47-4-9x57k-worker-b-w9ksx.c.openshift-qe.internal  Started container nginx

Comment 4 Venkat Ramaraju 2021-07-29 15:01:09 UTC
Can this be closed now?

Comment 8 Alex Dellapenta 2021-08-06 21:25:07 UTC
I've discussed with Jesus and am taking this BZ and changing it to the Documentation component. Moving to ON_QA for the following docs PR:

https://github.com/openshift/openshift-docs/pull/35034


Note You need to log in before you can comment on or make changes to this bug.