Description of problem: knative-camel-operator can be installed on 4.6.15, but fails on 4.6.18 and 4.7.0. The camel-controller-manager pod gets into CreateContainerError state, with the following error: container create failed: time="2021-03-02T12:02:28Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied" Version-Release number of selected component (if applicable): 4.6.18 , 4.7.0 How reproducible: Always Steps to Reproduce: 1. On 4.6.18, or 4.7.0, Install Serverless Operator from the Red Hat OperatorHub (currently 1.13.0) 2. Create KnativeServing in knative-serving namespace and KnativeEventing in knative-eventing namespace 3. Install "Knative Apache Camel Operator" (currently 0.18.0) from the community OperatorHub 4. Notice the operator installation fails, with the camel-controller-manager pod in CreateContainerError state. Actual results: camel-controller-manager pod is stuck in CreateContainerError with the following error: container create failed: time="2021-03-02T12:02:28Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied" Expected results: camel-controller-manager pod starts up normally Additional info: This seems to be a regression in 4.6.18 compared to 4.6.15, as the same operator with the same image worked fine there. Possibly introduced by https://github.com/projectatomic/runc/commit/e541951c107025363752afe4fb483d3b8d71addd ?
The image used by the knative-camel-operator is gcr.io/knative-releases/knative.dev/eventing-camel/cmd/controller@sha256:874b498fc53ee5060c4f897c3fdf193a457d7c51c6ae6acc336d57518e848882
Specifically it seems the regression is between 4.6.16 (which also works), and 4.6.17 (on which it fails with CreateContainerError )
ah, It seems you've hit the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1915397 make sure the WORKDIR is accessible by the user the container runs as
The image is an upstream image based on gcr.io/distroless/static:nonroot This affects any image based on gcr.io/distroless/static:nonroot that doesn't modify WORKDIR , e.g. oc new-app quay.io/maschmid/helloworld:latest which is just FROM gcr.io/distroless/static:nonroot ADD hello_world /hello_world CMD ["/hello_world"] This image works on 4.6.15, but doesn't on 4.7.0
running `id=$(podman pull -1 gcr.io/distroless/static:nonroot)` and then `podman inspect $id` returns: [{ ... "Config": { "User": "65532", "Env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt" ], "WorkingDir": "/home/nonroot" }, ... }] So the WORKDIR is actually modified, just not by your Dockerfile. I am suspcious that running `oc new-app` is running this container as a random uid, and that uid is not 65532. I would recommend running as that user, or doing something similar to what CNV did to workaround this issue: ``` RUN chgrp -R 0 /home/nonroot && \ + chmod -R g=u /home/nonroot ``` for whatever group your container ends up running as
podman pull -q gcr.io/distroless/static:nonroot should be the first command
does the work around work for you?
As we don't have direct control on the image, we're trying to workaround by "runAsUser: 65532" in the operator: https://github.com/operator-framework/community-operators/pull/3262
That PR merged, is that work around sufficient/can we close this?
Specifically for the knative-camel-operator the issue is fixed by our "workaround". I'd leave it up to you if you want to track a general problem of making images based on gcr.io/distroless/static:nonroot "just work" on OpenShift like it did before 4.6.17. (I'd consider this to be a serious regression, as this behavior can cause applications breaking when upgrading to new OCP micro release, but I understand that was an unfortunate tradeoff that had to be done for fixing a different regression vs OCP 3.x)
yeah I deem this to be an unfortunate trade-off. Since this behavior is more correct, the regression must be allowed to happen
I've had a change of heart. I believe we can fix this case because it *was* previously valid. I've attached the PR. If it is accepted by upstream I will backport it to 4.5+
*** Bug 1944312 has been marked as a duplicate of this bug. ***
I have worked around the issue and submitted the patch to 4.5-4.8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438