1934177 – knative-camel-operator CreateContainerError "container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"

Bug 1934177 - knative-camel-operator CreateContainerError "container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"

Summary: knative-camel-operator CreateContainerError "container_linux.go:366: startin...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Peter Hunt
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1944312 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-02 16:28 UTC by Marek Schmidt
Modified:	2024-10-01 17:35 UTC (History)
CC List:	8 users (show)
Fixed In Version:	runc-1.0.0-84.rhaos4.6.git7116f03
Doc Type:	Bug Fix
Doc Text:	Cause: A change in the order of when runc sets up the workdir of a container Consequence: Container creation errors occurred if the workdir wasn't owned by the user running runc Fix: Update runc to attempt the chdir to the workdir multiple times, in case one does not work Result: Container creations succeed regardless of whether the workdir is owned by the container user or the user running runc
Clone Of:
Environment:
Last Closed:	2021-07-27 22:49:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	opencontainers runc pull 2894	0	None	open	libct/init_linux: retry chdir to fix EPERM	2021-04-06 17:53:21 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 22:51:31 UTC

Description Marek Schmidt 2021-03-02 16:28:51 UTC

Description of problem:

knative-camel-operator can be installed on 4.6.15, but fails on 4.6.18 and 4.7.0. 

The camel-controller-manager pod gets into CreateContainerError state, with the following error:

container create failed: time="2021-03-02T12:02:28Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"

Version-Release number of selected component (if applicable):
4.6.18 , 4.7.0

How reproducible:
Always

Steps to Reproduce:
1. On 4.6.18, or 4.7.0, Install Serverless Operator from the Red Hat OperatorHub (currently 1.13.0)
2. Create KnativeServing in knative-serving namespace and KnativeEventing in knative-eventing namespace
3. Install "Knative Apache Camel Operator" (currently 0.18.0) from the community OperatorHub
4. Notice the operator installation fails, with the camel-controller-manager pod in CreateContainerError state.


Actual results:
camel-controller-manager pod is stuck in CreateContainerError with the following error:

container create failed: time="2021-03-02T12:02:28Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"

Expected results:
camel-controller-manager pod starts up normally

Additional info:

This seems to be a regression in 4.6.18 compared to 4.6.15, as the same operator with the same image worked fine there.

Possibly introduced by https://github.com/projectatomic/runc/commit/e541951c107025363752afe4fb483d3b8d71addd  ?

Comment 1 Marek Schmidt 2021-03-02 16:34:32 UTC

The image used by the knative-camel-operator is gcr.io/knative-releases/knative.dev/eventing-camel/cmd/controller@sha256:874b498fc53ee5060c4f897c3fdf193a457d7c51c6ae6acc336d57518e848882

Comment 2 Marek Schmidt 2021-03-02 22:06:57 UTC

Specifically it seems the regression is between 4.6.16 (which also works), and 4.6.17 (on which it fails with CreateContainerError )

Comment 3 Peter Hunt 2021-03-03 14:43:58 UTC

ah, It seems you've hit the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1915397

make sure the WORKDIR is accessible by the user the container runs as

Comment 5 Marek Schmidt 2021-03-03 17:56:26 UTC

The image is an upstream image based on gcr.io/distroless/static:nonroot  

This affects any image based on gcr.io/distroless/static:nonroot  that doesn't modify WORKDIR , e.g.

oc new-app quay.io/maschmid/helloworld:latest

which is just

FROM gcr.io/distroless/static:nonroot
ADD hello_world /hello_world
CMD ["/hello_world"]


This image works on 4.6.15, but doesn't on 4.7.0

Comment 6 Peter Hunt 2021-03-03 19:12:34 UTC

running
`id=$(podman pull -1 gcr.io/distroless/static:nonroot)` and then `podman inspect $id` returns:
[{
...
 "Config": {
            "User": "65532",
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt"
            ],
            "WorkingDir": "/home/nonroot"
        },
...
}]

So the WORKDIR is actually modified, just not by your Dockerfile.

I am suspcious that running `oc new-app` is running this container as a random uid, and that uid is not 65532. I would recommend running as that user, or doing something similar to what CNV did to workaround this issue:
```
RUN chgrp -R 0 /home/nonroot && \
+    chmod -R g=u /home/nonroot
```
for whatever group your container ends up running as

Comment 7 Peter Hunt 2021-03-03 19:23:57 UTC

podman pull -q gcr.io/distroless/static:nonroot
should be the first command

Comment 8 Peter Hunt 2021-03-09 21:47:50 UTC

does the work around work for you?

Comment 9 Marek Schmidt 2021-03-10 12:44:49 UTC

As we don't have direct control on the image, we're trying to workaround by "runAsUser: 65532" in the operator:

https://github.com/operator-framework/community-operators/pull/3262

Comment 10 Peter Hunt 2021-03-15 17:00:34 UTC

That PR merged, is that work around sufficient/can we close this?

Comment 11 Marek Schmidt 2021-03-19 08:19:05 UTC

Specifically for the knative-camel-operator the issue is fixed by our "workaround".

I'd leave it up to you if you want to track a general problem of making images based on gcr.io/distroless/static:nonroot "just work" on OpenShift like it did before 4.6.17.

(I'd consider this to be a serious regression, as this behavior can cause applications breaking when upgrading to new OCP micro release, but I understand that was an unfortunate tradeoff that had to be done for fixing a different regression vs OCP 3.x)

Comment 12 Peter Hunt 2021-03-19 18:27:47 UTC

yeah I deem this to be an unfortunate trade-off. Since this behavior is more correct, the regression must be allowed to happen

Comment 13 Peter Hunt 2021-04-06 17:53:24 UTC

I've had a change of heart. I believe we can fix this case because it *was* previously valid. I've attached the PR.
If it is accepted by upstream I will backport it to 4.5+

Comment 14 Peter Hunt 2021-04-06 17:54:56 UTC

*** Bug 1944312 has been marked as a duplicate of this bug. ***

Comment 15 Peter Hunt 2021-04-16 21:34:49 UTC

I have worked around the issue and submitted the patch to 4.5-4.8

Comment 24 errata-xmlrpc 2021-07-27 22:49:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.