1874057 – Pod stuck in CreateContainerError - error msg="container_linux.go:348: starting container process caused \"chdir to cwd (\\\"/mount-point\\\") set in config.json failed: permission denied\""

Bug 1874057 - Pod stuck in CreateContainerError - error msg="container_linux.go:348: starting container process caused \"chdir to cwd (\\\"/mount-point\\\") set in config.json failed: permission denied\""

Summary: Pod stuck in CreateContainerError - error msg="container_linux.go:348: starti...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.5
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Peter Hunt
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1915397 1915870
TreeView+	depends on / blocked

Reported:	2020-08-31 12:43 UTC by Andreas Nowak
Modified:	2024-06-13 23:00 UTC (History)
CC List:	13 users (show)
Fixed In Version:	runc-1.0.0-82.rhaos4.6.git086e841.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 15:16:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	opencontainers runc pull 2685	None	closed	libctr/init_linux: reorder chdir	2021-02-16 00:57:25 UTC
Github	projectatomic runc pull 39	None	closed	[4.5] libctr/init_linux: reorder chdir	2021-02-16 00:57:24 UTC
Red Hat Product Errata	RHSA-2020:5633	None	None	None	2021-02-24 15:17:27 UTC

Comment 1 Seth Jennings 2020-08-31 15:15:22 UTC

Please provide the rendered template with `oc get pod -n test-prometheus prometheus-0`

Comment 2 Seth Jennings 2020-08-31 17:30:49 UTC

To add color here, this is the result of a file permission error within the container.  The user the container process is running as does not have permissions on the working directory defined for the container image.

The difference if behavior is caused by change in user.  Possible ran as root in 3.11 and non-root in 4.x.

Comment 4 Seth Jennings 2020-09-01 14:24:56 UTC

I suspect this might have something to do with the Dockerfile that creates the image having a VOLUME at the WORKDIR and some variation on how docker and cri-o handle that.

https://github.com/prometheus/prometheus/blob/master/Dockerfile#L23

Comment 5 Seth Jennings 2020-09-01 15:29:16 UTC

I could not recreate with a simplified prometheus pod using the same image.

$ cat prometheus.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: prometheus
spec:
  containers:
  - name: prometheus
    image: quay.io/prometheus/prometheus:v2.20.1
    command:
    - /bin/sh
    - "-c"
    - "echo 'this should be in the logs' && sleep 86400"
    volumeMounts:
    - mountPath: /prometheus
      name: prometheus-data-test-prometheus
  volumes:
  - name: prometheus-data-test-prometheus
    emptyDir: {}

$ oc get pod -oyaml prometheus
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "",
          "interface": "eth0",
          "ips": [
              "10.128.2.20"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "",
          "interface": "eth0",
          "ips": [
              "10.128.2.20"
          ],
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: restricted
  creationTimestamp: "2020-09-01T15:18:00Z"
  managedFields:
...
  name: prometheus
  namespace: demo2
  resourceVersion: "64710"
  selfLink: /api/v1/namespaces/demo2/pods/prometheus
  uid: e57a9224-7a34-4a30-874d-0c5918cd35c4
spec:
  containers:
  - command:
    - /bin/sh
    - -c
    - echo 'this should be in the logs' && sleep 86400
    image: quay.io/prometheus/prometheus:v2.20.1
    imagePullPolicy: IfNotPresent
    name: prometheus
    resources: {}
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000580000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /prometheus
      name: prometheus-data-test-prometheus
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-nbmv8
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: default-dockercfg-nbsvq
  nodeName: ip-10-0-175-173.us-west-1.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000580000
    seLinuxOptions:
      level: s0:c24,c14
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 0
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: prometheus-data-test-prometheus
  - name: default-token-nbmv8
    secret:
      defaultMode: 420
      secretName: default-token-nbmv8
status:
  conditions:
...
  containerStatuses:
  - containerID: cri-o://82038ebb37461109a07d963def3ebb8600df5bb5ff84fb5ef096efa4e69cb25c
    image: quay.io/prometheus/prometheus:v2.20.1
    imageID: quay.io/prometheus/prometheus@sha256:788260ebd13613456c168d2eed8290f119f2b6301af2507ff65908d979c66c17
    lastState: {}
    name: prometheus
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-09-01T15:18:02Z"
  hostIP: 10.0.175.173
  phase: Running
  podIP: 10.128.2.20
  podIPs:
  - ip: 10.128.2.20
  qosClass: BestEffort
  startTime: "2020-09-01T15:18:00Z"

$ oc get pod
NAME         READY   STATUS    RESTARTS   AGE
prometheus   1/1     Running   0          4m28s

$ oc get events
LAST SEEN   TYPE     REASON           OBJECT           MESSAGE
4m2s        Normal   Scheduled        pod/prometheus   Successfully assigned demo2/prometheus to ip-10-0-175-173.us-west-1.compute.internal
4m          Normal   AddedInterface   pod/prometheus   Add eth0 [10.128.2.20/23]
4m          Normal   Pulled           pod/prometheus   Container image "quay.io/prometheus/prometheus:v2.20.1" already present on machine
4m          Normal   Created          pod/prometheus   Created container prometheus
4m          Normal   Started          pod/prometheus   Started container prometheus

In the CU situation, the volume mounted at /prometheus is a PVC vs an emptydir so there might be something there.

If I go onto the node and check selinux labels

# pwd
/var/lib/kubelet/pods/e57a9224-7a34-4a30-874d-0c5918cd35c4/volumes/kubernetes.io~empty-dir/prometheus-data-test-prometheus

# ls -alZ
total 0
drwxrwsrwx. 2 root 1000580000 system_u:object_r:container_file_t:s0:c14,c24  6 Sep  1 15:18 .
drwxr-xr-x. 3 root root       system_u:object_r:var_lib_t:s0                45 Sep  1 15:18 .

So the selinux labels and chown to the runAsUser is working.

Comment 6 Seth Jennings 2020-09-01 15:32:06 UTC

Forgot to mention in my previous comment, the recreate attempt without NFS was just to ensure that the baseline functionality works and that it is likely related to this particular storage setup.

Comment 7 Seth Jennings 2020-09-01 15:34:55 UTC

Sending to storage to validate the NFS storage configuration being used.

Comment 21 Peter Hunt 2020-09-28 23:40:36 UTC

I haven't been able to reproduce with a simple PVC, supplimental groups and the restricted scc:

first, I create a mount on a host:

```shell
oc debug node/$nodename

sh-4.2# chroot /host
sh-4.4# mkdir -p /mnt/data
sh-4.4# touch /mnt/data/hi

sh-4.4# chgrp -R 777 /mnt/data/
sh-4.4# chmod -R 2770 /mnt/data
sh-4.4# ls -la /mnt/data
total 0
drwxrws---. 2 root 7777 19 Sep 28 23:36 .
drwxr-xr-x. 3 root root 18 Sep 28 23:15 ..
-rwxrws---. 1 root 7777  0 Sep 28 23:36 hello
```


then run `oc apply -f /tmp/yaml` on this yaml
```
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: restricted
  name: test-pod
  labels:
    name: test-pod
spec:
  restartPolicy: Never
  securityContext:
    supplementalGroups: [7777]
    runAsUser: 1000100000
  containers:
    - name: test
      image: quay.io/haircommander/centos-test:latest
      imagePullPolicy: IfNotPresent
      name: test
      volumeMounts:
        - name: task-pv-storage
          mountPath: /test
  nodeSelector:
    kubernetes.io/hostname: $nodename # from above
  volumes:
      - name: task-pv-storage
        persistentVolumeClaim:
          claimName: task-pv-claim
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: task-pv-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: task-pv-claim
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
```

and it starts up just fine
```shell
 $ oc get pod/test-pod
NAME       READY   STATUS    RESTARTS   AGE
test-pod   1/1     Running   0          13s
```


Can someone provide me a fuller reproducer?

Comment 34 Peter Hunt 2020-10-23 21:04:10 UTC

Unfortunately, until I can reproduce, it will be hard to solve this one. I will try again next sprint

Comment 42 Peter Hunt 2020-11-19 15:14:18 UTC

The issue is coming from https://github.com/opencontainers/runc/commit/5e0e67d76cc99d76c8228d48f38f37034503f315 (which I believe was introduced in 4.5).

This is failing because runc attempts to chdir to a volume it has no access to (it's owned and in the group of the container user, not root). The aforementioned commit changes the order of the function so that we chdir before changing to the correct user. I have attached a PR submitted upstream that re-changes the order, hopefully fulfilling both cases.

Comment 46 Peter Hunt 2021-01-04 18:17:06 UTC

the fix has been merged, I have backported the relevant PR to our fork, and triggered a build in brew. It should come in the next 4.6.z release, and the initial 4.7 release

Comment 54 Michael Burke 2021-02-11 15:21:55 UTC

If this BZ needs to be included in the 4.7 release notes as a bug fix, please enter Doc Text by EOB 2/12. Thank you, Michael

Comment 57 errata-xmlrpc 2021-02-24 15:16:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.