Bug 1874057
| Summary: | Pod stuck in CreateContainerError - error msg="container_linux.go:348: starting container process caused \"chdir to cwd (\\\"/mount-point\\\") set in config.json failed: permission denied\"" | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Andreas Nowak <anowak> |
| Component: | Node | Assignee: | Peter Hunt <pehunt> |
| Node sub component: | CRI-O | QA Contact: | Weinan Liu <weinliu> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | high | CC: | aos-bugs, dwalsh, eparis, hekumar, jokerman, jsafrane, kiyyappa, nagrawal, pehunt, sgarciam, sgordon, tsweeney, weinliu |
| Version: | 4.5 | Keywords: | ServiceDeliveryImpact |
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | runc-1.0.0-82.rhaos4.6.git086e841.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:16:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1915397, 1915870 | ||
|
Comment 1
Seth Jennings
2020-08-31 15:15:22 UTC
To add color here, this is the result of a file permission error within the container. The user the container process is running as does not have permissions on the working directory defined for the container image. The difference if behavior is caused by change in user. Possible ran as root in 3.11 and non-root in 4.x. I suspect this might have something to do with the Dockerfile that creates the image having a VOLUME at the WORKDIR and some variation on how docker and cri-o handle that. https://github.com/prometheus/prometheus/blob/master/Dockerfile#L23 I could not recreate with a simplified prometheus pod using the same image.
$ cat prometheus.yaml
apiVersion: v1
kind: Pod
metadata:
name: prometheus
spec:
containers:
- name: prometheus
image: quay.io/prometheus/prometheus:v2.20.1
command:
- /bin/sh
- "-c"
- "echo 'this should be in the logs' && sleep 86400"
volumeMounts:
- mountPath: /prometheus
name: prometheus-data-test-prometheus
volumes:
- name: prometheus-data-test-prometheus
emptyDir: {}
$ oc get pod -oyaml prometheus
apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "",
"interface": "eth0",
"ips": [
"10.128.2.20"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status: |-
[{
"name": "",
"interface": "eth0",
"ips": [
"10.128.2.20"
],
"default": true,
"dns": {}
}]
openshift.io/scc: restricted
creationTimestamp: "2020-09-01T15:18:00Z"
managedFields:
...
name: prometheus
namespace: demo2
resourceVersion: "64710"
selfLink: /api/v1/namespaces/demo2/pods/prometheus
uid: e57a9224-7a34-4a30-874d-0c5918cd35c4
spec:
containers:
- command:
- /bin/sh
- -c
- echo 'this should be in the logs' && sleep 86400
image: quay.io/prometheus/prometheus:v2.20.1
imagePullPolicy: IfNotPresent
name: prometheus
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000580000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /prometheus
name: prometheus-data-test-prometheus
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-nbmv8
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: default-dockercfg-nbsvq
nodeName: ip-10-0-175-173.us-west-1.compute.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext:
fsGroup: 1000580000
seLinuxOptions:
level: s0:c24,c14
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: prometheus-data-test-prometheus
- name: default-token-nbmv8
secret:
defaultMode: 420
secretName: default-token-nbmv8
status:
conditions:
...
containerStatuses:
- containerID: cri-o://82038ebb37461109a07d963def3ebb8600df5bb5ff84fb5ef096efa4e69cb25c
image: quay.io/prometheus/prometheus:v2.20.1
imageID: quay.io/prometheus/prometheus@sha256:788260ebd13613456c168d2eed8290f119f2b6301af2507ff65908d979c66c17
lastState: {}
name: prometheus
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2020-09-01T15:18:02Z"
hostIP: 10.0.175.173
phase: Running
podIP: 10.128.2.20
podIPs:
- ip: 10.128.2.20
qosClass: BestEffort
startTime: "2020-09-01T15:18:00Z"
$ oc get pod
NAME READY STATUS RESTARTS AGE
prometheus 1/1 Running 0 4m28s
$ oc get events
LAST SEEN TYPE REASON OBJECT MESSAGE
4m2s Normal Scheduled pod/prometheus Successfully assigned demo2/prometheus to ip-10-0-175-173.us-west-1.compute.internal
4m Normal AddedInterface pod/prometheus Add eth0 [10.128.2.20/23]
4m Normal Pulled pod/prometheus Container image "quay.io/prometheus/prometheus:v2.20.1" already present on machine
4m Normal Created pod/prometheus Created container prometheus
4m Normal Started pod/prometheus Started container prometheus
In the CU situation, the volume mounted at /prometheus is a PVC vs an emptydir so there might be something there.
If I go onto the node and check selinux labels
# pwd
/var/lib/kubelet/pods/e57a9224-7a34-4a30-874d-0c5918cd35c4/volumes/kubernetes.io~empty-dir/prometheus-data-test-prometheus
# ls -alZ
total 0
drwxrwsrwx. 2 root 1000580000 system_u:object_r:container_file_t:s0:c14,c24 6 Sep 1 15:18 .
drwxr-xr-x. 3 root root system_u:object_r:var_lib_t:s0 45 Sep 1 15:18 .
So the selinux labels and chown to the runAsUser is working.
Forgot to mention in my previous comment, the recreate attempt without NFS was just to ensure that the baseline functionality works and that it is likely related to this particular storage setup. Sending to storage to validate the NFS storage configuration being used. I haven't been able to reproduce with a simple PVC, supplimental groups and the restricted scc:
first, I create a mount on a host:
```shell
oc debug node/$nodename
sh-4.2# chroot /host
sh-4.4# mkdir -p /mnt/data
sh-4.4# touch /mnt/data/hi
sh-4.4# chgrp -R 777 /mnt/data/
sh-4.4# chmod -R 2770 /mnt/data
sh-4.4# ls -la /mnt/data
total 0
drwxrws---. 2 root 7777 19 Sep 28 23:36 .
drwxr-xr-x. 3 root root 18 Sep 28 23:15 ..
-rwxrws---. 1 root 7777 0 Sep 28 23:36 hello
```
then run `oc apply -f /tmp/yaml` on this yaml
```
---
apiVersion: v1
kind: Pod
metadata:
annotations:
openshift.io/scc: restricted
name: test-pod
labels:
name: test-pod
spec:
restartPolicy: Never
securityContext:
supplementalGroups: [7777]
runAsUser: 1000100000
containers:
- name: test
image: quay.io/haircommander/centos-test:latest
imagePullPolicy: IfNotPresent
name: test
volumeMounts:
- name: task-pv-storage
mountPath: /test
nodeSelector:
kubernetes.io/hostname: $nodename # from above
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: task-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
```
and it starts up just fine
```shell
$ oc get pod/test-pod
NAME READY STATUS RESTARTS AGE
test-pod 1/1 Running 0 13s
```
Can someone provide me a fuller reproducer?
Unfortunately, until I can reproduce, it will be hard to solve this one. I will try again next sprint The issue is coming from https://github.com/opencontainers/runc/commit/5e0e67d76cc99d76c8228d48f38f37034503f315 (which I believe was introduced in 4.5). This is failing because runc attempts to chdir to a volume it has no access to (it's owned and in the group of the container user, not root). The aforementioned commit changes the order of the function so that we chdir before changing to the correct user. I have attached a PR submitted upstream that re-changes the order, hopefully fulfilling both cases. the fix has been merged, I have backported the relevant PR to our fork, and triggered a build in brew. It should come in the next 4.6.z release, and the initial 4.7 release If this BZ needs to be included in the 4.7 release notes as a bug fix, please enter Doc Text by EOB 2/12. Thank you, Michael Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |