Description of problem: oc exec cmd run executed file in azure file volume return 139 or exec failed: container_linux.go:366: starting container process caused: interrupted system call Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2021-01-18-214951 How reproducible: Hit once in our ci, and can be reproduced with 4.7.0-0.nightly-2021-01-18-214951 ipi-on-azure-MAG-fully_private-NAT cluster Steps to Reproduce: 1. Create a new project 2. Create an azure file PVC with dynamic provisioning 3. Create a pod using this PVC 4. Copy an executable file this PVC 5. Run this file with oc exec cmd Actual results: $ oc --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_testuser-0.kubeconfig -n kop2x exec mypod -- /mnt/iaas/hello command terminated with exit code 139 $ oc --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_testuser-0.kubeconfig -n kop2x exec mypod -- /mnt/iaas/hello time="2021-01-19T03:32:28Z" level=error msg="exec failed: container_linux.go:366: starting container process caused: interrupted system call" command terminated with exit code 1 Run an executable file not in azure file volume: $ oc --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_testuser-0.kubeconfig -n kop2x exec mypod -- /hello Hello OpenShift Storage Tried to rsh into this pod and execute this file, it succeeded. $ oc -n kop2x --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_testuser-0.kubeconfig rsh mypod sh-4.4$ df Filesystem 1K-blocks Used Available Use% Mounted on overlay 133682156 13110772 120571384 10% / tmpfs 65536 0 65536 0% /dev tmpfs 4075872 0 4075872 0% /sys/fs/cgroup shm 65536 0 65536 0% /dev/shm tmpfs 4075872 44184 4031688 2% /etc/passwd //cluster27fae.file.core.windows.net/storage-o2bf3-wph46-dy-pvc-b1a99c2f-1870-4434-98ad-21ed59687a2d 1048576 2368 1046208 1% /mnt/iaas /dev/sda4 133682156 13110772 120571384 10% /etc/hosts tmpfs 4075872 28 4075844 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 4075872 0 4075872 0% /proc/acpi tmpfs 4075872 0 4075872 0% /proc/scsi tmpfs 4075872 0 4075872 0% /sys/firmware sh-4.4$ ls /mnt/iaas/ hello testfile sh-4.4$ /mnt/iaas/hello Hello OpenShift Storage Expected results: Additional info: $ oc --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_admin.kubeconfig get sc sc-kop2x -oyaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.beta.kubernetes.io/is-default-class: "false" creationTimestamp: "2021-01-19T03:27:15Z" managedFields: - apiVersion: storage.k8s.io/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:storageclass.beta.kubernetes.io/is-default-class: {} f:provisioner: {} f:reclaimPolicy: {} f:volumeBindingMode: {} manager: kubectl-create operation: Update time: "2021-01-19T03:27:15Z" name: sc-kop2x resourceVersion: "50356" selfLink: /apis/storage.k8s.io/v1/storageclasses/sc-kop2x uid: cc8f5de7-9c49-4fea-a6c6-6006d4ba1a79 provisioner: kubernetes.io/azure-file reclaimPolicy: Delete volumeBindingMode: Immediate $ oc --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_admin.kubeconfig get pvc mypvc -n kop2x -oyaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/azure-file creationTimestamp: "2021-01-19T03:27:16Z" finalizers: - kubernetes.io/pvc-protection labels: name: dynamic-pvc managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:pv.kubernetes.io/bind-completed: {} f:pv.kubernetes.io/bound-by-controller: {} f:volume.beta.kubernetes.io/storage-provisioner: {} f:spec: f:volumeName: {} f:status: f:accessModes: {} f:capacity: .: {} f:storage: {} f:phase: {} manager: kube-controller-manager operation: Update time: "2021-01-19T03:27:16Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:labels: .: {} f:name: {} f:spec: f:accessModes: {} f:resources: f:requests: .: {} f:storage: {} f:storageClassName: {} f:volumeMode: {} manager: kubectl-create operation: Update time: "2021-01-19T03:27:16Z" name: mypvc namespace: kop2x resourceVersion: "50366" selfLink: /api/v1/namespaces/kop2x/persistentvolumeclaims/mypvc uid: b1a99c2f-1870-4434-98ad-21ed59687a2d spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: sc-kop2x volumeMode: Filesystem volumeName: pvc-b1a99c2f-1870-4434-98ad-21ed59687a2d status: accessModes: - ReadWriteOnce capacity: storage: 1Gi phase: Bound $ oc --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_admin.kubeconfig get pod mypod -n kop2x -oyaml apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/network-status: |- [{ "name": "", "interface": "eth0", "ips": [ "10.128.2.83" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "", "interface": "eth0", "ips": [ "10.128.2.83" ], "default": true, "dns": {} }] openshift.io/scc: restricted creationTimestamp: "2021-01-19T03:27:17Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:containers: k:{"name":"testpod"}: .: {} f:image: {} f:imagePullPolicy: {} f:name: {} f:ports: .: {} k:{"containerPort":80,"protocol":"TCP"}: .: {} f:containerPort: {} f:name: {} f:protocol: {} f:resources: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/mnt/iaas"}: .: {} f:mountPath: {} f:name: {} f:subPath: {} f:dnsPolicy: {} f:enableServiceLinks: {} f:initContainers: .: {} k:{"name":"init"}: .: {} f:command: {} f:image: {} f:imagePullPolicy: {} f:name: {} f:resources: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/mnt/iaas"}: .: {} f:mountPath: {} f:name: {} f:restartPolicy: {} f:schedulerName: {} f:securityContext: .: {} f:fsGroup: {} f:seLinuxOptions: f:level: {} f:terminationGracePeriodSeconds: {} f:volumes: .: {} k:{"name":"dynamic"}: .: {} f:name: {} f:persistentVolumeClaim: .: {} f:claimName: {} manager: kubectl-create operation: Update time: "2021-01-19T03:27:17Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: f:k8s.v1.cni.cncf.io/network-status: {} f:k8s.v1.cni.cncf.io/networks-status: {} manager: multus operation: Update time: "2021-01-19T03:27:20Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:status: f:conditions: k:{"type":"ContainersReady"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:status: {} f:type: {} k:{"type":"Initialized"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:status: {} f:type: {} k:{"type":"Ready"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:status: {} f:type: {} f:containerStatuses: {} f:hostIP: {} f:initContainerStatuses: {} f:phase: {} f:podIP: {} f:podIPs: .: {} k:{"ip":"10.128.2.83"}: .: {} f:ip: {} f:startTime: {} manager: kubelet operation: Update time: "2021-01-19T03:27:21Z" name: mypod namespace: kop2x resourceVersion: "50411" selfLink: /api/v1/namespaces/kop2x/pods/mypod uid: ba2b7382-0696-402e-88bb-283616f937a5 spec: containers: - image: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40 imagePullPolicy: IfNotPresent name: testpod ports: - containerPort: 80 name: http-server protocol: TCP resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1001050000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /mnt/iaas name: dynamic subPath: subpath - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-kz542 readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true imagePullSecrets: - name: default-dockercfg-t6q28 initContainers: - command: - sh - -c - mkdir /mnt/iaas/subpath; cp /hello /mnt/iaas/subpath/hello image: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40 imagePullPolicy: IfNotPresent name: init resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1001050000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /mnt/iaas name: dynamic - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-kz542 readOnly: true nodeName: storage-o2bf3-wph46-worker-centralus3-bvhs7 preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1001050000 seLinuxOptions: level: s0:c32,c29 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - name: dynamic persistentVolumeClaim: claimName: mypvc - name: default-token-kz542 secret: defaultMode: 420 secretName: default-token-kz542 status: conditions: - lastProbeTime: null lastTransitionTime: "2021-01-19T03:27:20Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2021-01-19T03:27:21Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2021-01-19T03:27:21Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2021-01-19T03:27:17Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://45c00648738287b94e0c7ff761674824f1a4f4f9b2a3ea1bbf0614e4244f35a2 image: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40 imageID: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40 lastState: {} name: testpod ready: true restartCount: 0 started: true state: running: startedAt: "2021-01-19T03:27:21Z" hostIP: 10.0.1.5 initContainerStatuses: - containerID: cri-o://12d39e0351c00b8e51e914d5eeaf5fdde2e911cb60fd48e14876b4ecce4d632b image: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40 imageID: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40 lastState: {} name: init ready: true restartCount: 0 state: terminated: containerID: cri-o://12d39e0351c00b8e51e914d5eeaf5fdde2e911cb60fd48e14876b4ecce4d632b exitCode: 0 finishedAt: "2021-01-19T03:27:20Z" reason: Completed startedAt: "2021-01-19T03:27:20Z" phase: Running podIP: 10.128.2.83 podIPs: - ip: 10.128.2.83 qosClass: BestEffort startTime: "2021-01-19T03:27:17Z"
is the program you're using not gracefully handle EINTR?
Hi Peter, I can't get the source code of `hello` binary file, only can find the binary file from here: https://github.com/jhou1/hello-openshift/tree/master/bin. I tried another two executable files: $ oc -n a6wlz exec mypod -- cat /mnt/iaas/test #!/bin/sh echo "Hello world!" $ oc -n a6wlz exec mypod -- /mnt/iaas/test Hello world! $ oc -n a6wlz exec mypod -- /mnt/iaas/test1 time="2021-02-01T07:17:58Z" level=error msg="exec failed: container_linux.go:366: starting container process caused: interrupted system call" command terminated with exit code 1 This is the source code for test1: $ cat test1.go package main import "fmt" func main() { fmt.Println("Hello world!") } go build test1.go $ oc cp test1 a6wlz/mypod:/mnt/iaas $ oc -n a6wlz exec mypod -- ls /mnt/iaas hello test test1 testfile
> return 139 This means the process was killed with signal 9, most probably as a result of OOM kill. I suggest to review and maybe raise the memory limit.
As for "interrupted system call", the error comes from runc, and it seems that this might be caused by go 1.13 used to compile it. The remedy is to use go 1.14+.
we should be using go 1.15 in 4.8 now, so I don't think this should be an issue there. Since this was originally targeted at 4.7, I'm moving this out of the release.
Was looking into it this week. Apparently, execve(2) syscall can actually return EINTR on Azure (when using CIFS share). No one ever wraps execve(2) into retry-on-eintr loop (I checked golang runtime and glibc). Nevertheless, we're going to do that in runc, because of this bug. A WIP PR is https://github.com/opencontainers/runc/pull/3045, should be merged soon and then I'll backport to 1.0 branch. From looking at the code, the same issue exists in crun, the fix is in https://github.com/containers/crun/pull/702.
Fix PR is merged, backport PR (https://github.com/opencontainers/runc/pull/3074) in review, the fix will be part of runc 1.0.1 release.
Fixed in runc 1.0.1
private-templates/functionality-testing/aos-4_12/ipi-on-azure/versioned-installer-MAG-fully_private_cluster-NAT Fixed on 4.12.0-0.nightly-2022-09-02-194931 $ oc exec mypod -- /hello Hello OpenShift Storage
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399