Bug 1917662 - oc exec cmd run executed file in azure file volume return 139 or exec failed: container_linux.go:366: starting container process caused: interrupted system call
Summary: oc exec cmd run executed file in azure file volume return 139 or exec failed:...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.12.0
Assignee: Kir Kolyshkin
QA Contact: Weinan Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-19 05:34 UTC by Qin Ping
Modified: 2023-01-17 22:17 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:46:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:46:40 UTC

Description Qin Ping 2021-01-19 05:34:03 UTC
Description of problem:
oc exec cmd run executed file in azure file volume return 139 or exec failed: container_linux.go:366: starting container process caused: interrupted system call

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-01-18-214951

How reproducible:
Hit once in our ci, and can be reproduced with 4.7.0-0.nightly-2021-01-18-214951 ipi-on-azure-MAG-fully_private-NAT cluster

Steps to Reproduce:
1. Create a new project
2. Create an azure file PVC with dynamic provisioning
3. Create a pod using this PVC
4. Copy an executable file this PVC
5. Run this file with oc exec cmd

Actual results:
$ oc   --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_testuser-0.kubeconfig -n kop2x exec mypod  -- /mnt/iaas/hello
command terminated with exit code 139

$ oc   --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_testuser-0.kubeconfig -n kop2x exec mypod  -- /mnt/iaas/hello
time="2021-01-19T03:32:28Z" level=error msg="exec failed: container_linux.go:366: starting container process caused: interrupted system call"
command terminated with exit code 1

Run an executable file not in azure file volume:
$ oc   --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_testuser-0.kubeconfig -n kop2x exec mypod  -- /hello
Hello OpenShift Storage


Tried to rsh into this pod and execute this file, it succeeded.
$ oc -n kop2x --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_testuser-0.kubeconfig rsh mypod
sh-4.4$ df
Filesystem                                                                                           1K-blocks     Used Available Use% Mounted on
overlay                                                                                              133682156 13110772 120571384  10% /
tmpfs                                                                                                    65536        0     65536   0% /dev
tmpfs                                                                                                  4075872        0   4075872   0% /sys/fs/cgroup
shm                                                                                                      65536        0     65536   0% /dev/shm
tmpfs                                                                                                  4075872    44184   4031688   2% /etc/passwd
//cluster27fae.file.core.windows.net/storage-o2bf3-wph46-dy-pvc-b1a99c2f-1870-4434-98ad-21ed59687a2d   1048576     2368   1046208   1% /mnt/iaas
/dev/sda4                                                                                            133682156 13110772 120571384  10% /etc/hosts
tmpfs                                                                                                  4075872       28   4075844   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                                                                                                  4075872        0   4075872   0% /proc/acpi
tmpfs                                                                                                  4075872        0   4075872   0% /proc/scsi
tmpfs                                                                                                  4075872        0   4075872   0% /sys/firmware
sh-4.4$ ls /mnt/iaas/
hello  testfile
sh-4.4$ /mnt/iaas/hello 
Hello OpenShift Storage


Expected results:

Additional info:
$ oc --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_admin.kubeconfig get sc sc-kop2x -oyaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: "false"
  creationTimestamp: "2021-01-19T03:27:15Z"
  managedFields:
  - apiVersion: storage.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:storageclass.beta.kubernetes.io/is-default-class: {}
      f:provisioner: {}
      f:reclaimPolicy: {}
      f:volumeBindingMode: {}
    manager: kubectl-create
    operation: Update
    time: "2021-01-19T03:27:15Z"
  name: sc-kop2x
  resourceVersion: "50356"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/sc-kop2x
  uid: cc8f5de7-9c49-4fea-a6c6-6006d4ba1a79
provisioner: kubernetes.io/azure-file
reclaimPolicy: Delete
volumeBindingMode: Immediate

$ oc --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_admin.kubeconfig get pvc mypvc -n kop2x -oyaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/azure-file
  creationTimestamp: "2021-01-19T03:27:16Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    name: dynamic-pvc
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:pv.kubernetes.io/bind-completed: {}
          f:pv.kubernetes.io/bound-by-controller: {}
          f:volume.beta.kubernetes.io/storage-provisioner: {}
      f:spec:
        f:volumeName: {}
      f:status:
        f:accessModes: {}
        f:capacity:
          .: {}
          f:storage: {}
        f:phase: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-01-19T03:27:16Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:name: {}
      f:spec:
        f:accessModes: {}
        f:resources:
          f:requests:
            .: {}
            f:storage: {}
        f:storageClassName: {}
        f:volumeMode: {}
    manager: kubectl-create
    operation: Update
    time: "2021-01-19T03:27:16Z"
  name: mypvc
  namespace: kop2x
  resourceVersion: "50366"
  selfLink: /api/v1/namespaces/kop2x/persistentvolumeclaims/mypvc
  uid: b1a99c2f-1870-4434-98ad-21ed59687a2d
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: sc-kop2x
  volumeMode: Filesystem
  volumeName: pvc-b1a99c2f-1870-4434-98ad-21ed59687a2d
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1Gi
  phase: Bound

$ oc --kubeconfig=/home/piqin/workdir/preserve-storage-server1-piqin/ocp4_admin.kubeconfig get pod mypod -n kop2x -oyaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "",
          "interface": "eth0",
          "ips": [
              "10.128.2.83"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "",
          "interface": "eth0",
          "ips": [
              "10.128.2.83"
          ],
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: restricted
  creationTimestamp: "2021-01-19T03:27:17Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:containers:
          k:{"name":"testpod"}:
            .: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:ports:
              .: {}
              k:{"containerPort":80,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
            f:resources: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/mnt/iaas"}:
                .: {}
                f:mountPath: {}
                f:name: {}
                f:subPath: {}
        f:dnsPolicy: {}
        f:enableServiceLinks: {}
        f:initContainers:
          .: {}
          k:{"name":"init"}:
            .: {}
            f:command: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/mnt/iaas"}:
                .: {}
                f:mountPath: {}
                f:name: {}
        f:restartPolicy: {}
        f:schedulerName: {}
        f:securityContext:
          .: {}
          f:fsGroup: {}
          f:seLinuxOptions:
            f:level: {}
        f:terminationGracePeriodSeconds: {}
        f:volumes:
          .: {}
          k:{"name":"dynamic"}:
            .: {}
            f:name: {}
            f:persistentVolumeClaim:
              .: {}
              f:claimName: {}
    manager: kubectl-create
    operation: Update
    time: "2021-01-19T03:27:17Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:k8s.v1.cni.cncf.io/network-status: {}
          f:k8s.v1.cni.cncf.io/networks-status: {}
    manager: multus
    operation: Update
    time: "2021-01-19T03:27:20Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          k:{"type":"ContainersReady"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
          k:{"type":"Initialized"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
          k:{"type":"Ready"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
        f:containerStatuses: {}
        f:hostIP: {}
        f:initContainerStatuses: {}
        f:phase: {}
        f:podIP: {}
        f:podIPs:
          .: {}
          k:{"ip":"10.128.2.83"}:
            .: {}
            f:ip: {}
        f:startTime: {}
    manager: kubelet
    operation: Update
    time: "2021-01-19T03:27:21Z"
  name: mypod
  namespace: kop2x
  resourceVersion: "50411"
  selfLink: /api/v1/namespaces/kop2x/pods/mypod
  uid: ba2b7382-0696-402e-88bb-283616f937a5
spec:
  containers:
  - image: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40
    imagePullPolicy: IfNotPresent
    name: testpod
    ports:
    - containerPort: 80
      name: http-server
      protocol: TCP
    resources: {}
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1001050000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /mnt/iaas
      name: dynamic
      subPath: subpath
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-kz542
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: default-dockercfg-t6q28
  initContainers:
  - command:
    - sh
    - -c
    - mkdir /mnt/iaas/subpath; cp /hello /mnt/iaas/subpath/hello
    image: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40
    imagePullPolicy: IfNotPresent
    name: init
    resources: {}
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1001050000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /mnt/iaas
      name: dynamic
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-kz542
      readOnly: true
  nodeName: storage-o2bf3-wph46-worker-centralus3-bvhs7
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1001050000
    seLinuxOptions:
      level: s0:c32,c29
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: dynamic
    persistentVolumeClaim:
      claimName: mypvc
  - name: default-token-kz542
    secret:
      defaultMode: 420
      secretName: default-token-kz542
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-01-19T03:27:20Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-01-19T03:27:21Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-01-19T03:27:21Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-01-19T03:27:17Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://45c00648738287b94e0c7ff761674824f1a4f4f9b2a3ea1bbf0614e4244f35a2
    image: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40
    imageID: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40
    lastState: {}
    name: testpod
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-01-19T03:27:21Z"
  hostIP: 10.0.1.5
  initContainerStatuses:
  - containerID: cri-o://12d39e0351c00b8e51e914d5eeaf5fdde2e911cb60fd48e14876b4ecce4d632b
    image: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40
    imageID: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40
    lastState: {}
    name: init
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://12d39e0351c00b8e51e914d5eeaf5fdde2e911cb60fd48e14876b4ecce4d632b
        exitCode: 0
        finishedAt: "2021-01-19T03:27:20Z"
        reason: Completed
        startedAt: "2021-01-19T03:27:20Z"
  phase: Running
  podIP: 10.128.2.83
  podIPs:
  - ip: 10.128.2.83
  qosClass: BestEffort
  startTime: "2021-01-19T03:27:17Z"

Comment 2 Peter Hunt 2021-01-29 19:25:32 UTC
is the program you're using not gracefully handle EINTR?

Comment 3 Qin Ping 2021-02-01 07:23:19 UTC
Hi Peter,

I can't get the source code of `hello` binary file, only can find the binary file from here: https://github.com/jhou1/hello-openshift/tree/master/bin.

I tried another two executable files:
$ oc -n a6wlz exec mypod -- cat /mnt/iaas/test
#!/bin/sh
echo "Hello world!"
$ oc -n a6wlz exec mypod -- /mnt/iaas/test
Hello world!

$ oc -n a6wlz exec mypod -- /mnt/iaas/test1
time="2021-02-01T07:17:58Z" level=error msg="exec failed: container_linux.go:366: starting container process caused: interrupted system call"
command terminated with exit code 1

This is the source code for test1:
$ cat test1.go 
package main

import "fmt"

func main() {
	fmt.Println("Hello world!")
}

go build test1.go

$ oc cp test1 a6wlz/mypod:/mnt/iaas
$ oc -n a6wlz exec mypod -- ls /mnt/iaas
hello
test
test1
testfile

Comment 6 Kir Kolyshkin 2021-05-21 18:24:32 UTC
> return 139

This means the process was killed with signal 9, most probably as a result of OOM kill.

I suggest to review and maybe raise the memory limit.

Comment 7 Kir Kolyshkin 2021-05-21 18:25:43 UTC
As for "interrupted system call", the error comes from runc, and it seems that this might be caused by go 1.13 used to compile it. The remedy is to use go 1.14+.

Comment 8 Peter Hunt 2021-06-09 14:25:52 UTC
we should be using go 1.15 in 4.8 now, so I don't think this should be an issue there. Since this was originally targeted at 4.7, I'm moving this out of the release.

Comment 9 Kir Kolyshkin 2021-07-02 01:55:33 UTC
Was looking into it this week. Apparently, execve(2) syscall can actually return EINTR on Azure (when using CIFS share).

No one ever wraps execve(2) into retry-on-eintr loop (I checked golang runtime and glibc).

Nevertheless, we're going to do that in runc, because of this bug.

A WIP PR is https://github.com/opencontainers/runc/pull/3045, should be merged soon and then I'll backport to 1.0 branch.

From looking at the code, the same issue exists in crun, the fix is in https://github.com/containers/crun/pull/702.

Comment 10 Kir Kolyshkin 2021-07-09 20:03:39 UTC
Fix PR is merged, backport PR (https://github.com/opencontainers/runc/pull/3074) in review, the fix will be part of runc 1.0.1 release.

Comment 12 Kir Kolyshkin 2022-01-28 23:08:02 UTC
Fixed in runc 1.0.1

Comment 14 Weinan Liu 2022-09-05 08:08:01 UTC
private-templates/functionality-testing/aos-4_12/ipi-on-azure/versioned-installer-MAG-fully_private_cluster-NAT
Fixed on 4.12.0-0.nightly-2022-09-02-194931

$ oc exec mypod  -- /hello
Hello OpenShift Storage

Comment 17 errata-xmlrpc 2023-01-17 19:46:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.