Description of problem: Created a PVC then created a Pod reference the PVC. The container in the Pod ran command `touch /mnt/SUCCESS && exit 0 || exit 1` and then Pod was completed, which was expected from storage angle because there was a file named SUCCESS created in the backend storage. But I got the warning from event: FailedCreatePodSandBox 1s kubelet, cnv-executor-qwang-cnv14-node-a9e382-1.example.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_pod-g_qwang2_99a32ac7-b793-11e9-a860-fa163e026b79_1(cd418d6c2d28e9a505b496e443e6fc365d5f0d5fdb019565544203d56bdea947): Multus: Err in tearing down failed plugins: cannot set "openshift-sdn" ifname to "eth0": no netns: failed to Statfs "/proc/87891/ns/net": no such file or directory My questions are: 1. There is not an "openshift-multus" namespace in my cluster, no pod provides multiple NIC. Is Multus supported in CNV 1.4? 2. There are kube-multus pods running in "kube-system" namespace. What are these pods for? 3. Why in this case, Pod needs Multus instead of other network interfaces? 4. Why does container run command `touch /mnt/SUCCESS && exit 0 || exit 1`, a pod sandbox will be created? If it ran command `dd if=/dev/zero of=/tmp/test.data bs=5M count=100 2>&1`, I wouldn't see this warning, everything seemed fine. Version-Release number of selected component (if applicable): CNV v1.4.1 How reproducible: 100% Steps to Reproduce: 1. Use this manifest to create a PVC $ cat pvc-gluster.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: pvc-gluster spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi 2. Verify PVC and PV is bound [cloud-user@cnv-executor-qwang-cnv14-master-a9e382-1 ~]$ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc-gluster Bound pvc-3cd5ac36-b793-11e9-a860-fa163e026b79 1Gi RWO glusterfs-storage 5m 3. Create a Pod reference the PVC $ cat pod.yaml kind: Pod apiVersion: v1 metadata: name: pod-g spec: containers: - name: pod-test image: gcr.io/google_containers/busybox:1.24 command: - "/bin/sh" args: - "-c" - "touch /mnt/SUCCESS && exit 0 || exit 1" volumeMounts: - name: g-pvc mountPath: "/mnt" restartPolicy: "OnFailure" volumes: - name: g-pvc persistentVolumeClaim: claimName: pvc-gluster 4. Check Pod status/logs Actual results: 4. [cloud-user@cnv-executor-qwang-cnv14-master-a9e382-1 ~]$ oc describe pod pod-g Name: pod-g Namespace: qwang2 Priority: 0 PriorityClassName: <none> Node: cnv-executor-qwang-cnv14-node-a9e382-1.example.com/172.16.0.29 Start Time: Mon, 05 Aug 2019 11:13:37 -0400 Labels: <none> Annotations: k8s.v1.cni.cncf.io/networks-status= openshift.io/scc=anyuid Status: Succeeded IP: 10.130.0.22 Containers: pod-test: Container ID: cri-o://ec7fbea43772096815f75b60d900ae4cb5fced0e4b4e82fbf385386c0c0e5b65 Image: gcr.io/google_containers/busybox:1.24 Image ID: gcr.io/google_containers/busybox@sha256:4bdd623e848417d96127e16037743f0cd8b528c026e9175e22a84f639eca58ff Port: <none> Host Port: <none> Command: /bin/sh Args: -c touch /mnt/SUCCESS && exit 0 || exit 1 State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 05 Aug 2019 11:13:46 -0400 Finished: Mon, 05 Aug 2019 11:13:46 -0400 Ready: False Restart Count: 0 Environment: <none> Mounts: /mnt from g-pvc (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-sfzc5 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: g-pvc: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-gluster ReadOnly: false default-token-sfzc5: Type: Secret (a volume populated by a Secret) SecretName: default-token-sfzc5 Optional: false QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/compute=true Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 21s default-scheduler Successfully assigned qwang2/pod-g to cnv-executor-qwang-cnv14-node-a9e382-1.example.com Normal Pulled 12s kubelet, cnv-executor-qwang-cnv14-node-a9e382-1.example.com Container image "gcr.io/google_containers/busybox:1.24" already present on machine Normal Created 12s kubelet, cnv-executor-qwang-cnv14-node-a9e382-1.example.com Created container Normal Started 12s kubelet, cnv-executor-qwang-cnv14-node-a9e382-1.example.com Started container Normal SandboxChanged 11s kubelet, cnv-executor-qwang-cnv14-node-a9e382-1.example.com Pod sandbox changed, it will be killed and re-created. Warning FailedCreatePodSandBox 1s kubelet, cnv-executor-qwang-cnv14-node-a9e382-1.example.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_pod-g_qwang2_99a32ac7-b793-11e9-a860-fa163e026b79_1(cd418d6c2d28e9a505b496e443e6fc365d5f0d5fdb019565544203d56bdea947): Multus: Err in tearing down failed plugins: cannot set "openshift-sdn" ifname to "eth0": no netns: failed to Statfs "/proc/87891/ns/net": no such file or directory Expected results: Don't have this warning. Additional info: [cloud-user@cnv-executor-qwang-cnv14-master-a9e382-1 ~]$ oc get pods -o wide -n openshift-sdn NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE ovs-mvmfh 1/1 Running 0 1h 172.16.0.14 cnv-executor-qwang-cnv14-node-a9e382-2.example.com <none> ovs-qcvhd 1/1 Running 0 1h 172.16.0.29 cnv-executor-qwang-cnv14-node-a9e382-1.example.com <none> ovs-zwswg 1/1 Running 0 1h 172.16.0.17 cnv-executor-qwang-cnv14-master-a9e382-1.example.com <none> sdn-5cs88 1/1 Running 0 1h 172.16.0.14 cnv-executor-qwang-cnv14-node-a9e382-2.example.com <none> sdn-846tp 1/1 Running 0 1h 172.16.0.29 cnv-executor-qwang-cnv14-node-a9e382-1.example.com <none> sdn-8ndpg 1/1 Running 0 1h 172.16.0.17 cnv-executor-qwang-cnv14-master-a9e382-1.example.com <none>
Would you grant credentials to your cluster to Sebastian (over email?) Multus must keep working in CNV-1.4.1.
I think this is a know issue in multus that was fixed by using a local cache. The problem is the network namespace is removed before multus run the delete function. I also see the Annotations: k8s.v1.cni.cncf.io/networks-status= is empty and this is a bit weird. can you try to start a pod that doesn't finish immediately and check the annotation? also the start and end time are equals looks like the pod was to quickly for the remove State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 05 Aug 2019 11:13:46 -0400 Finished: Mon, 05 Aug 2019 11:13:46 -0400
A comparatively long-running command won't have this problem. I noticed during the Pod running, it has network annotation. k8s.v1.cni.cncf.io/networks-status=[{ "name": "openshift.1", "ips": [ "10.129.0.25" ], "default": true, "dns": {} }] Once command finished and Pod is completed, it became k8s.v1.cni.cncf.io/networks-status= Flowing are the same Pod I `describe` during it's Running vs. Compeleted [cloud-user@cnv-executor-qwang-cnv14-master-a9e382-1 ~]$ oc describe pod pod-g-3 Name: pod-g-3 Namespace: qwang3 Priority: 0 PriorityClassName: <none> Node: cnv-executor-qwang-cnv14-node-a9e382-2.example.com/172.16.0.14 Start Time: Tue, 06 Aug 2019 01:13:37 -0400 Labels: <none> Annotations: k8s.v1.cni.cncf.io/networks-status=[{ "name": "openshift.1", "ips": [ "10.129.0.25" ], "default": true, "dns": {} }] openshift.io/scc=anyuid Status: Running IP: 10.129.0.25 Containers: pod-test: Container ID: cri-o://ec7465913012d050bff4f30279bfefce401f493439f3f99ef46406740ea1ab15 Image: gcr.io/google_containers/busybox:1.24 Image ID: gcr.io/google_containers/busybox@sha256:4bdd623e848417d96127e16037743f0cd8b528c026e9175e22a84f639eca58ff Port: <none> Host Port: <none> Command: /bin/sh Args: -c max=10; for i in `seq 2 $max`; do echo $i; sleep 5; done State: Running Started: Tue, 06 Aug 2019 01:13:44 -0400 Ready: True Restart Count: 0 Environment: <none> Mounts: /mnt from g-pvc (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-4qf2v (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: g-pvc: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-gluster ReadOnly: false default-token-4qf2v: Type: Secret (a volume populated by a Secret) SecretName: default-token-4qf2v Optional: false QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/compute=true Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 10s default-scheduler Successfully assigned qwang3/pod-g-3 to cnv-executor-qwang-cnv14-node-a9e382-2.example.com Normal Pulled 1s kubelet, cnv-executor-qwang-cnv14-node-a9e382-2.example.com Container image "gcr.io/google_containers/busybox:1.24" already present on machine Normal Created 1s kubelet, cnv-executor-qwang-cnv14-node-a9e382-2.example.com Created container Normal Started 1s kubelet, cnv-executor-qwang-cnv14-node-a9e382-2.example.com Started container [cloud-user@cnv-executor-qwang-cnv14-master-a9e382-1 ~]$ oc describe pod pod-g-3 Name: pod-g-3 Namespace: qwang3 Priority: 0 PriorityClassName: <none> Node: cnv-executor-qwang-cnv14-node-a9e382-2.example.com/172.16.0.14 Start Time: Tue, 06 Aug 2019 01:13:37 -0400 Labels: <none> Annotations: k8s.v1.cni.cncf.io/networks-status= openshift.io/scc=anyuid Status: Succeeded IP: 10.129.0.25 Containers: pod-test: Container ID: cri-o://ec7465913012d050bff4f30279bfefce401f493439f3f99ef46406740ea1ab15 Image: gcr.io/google_containers/busybox:1.24 Image ID: gcr.io/google_containers/busybox@sha256:4bdd623e848417d96127e16037743f0cd8b528c026e9175e22a84f639eca58ff Port: <none> Host Port: <none> Command: /bin/sh Args: -c max=10; for i in `seq 2 $max`; do echo $i; sleep 5; done State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 06 Aug 2019 01:13:44 -0400 Finished: Tue, 06 Aug 2019 01:14:29 -0400 Ready: False Restart Count: 0 Environment: <none> Mounts: /mnt from g-pvc (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-4qf2v (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: g-pvc: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-gluster ReadOnly: false default-token-4qf2v: Type: Secret (a volume populated by a Secret) SecretName: default-token-4qf2v Optional: false QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/compute=true Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 1m default-scheduler Successfully assigned qwang3/pod-g-3 to cnv-executor-qwang-cnv14-node-a9e382-2.example.com Normal Pulled 58s kubelet, cnv-executor-qwang-cnv14-node-a9e382-2.example.com Container image "gcr.io/google_containers/busybox:1.24" already present on machine Normal Created 58s kubelet, cnv-executor-qwang-cnv14-node-a9e382-2.example.com Created container Normal Started 58s kubelet, cnv-executor-qwang-cnv14-node-a9e382-2.example.com Started container
Thanks, Qixuan and Sebastian. Based on your report, I understand that this is a known multus/openshift bug that is solved in OCP4, I'm closing this bug.